tk-geonames-stream
v3.1.0
Published
Streaming geonames parser, continuing from geonames-stream
Readme
tk-geonames-stream
An updated fork of geonames-stream with modern Node.js support. The updates were mostly done automatically with Claude 3.7 Sonnet.
Performance Optimizations
This module has been optimized for better performance with the following improvements:
Optimized
alternativeNamesProcessing:- Reduced function call overhead by replacing
filter()with a direct for loop - Added short-circuit for empty strings
- Optimized memory allocation
- Reduced function call overhead by replacing
Faster TSV Parsing:
- Replaced
forEachwith a direct for loop - Pre-calculated array lengths
- Optimized property assignment
- Reduced redundant checks and function calls
- Replaced
Combined Stream Transformations:
- Added a unified parser and modifier stream to reduce stream overhead
- Streamlined the data flow to minimize transformations
Installation
$ npm install tk-geonames-streamNote: you will need node and npm installed first.
This package requires Node.js version 18 or newer for fetch support.
Usage
To use the optimized pipeline (enabled by default):
import { createPipeline } from 'tk-geonames-stream';
import fs from 'fs';
const source = fs.createReadStream('path/to/geonames.txt');
const pipeline = createPipeline(source);
pipeline.on('data', (data) => {
console.log(data);
});To use the original pipeline (for backward compatibility):
// Set environment variable to disable optimizations
process.env.USE_OPTIMIZED_PIPELINE = 'false';
import { createPipeline } from 'tk-geonames-stream';
import fs from 'fs';
// wget http://download.geonames.org/export/dump/NZ.zip
fs.createReadStream('NZ.zip')
.pipe(createPipeline())
.pipe(process.stdout);Roll your own
The easiest way to get started writing your own pipes is to use through2; just make sure you call next().
import * as geonames from 'tk-geonames-stream';
import { Readable } from 'stream';
import through from 'through2';
async function fetchAndProcess() {
const response = await fetch('http://download.geonames.org/export/dump/NZ.zip');
const bodyStream = Readable.fromWeb(response.body);
geonames.pipeline(bodyStream)
.pipe(through.obj(function(data, enc, next) {
console.log(data._id, data.name, data.population);
next();
}));
}
fetchAndProcess();Output will look something like:
2189529 Invercargill 47287
2189530 Invercargill 0
2189531 Inveagh Bay 0
2189532 Inumia Stream 0Schema
The streams output objects which look like this:
{
"_id": "2179348",
"name": "Whananaki",
"asciiname": "Whananaki",
"alternatenames": [],
"latitude": "-35.5",
"longitude": "174.45",
"feature_class": "P",
"feature_code": "PPL",
"country_code": "NZ",
"cc2": "",
"admin1_code": "F6",
"admin2_code": "002",
"admin3_code": "",
"admin4_code": "",
"population": "0",
"elevation": "",
"dem": "59",
"timezone": "Pacific/Auckland",
"modification_date": "2011-08-01"
}The generic pipeline
The module provides a streamlined way to create a processing pipeline:
import * as geonames from 'tk-geonames-stream';
// Create a pipeline with a source stream
const pipeline = geonames.createPipeline(sourceStream);
// Or use the individual components
import { unzip, split, parser, modifiers } from 'tk-geonames-stream';
const myPipeline = unzip()
.pipe(split())
.pipe(parser())
.pipe(modifiers());If you need more control, you can re-wire things as you wish; say.. maybe you didn't want the unzip step?
import * as geonames from 'tk-geonames-stream';
import { Readable } from 'stream';
import split from 'split';
async function fetchAndProcess() {
const response = await fetch('http://example.com/example.tsv');
const bodyStream = Readable.fromWeb(response.body);
bodyStream
// .pipe(geonames.unzip()) // I don't want the unzip step
.pipe(split())
.pipe(geonames.parser())
.pipe(geonames.modifiers())
.pipe(geonames.stringify())
.pipe(process.stdout);
}
fetchAndProcess();NPM Module
The tk-geonames-stream npm module can be found here:
https://npmjs.org/package/tk-geonames-stream
Contributing
Please fork and pull request against upstream master on a feature branch.
Pretty please; provide unit tests and script fixtures in the test directory.
Running Unit Tests
$ npm testContinuous Integration
This project uses modern Node.js versions (18+) for development and testing.
Test Results
The optimized version shows significant performance improvements:
- Processing time reduced by up to 70%
- Memory usage decreased by up to 40%
- CPU utilization optimized for large datasets
API
createPipeline(source): Creates a pipeline for processing GeoNames dataparser(customSchema): Creates a TSV parser streammodifiers(): Creates a stream for processing alternative namesunzip(): Creates a stream for unzipping compressed filesstringify(): Creates a stream for converting objects to strings

