@map-colonies/shapefile-reader
v1.0.1
Published
This is template for map colonies typescript packages
Maintainers
Keywords
Readme
@map-colonies/shapefile-reader
A Node.js library for reading large shapefiles in memory-controlled chunks. It processes GeoJSON features in vertex-bounded batches, with built-in support for resumable processing, progress tracking, and metrics collection.
Features
- Chunk-based processing — splits features into chunks bounded by a configurable vertex limit, keeping memory usage predictable
- Oversized feature handling — features that exceed the vertex limit are captured as
skippedFeatureswithin their chunk rather than silently dropped - Resumable processing — save and restore processing state to continue an interrupted run from where it left off
- Progress tracking — real-time percentage, speed (features/vertices/chunks per second), and estimated time remaining
- Metrics collection — per-chunk and per-file timing and feature count callbacks
- Auto-generated feature IDs — optionally assign a UUID to features that have no identifier
Requirements
- Node.js >= 24
- GDAL native binaries (provided by
gdal-async)
Installation
npm install @map-colonies/shapefile-readerQuick Start
import { ShapefileChunkReader } from '@map-colonies/shapefile-reader';
const reader = new ShapefileChunkReader({
maxVerticesPerChunk: 50_000,
});
await reader.readAndProcess('/path/to/file.shp', {
process: async (chunk) => {
console.log(`Chunk ${chunk.id}: ${chunk.features.length} features, ${chunk.verticesCount} vertices`);
// handle the GeoJSON features in chunk.features
},
});API
ShapefileChunkReader
The main class. Construct it once with your options and reuse it across multiple files.
const reader = new ShapefileChunkReader(options: ReaderOptions);readAndProcess(shapefilePath, processor)
Reads the shapefile at shapefilePath and calls processor.process(chunk) for each chunk.
await reader.readAndProcess(shapefilePath: string, processor: ChunkProcessor): Promise<void>- If a
stateManageris provided, state is saved after each successfully processed chunk and on error. - If a previous state exists (loaded via
stateManager.loadState()), processing resumes from the last saved position.
getShapefileStats(shapefilePath)
Pre-scans the shapefile to return total feature and vertex counts. Useful for estimating progress before processing begins.
const { totalFeatures, totalVertices } = await reader.getShapefileStats(shapefilePath: string);Throws if the file has no valid features or vertices.
ReaderOptions
| Option | Type | Required | Description |
|--------|------|----------|-------------|
| maxVerticesPerChunk | number | Yes | Maximum total vertices allowed per chunk. Controls peak memory usage. |
| generateFeatureId | boolean | No | When true, assigns a random UUID to each feature that has no id. Default: false. |
| logger | Logger | No | Any logger with info, debug, warn, error methods accepting an object (e.g. pino, @map-colonies/js-logger). |
| stateManager | StateManager | No | Enables resumable processing. See StateManager. |
| metricsCollector | MetricsCollector | No | Receives per-chunk and per-file metrics callbacks. See MetricsCollector. |
ShapefileChunk
The object passed to your ChunkProcessor for each chunk.
interface ShapefileChunk {
id: number; // zero-based chunk index
features: Feature[]; // GeoJSON features that fit within the vertex limit
verticesCount: number; // total vertices across features in this chunk
skippedFeatures: Feature[]; // features whose vertex count alone exceeds maxVerticesPerChunk
skippedVerticesCount: number;
}Features in skippedFeatures have a vertices property added to their properties object, recording their vertex count.
StateManager
Implement this interface to enable resumable processing.
interface StateManager {
saveState: (state: ProcessingState) => Promise<void> | void;
loadState: () => (ProcessingState | null) | Promise<ProcessingState | null>;
}saveState is called after each successfully processed chunk and on processing errors. loadState is called once at the start of readAndProcess — return null to start fresh.
ProcessingState
interface ProcessingState {
filePath: string;
lastProcessedChunkIndex: number;
lastProcessedFeatureIndex: number;
timestamp: Date;
progress?: ProgressInfo; // full progress snapshot at time of save
}MetricsCollector
Implement this interface to receive performance metrics.
interface MetricsCollector {
onChunkMetrics?: (metrics: ChunkMetrics) => void;
onFileMetrics?: (metrics: FileMetrics) => void;
}ChunkMetrics — emitted after each chunk is processed:
| Field | Type | Description |
|-------|------|-------------|
| chunkIndex | number | Chunk ID |
| featuresCount | number | Features in this chunk |
| skippedFeaturesCount | number | Skipped features in this chunk |
| verticesCount | number | Vertices in this chunk |
| readTimeMs | number | Time to read the chunk from disk |
| processTimeMs | number | Time your processor took |
| totalTimeMs | number | readTimeMs + processTimeMs |
| timestamp | Date | When the chunk finished processing |
FileMetrics — emitted once after all chunks are processed:
| Field | Type | Description |
|-------|------|-------------|
| totalFeatures | number | Total processed features |
| totalSkippedFeatures | number | Total skipped features |
| totalVertices | number | Total processed vertices |
| totalChunks | number | Number of chunks |
| totalReadTimeMs | number | Cumulative read time |
| totalProcessTimeMs | number | Cumulative process time |
| totalTimeMs | number | Cumulative total time |
| startTime | Date | When processing started |
| endTime | Date \| undefined | When processing ended |
openShapefile(path) / GdalShapefileReader
Lower-level access to the GDAL-backed shapefile reader. Implements the IShapefileSource interface.
import { openShapefile } from '@map-colonies/shapefile-reader';
const source = await openShapefile('/path/to/file.shp');
while (true) {
const { done, value: feature } = await source.read();
if (done) break;
// feature is a GeoJSON Feature
}
source.close();countVertices(geometry)
Utility that counts the total number of vertices in any GeoJSON geometry, including nested rings and sub-geometries in GeometryCollection.
import { countVertices } from '@map-colonies/shapefile-reader';
const count = countVertices(feature.geometry);Advanced Example
import { ShapefileChunkReader } from '@map-colonies/shapefile-reader';
import pino from 'pino';
const reader = new ShapefileChunkReader({
maxVerticesPerChunk: 100000,
generateFeatureId: true,
logger: pino({ level: 'info' }),
stateManager: {
saveState: async (state) => {
await db.save('shapefile_state', state);
},
loadState: async () => {
return db.load('shapefile_state');
},
},
metricsCollector: {
onChunkMetrics: (metrics) => {
console.log(`Chunk ${metrics.chunkIndex}: ${metrics.featuresCount} features in ${metrics.totalTimeMs}ms`);
},
onFileMetrics: (metrics) => {
console.log(`Done — ${metrics.totalFeatures} features across ${metrics.totalChunks} chunks`);
},
},
});
await reader.readAndProcess('/data/large-file.shp', {
process: async (chunk) => {
if (chunk.skippedFeatures.length > 0) {
console.warn(`${chunk.skippedFeatures.length} features skipped in chunk ${chunk.id}`);
}
console.log(`${chunk.skippedFeatures.length} features in chunk ${chunk.id}`);
},
});Development
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Lint
npm run lint
# Format
npm run format:fix