npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

bigjsondb

v1.0.0

Published

Efficient library for querying massive compressed JSONL files with streaming and indexing support

Readme

BigJsonDB

Efficiently query massive compressed JSONL files (hundreds or thousands of gigabytes) with a MongoDB-like API.

BigJsonDB uses streaming decompression and optional indexing to handle enormous .jsonl.gz files without loading them entirely into memory. Perfect for big data processing, log analysis, and large dataset queries.

Features

  • 🚀 Streaming Architecture - Process files of any size without memory constraints
  • 📊 Index Support - Create indexes on fields for lightning-fast lookups
  • 🔍 Rich Query API - MongoDB-like queries with operators ($eq, $gt, $in, $regex, etc.)
  • 📄 Pagination & Sorting - Skip, limit, and sort results efficiently
  • 💾 Memory Efficient - Handles terabyte-scale files on commodity hardware
  • 📦 Zero Dependencies - Uses only Node.js built-in modules
  • 🎯 TypeScript Support - Full type definitions included

Installation

npm install bigjsondb

Quick Start

import { BigJsonDB } from 'bigjsondb';

// Open a compressed JSONL file
const db = new BigJsonDB('data.jsonl.gz');

// Simple query
const users = await db.find({ age: { $gte: 18 } });

// Query with pagination
const results = await db.find(
  { status: 'active', country: 'US' },
  { skip: 100, limit: 20 }
);

// Count matching documents
const count = await db.count({ role: 'admin' });

API Reference

Constructor

new BigJsonDB(filePath: string, config?: BigJsonDBConfig)

Parameters:

  • filePath - Path to the .jsonl.gz file
  • config (optional):
    • autoIndex?: boolean - Auto-index on first access (default: false)
    • maxCacheSize?: number - Max memory for caching in bytes (default: 100MB)
    • chunkSize?: number - Chunk size for streaming in bytes (default: 64KB)

Example:

const db = new BigJsonDB('logs.jsonl.gz', {
  maxCacheSize: 500 * 1024 * 1024, // 500MB
  chunkSize: 128 * 1024 // 128KB chunks
});

Query Methods

find(query, options)

Find documents matching a query.

async find(query?: Query, options?: QueryOptions): Promise<any[]>

Parameters:

  • query - Query conditions (MongoDB-like syntax)
  • options:
    • skip?: number - Number of documents to skip
    • limit?: number - Maximum number of documents to return
    • sort?: { [field: string]: 'asc' | 'desc' } - Sort specification
    • projection?: { [field: string]: 0 | 1 } - Fields to include/exclude

Examples:

// Simple equality
await db.find({ name: 'John' });

// Comparison operators
await db.find({ 
  age: { $gte: 21, $lt: 65 },
  status: { $ne: 'banned' }
});

// Array operators
await db.find({ 
  role: { $in: ['admin', 'moderator'] },
  tags: { $nin: ['spam', 'deleted'] }
});

// Regular expressions
await db.find({ 
  email: { $regex: '@gmail\\.com$' }
});

// Nested fields
await db.find({ 
  'address.city': 'New York',
  'profile.verified': true
});

// With options
await db.find(
  { category: 'electronics' },
  { 
    skip: 20, 
    limit: 10,
    sort: { price: 'desc' },
    projection: { name: 1, price: 1 }
  }
);

findOne(query, options)

Find a single document.

async findOne(query?: Query, options?: QueryOptions): Promise<any | null>

Example:

const user = await db.findOne({ email: '[email protected]' });

count(query)

Count documents matching a query.

async count(query?: Query): Promise<number>

Example:

const activeUsers = await db.count({ status: 'active' });

distinct(field, query)

Get distinct values for a field.

async distinct(field: string, query?: Query): Promise<any[]>

Example:

const countries = await db.distinct('country');
const activeDepartments = await db.distinct('department', { status: 'active' });

stream(query, callback)

Stream documents for custom processing (memory-efficient for large result sets).

async stream(query: Query, callback: (doc: any) => void | Promise<void>): Promise<void>

Example:

let sum = 0;
await db.stream(
  { category: 'sales' },
  (doc) => { sum += doc.amount; }
);
console.log('Total sales:', sum);

Index Methods

createIndex(field)

Create an index on a field for faster lookups. This scans the entire file once to build the index.

async createIndex(field: string): Promise<void>

Example:

// Create indexes on frequently queried fields
await db.createIndex('userId');
await db.createIndex('timestamp');
await db.createIndex('status');

// Now queries on these fields will be much faster
const user = await db.findOne({ userId: '12345' }); // Uses index!

listIndexes()

List all indexed fields.

listIndexes(): string[]

dropIndex(field)

Remove an index.

dropIndex(field: string): void

getStats()

Get database statistics.

getStats(): DbStats

Example:

const stats = db.getStats();
console.log('Total records:', stats.totalRecords);
console.log('Indexes:', stats.indexes);

Query Operators

BigJsonDB supports the following MongoDB-like operators:

| Operator | Description | Example | |----------|-------------|---------| | $eq | Equal to | { age: { $eq: 25 } } | | $ne | Not equal to | { status: { $ne: 'deleted' } } | | $gt | Greater than | { price: { $gt: 100 } } | | $gte | Greater than or equal | { age: { $gte: 18 } } | | $lt | Less than | { score: { $lt: 50 } } | | $lte | Less than or equal | { count: { $lte: 10 } } | | $in | Value in array | { role: { $in: ['admin', 'user'] } } | | $nin | Value not in array | { status: { $nin: ['banned'] } } | | $regex | Regular expression | { name: { $regex: '^John' } } | | $exists | Field exists | { phone: { $exists: true } } |

Performance Tips

1. Use Indexes for Frequent Queries

If you frequently query by specific fields, create indexes:

// One-time index creation
await db.createIndex('userId');
await db.createIndex('timestamp');

// Subsequent queries will be much faster
await db.find({ userId: '12345' }); // Lightning fast!

2. Use Projections to Reduce Data Transfer

Only retrieve the fields you need:

// Instead of this:
const users = await db.find({ role: 'admin' });

// Do this:
const users = await db.find(
  { role: 'admin' },
  { projection: { name: 1, email: 1 } }
);

3. Use Streaming for Large Result Sets

When processing many results, use streaming to avoid memory issues:

// Instead of loading everything into memory:
const allLogs = await db.find({ level: 'error' }); // Could be millions!

// Stream and process incrementally:
await db.stream({ level: 'error' }, (log) => {
  processLog(log);
});

4. Limit Results When Possible

Always use limit if you don't need all results:

// Get just the top 10
const topProducts = await db.find(
  { category: 'electronics' },
  { limit: 10, sort: { sales: 'desc' } }
);

5. Combine Multiple Operators

Make queries more specific to reduce scanning:

// More specific = faster
await db.find({ 
  status: 'active',
  created: { $gte: '2024-01-01' },
  country: { $in: ['US', 'CA', 'GB'] }
});

Use Cases

Log Analysis

const db = new BigJsonDB('server-logs.jsonl.gz');

// Find all 500 errors in the last hour
const errors = await db.find({
  status: 500,
  timestamp: { $gte: Date.now() - 3600000 }
});

// Count requests by endpoint
const endpoints = await db.distinct('endpoint');

Data Analytics

const db = new BigJsonDB('transactions.jsonl.gz');

// Calculate total revenue by category
const categories = await db.distinct('category');
for (const category of categories) {
  let total = 0;
  await db.stream(
    { category },
    (tx) => { total += tx.amount; }
  );
  console.log(`${category}: $${total}`);
}

User Data Processing

const db = new BigJsonDB('users.jsonl.gz');

// Index for fast lookups
await db.createIndex('email');
await db.createIndex('userId');

// Find user by email
const user = await db.findOne({ email: '[email protected]' });

// Get active users in a region
const activeUsers = await db.find({
  status: 'active',
  'location.country': 'US',
  lastLogin: { $gte: '2024-01-01' }
});

File Format

BigJsonDB works with gzip-compressed JSONL (JSON Lines) files. Each line should be a valid JSON object:

{"id": 1, "name": "Alice", "age": 30, "city": "NYC"}
{"id": 2, "name": "Bob", "age": 25, "city": "LA"}
{"id": 3, "name": "Charlie", "age": 35, "city": "Chicago"}

Compress with gzip:

gzip data.jsonl

This creates data.jsonl.gz ready for BigJsonDB.

Limitations

  1. Write Operations - Currently read-only. For writes, decompress, modify, and re-compress.
  2. In-Memory Sorting - Sorting loads results into memory. Use indexes and limits for large sorts.
  3. Index Storage - Indexes are stored in memory. For very large datasets, index memory usage should be monitored.
  4. Compressed Seeking - Random access in compressed files requires full decompression up to the target point.

TypeScript Types

Full TypeScript definitions are included:

interface Query {
  [field: string]: any | {
    $eq?: any;
    $ne?: any;
    $gt?: any;
    $gte?: any;
    $lt?: any;
    $lte?: any;
    $in?: any[];
    $nin?: any[];
    $regex?: string | RegExp;
    $exists?: boolean;
  };
}

interface QueryOptions {
  skip?: number;
  limit?: number;
  sort?: { [field: string]: 'asc' | 'desc' };
  projection?: { [field: string]: 0 | 1 };
}

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Author

ale


Made with ❤️ for big data processing