jsonl-explorer-mcp

v1.0.0

Published

10 days ago

MCP server for analyzing JSONL files with streaming, statistics, search, validation, and live tailing

0High
0Medium
0Low

timon.orawski

mcp jsonl json-lines analysis streaming validation

JSONL Explorer MCP

A Model Context Protocol (MCP) server for analyzing JSONL (JSON Lines) files. Designed for local development workflows with files ranging from 1MB to 1GB.

Why JSONL Explorer?

Working with large JSONL files in development can be challenging:

Log files grow too large to open in editors
Data exports need exploration before processing
Event streams require real-time monitoring
Schema drift happens silently across records

JSONL Explorer solves these problems by providing streaming analysis tools that work efficiently with large files while integrating seamlessly with AI assistants via MCP.

Features

| Feature | Description | |---------|-------------| | Streaming Architecture | Process files of any size without loading into memory | | Schema Inference | Automatically detect and track schema across records | | Statistical Analysis | Field-level stats including distributions, percentiles, cardinality | | Flexible Querying | Simple comparisons, regex, JSONPath, and compound queries | | JSON Schema Validation | Validate syntax and structure against schemas | | Live File Tailing | Monitor actively-written files with cursor-based tracking | | File Comparison | Diff two JSONL files by key field |

Quick Start

Installation

npm install -g jsonl-explorer-mcp

Or run directly with npx:

npx jsonl-explorer-mcp

MCP Client Configuration

Claude Desktop

Add to ~/.config/claude/claude_desktop_config.json (Linux/macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "jsonl-explorer": {
      "command": "npx",
      "args": ["jsonl-explorer-mcp"]
    }
  }
}

Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "jsonl-explorer": {
      "command": "npx",
      "args": ["jsonl-explorer-mcp"]
    }
  }
}

Transport Modes

Stdio Mode (default) - For MCP clients that communicate via stdin/stdout:

jsonl-explorer-mcp

HTTP Mode - For web-based integrations:

jsonl-explorer-mcp --http --port=3000

Tools Reference

`jsonl_inspect`

Get a comprehensive overview of a JSONL file including size, record count, inferred schema, and field statistics.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| | file | string | required | Absolute path to the JSONL file | | sampleSize | number | 100 | Records to sample for schema inference |

Example Response:

{
  "file": "/data/events.jsonl",
  "size": "156.2 MB",
  "lineCount": 1248392,
  "validRecords": 1248392,
  "malformedLines": 0,
  "schema": {
    "type": "object",
    "fields": [
      { "name": "id", "types": ["string"], "nullable": false },
      { "name": "timestamp", "types": ["string"], "nullable": false },
      { "name": "event_type", "types": ["string"], "nullable": false },
      { "name": "payload", "types": ["object"], "nullable": true }
    ]
  }
}

`jsonl_sample`

Retrieve sample records using various sampling strategies.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| | file | string | required | Absolute path to the JSONL file | | count | number | 10 | Number of records to sample | | mode | string | "first" | Sampling mode: first, last, random, range | | rangeStart | number | - | Start line for range mode (1-indexed) | | rangeEnd | number | - | End line for range mode (1-indexed) |

Sampling Modes:

first - First N records (fast, streaming)
last - Last N records (requires file scan)
random - Random sample using reservoir sampling
range - Specific line range

`jsonl_schema`

Infer the schema of records by sampling.

Output Formats:

inferred - Internal schema representation with type frequencies
json-schema - Standard JSON Schema (draft-07)
formatted - Human-readable summary

`jsonl_stats`

Collect aggregate statistics for fields.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| | file | string | required | Absolute path to the JSONL file | | fields | string[] | all | Specific fields to analyze | | maxRecords | number | all | Maximum records to analyze |

Statistics Provided:

Numeric fields: min, max, mean, median, stdDev, percentiles (p50, p90, p95, p99)
String fields: minLength, maxLength, avgLength, cardinality, value distribution
Boolean fields: true/false counts and percentages
All fields: null count, unique count

`jsonl_search`

Search for records where a field matches a regex pattern.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| | file | string | required | Absolute path to the JSONL file | | field | string | required | Field path (supports dot notation) | | pattern | string | required | Regex pattern to match | | caseSensitive | boolean | false | Case-sensitive matching | | maxResults | number | 100 | Maximum results to return | | returnFields | string[] | all | Fields to include in results |

Example:

{
  "file": "/data/logs.jsonl",
  "field": "message",
  "pattern": "error|failed|exception",
  "caseSensitive": false,
  "maxResults": 50
}

`jsonl_filter`

Filter records using powerful query expressions.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| | file | string | required | Absolute path to the JSONL file | | query | string | required | Query expression | | outputFormat | string | "records" | Output: records, count, lines | | limit | number | 1000 | Maximum results |

Query Syntax:

| Type | Example | Description | |------|---------|-------------| | Equality | status == "active" | Exact match | | Comparison | age > 30 | Numeric comparison (>, >=, <, <=, !=) | | Regex | email =~ "@gmail\\.com$" | Pattern matching | | Null check | deleted_at == null | Check for null values | | JSONPath | $[?(@.price < 100)] | Full JSONPath expressions | | Compound | status == "active" AND age > 30 | Combine with AND/OR |

Examples:

// Find active premium users
"subscription == \"premium\" AND active == true"

// Find orders over $100
"total > 100"

// Find emails from specific domain
"email =~ \"@company\\.com$\""

// Complex JSONPath
"$[?(@.items[*].quantity > 10)]"

`jsonl_validate`

Validate file syntax and optionally against a JSON Schema.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| | file | string | required | Absolute path to the JSONL file | | schema | object/string | - | JSON Schema (inline or file path) | | stopOnFirstError | boolean | false | Stop on first error | | maxErrors | number | 100 | Maximum errors to report |

Response:

{
  "valid": false,
  "totalRecords": 10000,
  "validRecords": 9987,
  "invalidRecords": 13,
  "errors": [
    {
      "line": 1523,
      "error": "must have required property 'user_id'",
      "path": "/user_id"
    }
  ]
}

`jsonl_tail`

Monitor actively-written files for new records using cursor-based tracking.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| | file | string | required | Absolute path to the JSONL file | | cursor | number | 0 | Byte position to start from | | maxRecords | number | 100 | Maximum records to return | | timeout | number | 0 | Wait time for new content (ms) |

Usage Pattern:

// Initial call - start from beginning
{ "file": "/var/log/app.jsonl", "cursor": 0 }
// Response: { records: [...], newCursor: 15234, hasMore: false }

// Subsequent calls - continue from cursor
{ "file": "/var/log/app.jsonl", "cursor": 15234, "timeout": 5000 }
// Waits up to 5s for new content

`jsonl_diff`

Compare two JSONL files and report differences.

Parameters: | Name | Type | Default | Description | |------|------|---------|-------------| | file1 | string | required | Path to first file | | file2 | string | required | Path to second file | | keyField | string | - | Field to use as unique key for matching | | compareFields | string[] | all | Specific fields to compare | | maxDiffs | number | 100 | Maximum differences to report |

Diff Types:

added - Record exists only in file2
removed - Record exists only in file1
modified - Record exists in both but differs

Use Cases

Exploring Log Files

"Inspect the application log file at /var/log/app.jsonl and show me
the schema and any error messages from the last hour"

Data Quality Analysis

"Validate /data/export.jsonl against this schema and show me
statistics on the user_id field to check for duplicates"

Real-time Monitoring

"Tail the events file and alert me when you see any records
with event_type containing 'error'"

Comparing Exports

"Diff these two data exports using 'id' as the key field
and show me what changed"

Architecture

See ARCHITECTURE.md for detailed technical documentation including:

Streaming parser design
Schema inference algorithm
Statistics collection with Welford's algorithm
Query engine implementation
Memory efficiency strategies

Development

Prerequisites

Node.js >= 18
npm >= 9

Setup

# Clone the repository
git clone https://github.com/YOUR_USERNAME/jsonl-explorer-mcp.git
cd jsonl-explorer-mcp

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

Scripts

| Command | Description | |---------|-------------| | npm run build | Compile TypeScript to JavaScript | | npm run dev | Run with auto-reload (development) | | npm run start | Run compiled server (stdio mode) | | npm run start:http | Run compiled server (HTTP mode) | | npm test | Run test suite in watch mode | | npm run test:run | Run tests once | | npm run typecheck | Type-check without emitting |

Project Structure

src/
├── index.ts              # Entry point, transport setup
├── server.ts             # MCP server configuration
├── core/                 # Core processing modules
│   ├── streaming-parser.ts   # Line-by-line JSONL processing
│   ├── schema-inferrer.ts    # Schema detection
│   ├── statistics.ts         # Stats collection
│   ├── query-engine.ts       # Query parsing/execution
│   └── file-tailer.ts        # Cursor-based tailing
├── tools/                # MCP tool implementations
│   ├── inspect.ts
│   ├── sample.ts
│   ├── schema.ts
│   ├── stats.ts
│   ├── search.ts
│   ├── filter.ts
│   ├── validate.ts
│   ├── tail.ts
│   └── diff.ts
└── utils/                # Shared utilities
    ├── format.ts
    ├── file-info.ts
    └── types.ts

Performance

Designed for efficiency with large files:

| File Size | Records | Inspect Time | Memory | |-----------|---------|--------------|--------| | 10 MB | 50,000 | ~0.5s | ~20 MB | | 100 MB | 500,000 | ~3s | ~25 MB | | 1 GB | 5,000,000 | ~25s | ~30 MB |

Memory usage stays constant regardless of file size due to streaming architecture.

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT - see LICENSE for details.

Related Projects

Model Context Protocol - The protocol this server implements
MCP Servers - Official MCP server implementations