jsonl-explorer-mcp
v1.0.0
Published
MCP server for analyzing JSONL files with streaming, statistics, search, validation, and live tailing
Maintainers
Readme
JSONL Explorer MCP
A Model Context Protocol (MCP) server for analyzing JSONL (JSON Lines) files. Designed for local development workflows with files ranging from 1MB to 1GB.
Why JSONL Explorer?
Working with large JSONL files in development can be challenging:
- Log files grow too large to open in editors
- Data exports need exploration before processing
- Event streams require real-time monitoring
- Schema drift happens silently across records
JSONL Explorer solves these problems by providing streaming analysis tools that work efficiently with large files while integrating seamlessly with AI assistants via MCP.
Features
| Feature | Description | |---------|-------------| | Streaming Architecture | Process files of any size without loading into memory | | Schema Inference | Automatically detect and track schema across records | | Statistical Analysis | Field-level stats including distributions, percentiles, cardinality | | Flexible Querying | Simple comparisons, regex, JSONPath, and compound queries | | JSON Schema Validation | Validate syntax and structure against schemas | | Live File Tailing | Monitor actively-written files with cursor-based tracking | | File Comparison | Diff two JSONL files by key field |
Quick Start
Installation
npm install -g jsonl-explorer-mcpOr run directly with npx:
npx jsonl-explorer-mcpMCP Client Configuration
Claude Desktop
Add to ~/.config/claude/claude_desktop_config.json (Linux/macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"jsonl-explorer": {
"command": "npx",
"args": ["jsonl-explorer-mcp"]
}
}
}Claude Code
Add to your project's .mcp.json:
{
"mcpServers": {
"jsonl-explorer": {
"command": "npx",
"args": ["jsonl-explorer-mcp"]
}
}
}Transport Modes
Stdio Mode (default) - For MCP clients that communicate via stdin/stdout:
jsonl-explorer-mcpHTTP Mode - For web-based integrations:
jsonl-explorer-mcp --http --port=3000Tools Reference
jsonl_inspect
Get a comprehensive overview of a JSONL file including size, record count, inferred schema, and field statistics.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |
| sampleSize | number | 100 | Records to sample for schema inference |
Example Response:
{
"file": "/data/events.jsonl",
"size": "156.2 MB",
"lineCount": 1248392,
"validRecords": 1248392,
"malformedLines": 0,
"schema": {
"type": "object",
"fields": [
{ "name": "id", "types": ["string"], "nullable": false },
{ "name": "timestamp", "types": ["string"], "nullable": false },
{ "name": "event_type", "types": ["string"], "nullable": false },
{ "name": "payload", "types": ["object"], "nullable": true }
]
}
}jsonl_sample
Retrieve sample records using various sampling strategies.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |
| count | number | 10 | Number of records to sample |
| mode | string | "first" | Sampling mode: first, last, random, range |
| rangeStart | number | - | Start line for range mode (1-indexed) |
| rangeEnd | number | - | End line for range mode (1-indexed) |
Sampling Modes:
first- First N records (fast, streaming)last- Last N records (requires file scan)random- Random sample using reservoir samplingrange- Specific line range
jsonl_schema
Infer the schema of records by sampling.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |
| sampleSize | number | 1000 | Records to sample |
| outputFormat | string | "inferred" | Format: inferred, json-schema, formatted |
Output Formats:
inferred- Internal schema representation with type frequenciesjson-schema- Standard JSON Schema (draft-07)formatted- Human-readable summary
jsonl_stats
Collect aggregate statistics for fields.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |
| fields | string[] | all | Specific fields to analyze |
| maxRecords | number | all | Maximum records to analyze |
Statistics Provided:
- Numeric fields: min, max, mean, median, stdDev, percentiles (p50, p90, p95, p99)
- String fields: minLength, maxLength, avgLength, cardinality, value distribution
- Boolean fields: true/false counts and percentages
- All fields: null count, unique count
jsonl_search
Search for records where a field matches a regex pattern.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |
| field | string | required | Field path (supports dot notation) |
| pattern | string | required | Regex pattern to match |
| caseSensitive | boolean | false | Case-sensitive matching |
| maxResults | number | 100 | Maximum results to return |
| returnFields | string[] | all | Fields to include in results |
Example:
{
"file": "/data/logs.jsonl",
"field": "message",
"pattern": "error|failed|exception",
"caseSensitive": false,
"maxResults": 50
}jsonl_filter
Filter records using powerful query expressions.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |
| query | string | required | Query expression |
| outputFormat | string | "records" | Output: records, count, lines |
| limit | number | 1000 | Maximum results |
Query Syntax:
| Type | Example | Description |
|------|---------|-------------|
| Equality | status == "active" | Exact match |
| Comparison | age > 30 | Numeric comparison (>, >=, <, <=, !=) |
| Regex | email =~ "@gmail\\.com$" | Pattern matching |
| Null check | deleted_at == null | Check for null values |
| JSONPath | $[?(@.price < 100)] | Full JSONPath expressions |
| Compound | status == "active" AND age > 30 | Combine with AND/OR |
Examples:
// Find active premium users
"subscription == \"premium\" AND active == true"
// Find orders over $100
"total > 100"
// Find emails from specific domain
"email =~ \"@company\\.com$\""
// Complex JSONPath
"$[?(@.items[*].quantity > 10)]"jsonl_validate
Validate file syntax and optionally against a JSON Schema.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |
| schema | object/string | - | JSON Schema (inline or file path) |
| stopOnFirstError | boolean | false | Stop on first error |
| maxErrors | number | 100 | Maximum errors to report |
Response:
{
"valid": false,
"totalRecords": 10000,
"validRecords": 9987,
"invalidRecords": 13,
"errors": [
{
"line": 1523,
"error": "must have required property 'user_id'",
"path": "/user_id"
}
]
}jsonl_tail
Monitor actively-written files for new records using cursor-based tracking.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file | string | required | Absolute path to the JSONL file |
| cursor | number | 0 | Byte position to start from |
| maxRecords | number | 100 | Maximum records to return |
| timeout | number | 0 | Wait time for new content (ms) |
Usage Pattern:
// Initial call - start from beginning
{ "file": "/var/log/app.jsonl", "cursor": 0 }
// Response: { records: [...], newCursor: 15234, hasMore: false }
// Subsequent calls - continue from cursor
{ "file": "/var/log/app.jsonl", "cursor": 15234, "timeout": 5000 }
// Waits up to 5s for new contentjsonl_diff
Compare two JSONL files and report differences.
Parameters:
| Name | Type | Default | Description |
|------|------|---------|-------------|
| file1 | string | required | Path to first file |
| file2 | string | required | Path to second file |
| keyField | string | - | Field to use as unique key for matching |
| compareFields | string[] | all | Specific fields to compare |
| maxDiffs | number | 100 | Maximum differences to report |
Diff Types:
added- Record exists only in file2removed- Record exists only in file1modified- Record exists in both but differs
Use Cases
Exploring Log Files
"Inspect the application log file at /var/log/app.jsonl and show me
the schema and any error messages from the last hour"Data Quality Analysis
"Validate /data/export.jsonl against this schema and show me
statistics on the user_id field to check for duplicates"Real-time Monitoring
"Tail the events file and alert me when you see any records
with event_type containing 'error'"Comparing Exports
"Diff these two data exports using 'id' as the key field
and show me what changed"Architecture
See ARCHITECTURE.md for detailed technical documentation including:
- Streaming parser design
- Schema inference algorithm
- Statistics collection with Welford's algorithm
- Query engine implementation
- Memory efficiency strategies
Development
Prerequisites
- Node.js >= 18
- npm >= 9
Setup
# Clone the repository
git clone https://github.com/YOUR_USERNAME/jsonl-explorer-mcp.git
cd jsonl-explorer-mcp
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm testScripts
| Command | Description |
|---------|-------------|
| npm run build | Compile TypeScript to JavaScript |
| npm run dev | Run with auto-reload (development) |
| npm run start | Run compiled server (stdio mode) |
| npm run start:http | Run compiled server (HTTP mode) |
| npm test | Run test suite in watch mode |
| npm run test:run | Run tests once |
| npm run typecheck | Type-check without emitting |
Project Structure
src/
├── index.ts # Entry point, transport setup
├── server.ts # MCP server configuration
├── core/ # Core processing modules
│ ├── streaming-parser.ts # Line-by-line JSONL processing
│ ├── schema-inferrer.ts # Schema detection
│ ├── statistics.ts # Stats collection
│ ├── query-engine.ts # Query parsing/execution
│ └── file-tailer.ts # Cursor-based tailing
├── tools/ # MCP tool implementations
│ ├── inspect.ts
│ ├── sample.ts
│ ├── schema.ts
│ ├── stats.ts
│ ├── search.ts
│ ├── filter.ts
│ ├── validate.ts
│ ├── tail.ts
│ └── diff.ts
└── utils/ # Shared utilities
├── format.ts
├── file-info.ts
└── types.tsPerformance
Designed for efficiency with large files:
| File Size | Records | Inspect Time | Memory | |-----------|---------|--------------|--------| | 10 MB | 50,000 | ~0.5s | ~20 MB | | 100 MB | 500,000 | ~3s | ~25 MB | | 1 GB | 5,000,000 | ~25s | ~30 MB |
Memory usage stays constant regardless of file size due to streaming architecture.
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
MIT - see LICENSE for details.
Related Projects
- Model Context Protocol - The protocol this server implements
- MCP Servers - Official MCP server implementations
