@ev3lynx/md-analyzer
v0.2.1
Published
Markdown document analyzer for AI agents - extract metadata, headings, links, tables, tokens, and key points
Downloads
693
Maintainers
Readme
md-analyzer
Markdown document analyzer for AI agents - extract metadata, headings, links, tables, tokens, and key points from
.mdfiles.
md-analyzer is a lightweight, agent-ready document analysis tool designed for AI workflows. It provides single-shot document overviews, token budget tracking, and document relationship graphs — perfect for OpenCode, kiro-cli, or any AI agent framework.
Quick Overview
| Feature | Description |
|---------|-------------|
| Keypoints | Single-shot document overview (ideal for agents) |
| Token Tracking | Session-based token budget with /tmp/md-analyzer-session.json |
| Graph | Document relationship topology (backlinks, orphans) |
| Search | Keyword search with relevance ranking |
| Logs | Structured JSON logs in log/{sessionId}.json |
Why md-analyzer?
- Agent-native — Designed for AI agent workflows with single-shot outputs
- Token-safe — Built-in limits prevent context blowout (default: 20 results)
- Extensible — Simple TypeScript source, easy to extend for plugins
- Zero config — Runtime TOML parser, no config file required
Prerequisites
System Requirements
| Requirement | Version | Notes | |-------------|---------|-------| | Node.js | ≥ 18.0.0 | LTS recommended | | npm | ≥ 8.0.0 | Comes with Node.js |
Verify Installation
node --version # Should be >= 18.0.0
npm --version # Should be >= 8.0.0Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| micromark | ^4.0.0 | Markdown parsing |
| js-tiktoken | ^1.0.0 | GPT token counting |
Installation
From npm (recommended)
npm install -g @ev3lynx/md-analyzer
md-analyzer --helpFrom source
git clone https://github.com/Ev3lynx727/md-analyzer.git
cd md-analyzer
npm install
npm run buildQuick test
node md-analyzer.js . --keypoints --jsonUsage
Basic CLI
# With npx (no install required)
npx @ev3lynx/md-analyzer <directory> [options]
# After global install
npm install -g @ev3lynx/md-analyzer
md-analyzer <directory> [options]
# From source (development)
node md-analyzer.js <directory> [options]Options
| Flag | Description | Example |
|------|-------------|---------|
| --json | Output as JSON | --json |
| --search <kw> | Search keyword in content | --search "task" |
| --filter <k=v> | Filter by metadata field | --filter "category=guides" |
| --rank | Rank results by relevance | --search "task" --rank |
| --graph | Document relationship graph | --graph |
| --deps | Dependency graph (DAG order) | --deps |
| --orphans | Find unreferenced docs | --orphans |
| --backlinks <doc> | Find docs linking to <doc> | --backlinks "adr-2026-01" |
| --keypoints | Quick overview (single-shot) | --keypoints |
| --lint-fragments | Frontmatter health check | --lint-fragments |
| --session | Token budget report | --session |
| --budget <n> | Set token budget limit | --budget 50000 |
| --max-results <n> | Limit output | --max-results 10 |
Examples
# Quick overview (single-shot for agents)
npx @ev3lynx/md-analyzer /path/to/docs --keypoints --json
# Search with ranking
md-analyzer . --search "task lifecycle" --rank --json
# Find backlinks
md-analyzer . --backlinks adr-2026-04-01 --json
# Token budget tracking
md-analyzer . --session --budget 100000 --json
# Find orphans
md-analyzer . --orphans --json
# Lint frontmatter
md-analyzer . --lint-fragments --json--keypoints JSON output
[
{
"fileName": "adr-2026-04",
"title": "ADR-2026-04: Configuration Management",
"summary": {
"totalHeadings": 5,
"totalLinks": 3,
"totalWikilinks": 1,
"totalTokens": 1200,
"wordCount": 850
},
"keyHeadings": [
{ "level": 1, "text": "ADR-2026-04: Configuration Management", "line": 1, "tokens": 320 },
{ "level": 2, "text": "Status", "line": 4, "tokens": 85 },
{ "level": 2, "text": "Context", "line": 8, "tokens": 410 },
{ "level": 2, "text": "Decision", "line": 22, "tokens": 290 },
{ "level": 2, "text": "Consequences", "line": 35, "tokens": 95 }
],
"importantLinks": [
{ "text": "Memory FAQ", "url": "https://help.openai.com/..." }
],
"internalReferences": ["adr-2026-01", "adr-2026-02"],
"readingTime": "4 min"
}
]Agent hook integration (pre-read)
The --keypoints output powers a pre-read hook for OpenAI-compatible agents (opencode, kiro-cli). Before reading a file, the hook injects the structured overview so the LLM can decide what to read.
// opencode.json / kiro-cli hook config
{
"preToolUse": [
{
"matcher": { "tool_name": "read" },
"command": "uv run --project ~/.kiro/hooks python ~/.kiro/hooks/pre_read_md.py"
}
]
}See python/pre_read.py for the full hook implementation.
Calling from code
md-analyzer is a CLI tool — integrate via subprocess:
// Node.js
const { execSync } = require('child_process');
const result = execSync('md-analyzer ./docs --keypoints --json', { encoding: 'utf-8' });
const docs = JSON.parse(result);# Python
import subprocess, json
result = subprocess.run(["md-analyzer", "./docs", "--keypoints", "--json"],
capture_output=True, text=True)
docs = json.loads(result.stdout)For agent hooks, see the pre-read integration section above.
Configuration
hooks.toml
[tool.md-analyzer.config]
# Path configuration
default_directory = "/path/to/docs"
# Token budget configuration
default_budget = 100000
max_tokens = 200000
# Output safety (prevent token blowout)
max_results_default = 0Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| MD_ANALYZER_PATH | Path to md-analyzer.js | md-analyzer.js |
| MD_ANALYZER_DEFAULT_DIR | Default directory | . |
| MD_ANALYZER_MAX_TOKENS | Max token limit | 200000 |
| MD_ANALYZER_DEFAULT_BUDGET | Default budget | 100000 |
| MD_ANALYZER_MAX_RESULTS | Max results | 20 |
Priority Chain
CLI --max-results 3
↓
MD_ANALYZER_MAX_RESULTS=5
↓
max_results_default=0 (hooks.toml, default: no limit)
↓
0 (no limit — raw results)Session & Logging
Session File
Location: /tmp/md-analyzer-session.json
{
"sessionId": "session-1234567890",
"calls": 5,
"totalTokens": 1500,
"filesProcessed": 25,
"startTime": "2026-05-03T12:00:00.000Z"
}Run Logs
Location: {project}/log/{sessionId}.json
[
{
"timestamp": "2026-05-03T12:00:00.000Z",
"sessionId": "session-1234567890",
"directory": "/path/to/docs",
"flags": ["--keypoints", "--json"],
"filesFound": 10,
"filesProcessed": 10,
"tokensThisCall": 300,
"totalSessionTokens": 1500,
"errors": [],
"durationMs": 450,
"mode": "keypoints"
}
]Architecture
md-analyzer/
├── src/
│ └── md-analyzer.ts # TypeScript source
├── md-analyzer.js # Compiled output
├── md-analyzer.d.ts # Type declarations
├── hooks.toml # Configuration
├── log/ # Run logs
├── python/ # Agent hook integration
│ ├── pre_read.py
│ └── README.md
└── embedded-docs/ # Sample documents for testingKey Functions
| Function | Description |
|----------|-------------|
| extractFrontmatter() | YAML metadata extraction |
| extractHeadings() | Parse H1-H6 structure |
| extractLinks() | Internal/external link analysis |
| extractTables() | Markdown table parsing |
| scanMarkdownFiles() | Recursive directory scanner |
| buildGraph() | Document relationship topology |
| extractKeyPoints() | Single-shot overview |
| loadSession() / saveSession() | Token budget tracking |
Error Handling
| Error | Description |
|-------|-------------|
| permission_denied | Skip inaccessible directories |
| file_read_error | Return partial results |
| token_count_fallback | Use charCount/4 estimation |
License
MIT - See LICENSE file.
Links
- npm: https://npmjs.com/package/@ev3lynx/md-analyzer
- GitHub: https://github.com/Ev3lynx727/md-analyzer
- Issues: https://github.com/Ev3lynx727/md-analyzer/issues
