markdown-docs-mcp
v0.1.5
Published
MCP server for efficient navigation of large markdown documents (datasheets, standards, manuals)
Maintainers
Readme
markdown-docs-mcp
MCP server for efficient navigation of large markdown documents — datasheets, IEC/ISO standards, reference manuals.
Lets agents read what they need from a 100 000+ line markdown file without dumping the whole thing into context.
Tools
view_toc— get document structure (TOC with line ranges, sizes, anomaly hints; auto-trims to fit a configurable byte cap).read_section— fetch a single section by opaque id; raw and logical reading modes; comment stripping; continuation viafrom_line.search— literal or regex search across titles and content with section context.analyze_document— diagnostic report on structural anomalies (self-nesting, level jumps, PDF-conversion artifacts, etc.).
Installation
Claude Code plugin
Run inside a Claude Code session:
/plugin marketplace add hacker-cb/markdown-docs-mcp
/plugin install markdown-docs@hacker-cbThe plugin installs into user scope by default. To install for a team-shared
project instead (writes .claude/settings.json in the project), pass
--scope project to the second command:
/plugin install markdown-docs@hacker-cb --scope projectDirect MCP config (any MCP-compatible client)
Add to ~/.claude/settings.json, project .mcp.json, Cursor settings, Continue config, etc.:
{
"mcpServers": {
"markdown-docs": {
"command": "npx",
"args": ["-y", "markdown-docs-mcp@latest"]
}
}
}Example prompt
Once installed, point the agent at a large markdown file and ask anything:
Explain me peripherals of @tests/fixtures/public/esp32-p4-trm.mdThe agent will call view_toc to skim the 143k-line manual, drill into
relevant sections via read_section, and answer from a focused subset —
without ever loading the full file into its context.
The @<path> syntax is Claude Code's file-attachment shortcut. In Cursor /
Continue / other MCP clients, pass the absolute or workspace-relative path as
plain text instead.
Configuration
Two optional env vars override response caps (see src/config.ts):
| Variable | Default | Ceiling | Purpose |
| ----------------------------------- | -------- | ------- | ---------------------------------------- |
| MARKDOWN_DOCS_MAX_TOC_BYTES | 51 200 | 500 000 | Cap on view_toc response payload |
| MARKDOWN_DOCS_MAX_SECTION_BYTES | 204 800 | 500 000 | Cap on read_section.content |
Invalid values warn to stderr and fall back to default; the server does not refuse to start.
Known limitations
Issues identified during the pre-release code review but deliberately deferred — they have not surfaced on real workloads so far, and fixing them prematurely would add complexity for no current benefit. Revisit when observed in practice:
- Regex DoS surface in
search. A user-supplied regex (regex: true) is evaluated against every body line of the indexed document (up to 143k lines / 5 MB on the largest fixture). A pathological pattern with nested quantifiers ((a+)+$and similar) can hang the server thread with no way for the MCP client to cancel it. Mitigation when needed: depend onre2(linear-time engine) or wrap.execin asetImmediate-paced loop with a wall-clock budget. - Concurrent
getOrBuildrace in the LRU cache. Two near-simultaneous tool calls on a freshly-evicted (or fresh) file both miss, both runbuildIndexend-to-end (90–120 s on the largest fixture), and the second result overwrites the first. Stdio MCP transports are typically single-flight per session so this is mostly theoretical today, but parallel-aware clients (Promise.all([...])) can hit it. Fix: store the in-flightPromise<Index>in aMap<string, Promise<Index>>keyed by the resolved path. stripCommentsisO(lines × ranges). For each comment range the function builds a per-lineSetand probes it line-by-line. On a section containing thousands of<!-- PDF_PAGE_BEGIN n -->markers this becomes the dominant cost ofread_section. Today it stays cheap because the section byte cap keeps the inner loop small; ifMARKDOWN_DOCS_MAX_SECTION_BYTESis raised toward the 500 KB ceiling it will degrade. Fix: sort ranges once at index-build time and use a two-pointer walk, or precompute a boolean line-flag array.
Not in scope (MVP)
Deliberate omissions — out of scope for the current MCP, not bugs:
- Semantic search (embeddings).
searchis grep-like by design. - Table parsing into JSON / structured form.
- Image parsing / OCR.
- On-disk index cache (in-memory LRU is enough for the realistic working set).
- A
numbering_filterparameter onview_toc— add it if a real workflow needs it. - Performance benchmarks as a CI gate (smoke tests only).
- Wrappers for other platforms (Cursor extension, Copilot CLI plugin, Gemini CLI extension, etc.). The MCP server already works with any MCP-compatible client over stdio; a platform-specific wrapper is separate work, done on demand.
Development
pnpm install
pnpm test
pnpm buildReleasing
Versions live in four places that must stay in lockstep: package.json, .claude-plugin/plugin.json, .claude-plugin/marketplace.json (plugins[0].version), and .mcp.json (args[1]). scripts/release.mjs bumps them atomically and creates a release: vX.Y.Z commit plus a vX.Y.Z tag. From master:
pnpm release --dry-run 0.2.0 # preview diff
pnpm release 0.2.0 # apply + commit + tag
git push --follow-tags origin masterFlag forwarding note: pnpm 10 (this repo's pinned version) forwards unknown flags like
--dry-runto the script. If you use a different package manager that intercepts the flag, insert--to disambiguate (pnpm release -- --dry-run 0.2.0) or invoke the script directly (node scripts/release.mjs --dry-run 0.2.0).
The release.yml workflow runs on v* tags: full test matrix → build → npm publish --provenance --access public via npm Trusted Publisher (OIDC, no NPM_TOKEN needed) → GitHub Release with auto-generated notes. Trusted Publisher must be configured once on npmjs.com under the maintainer account.
License
MIT — see LICENSE.
