markdown-matters

v0.3.5

Published

19 hours ago

Token-efficient markdown analysis tool for LLM consumption

0High
0Medium
0Low

alphab.io

markdown llm ai analysis search embeddings

markdown-matters

Give LLMs exactly the markdown they need. Nothing more.

QUICK REFERENCE
  mdm init [options]              Initialize mdm in a directory
  mdm index [path] [options]      Index markdown files (add --embed for semantic search)
  mdm fix [path] [options]        Repair malformed YAML frontmatter
  mdm search <query> [options]    Search by meaning or structure
  mdm context <files...>          Get LLM-ready summary
  mdm tree [path]                 Show files or document outline
  mdm config <command>            Configuration management (init, show, check)
  mdm duplicates [path]           Find duplicate content
  mdm embeddings <command>        Manage embedding namespaces
  mdm links <file>                Outgoing links
  mdm backlinks <file>            Incoming links
  mdm stats [path]                Index statistics

Why?

Your documentation is 50K tokens of markdown. LLM context windows are limited. Raw markdown dumps waste tokens on structure, headers, and noise.

mdm extracts structure instead of dumping text. The result: 80%+ fewer tokens while preserving everything needed to understand your docs.

npm install -g markdown-matters
mdm index .                     # Index your docs
mdm fix .                       # Preview malformed frontmatter repairs
mdm search "authentication"     # Find by meaning
mdm context README.md           # Get LLM-ready summary

Installation

npm install -g markdown-matters

Requires Node.js 18+. Semantic search requires an embedding provider (OpenAI, Ollama, LM Studio, OpenRouter, or Voyage). See docs/CONFIG.md for provider setup.

Commands

init

Initialize mdm in a directory. Supports both local project setup and global shared indexing.

mdm init                        # Interactive setup (prompts for local or global)
mdm init --local                # Initialize locally (.mdm/ in current directory)
mdm init --global               # Initialize globally (~/.mdm/)
mdm init --yes                  # Accept all defaults without prompting

Local setup creates .mdm/ and .mdm.toml in your project. Global setup creates ~/.mdm/ with source registration for multi-project indexing.

Config resolution: Local .mdm.toml takes precedence over ~/.mdm/.mdm.toml, which falls back to built-in defaults.

index

Index markdown files for fast searching.

mdm index                       # Index current directory (prompts for semantic)
mdm index ./docs                # Index specific path
mdm index --embed               # Build embeddings for semantic search
mdm index --no-embed            # Skip the semantic search prompt
mdm index --watch               # Watch for changes and re-index automatically
mdm index --force               # Bypass cache, re-process all files
mdm index --all                 # Index all registered global sources from ~/.mdm/.mdm.toml
mdm index --exclude "*.draft.md,research/**"  # Exclude patterns (comma-separated)
mdm index --no-gitignore        # Ignore .gitignore file

By default, mdm respects .gitignore and .mdmignore patterns. Use --exclude to add CLI-level patterns (highest priority).

fix

Repair malformed YAML frontmatter in markdown files.

mdm fix                         # Preview repairs in current directory
mdm fix ./docs                  # Preview repairs in a directory
mdm fix docs/broken.md          # Preview repairs for one file
mdm fix ./docs --write          # Apply repairs
mdm fix ./docs --write --force  # Apply even when tracked files are modified
mdm fix ./docs --json           # Output structured report

By default, mdm fix is a dry run. It lists files that would change and shows line-level - and + diffs for repaired frontmatter lines. Use --write to apply repairs.

When writing, mdm skips tracked files with uncommitted modifications and prints skipped (uncommitted changes): <path>. Untracked files, clean tracked files, and files outside git repositories proceed normally. Use --force to bypass the dirty-file guard.

search

Search by meaning (semantic) or keyword (text match).

mdm search "how to authenticate"        # Semantic search (if embeddings exist)
mdm search -k "auth.*flow"              # Keyword search (text match)
mdm search -n 5 "setup"                 # Limit to 5 results
mdm search --threshold 0.25 "deploy"    # Lower threshold for more results

Similarity Threshold

Semantic search filters results by similarity score (0-1). Default: 0.35 (35%).

0 results? Content may exist below the threshold. Try --threshold 0.25
Typical scores: Single-word queries score ~30-40%, multi-word phrases ~50-70%
Higher threshold = stricter matching, fewer results
Lower threshold = more results, possibly less relevant

mdm search "authentication"              # Uses default 0.35 threshold
mdm search --threshold 0.25 "auth"       # Lower threshold for broad queries
mdm search --threshold 0.6 "specific"    # Higher threshold for precision

Context Lines

Show surrounding lines around matches (like grep):

mdm search "checkpoint" -C 3            # 3 lines before AND after each match
mdm search "error" -B 2 -A 5            # 2 lines before, 5 lines after

Auto-detection: Uses semantic search if embeddings exist and query looks like natural language. Use -k to force keyword search.

Advanced Search

Quality Modes - Control speed vs. accuracy tradeoff:

mdm search "query" --quality fast       # 40% faster, good recall
mdm search "query" -q thorough          # Best recall, 30% slower

Re-ranking - Boost precision by 20-35%:

mdm search "query" --rerank             # First use downloads 90MB model
npm install @huggingface/transformers         # Required dependency

HyDE - Better results for complex questions:

mdm search "how to implement auth" --hyde   # Expands query semantically

AI Summarization

Generate AI-powered summaries of search results:

mdm search "authentication" --summarize     # Get AI summary of results
mdm search "error handling" -s --yes        # Skip cost confirmation
mdm search "database" -s --stream           # Stream output in real-time

Uses your existing AI subscription (Claude Code, Copilot CLI) for free, or pay-per-use API providers. See AI Summarization for setup.

context

Get LLM-ready summaries from one or more files.

mdm context README.md                   # Single file
mdm context README.md docs/api.md       # Multiple files
mdm context docs/*.md                   # Glob patterns work
mdm context -t 500 README.md            # Token budget
mdm context --brief README.md           # Minimal output
mdm context --full README.md            # Include full content

Section Filtering

Extract specific sections instead of entire files:

mdm context doc.md --sections           # List available sections
mdm context doc.md --section "Setup"    # Extract by section name
mdm context doc.md --section "2.1"      # Extract by section number
mdm context doc.md --section "API*"     # Glob pattern matching
mdm context doc.md --section "Config" --shallow  # Top-level only (no nested subsections)

The --sections flag shows all sections with their numbers and token counts, helping you target exactly what you need.

tree

Show file structure or document outline.

mdm tree                        # List markdown files in current directory
mdm tree ./docs                 # List files in specific directory
mdm tree README.md              # Show document outline (heading hierarchy)

Auto-detection: Directory shows file list, file shows document outline.

links / backlinks

Analyze link relationships.

mdm links README.md             # What does this file link to?
mdm backlinks docs/api.md       # What files link to this?

stats

Show index statistics.

mdm stats                       # Current directory
mdm stats ./docs                # Specific path

duplicates

Detect duplicate content in markdown files.

mdm duplicates                  # Find duplicates in current directory
mdm duplicates docs/            # Find duplicates in specific directory
mdm duplicates --min-length 100 # Only flag sections over 100 characters
mdm duplicates -p "docs/**"     # Filter by path pattern

embeddings

Manage embedding providers and namespaces.

mdm embeddings list             # List all embedding namespaces
mdm embeddings current          # Show active namespace
mdm embeddings switch openai    # Switch to OpenAI embeddings
mdm embeddings remove ollama    # Remove Ollama embeddings
mdm embeddings remove openai -f # Force remove active namespace

Namespaces store embeddings separately by provider/model. Switching is instant without rebuild.

Workflows

Before Adding Context to LLM

mdm tree docs/                          # See what's available
mdm tree docs/api.md                    # Check document structure
mdm context -t 500 docs/api.md          # Get summary within token budget

Finding Documentation

mdm search "authentication"             # By meaning
mdm search -k "Setup|Install"           # By keyword pattern

Setting Up Semantic Search

mdm supports multiple embedding providers for semantic search:

OpenAI (default) - Cloud-based, requires API key
Ollama - Free, local, daemon-based
LM Studio - Free, local, GUI-based (development only)
OpenRouter - Multi-provider gateway
Voyage - Premium quality, competitive pricing

Quick start with OpenAI:

export OPENAI_API_KEY=sk-...
mdm index --embed                       # Build embeddings
mdm search "how to deploy"              # Now works semantically

Using Ollama (free, local):

ollama serve && ollama pull nomic-embed-text
mdm index --embed --provider ollama --provider-model nomic-embed-text

See docs/CONFIG.md for complete provider setup, comparison, and configuration options.

MCP Integration

For Claude Desktop, add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "mdm": {
      "command": "mdm-mcp",
      "args": []
    }
  }
}

For Claude Code, add to .claude/settings.json:

{
  "mcpServers": {
    "mdm": {
      "command": "mdm-mcp",
      "args": []
    }
  }
}

MCP Tools

| Tool | Description | |------|-------------| | md_search | Semantic search by meaning; returns relevant sections | | md_context | Token-compressed file summaries at brief, summary, or full detail | | md_structure | Heading hierarchy with token counts | | md_keyword_search | Structural search by heading, code, list, or table presence | | md_index | Build or rebuild the index | | md_links | Outgoing links from a file | | md_backlinks | Incoming links to a file |

Configuration

mdm supports a layered configuration system for persistent settings:

# Create a config file
mdm config init

# Check your configuration
mdm config check

# Customize settings in .mdm.toml

# .mdm.toml
[index]
maxDepth = 10
excludePatterns = ["node_modules", ".git", "dist", "build"]

[search]
defaultLimit = 20
minSimilarity = 0.35

Configuration precedence: CLI flags > Environment variables > Config file > Defaults

See docs/CONFIG.md for the complete configuration reference.

Index Location

Indexes are stored in .mdm/ in your project root:

.mdm/
  indexes/
    documents.json    # Document metadata
    sections.json     # Section index
    links.json        # Link graph
    vectors.bin       # Embeddings (if enabled)

Environment Variables

| Variable | Description | |----------|-------------| | OPENAI_API_KEY | Required for OpenAI semantic search (default provider) | | OPENROUTER_API_KEY | Required for OpenRouter semantic search | | MDM_* | Configuration overrides (see CONFIG.md) |

AI Summarization

Transform search results into actionable insights using AI.

Quick Start

# Basic usage (auto-detects installed CLI tools)
mdm search "authentication" --summarize

# Skip confirmation for scripts
mdm search "error handling" --summarize --yes

# Stream output in real-time
mdm search "database" --summarize --stream

First-Time Setup

On first use, mdm auto-detects available providers:

Using claude (subscription - FREE)

--- AI Summary ---

Based on the search results, here are the key findings...

Providers

CLI Providers (FREE with subscription):

| Provider | Command | Subscription Required | |----------|---------|----------------------| | Claude Code | claude | Claude Pro/Team | | GitHub Copilot | copilot | Copilot subscription | | OpenCode | opencode | BYOK (any provider) |

API Providers (pay-per-use):

| Provider | Cost per 1M tokens | Notes | |----------|-------------------|-------| | DeepSeek | $0.14-0.56 | Ultra-cheap | | Qwen | $0.03-0.12 | Budget option | | Google Gemini | $0.30-2.50 | Balanced | | OpenAI GPT | $1.75-14.00 | Premium | | Anthropic Claude | $3.00-15.00 | Premium |

Configuration

Option 1: Auto-detection (recommended)

Just run --summarize - mdm finds installed CLI tools automatically.

Option 2: Config file

# .mdm.toml
[aiSummarization]
mode = "cli"        # 'cli' (free) or 'api' (paid)
provider = "claude" # Provider name

Option 3: Environment variables

export MDM_AISUMMARIZATION_MODE=api
export MDM_AISUMMARIZATION_PROVIDER=deepseek
export DEEPSEEK_API_KEY=sk-...

CLI Flags

| Flag | Short | Description | |------|-------|-------------| | --summarize | -s | Enable AI summarization | | --yes | -y | Skip cost confirmation | | --stream | | Stream output in real-time |

Cost Transparency

API providers show cost estimates before proceeding:

Cost Estimate:
  Provider: deepseek
  Input tokens: ~2,500
  Output tokens: ~500
  Estimated cost: $0.0007

Continue with summarization? [Y/n]:

CLI providers show free status:

Using claude (subscription - FREE)

See docs/summarization.md for architecture details and troubleshooting.

Performance

| Metric | Raw Markdown | mdm | Savings | |--------|--------------|---------|---------| | Context for single doc | 2,500 tokens | 400 tokens | 84% | | Context for 10 docs | 25,000 tokens | 4,000 tokens | 84% | | Search latency | N/A | <100ms | - |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

markdown-matters

Why?

Installation

Commands

init

index

fix

search

Similarity Threshold

Context Lines

Advanced Search

AI Summarization

context

Section Filtering

tree

links / backlinks

stats

duplicates

embeddings

Workflows

Before Adding Context to LLM

Finding Documentation

Setting Up Semantic Search

MCP Integration

MCP Tools

Configuration

Index Location

Environment Variables

AI Summarization

Quick Start

First-Time Setup

Providers

Configuration

CLI Flags

Cost Transparency

Performance

License