@orkestrel/llms-txt
v1.1.0
Published
Dependency-free Node toolkit to aggregate Markdown docs into LLM-friendly .txt outputs with link rewriting and optional validation.
Maintainers
Readme
@orkestrel/llms-txt
Aggregate Markdown docs into LLM‑friendly .txt files with deterministic transforms, link rewriting, and optional validation.
- Dependency‑free, ESM‑only, Node 18+
- Deterministic outputs for reliable ingestion and CI
- Two formats: compact plain text (llms.txt) and richer full text (llms-full.txt)
- Link rewriting with optional extension trimming and base URL resolution
- Optional HTTP/local link validation with concurrency and timeouts
- Per-link validation progress: programmatic callback and a simple CLI progress line
Repository: https://github.com/orkestrel/llms-txt
Contents
- Install
- Requirements
- Why this library
- Quick start
- Outputs and modes
- Transform rules
- Rewriting
- Validation
- Whitespace and aggregation
- TypeScript and build
- Testing
- Guides
- Contributing
- License
Install
npm i -D @orkestrel/llms-txtRequirements
- Node 18+
- ESM‑only (package.json "type": "module")
Why this library
- Deterministic, low‑friction conversion of Markdown/MDX to .txt for LLM ingestion
- Stable, predictable transforms that preserve code blocks and make links explicit
- Optional link validation (HTTP/local) to catch broken docs in CI
- Zero runtime dependencies, fast enough for large doc sets
Quick start
CLI
# Both files with base URL (writes into docs by default)
npx llms-txt all --docs docs --base-url https://example.com/docs
# Only compact plain output to a custom folder
npx llms-txt plain --docs docs --out dist --base-url https://example.com/docs
# Only richer full output to a custom folder
npx llms-txt full --docs docs --out dist --base-url https://example.com/docs
# Validate links and fail fast (prints a simple progress line)
npx llms-txt all --docs docs --base-url https://example.com/docs --validate-links --fail-fastAPI
import { generateAll, generateLlm, generateLlmFull } from '@orkestrel/llms-txt'
await generateAll({
docsDir: 'docs',
// outDir optional; defaults to docsDir
baseUrl: 'https://example.com/docs',
includeExtensions: ['.md', '.mdx'],
excludeSubstrings: [],
validateLinks: false,
failFast: false,
concurrency: 8,
timeoutMs: 10_000,
})
// Per-link validation callbacks
await generateLlm({
docsDir: 'docs',
// outDir optional; defaults to docsDir
validateLinks: true,
onValidateProgress: (e) => {
// e.link, e.total, e.validated, e.broken
},
onValidateValid: (link) => {
// link.url was confirmed reachable/existing
},
onValidateBroken: (link) => {
// link.url failed validation
},
})
// Or one output at a time
await generateLlm({ docsDir: 'docs', baseUrl: 'https://example.com/docs' })
await generateLlmFull({ docsDir: 'docs', baseUrl: 'https://example.com/docs' })Outputs and modes
- llms.txt (plain)
- Table of contents: organized directory structure with document links
- Project metadata (title, description) followed by sections
- Each section lists documents as clickable links
- Minimal token count for navigation and discovery
- Format: H1 title, blockquote description, H2 sections with bullet list of document links
- llms-full.txt (full)
- Complete documentation: includes full transformed content of each document
- Project metadata followed by sections containing document content
- Each document includes title, source link, and full text
- Links and images are textualized to make destinations explicit
- Preserves headings, lists, emphasis, paragraphs, and code blocks
Transform rules
- Frontmatter: leading YAML blocks (
--- ... ---) are removed - llms.txt (table of contents):
- Generates project metadata (H1 title, blockquote description)
- Organizes documents into sections (H2 headings) based on directory structure
- Each document appears as a list item with link to its source
- Document titles are extracted from first H1 heading in each file
- llms-full.txt (complete content):
- Includes project metadata and section organization
- For each document: H3 title, source link, and full transformed content
- Preserves all markdown structure: headings, paragraphs, lists, blockquotes, code blocks
- Keep markdown markers (headings, lists, emphasis) intact
- Content transformation (applies to llms-full.txt):
- Links:
[text](url)→text (url) - Images:
→alt (image: url) - Reference definitions are resolved; autolinks
<https://...>are unwrapped
- Links:
See Guides: Transform details and Formats.
Rewriting
- baseUrl: relative links/images are resolved against the source file path, then made absolute via
baseUrl - Extension trimming: by default trims
.md/.mdx(configurable); keep extensions with--keep-extensions - Anchors and query strings are preserved
Examples
start.md → https://example.com/docs/guide/start
./start.md → https://example.com/docs/guide/start
../index.md → https://example.com/docs/index
/api/users.md → https://example.com/api/usersSee Guides: Rewriting.
Validation
Optionally verify link destinations.
- HTTP(S): HEAD with GET fallback; success on 2xx/3xx; timeouts configurable
- Local: existence checks against
docsDir - Controls:
validateLinks,failFast,concurrency,timeoutMs - Outputs:
checkedLinks,brokenLinkswith source path and position when available - Callbacks:
onValidateProgress(e)— per link (valid or broken)onValidateValid(link)— for each valid linkonValidateBroken(link)— for each broken link
- CLI also renders a simple progress line when
--validate-linksis used
CLI
npx llms-txt all --docs docs --validate-links --concurrency 8 --timeout-ms 15000See Guides: Validation.
Whitespace and aggregation
- Project structure: H1 title, blockquote description, H2 sections
- llms.txt: each document listed as
- [Title](url)or- [Title](url): description - llms-full.txt: each document prefixed with
### Title,Source: url, followed by full content - Normalization: CRLF→LF, trim trailing spaces, collapse 3+ blank lines to 2, ensure trailing newline
- Determinism: traversal order defines emission order; use naming to influence sequence if needed
TypeScript and build
Recommended TS config (excerpt):
{
"compilerOptions": {
"target": "ES2022",
"module": "ES2022",
"moduleResolution": "Bundler",
"lib": ["ES2022"],
"strict": true
}
}ESM‑only usage in Node:
node --version # 18+ recommendedTesting
- Vitest for tests and assertions
- Golden outputs for both modes and CLI paths
Run:
npm testGuides
- Overview
- Start
- Concepts
- Formats
- Transform details
- Rewriting
- Validation
- Examples
- Tips
- Tests
- Ecosystem
- Contribute
- FAQ
Contributing
We value determinism, strict typing, and small, composable APIs. See Contribute for principles and workflow. For issues and feature requests, visit https://github.com/orkestrel/llms-txt/issues.
License
MIT © Orkestrel
