llms-furl
v0.1.3
Published
Split llms-full.txt into individual markdown files
Maintainers
Readme
llms-furl
Furl llms-full.txt (or llms.txt link lists) into a structured tree of small, searchable files you can assemble into LLM context with standard Unix tools.
├── api/
│ ├── auth.md
┌──────────────────┐ │ └── rate-limits.md
│ llms-full.txt │ ─ npx llms-furl ─▶ ├── concepts/
│ (400KB blob) │ │ ├── context.md
└──────────────────┘ │ └── rag.md
└── examples/
└── sdk.mdfurl /fɜːrl/ — to roll up; to make compact. A play on "full."
No vectors, no tools. Just files and bash.
Why filesystem-based context?
"The primary lesson from the actually successful agents so far is the return to Unix fundamentals: file systems, shells, processes & CLIs. Don't fight the models, embrace the abstractions they're tuned for. Bash is all you need." — @rauch
LLM agents perform well with Unix-style workflows like
find,grep,jq, and pipes. Rather than stuffing everything into the prompt, you can keep large context local in a filesystem and let agents retrieve smaller slices on demand — this is filesystem-based context retrieval.
llms-furl turns context selection into a Unix problem, not a prompt-engineering problem.
Install
Requirements: Node.js >= 20.
npm install -g llms-furl
# or one-off
npx llms-furl --helpQuickstart
llms-furl https://vercel.com/docs/llms-full.txt
tree -L 3 llms-furl/vercel.comllms-furl/
└── vercel.com
├── index.json
├── api/
│ ├── auth.md
│ ├── files.md
│ └── rate-limits.md
├── concepts/
│ ├── context.md
│ ├── rag.md
│ └── tasks.md
└── examples/
├── file-upload.md
└── sdk.mdEach file is a leaf — a small, self-contained piece of the original document, split along its natural section boundaries.
Now you can use standard Unix tools to build exactly the context you need.
# Find anything related to rate limits
rg "rate" llms-furl/vercel.com/docs
# Collect all API-related docs
fd . llms-furl/vercel.com/api | xargs cat
# Build a context for "file upload"
rg -l "file upload" llms-furl/vercel.com | xargs cat > context.txtPipe that directly into your LLM:
cat context.txt | llm "Summarize how file uploads work in this API"Usage
llms-furl <input> [output-dir]
llms-furl split <input> [output-dir]
llms-furl list [output-dir]
llms-furl remove <files...>
llms-furl rm <files...>
llms-furl clean [output-dir]Options:
--debug,-dshow split diagnostics and flattening info--help,-hshow help--version,-vshow version
Output layout
- URL input defaults to
llms-furl/<host>(for example,llms-furl/vercel.com). - File input defaults to the current directory unless
output-diris given. - Output file paths are derived from each page URL (strip leading/trailing slashes and
.md/.html). - If all pages share one top-level directory (for example,
docs/), llms-furl flattens it for URL inputs (shown in--debug). index.jsonis written alongside the output and contains a tree plussource(andnamefor URL inputs).
Split patterns
llms-furl detects common llms-full formats automatically:
- Pattern A:
# Titlefollowed bySource: https://... - Pattern B:
<page>...</page>with frontmatter containingsource_url - Pattern C:
# Title, blank line, thenURL: https://... - Pattern D: Vercel-style dash-separated blocks with
title:andsource:
Code blocks are ignored when detecting boundaries.
llms.txt link lists
If a site only provides llms.txt, llms-furl reads every link in the list,
fetches each page, and writes each one as a leaf file. Relative links are
resolved against the llms.txt URL.
llms-furl https://cursor.com/llms.txtIntegration hints
When output is inside llms-furl/, llms-furl maintains llms-furl/AGENTS.md and may offer to update:
.gitignoreto ignorellms-furl/tsconfig.jsonto excludellms-furlAGENTS.mdto add a llms-furl section
In TTY, you get a y/n prompt; in non-interactive runs it prints hints only. Consent is stored in llms-furl/.llms-furl.json.
llms-furl lets you treat your LLM documentation the way Unix always wanted you to:
as a living, searchable filesystem of knowledge.
Acknowledgments
This project was inspired by opensrc. Thank you for the great idea!
