@snaf/butt-rag
v0.1.2
Published
fetch, filter, and compact documentation into your local commode
Readme
🧻 butt-rag
Bundler for Unified Text Transform, Retrieval-Augmented Generation.
A CLI that fetches documentation from git repos and URLs, filters it, optionally compacts it, and deposits it in a local .commode/ directory for your AI tools to consume. Think of it as plumbing for your project's docs.
Also works as a declarative docs manifest. Instead of scattering links across READMEs or bookmarking docs you'll lose in a week, declare all your dependency docs in one .turd.yaml and pull them down with a single command. Sometimes you just want local docs you can cmd+f through without fighting a search bar that thinks it knows better. The RAG stuff is a bonus, not a prerequisite.
📦 Install
pnpm add -g @snaf/butt-ragOr, for the commitment-averse:
pnpx @snaf/butt-rag🚽 Usage
Drop a docs.turd.yaml (Tree for Unified RAG Dump) in your project root:
compact: false # set true to also produce a single load.md with everything
defaults:
compact: false # per-source default: squash files into one .md
wipe: false # per-source default: delete raw files after compaction
bidet: false # per-source default: delete compact file after top-level compaction
sources:
tailwind:
type: git
url: https://github.com/tailwindlabs/tailwindcss.com.git
subDirectories: ['src/docs']
extensions: ['.mdx']
compact: true
wipe: true # only keep the compacted file
svelte:
type: txt
url: https://svelte.dev/llms-medium.txtThen let 'er rip:
butt-ragYour docs land in .commode/. You're welcome.
🪠 The Pipeline
Every good system needs a flow.
- Fetch each source (sparse git clone or HTTP fetch)
- Filter git sources by subdirectory and file extension
- Compact per-source if configured (concatenate with file markers)
- Wipe raw intermediates if configured
- Compact all into
load.mdif top-level compact is on - Bidet per-source compact files if configured
🗺️ What Goes Where
| scenario | output |
| ---------------------------- | -------------------------------------------------------------- |
| git source, compact: false | .commode/<name>/ (preserves dir structure) |
| git source, compact: true | .commode/<name>.md (one file, file markers between sections) |
| txt source | .commode/<name>.txt |
| top-level compact: true | .commode/load.md (the whole load in one file) |
🔧 CLI Options
butt-rag # run with ./docs.turd.yaml
butt-rag --config-file my.turd.yaml # use a different config
butt-rag --dry-run # print the plan without touching disk (sounds painful, isn't)
butt-rag --flush # nuke the commode📋 Config Reference
Top-level
compact(boolean, defaultfalse): produce a combinedload.mdfrom all sourcesdefaults(object): default values for per-source flags
Defaults
compact(boolean, defaultfalse): squash source files into a single.mdwipe(boolean, defaultfalse): delete raw files after per-source compactionbidet(boolean, defaultfalse): remove per-source compactions after top-level compaction
Source (git)
type:giturl: git clone URLsubDirectories(string[]): only checkout these paths (uses sparse checkout)extensions(string[]): only keep files with these extensionsskip(boolean, defaultfalse): skip this source without commenting it outcompact,wipe,bidet: override defaults
Source (txt)
type:txturl: URL to fetch (typically anllms.txtendpoint)skip(boolean, defaultfalse): skip this source without commenting it outcompact,wipe,bidet: no-ops for txt sources (already single files, nothing to squeeze)
💩 Glossary
Because you can't spell "documentation" without... actually, you can. Anyway.
- TURD 📜 Tree for Unified RAG Dump. Your config file. Treat it with respect.
- commode 🚽 The
.commode/output directory. Where processed docs end up. Classy name for a classy destination. - compact 🧱 Squash multiple files into one. Fiber for your docs.
- flush 🌊 Nuclear option. Delete everything in the commode. Start fresh. No debris left behind.
- wipe 🧻 Clean up raw files after compaction. Basic hygiene.
- bidet 💦 Clean up per-source compact files after top-level compaction. Fancy hygiene.
- load 📦 The final combined output file. The whole payload.
- skip ⏭️ Leave a source in your config but don't process it. For when you're not ready to commit.
