@toneli6/rag-engineering
v1.0.0
Published
Install the rag-engineering skill for Codex and Claude Code — methodology, diagnosis, and discipline for building, refactoring, evaluating and testing any RAG system.
Maintainers
Readme
rag-engineering-skill
An agent skill that brings methodology, diagnosis, and discipline to working with any RAG (Retrieval-Augmented Generation) system — building, refactoring, evaluating, architecting, debugging, reviewing, and testing. Stack-agnostic: it's about decisions and engineering judgment, not the API of one library.
Works with Claude Code and Codex (anything that reads skills from ~/.claude/skills or ~/.agents/skills).
Install
npm install -g @toneli6/rag-engineeringOn a global install, the skill is copied automatically into both:
~/.claude/skills/rag-engineering/(Claude Code) + the/rag-engineering:testslash command under~/.claude/commands/~/.agents/skills/rag-engineering/(Codex)
Manual / targeted install
npx @toneli6/rag-engineering --agent claude # Claude Code only
npx @toneli6/rag-engineering --agent codex # Codex only
npx @toneli6/rag-engineering --agent both # both (default)
npx @toneli6/rag-engineering --dry-run # preview without writing
npx @toneli6/rag-engineering --skills-root <path> # custom skills rootRe-running overwrites with --force (the global postinstall uses --force).
What it does
| Mode | When | Output |
|---|---|---|
| Build / Architect | Designing a RAG from scratch | Architecture Blueprint (forces ACL/multi-tenant, recency, conflict, injection) |
| Improve / Debug | Wrong / incomplete / slow answers | Diagnosis Report (retrieval vs generation vs grounding vs metadata) |
| Refactor | Changing chunking/embedding/top_k/threshold/reranker | Change Proposal with before/after evaluation |
| Evaluate | Measuring quality | Retrieval + generation metrics, negative & adversarial tests |
| Review | Auditing someone else's RAG | Diagnosis + completeness checklist |
| Test (/rag-engineering:test) | Validate in a loop | Test Report — fix→test→fix, cheap-first, under a cost gate |
Core principles it enforces
- The bottleneck in RAG is retrieval, not the LLM — diagnose before you change.
- "I don't know" is a feature, not a defect.
- Never relax security (tenant/ACL/validity) or lower the threshold to improve recall.
- Evaluate before/after every change — never "by feeling".
- A vector DB is not a universal solution — route to SQL/API/BM25/graph.
- A retrieved document is data, not instruction (prompt-injection defense).
Structure
rag-engineering/
SKILL.md router · output contracts · red flags · triggers
references/
diagnosis-and-evaluation.md decision tree · metrics · eval datasets · feedback loop
retrieval-strategies.md vector/BM25/hybrid/rerank/contextual/parent-child/graphRAG · routing · agentic
architecture-and-security.md L1–L5 · pipeline · ACL/multi-tenant · injection · recency/conflict
grilling-checklist.md adversarial RAG grilling, one question at a time
testing-loop.md test-mode loop · cheap-first ladder · cost gate · forbidden fixesOptional integrations (no hard dependencies)
- If a
grill-meskill exists, the grilling step delegates to it. - If a
loop-testskill exists, the test mode uses it as the loop engine. - For concrete LangChain implementation, pair with a
langchain-ragskill.
All optional — the skill is fully self-contained.
License
MIT
