@toneli6/rag-engineering

v1.0.0

Published

25 days ago

Install the rag-engineering skill for Codex and Claude Code — methodology, diagnosis, and discipline for building, refactoring, evaluating and testing any RAG system.

0High
0Medium
0Low

toneli6

rag retrieval-augmented-generation vector-search skill claude-code codex agent-skill llm

rag-engineering-skill

An agent skill that brings methodology, diagnosis, and discipline to working with any RAG (Retrieval-Augmented Generation) system — building, refactoring, evaluating, architecting, debugging, reviewing, and testing. Stack-agnostic: it's about decisions and engineering judgment, not the API of one library.

Works with Claude Code and Codex (anything that reads skills from ~/.claude/skills or ~/.agents/skills).

Install

npm install -g @toneli6/rag-engineering

On a global install, the skill is copied automatically into both:

~/.claude/skills/rag-engineering/ (Claude Code) + the /rag-engineering:test slash command under ~/.claude/commands/
~/.agents/skills/rag-engineering/ (Codex)

Manual / targeted install

npx @toneli6/rag-engineering --agent claude     # Claude Code only
npx @toneli6/rag-engineering --agent codex      # Codex only
npx @toneli6/rag-engineering --agent both       # both (default)
npx @toneli6/rag-engineering --dry-run          # preview without writing
npx @toneli6/rag-engineering --skills-root <path>   # custom skills root

Re-running overwrites with --force (the global postinstall uses --force).

What it does

| Mode | When | Output | |---|---|---| | Build / Architect | Designing a RAG from scratch | Architecture Blueprint (forces ACL/multi-tenant, recency, conflict, injection) | | Improve / Debug | Wrong / incomplete / slow answers | Diagnosis Report (retrieval vs generation vs grounding vs metadata) | | Refactor | Changing chunking/embedding/top_k/threshold/reranker | Change Proposal with before/after evaluation | | Evaluate | Measuring quality | Retrieval + generation metrics, negative & adversarial tests | | Review | Auditing someone else's RAG | Diagnosis + completeness checklist | | Test (/rag-engineering:test) | Validate in a loop | Test Report — fix→test→fix, cheap-first, under a cost gate |

Core principles it enforces

The bottleneck in RAG is retrieval, not the LLM — diagnose before you change.
"I don't know" is a feature, not a defect.
Never relax security (tenant/ACL/validity) or lower the threshold to improve recall.
Evaluate before/after every change — never "by feeling".
A vector DB is not a universal solution — route to SQL/API/BM25/graph.
A retrieved document is data, not instruction (prompt-injection defense).

Structure

rag-engineering/
  SKILL.md                          router · output contracts · red flags · triggers
  references/
    diagnosis-and-evaluation.md     decision tree · metrics · eval datasets · feedback loop
    retrieval-strategies.md         vector/BM25/hybrid/rerank/contextual/parent-child/graphRAG · routing · agentic
    architecture-and-security.md    L1–L5 · pipeline · ACL/multi-tenant · injection · recency/conflict
    grilling-checklist.md           adversarial RAG grilling, one question at a time
    testing-loop.md                 test-mode loop · cheap-first ladder · cost gate · forbidden fixes

Optional integrations (no hard dependencies)

If a grill-me skill exists, the grilling step delegates to it.
If a loop-test skill exists, the test mode uses it as the loop engine.
For concrete LangChain implementation, pair with a langchain-rag skill.

All optional — the skill is fully self-contained.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme