tea-rags

v1.31.1

Published

8 days ago

MCP server for semantic search using local Qdrant and Ollama (default) with support for OpenAI, Cohere, and Voyage AI

0High
0Medium
0Low

artk0de

mcp qdrant vector-search semantic-search embeddings

Your coding agent copies the first code it finds — not the right one.

TeaRAGs is an MCP server for code search that enriches every retrieved chunk with git history: authorship, churn, bug-fix rate, ownership. Your agent stops learning from hotspots and starts learning from stable, owned, battle-tested code.

📖 Full documentation · 🏁 15-minute quickstart · 🧠 Core concepts

The Problem

1. Understanding a monorepo is expensive — for humans AND agents

Every new developer pays in hours. Every fresh agent session pays in tokens. Naming conventions, domain logic, local idioms — all of it has to be rebuilt from scratch, every time.

2. Bad code hygiene is a tax on your agent

Confusing names mean the agent reads more files. More files mean more tokens, slower responses, and a higher chance of picking the wrong example. Your codebase's technical debt is now your AI bill.

3. Agents can't tell stable code from a hotspot

Standard code search ranks by embedding similarity alone. It doesn't know which function gets bug-fixed every sprint, which module hasn't been touched in two years, or whose name is on the commits. So the agent copies whatever looks similar — including the broken examples.

The Solution

TeaRAGs gives your agent two things it can't get from vanilla code search.

1. Every chunk carries its own history

Retrieved code comes with signals about who wrote it, how stable it is, how often it gets bug-fixed, and how impactful a change would be. Semantic similarity stops being the whole answer — it becomes the floor.

2. Pre-built skills, not just raw tools

TeaRAGs ships agent skills — ready-made playbooks that tell your agent when and how to use the signals. No prompt engineering required:

explore — orient in an unfamiliar codebase
data-driven-generation — write code backed by stable, owned templates
risk-assessment — know what you'd break before you break it
refactoring-scan · bug-hunt · pattern-search — and more

Install the plugin, your agent learns the workflow. See all skills →

Bonus: dinopowers — a companion plugin with 10 wrappers over superpowers:* skills (Jesse Vincent's skills library for Claude Code) that inject tea-rags signals into brainstorming, planning, debugging, TDD, review, and completion flows. Mean eval delta +71pp across 136 cases. Learn more →

Use Cases

🛡️ Safe code generation

Your agent writes new code backed by stable, canonical templates — modules with a low bug-fix rate, long stability, and a clear owner. No more copying from last sprint's hotspot. Skill: data-driven-generation · Why stable code is safer →

🔧 Refactoring planning & problem-pattern discovery

Find the 5% of code responsible for 80% of incidents. High churn + high bug-fix rate + concentrated ownership = your next production issue — and your next refactoring candidate. Skills: refactoring-scan, bug-hunt

🎯 Risk assessment before changes

Before modifying a function, the agent checks who depends on it, how often it breaks, and what its ticket history says. Know the blast radius before you blast. Skill: risk-assessment · Coupling & blast radius theory →

🗺️ Learning an unfamiliar codebase

Ask questions instead of reading directory trees. "How does auth work?" returns the stable, canonical implementation with its history attached — not a random similar-looking snippet. Skill: explore

How It Works

flowchart LR
    User([👤 You])

    subgraph mcp["TeaRAGs MCP Server"]
        Agent[🤖 Agent<br/>runs skills]
        TeaRAGs[🍵 TeaRAGs<br/>search · enrich · rerank]
        Agent <--> TeaRAGs
    end

    Qdrant[(🗄️ Qdrant<br/>vector DB)]
    Embeddings[✨ Embeddings<br/>Ollama/OpenAI]
    Codebase[📁 Your Codebase<br/>+ Git History]

    User <--> Agent
    TeaRAGs <--> Qdrant
    TeaRAGs <--> Embeddings
    TeaRAGs <--> Codebase

You talk to your agent. The agent runs a TeaRAGs skill. TeaRAGs searches your code, enriches each result with git history, and ranks by what the skill needs — stability, ownership, risk, or pure relevance.

What You Get

🧬 Trajectory-aware retrieval — the only open-source code RAG that scores results by git history, not just embedding similarity
📚 Ships with agent skills — 6 ready-made playbooks for exploration, generation, risk assessment, and index management (plus 2 internal strategies)
🔒 Local-first, privacy-first — works fully offline with Ollama; your code never leaves your machine (cloud providers optional)
🚀 Built for monorepos — AST-aware chunking across 10+ languages, incremental reindexing, parallel pipelines, millions of LOC tested

🕸️ Codegraph Enrichments (beta)

⚠️ Beta. Structural graph signals are still being calibrated across languages and may change between releases.

Beyond git history, tea-rags can enrich chunks with structural graph signals — call graph and import graph (fan-in, fan-out, instability, PageRank, transitive impact) — and expose graph-query MCP tools (get_callers, get_callees, find_cycles, trace_path). This powers blast-radius and architectural-hub ranking.

Codegraph is disabled by default (beta). Opt in with the CODEGRAPH_ENABLED environment variable, then re-index:

CODEGRAPH_ENABLED=true   # enable graph signals + tools
CODEGRAPH_ENABLED=false  # default — graph extraction off

See the Codegraph Enrichments docs for signals, presets, supported languages, and configuration.

Who It's For

Developers in large monorepos — where "find similar code" returns a dozen near-duplicates and you need the canonical one
Solo devs doing agentic development — agent-driven workflows produce bursts of micro-commits that wreck churn metrics. TeaRAGs ships a GIT SESSIONS mode (TRAJECTORY_GIT_SQUASH_AWARE_SESSIONS=true) that groups commits by (author, time gap) so a 20-commit refactor session counts as one. Churn, bug-fix rate, and ownership stay meaningful even with a single human + an agent as the only contributors.
Tech leads worried about AI code quality — who want their team's agents to learn from stable modules, not from last sprint's hotspot
Privacy-sensitive teams — finance, healthcare, defense, or anyone who can't send source code to a cloud API

Not for: repos without git history (no signal to enrich) or teams that only need autocomplete (use Copilot).

🚀 Quick Start

Inside Claude Code, install the TeaRAGs plugins and run the setup wizard:

/plugin marketplace add artk0de/TeaRAGs-MCP
/plugin install tea-rags-setup@tea-rags
/tea-rags-setup:install

Then install the skills plugin (Claude-only, final step):

/plugin install tea-rags@tea-rags

Optionally install dinopowers for wrappers over superpowers:* skills:

/plugin install dinopowers@tea-rags

Index your codebase:

/tea-rags:index

Ask your agent anything: "How does auth work in this project?", "Find stable examples of retry logic", "What should I know before touching the payment module?".

For other MCP clients, CI, or air-gapped setups, see the manual install (Node + npm install -g tea-rags + Ollama/ONNX/OpenAI/Cohere/Voyage).

🗂️ Project Registry

TeaRAGs maintains a per-machine registry at ~/.tea-rags/registry.json (or $TEA_RAGS_DATA_DIR/registry.json) that records collection metadata and lets you address indexed projects by a short name instead of an absolute path or opaque collection id.

CLI:

tea-rags register-project --path ./my-repo --name myrepo
tea-rags list-projects
tea-rags list-projects --json
tea-rags tune --project myrepo
tea-rags unregister-project --name myrepo

MCP tools: register_project, list_projects, unregister_project.

Every project-aware tool and command also accepts an optional project parameter alongside path and collection. Resolution priority is collection > project > path. The registry is auto-populated at the end of each indexing / reindexing run with the embedding model, embedding dimensions, Qdrant URL, indexedAt timestamp, tea-rags version, and chunk count — no manual register-project call is required for collections you index through tea-rags.

📚 Documentation

artk0de.github.io/TeaRAGs-MCP

| I want to… | Start here | | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | | Get it running | Quickstart (15 min) — install, index, first query | | Understand the concept | Core Concepts — vectorization, trajectory enrichment, reranking | | See what my agent can do | Skills — 6 ready-made agent playbooks for exploration, generation, risk | | Look under the hood | Architecture — pipelines, data model, reranker internals | | Learn the theory | Knowledge Base — RAG, code search, software evolution |

🤝 Contributing

See CONTRIBUTING.md for workflow and conventions.

🙏 Acknowledgments

Built on a fork of mhalder/qdrant-mcp-server — clean architecture, solid tests, open-source spirit. And its ancestor qdrant/mcp-server-qdrant. Code vectorization inspired by claude-context (Zilliz).

Feel free to fork this fork. It's forks all the way down. 🐢

⚖️ License

MIT — see LICENSE. Brand policy in BRAND.md.