cairn-index

v5.1.0

Published

a month ago

Local hybrid index (FTS5 + vector embeddings + AST graph) over a single sqlite file. Curate, ingest, retrieve.

0High
0Medium
0Low

brightside_solutions

rag hybrid-search vector-search fts5 sqlite sqlite-vec code-search knowledge-graph tree-sitter mcp local-first

cairn

A local, code-aware retrieval engine for grounding LLMs and coding agents — one SQLite file, no model required. cairn indexes codebases, docs, and web pages into hybrid search (lexical + vector) and a real AST knowledge graph — functions, types, and the calls/depends_on/imports edges between them — traversable inside a repo and across repos you link. On top of that it packs cursor-shaped context for completion and serves an MCP endpoint agents query directly.

Not embeddings-only RAG:

A knowledge graph, not just chunks. tree-sitter extracts entities + call/reference edges; graph / path / ask traverse them. Ask "what reaches this function," not only "what reads like it."
Cross-source linking. cairn link sdk program resolves names from one indexed repo into another — a call in your SDK paths straight to its on-chain/program definition. Multi-repo systems become one graph.
Context packing for completion. pack assembles a token-budgeted neighborhood — callee signatures, touched types, callers — for a cursor. Deterministic, model-free, drift-flagged. The same context an editor plugin would build, exposed as a tool.
Inference-optional, tiered. Runs with zero models (FTS + graph). Add an embed endpoint for hybrid search; add a chat endpoint for doc-extract. You opt in, layer by layer.
Local and yours. No telemetry, no bundled model; inference is HTTP to an endpoint you control. MCP-native, so any agent harness binds by URL.

pnpm add cairn-index   # npm install cairn-index — the bins are cairn and cairn-mcp

cairn is the pipeline; inference is yours — and optional. [v3] cairn bundles no model; [v8] it needs no model at all. Three tiers, each opt-in to the next:

| tier | needs | adds | enable | | --- | --- | --- | --- | | 0 (default) | nothing | FTS5 chunk search + AST graph (calls/imports/depends, cross-source) + path | make up — zero setup | | 1 | embed endpoint | vector arm of search (RRF hybrid) + semantic graph(query) | make up-embed / set CAIRN_EMBED_URL | | 2 | chat endpoint | markdown doc-extract → concept/tag/doc-edges | make up-extract / set CAIRN_CHAT_URL |

Endpoints are any OpenAI-compatible server (llama.cpp's llama-server, Ollama, vLLM, LM Studio, OpenAI, Azure). make up-embed/up-extract run host-native llama.cpp models so they get Metal on Mac / GPU on Linux (Docker can't pass a GPU into a container); cairn reaches them over host.docker.internal. See setup / inference.

Tier 0 is distinct from CAIRN_OFFLINE — that flag blocks non-localhost egress but still assumes a (localhost) model; tier 0 means no model at all.

Embedding dim is pinned at 1024 (what's tested) and guarded on every call. Switching tier 0 → 1 (or changing embed models) on an existing index: run cairn reindex to (re-)embed — plain refresh is hash-gated and skips unchanged files. Upgrading from v2: inference moved out of process — there's no embedded model or ~/.cairn/models anymore.

quick start

Fastest path — make up (tier 0, just Docker; no models):

cp .env.example .env          # set CAIRN_INGEST_ROOT to the code you want indexed
make up                       # cairn-mcp (:8093) — FTS + AST graph, no models
# make up-embed               # …tier 1: also start the embed server (:8091) for hybrid search
# make up-extract             # …tier 2: also start the chat server + enable doc-extract
# point your MCP harness at http://localhost:8093/mcp   (make down to stop)

Library — createCairn is the composition root; pass inference to opt into a tier (omit for tier 0):

import { createCairn } from 'cairn-index'

const cairn = createCairn() // tier 0; or createCairn({ inference: { embed: { url, model } } }) for tier 1
await cairn.start()          // optional: stale-job recovery + eager embed health probe
await cairn.ingest.add({ kind: 'code', path: './src', label: 'my-project' })
const hits = await cairn.retrieve.search('how does the chunker handle overlap', { k: 5 })
cairn.close()

CLI:

cairn add ./src --label my-project
cairn search "how does the chunker handle overlap" -k 5
cairn graph "fee invariant"
cairn refresh all

what cairn is for

Local-first retrieval grounding for an LLM. You curate what's indexed (no automatic crawling), cairn add brings it in, and either you or a model running over MCP can query the result. Five query surfaces:

Hybrid chunk search (search) — FTS5 + vector embeddings fused via RRF. Default mode; returns ranked text chunks. In tier 0 (no embed) it degrades cleanly to FTS-only ranking (vec_rank is null on every hit).
Knowledge graph (graph) — entities (functions, structs, concepts) and edges (calls, depends_on, mitigates, references, verifies) extracted from code (tree-sitter, AST-based) and markdown (LLM, hash-gated). graph(query) uses semantic entity search in tier 1+, falling back to lexical entity match (entities_fts) in tier 0.
Composed retrieval (ask) — hybrid search + per-hit entity context in one call.
Shortest path (path) — BFS between two entities via batched layer fetch (one SQL per BFS layer).
Tag-filtered retrieval (tags, --tag) — concept entities carry free-form LLM-emitted tags; filter search / ask / graph by tag.

Cross-source linking (cairn link sdk program) lets you resolve names across two related sources — SDK calling its on-chain program is the canonical case. Soft-delete + FK cascades keep the graph clean across refreshes and removals.

Beyond query-shaped retrieval, cairn exposes a cursor-shaped surface for code completion:

Context packing (complete.contextPack, CLI cairn pack, MCP pack) — given an enclosing symbol, assemble a token-budgeted pack of its graph neighborhood (callees' signatures, the types it touches, its callers) for a completion model's context window. Deterministic, no model call on the hot path (pure graph walk + indexed seeks); drift made visible via a per-snippet stale flag. The MCP tool lets a thinking-loop agent pull the same cursor-context — no separate completion model needed. See completion.

architecture

data layer — better-sqlite3 + sqlite-vec. Flat tables (sources, files, chunks, entities, edges, source_links) plus FTS5 + vec0 virtual tables. FK cascades from sources through entities into edges; triggers keep chunks_vec and entities_vec in sync.
ingestion — gitignore-aware walker → boring chunkers (code: 60-line / 10-overlap; text: ~2000-char / paragraph-snapping) → (tier 1+) batch embed via the configured endpoint (1024-dim, guarded) → atomic insert. Tier 0 skips embedding and writes no vectors; cairn reindex backfills them once an embed endpoint is configured.
graph — tree-sitter (Rust, TypeScript / TSX, Python) extracts entities and per-fn AST call/ref maps. Edge layer derives calls / depends_on parse edges intra-file, cross-file, and cross-source (when sources are explicitly linked via cairn link). Optional LLM pass over markdown emits mitigates / references / verifies doc edges (capped confidence 0.7, hash-gated) plus free-form tags on concept entities. Fails open: if the chat endpoint is unreachable, doc-extract is skipped and chunk indexing continues. Soft-delete on entities; clean rebuild on every refresh.
retrieval — FTS5 + (tier 1+) vec each return top-50; reciprocal rank fusion with k=60; hydrate via narrow indexed seeks. No reranker, no tuning. In tier 0 the fusion degenerates to FTS rank (vec_rank: null) and graph(query) falls back to lexical entity match over entities_fts. Empty-query short-circuit.
inference — Embed / Chat providers wrap an EmbedRuntime / ChatRuntime; the runtimes are thin OpenAI-compatible HTTP clients (/v1/embeddings, /v1/chat/completions with response_format json_schema). Swapping the backend is config, not code. See inference.
serving — cairn-mcp is a passive Streamable-HTTP MCP server (CAIRN_MCP_TRANSPORT=http, default :8093) — any harness binds by URL; stdio is still available for spawn-based hosts. See serving.
interface — createCairn(opts) builds the graph at one composition root and injects it into the Cairn class (side-effect-free ctor; per-concern providers Db / Embed? / Chat? / Ingest / Retrieve / Complete); await cairn.start() runs recovery + an eager health probe. CLI covers ingestion, graph queries, and admin (init / add / list / search / ask / graph / path / pack / tags / refresh / reindex / link / unlink / links / remove / gc). MCP exposes the same surface (search list add link unlink links graph ask path tags refresh remove reindex gc job jobs pack) for agent-driven workflows; only init is CLI-only.

configuration & safety

Cairn is a curated index — you trust what you put in, and you control the surface around ingestion via env vars.

| Env var | Default | Purpose | |---|---|---| | CAIRN_EMBED_URL / CAIRN_EMBED_MODEL | unset → tier 0 | Set to an OpenAI-compatible embedding endpoint (must return 1024-dim) to enable tier 1 (hybrid search). Unset = lexical + graph only. | | CAIRN_CHAT_URL / CAIRN_CHAT_MODEL | unset | Set to enable tier 2 markdown doc-extract (structured outputs). Best-effort; fails open. Requires an embed endpoint too. | | CAIRN_MCP_TRANSPORT / CAIRN_MCP_PORT | http / 8093 | MCP transport + port for cairn-mcp (stdio for spawn-based hosts). | | CAIRN_MCP_TOKEN | unset | Bearer token required on every MCP request when set (use to expose beyond localhost). | | CAIRN_OFFLINE | unset | When 1/true, blocks fetchWeb (cairn add <url>) and refuses any non-localhost inference URL — one coherent "no non-localhost egress" switch. | | CAIRN_INGEST_ROOT | ./ | Compose only: host dir bind-mounted read-only at /projects. Address sources over MCP by their container path ($CAIRN_INGEST_ROOT/foo → /projects/foo), not the host path — see setup.md. | | CAIRN_ALLOWED_ROOTS | unset | Comma-separated absolute paths; cairn add rejects local paths outside them. In Docker, the read-only mount is the hard boundary on top. | | CAIRN_MAX_INGEST_FILES / CAIRN_MAX_INGEST_BYTES | 10000 / 500 MB | Abort directory ingestion before chunking if it exceeds the cap. CLI --force bypasses; MCP does not. |

All inference is HTTP to an endpoint you control — no telemetry, no bundled model. Keys (CAIRN_EMBED_KEY / CAIRN_CHAT_KEY) are sent only to the endpoints you configure.

stack

| dep | purpose | |---|---| | better-sqlite3 | sync sqlite driver | | sqlite-vec | vector search extension | | tree-sitter + tree-sitter-rust / tree-sitter-typescript / tree-sitter-python | AST entity + call extraction | | @modelcontextprotocol/sdk | MCP server (streamable-http + stdio) | | linkedom / unpdf | HTML / PDF → text | | zod | tool input + LLM output schemas |

Inference is not a dependency — it's an external OpenAI-compatible endpoint. The compose stack runs qwen3-embedding-0.6b (1024-dim) for embeddings and qwen3-0.6b for doc-extract, but any compatible models work (embeddings must be 1024-dim).

docs

setup — install, the compose stack, bring-your-own-endpoints, harness wiring, troubleshooting.
inference — the OpenAI-compatible embed/chat contract, dim guard, config.
serving — passive Streamable-HTTP MCP server, transports, auth seam.
design — data model, retrieval pipeline, decisions, what's out of scope.
completion — cursor-shaped context packing for code completion.
graph — knowledge-graph layer (entities, edges, soft-delete, AST extraction).
tags — concept-tag design + LLM emission + storage shape.
next — completed roadmap with retrospectives plus remaining open items.
enterprise — pitch / reference for an enterprise-shaped variant on Postgres + Azure OpenAI.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme