npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

cairn-index

v1.2.1

Published

Local hybrid index (FTS5 + vector embeddings + AST graph) over a single sqlite file. Curate, ingest, retrieve.

Readme

cairn

Local hybrid index for things you intentionally collect — web pages, codebases, files, raw text. Hybrid retrieval (FTS5 + vector embeddings, RRF fused) over a single sqlite file. Lightweight, fast, no daemons.

pnpm add cairn-index   # npm install cairn-index — the bins are cairn and cairn-mcp

Requires ollama running locally with nomic-embed-text pulled. See setup for the full prereq list and OpenCode wiring.

A daemon-free embedded runtime is available as an opt-in (in-process via node-llama-cpp, GGUFs auto-download to ~/.cairn/models on first use, ~785 MB). Pass runtime: 'embedded' (lib) or set CAIRN_RUNTIME=embedded (CLI). Set CAIRN_OFFLINE=1 to refuse the auto-download and require pre-cached models — useful for air-gapped or strict-egress environments.

quick start

Library:

import { Cairn } from 'cairn-index'

const cairn = new Cairn() // defaults to ~/.cairn, ollama @ 127.0.0.1:11434
await cairn.ingest.add({ kind: 'code', path: './src', label: 'my-project' })
const hits = await cairn.retrieve.search('how does the chunker handle overlap', { k: 5 })
cairn.close()

CLI:

cairn add ./src --label my-project
cairn search "how does the chunker handle overlap" -k 5
cairn graph "fee invariant"
cairn refresh all

what cairn is for

Local-first retrieval grounding for an LLM. You curate what's indexed (no automatic crawling), cairn add brings it in, and either you or a model running over MCP can query the result. Five query surfaces:

  • Hybrid chunk search (search) — FTS5 + vector embeddings fused via RRF. Default mode; returns ranked text chunks.
  • Knowledge graph (graph) — entities (functions, structs, concepts) and edges (calls, depends_on, mitigates, references, verifies) extracted from code (tree-sitter, AST-based) and markdown (LLM, hash-gated).
  • Composed retrieval (ask) — hybrid search + per-hit entity context in one call.
  • Shortest path (path) — BFS between two entities via batched layer fetch (one SQL per BFS layer).
  • Tag-filtered retrieval (tags, --tag) — concept entities carry free-form LLM-emitted tags; filter search / ask / graph by tag.

Cross-source linking (cairn link sdk program) lets you resolve names across two related sources — SDK calling its on-chain program is the canonical case. Soft-delete + FK cascades keep the graph clean across refreshes and removals.

architecture

  • data layerbetter-sqlite3 + sqlite-vec. Flat tables (sources, files, chunks, entities, edges, source_links) plus FTS5 + vec0 virtual tables. FK cascades from sources through entities into edges; triggers keep chunks_vec and entities_vec in sync.
  • ingestion — gitignore-aware walker → boring chunkers (code: 60-line / 10-overlap; text: ~2000-char / paragraph-snapping) → batch embed via ollama → atomic insert.
  • graph — tree-sitter (Rust, TypeScript / TSX, Python) extracts entities and per-fn AST call/ref maps. Edge layer derives calls / depends_on parse edges intra-file, cross-file, and cross-source (when sources are explicitly linked via cairn link). Optional LLM pass over markdown emits mitigates / references / verifies doc edges (capped confidence 0.7, hash-gated) plus free-form tags on concept entities (slugified, multi-tag, queryable via --tag). Soft-delete on entities; clean rebuild on every refresh.
  • retrieval — query embedded once; FTS5 + vec each return top-50; reciprocal rank fusion with k=60; hydrate via narrow indexed seeks. No reranker, no tuning. Empty-query short-circuit.
  • interfaceCairn class wires per-concern providers (Db, Embed, Chat, Ingest, Retrieve). Embed and Chat are runtime-agnostic: each takes an EmbedRuntime / ChatRuntime (Ollama or in-process llama.cpp), so swapping runtimes is one line. CLI for ingestion, graph queries, and admin (add / list / search / ask / graph / path / tags / refresh / reindex / link / unlink / links / remove). MCP server exposes search / list / add / graph / ask / path / tags / refresh so models can both query and maintain the index. Mutating ops remove / link / unlink / reindex / init are intentionally CLI-only — destructive or topology-changing actions require a human at the terminal.

configuration & safety (v1.2+)

Cairn is a curated index — you trust what you put in, and you control the surface around ingestion via env vars. Defaults are sensible for a single-user developer setup; the env vars matter in shared, agent-driven, or compliance-sensitive deployments.

| Env var | Default | Purpose | |---|---|---| | CAIRN_OFFLINE | unset | When 1 or true, blocks fetchWeb (no cairn add <url>) and blocks non-local model resolution (no Hugging Face GGUF auto-download). Localhost ollama still allowed. | | CAIRN_ALLOWED_ROOTS | unset | Comma-separated absolute paths. When set, cairn add rejects any local path (code / file / pdf) outside these roots. Defense-in-depth for MCP-driven ingestion. | | CAIRN_MAX_INGEST_FILES | 10000 | Aborts directory ingestion before any chunking/embedding if file count exceeds this. CLI --force bypasses; MCP intentionally has no force option. | | CAIRN_MAX_INGEST_BYTES | 500 MB | Same shape as the file cap, on total bytes. |

Network egress. Three categories, all bounded: (a) explicit cairn add <url> (user-initiated web ingest), (b) localhost ollama only when CAIRN_RUNTIME=ollama, (c) Hugging Face GGUF download on first use only when CAIRN_RUNTIME=embedded and the model isn't pre-cached. CAIRN_OFFLINE=1 blocks (a) and (c); (b) stays available because it's localhost.

Trust model. Cairn doesn't auto-crawl — every source enters via an explicit cairn add (CLI, library, or MCP) by you or an agent you authorized. Indexed content is queryable later, including by future MCP-connected agents — that is the point. For sensitive content, isolate it in a separate dbPath. The MCP host (Claude Desktop, OpenCode, etc.) controls which agents can connect; cairn assumes that gating is done host-side.

stack

| dep | purpose | |---|---| | better-sqlite3 | sync sqlite driver | | sqlite-vec | vector search extension | | tree-sitter + tree-sitter-rust / tree-sitter-typescript / tree-sitter-python | AST entity + call extraction | | linkedom | HTML → text | | @modelcontextprotocol/sdk | MCP server | | zod | tool input + LLM output schemas |

Embeddings via ollama (nomic-embed-text, 768-dim) — required prerequisite. Doc-extraction uses an ollama chat model (default Qwen3-0.6B-GGUF:UD-Q8_K_XL); skipped silently if not pulled. The embedded runtime substitutes equivalent in-process GGUFs. See setup.

docs

  • setup — prereqs, install, CLI, cross-source linking, OpenCode wiring, troubleshooting.
  • design — data model, retrieval pipeline, decisions, what's out of scope.
  • graph — knowledge-graph layer (entities, edges, soft-delete, AST extraction) layered over the chunk index. Additive; chunk search is unchanged.
  • next — completed roadmap with retrospectives plus the remaining open items (more language grammars, impl-method ID disambiguation).
  • v1.1 tags — concept-tag design + LLM emission + storage shape.
  • enterprise — pitch / reference for an enterprise-shaped variant on Postgres + Azure OpenAI.