git-wik
v0.1.2
Published
MCP server that builds a local relation graph of a GitHub repo's conversation history and serves token-efficient intelligence to Claude Code.
Maintainers
Readme
git-wik
Local knowledge graph for GitHub repos. Gives Claude Code token-efficient answers to "why does this code exist?" instead of making it read thousands of lines of raw PRs and issues.
git-wik builds a local SQLite graph of a GitHub repo's full history — issues, PRs, reviewer decisions, cross-references, and file co-change patterns. When you ask Claude Code to touch a file or implement a feature, it can call git-wik's MCP tool and get a compact, structured answer (~700 tokens) instead of spending its context window reading raw GitHub pages.
Why this exists
When Claude Code starts working on a file, it has no idea:
- Which PRs previously changed that file, and why
- What approaches were tried and rejected
- What constraints maintainers have enforced in reviews
- Which open issues are related to what it's about to touch
- What other files always change together with this one
Without that context, it re-invents decisions that were already made, misses constraints that are implicit in the codebase, and asks you to explain things that are already in the git history.
git-wik pre-builds that context into a queryable graph so Claude Code can retrieve it in one call, within its token budget.
How it works
Your GitHub repo
│
▼
git-wik index <repo> ← fetches via gh CLI, builds local SQLite graph
│
▼
~/.git-wik/<owner>-<repo>/graph.db
│
▼ (on every MCP call)
Hybrid retrieval engine
├─ BFS graph traversal ← follows edges: TOUCHES, FIXES, DEPENDS_ON, REFERENCES
└─ BM25 full-text search ← searches indexed issue/PR titles and keywords
│
▼
Scored candidate pool ← merged, deduplicated, ranked by relevance density
│
▼
Token-budget formatter ← greedy-fills 700 tokens with highest-signal entities
│
▼
Claude Code ← gets ≤700 tokens of structured contextWhat's in the graph
Every indexed repo gets a SQLite database with:
- Issues and PRs — titles, state, outcomes, full-text indexed for BM25 search
- FIXES edges — which PRs closed which issues (parsed from bodies + commit messages)
- TOUCHES edges — which files each PR modified
- DEPENDS_ON / SUPERSEDES edges — PR dependency chains
- CO_CHANGES_WITH edges — files that change together in commits (from git history)
- REFERENCES / MENTIONS edges — cross-references parsed from comment bodies (
fixes #N,related to #N, bare#N) - REVIEWED_BY edges — who reviewed each PR, with outcome states
- LLM-extracted decisions — WHY a PR made the choice it did (optional, confidence-gated)
Token efficiency
A typical file query using raw GitHub API calls might consume 5,000–15,000 tokens (PR descriptions, issue threads, review comments). git-wik returns the same information in ≤700 tokens by:
- Structural compression — one line per PR/issue instead of raw thread text
- Relevance scoring — only the highest-signal entities fill the token budget
- Evidence-gated inference — LLM-extracted insights only shown when
confidence ≥ 0.7AND supporting rationale exists, always marked[inferred:N.NN]
Benchmark snapshot
These are representative values from real repos and will vary by project size:
| Workflow | Raw GitHub reading | With git-wik |
|----------|---------------------|--------------|
| "Why does this line/file exist?" | 5,000–15,000 tokens | 150–700 tokens |
| "Find related implementation history" | 10–30 manual link hops | 1 query (get_context) |
| "Time to first contextual answer" | 2–10 minutes | 5–20 seconds |
Prerequisites
- Node.js 20+
- GitHub CLI (
gh) — installed and authenticated (gh auth login) - For LLM enrichment (optional):
ANTHROPIC_API_KEYorGOOGLE_GENAI_API_KEY
Installation
npm install -g git-wikOr use without installing:
npx git-wik <command>Maintainers: see RELEASING.md for npm publish setup and release steps.
Quick start
0. Run setup checks:
git-wik doctor1. Index a repo (fetches and builds the local graph):
git-wik index expressjs/expressThis takes 30–90 seconds for most repos. It fetches all issues and PRs, builds the relationship graph, and runs co-change analysis on recent commits.
2. Generate MCP config automatically:
git-wik init-mcpThis writes .mcp.json in your current directory. If you want to inspect without writing:
git-wik init-mcp --print3. Add to Claude Code manually (if you do not want init-mcp):
{
"mcpServers": {
"git-wik": {
"command": "node",
"args": ["/path/to/git-wik/dist/mcp/server.js"]
}
}
}Or with npx (no global install needed):
{
"mcpServers": {
"git-wik": {
"command": "npx",
"args": ["-y", "git-wik", "serve"]
}
}
}4. Use in Claude Code:
What changed in src/auth/middleware.ts and why?Claude Code calls get_context({ repo: "your/repo", query: "src/auth/middleware.ts" }) and gets back a structured answer — no reading raw files, no GitHub API calls.
Why this still matters if git log exists
For narrow questions about a single file's commit history, plain git log can be enough.
git-wik is designed for cases where agents need cross-artifact reasoning in a compact output:
- PR review rationale (not just commit diff)
- linked issues and references (
fixes,mentions,related) - constraints and rejected alternatives from discussion
- co-changed files likely impacted by your edit
- one compact answer instead of multiple shell/API hops
Killer demo (where git-wik wins)
Ask your agent:
Use git-wik MCP tools first. For src/auth/middleware.ts, explain:
1) what changed recently,
2) why that direction was chosen,
3) what constraints reviewers enforced,
4) which 3 files co-change most with it,
5) which open issues may regress if we edit it.With raw git log, this usually requires many follow-up calls and manual synthesis.
With git-wik, this is a single get_context call plus optional get_file_lore deepening.
Make agents use MCP first
If your agent defaults to shell commands, add a rule that forces MCP-first behavior.
Copy-paste policy and examples live in docs/AGENT_USAGE.md.
CLI commands
git-wik index [repo]
Populates the local graph for a repo. Run this before using any MCP tools.
If run from inside a git repo with a GitHub remote, repo is auto-detected — you don't need to type it.
# Auto-detect from current directory (must be inside a GitHub repo)
git-wik index
# Explicit
git-wik index owner/repo
# Options:
git-wik index owner/repo --local-path ./path/to/clone # faster co-change analysis from local git
git-wik index owner/repo --depth full # index all history (default: last 200 items)
git-wik index owner/repo --since 2024-01-01 # only items updated after this date
git-wik index owner/repo --no-enrich # skip LLM enrichment (faster, no API key)
git-wik index owner/repo --provider anthropic # force a specific LLM providerWhat it does:
- Fetches all issues and PRs via
ghCLI - Parses relationship edges (FIXES, DEPENDS_ON, REFERENCES, MENTIONS) from bodies and comments
- Builds file co-change pairs from git history
- Optionally runs LLM enrichment on merged PRs to extract decisions and constraints
Storage: ~/.git-wik/<owner>-<repo>/graph.db
git-wik enrich <repo>
Runs LLM enrichment over already-indexed PRs to extract design decisions, constraints, and rejected alternatives. Requires ANTHROPIC_API_KEY or GOOGLE_GENAI_API_KEY.
git-wik enrich owner/repo
# Options:
git-wik enrich owner/repo --provider anthropic # or gemini
git-wik enrich owner/repo --force # re-enrich already-processed items
git-wik enrich owner/repo --limit 50 # cap at 50 items per runEnrichment is gated to PRs that are likely to have meaningful discussion (merged + substantial body, or has review round-trips). Low-confidence extractions are silently discarded. Results marked [inferred:N.NN] in output.
git-wik status <repo>
Shows the current state of the local graph for a repo.
git-wik status owner/repoOutput: node counts, edge counts, last indexed timestamp, enrichment coverage.
git-wik issue <repo> <number>
Research a specific issue and print structured intelligence (JSON).
git-wik issue expressjs/express 4321git-wik file <repo> <path>
Get the history and design decisions for a specific file (JSON).
git-wik file expressjs/express src/router/index.jsgit-wik why [target]
Explain why a specific line of code exists. Traces the line through git blame → commit → PR → design decisions.
# Why does line 42 in this file exist?
git-wik why src/auth/middleware.ts:42
# What's the most recent PR that touched this file?
git-wik why src/auth/middleware.ts
# Options:
git-wik why src/auth/middleware.ts:42 --repo owner/name # override repo (default: auto-detected)
git-wik why src/auth/middleware.ts:42 --json # raw JSON outputAuto-detects the GitHub repo from your git remote. Output includes:
- The commit SHA, author, and date from
git blame - The PR that introduced that commit
- LLM-extracted rationale (confidence-gated, always marked
[inferred]) - Rejected alternatives and constraints from the PR discussion
git-wik doctor
Checks whether your local environment is ready:
git-wik doctorChecks:
- Node.js version
ghCLI availabilitygh authstatus- GitHub repo auto-detection
- local graph DB presence
.mcp.jsonpresence
git-wik init-mcp
Creates or updates .mcp.json with a git-wik MCP server entry:
git-wik init-mcp
# Options:
git-wik init-mcp --target .cursor/mcp.json # custom config path
git-wik init-mcp --print # print config only
git-wik init-mcp --force # overwrite existing git-wik entryDefault entry:
{
"mcpServers": {
"git-wik": {
"command": "npx",
"args": ["-y", "git-wik", "serve"]
}
}
}git-wik serve
Start the MCP server on stdio. This is what .mcp.json calls.
git-wik serveMCP tools
Once the server is running, Claude Code can call these tools:
get_context ★ Start here
The primary tool. Accepts any query type and returns a token-budgeted context package.
query can be:
"src/auth/middleware.ts" → file path
"#123" or "123" → issue number
"pr#456" or "pr456" → PR number
"rate limiting" → keyword / feature nameExample output (for a file path query):
# Context: src/auth/middleware.ts · expressjs/express
## PRs (4 matched)
• PR#5123 "Replace session storage with httpOnly cookies" [merged 2024-01] → fixes #4821
Review: @dougwilson APPROVED | @wesleytodd CHANGES_REQUESTED→APPROVED (2 rounds)
[inferred:0.91] WHY: localStorage tokens exposed to XSS; httpOnly prevents JS access
[inferred:0.87] CHOICE: httpOnly cookies with secure flag + SameSite=Strict
• PR#4987 "Session timeout enforcement" [merged 2023-11]
Review: @dougwilson APPROVED
• PR#4801 "Initial auth middleware" [merged 2023-08]
Review: @dougwilson APPROVED | @wesleytodd APPROVED
## Open Issues (2 related)
• #4901 "Session not clearing on logout" [open, 3 comments]
• #5034 "Refresh token race condition" [open]
## Closed Issues
• #4821 "XSS vulnerability in session handling" [closed] ← fixed by PR#5123
## Co-Changed Files (top 3)
• src/session/store.js (8×)
• src/middleware/csrf.js (5×)
• src/types/session.d.ts (4×)Total: ~180 tokens. Reading those PRs and issues raw: ~8,000 tokens.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| repo | string | yes | "owner/name" |
| query | string | yes | File path, issue#, PR#, or keyword |
| token_budget | number | no | Max output tokens (default: 700, max: 900) |
explain_line
Explain why a specific line of code exists. Runs git blame to find the last commit that touched the line, then looks up the PR that introduced that commit, and returns the full design context: rationale, rejected alternatives, constraints, and linked issues.
Use this before modifying a line — it tells you the original intent so you don't accidentally revert a deliberate decision.
Parameters: repo, file, line (optional — omit for most-recent PR context for the whole file)
find_implementation_context
Pre-implementation research. Given a natural-language description of a change, finds relevant prior PRs, design decisions, and constraints.
query: "add rate limiting to the auth middleware"Returns: relevant PRs with outcomes and file lists, design decisions with rationale, constraints, rejected alternatives.
Parameters: repo, query, max_prs (default: 3), max_files (default: 3)
research_issue
Full intelligence on a specific issue: linked PRs, contributors, discussion summary.
Parameters: repo, issue_number
get_file_lore
Design history for a specific file: PRs that touched it, decisions extracted from those PRs, constraints, and files that co-change with it.
Parameters: repo, file_path
find_similar
BM25 similarity search. Find issues or PRs that are similar to a query — useful for detecting duplicates or finding prior art.
Parameters: repo, query, limit (default: 5, max: 10)
get_pr_context
Full context for a specific PR: what it changed, why, what was rejected, what constraints it enforced.
Parameters: repo, pr_number
How the retrieval works
When get_context receives a query, it runs two passes in parallel and merges them:
Pass 1 — Structural BFS
Traverses the graph starting from the detected seed type:
- File seed → finds all PRs that touched that file (
TOUCHESedges reverse), then their linked PRs (DEPENDS_ON) - Issue seed → finds PRs that fix the issue (
FIXESedges reverse), plus cross-referenced issues (REFERENCES/MENTIONSedges) - PR seed → finds issues it fixes, linked PRs, and files it changed
- Keyword seed → skips BFS (no structural anchor), goes straight to FTS
Pass 2 — BM25 full-text search
Runs in parallel regardless of seed type. Searches BM25-indexed issue/PR titles and keywords. Returns top 15 matches.
Merge + score
Results from both passes are merged by entity ID. Each entity gets a score:
score = (DISTANCE_BASE − distance) × mergeWeight × recencyScore + FTS_MATCH_BONUSdistance— 0 for the seed itself, 1 for direct edges, 2 for two-hop edgesmergeWeight— 3 for merged PRs, 2 for open, 1 for closed-no-mergerecencyScore— 1 / (1 + daysSince / 180), half-life of 6 monthsFTS_MATCH_BONUS— +0.5 if the entity appeared in both BFS and FTS results
Items appearing in both passes get a natural boost, rising to the top of the ranking.
Token budget greedy fill
Sorted by score, entities fill the token budget greedily until the limit (~700 tokens by default) is reached. The formatter reserves ~30 tokens for the header.
Seed fallback
If a file/issue/PR seed is not found in the index (e.g., the repo hasn't been indexed, or the entity doesn't exist), the query is automatically retried as a keyword search. The output notes that this happened.
LLM enrichment
Enrichment is optional and only improves the [inferred:N.NN] lines in get_context output. The structural graph works without any API key.
When enrichment runs on a PR, it sends the PR thread to an LLM (Claude Haiku or Gemini) and extracts:
- Problem statement (why the PR was made)
- Choice made (what approach was taken)
- Rationale (why that approach)
- Constraints discovered
- Alternatives that were rejected
Quality gate: Extracted data is only surfaced when:
confidence ≥ 0.7(LLM's self-assessed confidence)- A supporting
rationalewas extracted (not just a bare claim)
Items that pass both gates are shown as [inferred:0.87] WHY: .... Items that fail are silently counted and reported in the footer as *N inferred items omitted (low evidence)*. This prevents confidently-wrong data from appearing.
Enrichment uses Anthropic's prompt caching to reduce the cost of the system prompt across bulk runs (~90% cache hit rate after warmup).
Environment variables
| Variable | Purpose |
|----------|---------|
| ANTHROPIC_API_KEY | Enables Claude Haiku LLM enrichment |
| GOOGLE_GENAI_API_KEY | Enables Gemini LLM enrichment (fallback) |
Neither variable is required for indexing or MCP queries. Only needed for git-wik enrich.
Data storage
All data is stored locally. Nothing is sent to any server except:
- GitHub API calls (via
ghCLI, using your existingghcredentials) - LLM API calls during
git-wik enrich(only if you set an API key)
Graphs are stored at:
~/.git-wik/<owner>-<repo>/graph.dbEach repo gets its own SQLite database. You can delete it at any time; git-wik index will rebuild from scratch.
Limitations
- Read-only: git-wik only reads GitHub data. It has no write path.
- Public repos / authenticated access: Uses
ghCLI, so it respects your GitHub authentication. Works on private repos if yourghsession has access. - Index freshness: The repo index has a 24-hour TTL. Run
git-wik indexagain to refresh. - LLM extraction quality: Inferred content can be wrong. The confidence + evidence gate reduces noise but doesn't eliminate it. Always verify inferred items against the actual PR.
- Large repos: The default
indexfetches up to 200 recent items. Use--depth fullfor complete history (slower).
Development
git clone https://github.com/your-org/git-wik
cd git-wik
npm install
npm run dev -- index expressjs/express # run CLI without building
npm test # vitest
npm run typecheck # tsc --noEmit
npm run build # compile to dist/How to test locally (with expected output)
- Environment checks
git-wik doctorExpected:
PASSfor Node and GitHub CLIWARNfor missing index or.mcp.jsonis acceptable on first run
- Build index
git-wik index expressjs/expressExpected:
- progress phases (fetching + co-change)
- final line similar to:
Done. <N> nodes · <M> edges · <K> file pairs - DB path under
~/.git-wik/expressjs-express/graph.db
- Create MCP config
git-wik init-mcpExpected:
Wrote MCP config to .../.mcp.json
- Explain a file or line
git-wik why src/router/index.js --repo expressjs/expressExpected:
- file header
- PR title/number
- reviewer list and changed files
- optional WHY / constraints / rejected approaches if enriched data exists
- Run quality checks
npm run typecheck
npm testExpected:
- no TypeScript errors
- all tests passing
Project structure
src/
graph/ SQLite schema, migrations, DB helpers, BM25 FTS
fetcher/ GitHub API wrappers, parse.ts, populate.ts, cochange.ts
extract/ LLM enrichment (Anthropic + Gemini providers)
intelligence/ Retrieval engine: context.ts, context-formatter.ts
mcp/ MCP server + tool handlers
cli/ Commander CLI wrapper
shared/ token-budget.ts, keywords.tsKey files:
src/intelligence/context.ts— Hybrid BFS+FTS retrieval, scoring algorithm, seed detectionsrc/intelligence/context-formatter.ts— Token-budget greedy formattersrc/fetcher/parse.ts— All regex parsing (relationships, cross-references)src/graph/db.ts— All SQLite node/edge types and upsert/query helpers
