@jchaffin/gh-rag
v0.4.7
Published
Hybrid RAG over GitHub repos: BM25 + Pinecone + cited answers.
Readme
import { createGhRag } from "@jchaffin/gh-rag";
const rag = await createGhRag({
openaiApiKey: process.env.OPENAI_API_KEY!,
pinecone: { apiKey: process.env.PINECONE_API_KEY!, index: "repo-chunks" },
});
await rag.ingest({ gitUrl: "https://github.com/owner/repo.git" });
const { text } = await rag.answer({
repo: "owner/repo",
question: "Tell me about the payments project"
});
console.log(text);Realtime ask (low-latency retrieval)
For voice or streaming clients, pull context snippets fast without generating a full answer:
const rag = createGhRag({
openaiApiKey: process.env.OPENAI_API_KEY!,
pine: { index: /* Pinecone index handle */ } as any,
});
const snippets = await rag.ask({ repo: "owner/repo", query: "auth flow", limit: 6 });
// Each snippet: { path, start, end, text }Server endpoint (Fastify):
- Start:
npm run build && npm run start - POST http://localhost:3000/ask with JSON
{ "repo": "owner/repo", "query": "auth flow", "limit": 6 }(GitHub ingests: useowner/repo, same as Pinecone namespace)
Notes:
- Pinecone namespaces: For GitHub URLs, ingestion uses one namespace per repository, named
owner/repo(GitHubfull_namestyle, e.g.ProsodyAI/website). For local paths, the namespace is the repo folder name (orrepoNameif you pass it). Search andaskmust use that same string.findBySkill/ cross-repo discovery still queries every namespace returned by index stats (not only the default namespace). - Set
OPENAI_EMBED_MODELto match your ingested index (e.g.,text-embedding-3-smallfor speed). Ingestion also respects this. - In-memory caching smooths identical queries for ~10s; embeddings cache for ~60s.
- Local BM25 index is optional. By default, ingest does not write any local files. To enable BM25 text ranking (used by
askwhen available), either passwriteBm25: truetoingestRepo/rag.ingest, or setGH_RAG_WRITE_BM25=1and provide aworkdirif you don't want..
CLI
Ask questions from the command line after ingesting a repo into Pinecone.
- Env: set
OPENAI_API_KEY,PINECONE_API_KEY, optionalPINECONE_INDEX(defaultrepo-chunks), optionalGITHUB_TOKEN.
Examples:
# Build once
npm run build
# Ask (uses env REPO and QUESTION if set)
npm run ask -- --repo owner/repo --question "What does the auth flow look like?"
# With JSON output
npm run ask -- -r owner/repo -q "Key modules?" --json
# If installed globally (after publish or npm link)
gh-rag-ask -r ProsodyAI/prosodyai "How do I run this?"
# Ingest a repo (GitHub URL or local path). After GitHub ingest, `rag.ingest` returns `repo` as owner/repo.
npm run ask -- --repo-url https://github.com/owner/repo.git --repo owner/repo
# Ingest then immediately ask in one command (repo defaults from ingest when omitted after --repo-url)
npm run ask -- --repo-url https://github.com/owner/repo.git -q "What are the core services?"
# Ingest ALL your GitHub repos (requires GITHUB_TOKEN)
# Default filters: excludes forks and archived repos
npm run ingest:all -- --affiliation owner --visibility all --concurrency 2
# Or if installed globally
gh-rag-ingest-all --affiliation owner --visibility all
# Flags:
# --include-forks Include forked repos
# --include-archived Include archived repos
# --dry-run List what would be ingested
# --index <name> Override Pinecone index