github-vec
v1.0.0
Published
Semantic search across GitHub repositories
Readme
github-vec
GitHub READMEs, vectorized.
"Ever searched GitHub for a project you knew existed but couldn't find?"
"You remember the concept, maybe a few keywords, but GitHub search returns nothing."
I got frustrated enough to embed 23M unique GitHub READMEs into a vector database. Now you can search by meaning, not just keywords.
Designed to work with claude-code subagents, keeping contexts lean.
CLI
# Install globally
bun install -g github-vec
# Search by meaning
github-vec "vector database for embeddings"
github-vec "lightweight web framework" --limit 20Options:
-l, --limit <n>- Number of results (default: 10, max: 50)-h, --help- Show help
Uses hosted API at https://github-vec.com
Why use this
"Someone already made something like your project. You just can't find it."
"Stop reinventing. Start finding."
Setup
./setup.sh # Install deps + Qdrant
qdrant # Start Qdrant server (in separate terminal)
bun scripts/ingest.ts # Ingest READMEs into QdrantRequires:
DEEPINFRA_API_KEY- for embeddingsDATA_DIR- path to data directory (default:/home/root/data)
Qdrant Servers
| Server | URL | Description |
|--------|-----|-------------|
| Local | http://localhost:6333 | Default development instance |
| Production | http://db.todofor.ai:6333 | Remote production instance |
To ingest to production:
QDRANT_URL="http://db.todofor.ai:6333" bun scripts/ingest.tsTo sync local storage to production (stops remote Qdrant, rsyncs, restarts):
./scripts/sync-qdrant.shData
| Property | Value |
|----------|-------|
| Records | 23M unique READMEs (100M+ with forks) |
| Size | ~350 GB |
| Source | BigQuery bigquery-public-data.github_repos |
Schema:
{"content_hash": "9d6a7cca...", "repo_name": "owner/repo", "content": "# Title\n..."}| Field | Type | Description |
|-------|------|-------------|
| content_hash | string | SHA-1 hash (unique ID) |
| repo_name | string | GitHub repo owner/repo |
| content | string | Raw README.md markdown |
Sample:
{
"content_hash": "9d6a7cca12ed5fc9831fec6d97fed2e88b1bb884",
"repo_name": "nyc-squirrels-2015/dbc_pair_mate_v2",
"content": "# dbc_pair_mate_v2\nThis a verion 2 of the dbc pair mate ported to Rails.\n"
}Pull data (optional)
To re-pull from BigQuery (~$16):
bun scripts/pull-readmes.ts ./data