ghcrawl
v0.6.0
Published
ghcrawl terminal UI and CLI runtime
Maintainers
Readme
ghcrawl
ghcrawl is a local-first GitHub issue and pull request crawler for maintainers.

Install
Install the published CLI package:
npm install -g ghcrawlThat package exposes the ghcrawl command directly.
If you are working from source or maintaining the repo, use CONTRIBUTING.md.
Requirements
Normal ghcrawl use needs both:
- a GitHub personal access token
- an OpenAI API key
GitHub is required to crawl issue and PR data. OpenAI is required for embeddings and the maintainer clustering and search workflow. If you already have a populated local DB you can still browse it without live keys, but a fresh sync + embed + cluster or refresh run needs both.
Quick Start
ghcrawl init
ghcrawl doctor
ghcrawl refresh owner/repo
ghcrawl tui owner/repoghcrawl init runs the setup wizard. It can either:
- save plaintext keys in
~/.config/ghcrawl/config.json - or guide you through a 1Password CLI (
op) setup that keeps keys out of the config file
ghcrawl refresh owner/repo is the main pipeline command. It pulls the latest open GitHub issues and pull requests, refreshes embeddings for changed items, and rebuilds the clusters you browse in the TUI.
Typical Commands
ghcrawl doctor
ghcrawl refresh owner/repo
ghcrawl tui owner/reporefresh, sync, and embed call remote services and should be run intentionally.
cluster does not call remote services, but it is still time consuming. On a repo with roughly 12k issues and PRs, a full cluster rebuild can take around 10 minutes.
clusters explores the clusters already stored in the local SQLite database and is expected to be the fast, read-only inspection path.
Refresh Command Example
ghcrawl refresh owner/repo
TUI Screenshots
| User open issue/PR list modal | Refresh modal |
| --- | --- |
|
|
|
| Press u to open the current user's issue and PR list modal. | Press g to open the GitHub/embed/cluster refresh modal. |
| Closed members in a cluster | Fully closed cluster |
| --- | --- |
|
|
|
| Closed members stay visible in gray so overlap is still easy to inspect. | A cluster with no open members is grayed out as a whole until you hide closed items. |

Press l on wide screens to toggle the stacked layout with the cluster list on the left and members/detail stacked on the right.
Controlling The Refresh Flow More Intentionally
Most users should run ghcrawl refresh owner/repo and let it do the full pipeline in the right order.
If you need tighter control, you can run the three stages yourself:
ghcrawl sync owner/repo # pull the latest open issues and pull requests from GitHub
ghcrawl embed owner/repo # generate or refresh OpenAI embeddings for changed items
ghcrawl cluster owner/repo # rebuild local related-work clusters from the current vectors (local-only, but can take ~10 minutes on a ~12k issue/PR repo)Run them in that order. refresh is just the safe convenience command that performs the same sequence for you.
Init And Doctor
First run:
ghcrawl init
ghcrawl doctorinit behavior:
- prompts you to choose one of two secret-storage modes:
plaintext: saves both keys to~/.config/ghcrawl/config.json1Password CLI: stores only vault and item metadata and tells you how to runghcrawlthroughop
- if you choose plaintext storage, init warns that anyone who can read that file can use your keys and that resulting API charges are your responsibility
- if you choose 1Password CLI mode, init tells you to create a Secure Note with concealed fields named:
GITHUB_TOKENOPENAI_API_KEY
GitHub token guidance:
- recommended: fine-grained PAT scoped to the repositories you want to crawl
- repository permissions:
Metadata: Read-onlyIssues: Read-onlyPull requests: Read-only
- if you use a classic PAT and need private repositories,
repois the safe fallback scope
doctor checks:
- config file presence and path
- local DB path wiring
- GitHub token presence, token-shape validation, and a live auth smoke check
- OpenAI key presence, key-shape validation, and a live auth smoke check
- if init is configured for 1Password CLI but you forgot to run through your
opwrapper, doctor tells you that explicitly
1Password CLI Example
If you choose 1Password CLI mode, create a 1Password Secure Note with concealed fields named exactly:
GITHUB_TOKENOPENAI_API_KEY
Then add this wrapper to ~/.zshrc:
ghcrawl-op() {
env GITHUB_TOKEN="$(op read 'op://Private/ghcrawl/GITHUB_TOKEN')" \
OPENAI_API_KEY="$(op read 'op://Private/ghcrawl/OPENAI_API_KEY')" \
ghcrawl "$@"
}Then use:
ghcrawl-op doctor
ghcrawl-op refresh owner/repo
ghcrawl-op tui owner/repoUsing The CLI To Extract JSON Data
These commands are intended more for scripts, bots, and agent integrations than for normal day-to-day terminal browsing:
ghcrawl threads owner/repo --numbers 42,43,44
ghcrawl threads owner/repo --numbers 42,43,44 --include-closed
ghcrawl author owner/repo --login lqquan
ghcrawl close-thread owner/repo --number 42
ghcrawl close-cluster owner/repo --id 123
ghcrawl clusters owner/repo --min-size 10 --limit 20
ghcrawl clusters owner/repo --min-size 10 --limit 20 --include-closed
ghcrawl cluster-detail owner/repo --id 123
ghcrawl cluster-detail owner/repo --id 123 --include-closed
ghcrawl search owner/repo --query "download stalls"Use threads --numbers ... when you want several specific issue or PR records in one CLI call instead of paying process startup overhead repeatedly.
Use author --login ... when you want all currently open issue/PR records from one user plus the strongest stored same-author similarity match for each item.
By default, JSON list commands filter out locally closed issues/PRs and completely closed clusters. Use --include-closed when you need to inspect those records too.
Use close-thread when you know a local issue/PR should be treated as closed before the next GitHub sync catches up. If that was the last open item in its cluster, ghcrawl automatically marks the cluster closed too.
Use close-cluster when you want to locally suppress a whole cluster from default JSON exploration without waiting for a rebuild.
Cost To Operate
The main variable cost is OpenAI embeddings. Current model pricing is published by OpenAI here: OpenAI API pricing.
On a real local run against roughly 12k issues plus about 1.2x related PR and issue inputs, text-embedding-3-large came out to about $0.65 USD total to embed the repo. Treat that as an approximate data point for something like ~14k issue and PR inputs, not a hard guarantee.
This screenshot is the reference point for that estimate:

Agent Skill
This repo ships an installable skill at skills/ghcrawl/SKILL.md.
For installation and usage conventions, point users at vercel-labs/skills.
Install the CLI first, then install the skill:
npm i -g ghcrawl
npx skills add -g pwrdrvr/ghcrawlThe skill is built around the stable JSON CLI surface and is intentionally conservative:
- default mode assumes no valid API keys and stays read-only
- API-backed operations only become available after
ghcrawl doctor --jsonshows healthy auth - even then,
refresh,sync,embed, andclustershould only run when the user explicitly asks for them - JSON list commands hide locally closed issues/PRs and closed clusters by default unless
--include-closedis passed
ghcrawl doctor --json
ghcrawl refresh owner/repo
ghcrawl threads owner/repo --numbers 42,43,44
ghcrawl clusters owner/repo --min-size 10 --limit 20 --sort recent
ghcrawl cluster-detail owner/repo --id 123 --member-limit 20 --body-chars 280Video Walkthrough
GitHub README links cannot force a new tab, but clicking the preview above will open the YouTube walkthrough from the repo page.
The agent and build contract for this repo lives in SPEC.md.
Current Caveats
servestarts the local HTTP API only. The web UI is not built yet.synconly pulls open issues and PRs.- a plain
sync owner/repois incremental by default after the first full completed open scan for that repo syncis metadata-only by defaultsync --include-commentsenables issue comments, PR reviews, and review comments for deeper contextembeddefaults totext-embedding-3-largeembedgenerates separate vectors fortitleandbody, and also uses stored summary text when presentembedstores an input hash per source kind and will not resubmit unchanged text for re-embeddingsync --sinceaccepts ISO timestamps and relative durations like15m,2h,7d, and1mosync --limit <count>is the best smoke-test path on a busy repositorytuiremembers sort order and min cluster size per repository in the persisted config file- the TUI shows locally closed threads and clusters in gray; press
xto hide or show them - on wide screens, press
lto toggle between three columns and a wider cluster list with members/detail stacked on the right - if you add a brand-new repo from the TUI with
p, ghcrawl runs sync -> embed -> cluster and opens that repo with min cluster size1+
Responsibility Attestation
By operating ghcrawl, you accept that you, and any employer or organization you operate it for, are fully responsible for:
- obtaining GitHub and OpenAI API keys through legitimate means
- monitoring that your use of this tool complies with the agreements, usage terms, and platform policies that apply to those keys
- storing those API keys securely
- any misuse, theft, unexpected charges, or other consequences resulting from those keys being exposed or abused
- monitoring spend and stopping or reconfiguring the tool if usage is higher than you intended
The creators and contributors of ghcrawl accept no liability for API charges, account actions, policy violations, data loss, or misuse resulting from operation of this tool.

