codebite
v0.7.5
Published
Agentic codebase analysis CLI — explore, understand, and analyze any codebase using LLM agents
Maintainers
Readme
codebite
An agentic codebase analysis CLI — explore, understand, and analyze any codebase using LLM agents. Think Claude Code or Cursor, but optimized purely for reading and understanding code, not writing it.
All LLM calls go through the Vercel AI SDK regardless of provider, giving you a unified, streaming-capable agent runtime.
Features
- Multi-provider LLM support — 14 providers including OpenAI, Anthropic, Google, Vercel AI Gateway, and LiteLLM
- Smart agentic loop — agent takes as many steps as needed, uses tools in parallel
- Deep indexing — LLM analyzes each file to extract purpose, per-function summaries, external service integrations, and dependencies — stored in a vector DB for semantic search
- Built-in analysis tools — chunked file reads, child-folder inspection, dependency analysis, git history, semantic search, web search, and Context7 docs lookup
- Persistent chats — create, restore, and list project-local conversations under
.codebite/ - Deep mode — exhaustive multi-angle exploration for complex questions
- Parallel subagents — in deep mode the main agent can spawn up to 5 focused subagents in parallel, each independently investigating a sub-question, and then synthesize their findings
- Context-optimized — agent starts with the top-level tree, reads file chunks when needed, and supports full diagnostics logging for troubleshooting and auditing
- Technology agnostic — works with any language and any project structure
Installation
# Global install
npm install -g codebite
# Or run without installing
npx codebite <command>Requirements: Node.js ≥ 18
Quick Start
# 1. Go to any project
cd /path/to/your-project
# 2. Initialize
codebite init --provider vercel --model openai/gpt-4o-mini --apikey vck_your-key-here
# 3. Or move the API key into local config after init (never committed)
echo '{ "apiKey": "vck_your-key-here" }' > .codebite.local.json
# 4. Index the codebase (optional, but required for semantic search)
codebite index
# 5. Commit the index so your whole team benefits from it
git add .codebite-index/
git commit -m "Add codebase vector index"
# 6. Ask questions
codebite ask "What does this project do and how is it structured?"
codebite ask "Which files integrate with Stripe?"API Key Management
API keys are never stored in .codebite.json. Instead, use one of these approaches:
Local development — .codebite.local.json
Create .codebite.local.json in your project root (it is gitignored automatically):
{
"apiKey": "vck_your-api-key-here"
}This file overrides any field in .codebite.json, so you can also use it to override other settings locally (e.g. switch models without touching the committed config). It is used by all commands that load config, including codebite index and codebite ask.
Environment variable — CODEBITE_API_KEY
CODEBITE_API_KEY=vck_your-key codebite ask "What is this project?"Priority order
Config is resolved in this order (later values win):
.codebite.json(committed, no secrets).codebite.local.json(gitignored, local overrides)CODEBITE_API_KEYenvironment variable
Supported Providers
codebite supports 14 providers via the Vercel AI SDK. See docs/providers.md for the full list of models, setup instructions, and embedding support details.
| Provider | --provider | Example model |
|----------|-------------|--------------|
| OpenAI | openai | gpt-4o, gpt-4o-mini |
| Anthropic | anthropic | claude-opus-4-5, claude-haiku-4-5-20251001 |
| Google Gemini | google | gemini-2.0-flash, gemini-2.5-pro |
| Mistral | mistral | mistral-large-latest |
| Vercel AI Gateway | vercel | openai/gpt-4o-mini, anthropic/claude-sonnet-4-5 |
| Groq | groq | llama-3.3-70b-versatile, mixtral-8x7b-32768 |
| xAI (Grok) | xai | grok-3, grok-2-1212 |
| Cohere | cohere | command-r-plus, command-a-03-2025 |
| DeepSeek | deepseek | deepseek-chat, deepseek-reasoner |
| AWS Bedrock | bedrock | anthropic.claude-3-5-sonnet-20241022-v2:0 |
| Azure OpenAI | azure | your deployment name |
| Together AI | togetherai | meta-llama/Llama-3.3-70B-Instruct-Turbo |
| Fireworks AI | fireworks | accounts/fireworks/models/llama-v3p3-70b-instruct |
| LiteLLM | litellm | ollama/llama3, any OpenAI-compatible model |
Vercel AI Gateway
Route all calls through your Vercel AI Gateway — just set provider: vercel and your Vercel API key:
codebite init --provider vercel --model openai/gpt-4o-mini
echo '{ "apiKey": "vck_your-vercel-key" }' > .codebite.local.jsonThe gateway URL is constructed from two optional environment variables:
VERCEL_TEAM_ID=your-team-slug # defaults to "default"
VERCEL_GATEWAY_NAME=my-gateway # defaults to "default"Resulting URL: https://gateway.ai.vercel.sh/v1/{VERCEL_TEAM_ID}/{VERCEL_GATEWAY_NAME}
Configuration
Base settings are stored in .codebite.json in your project root (commit this, but omit apiKey):
{
"provider": "vercel",
"model": "openai/gpt-4o-mini",
"maxSteps": 30,
"deepMode": false
}Local overrides go in .codebite.local.json (gitignored):
{
"apiKey": "vck_your-api-key-here",
"tools": {
"tavilyApiKey": "tvly-your-tavily-key",
"context7ApiKey": "ctx7-your-key"
}
}| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| provider | Yes | — | Provider name — see docs/providers.md |
| model | Yes | — | Model ID for the chosen provider |
| apiKey | Yes* | — | API key — use .codebite.local.json or CODEBITE_API_KEY |
| baseURL | No | — | Custom base URL (required for azure; defaults to http://localhost:4000 for litellm) |
| maxSteps | No | 30 | Max agent steps per query (1–200) |
| deepMode | No | false | Enable deep mode globally |
| disableSubagents | No | false | Disable subagent spawning in deep mode |
| tools.tavilyApiKey | No | — | Tavily key for web search |
| tools.context7ApiKey | No | — | Context7 key for MCP-backed documentation lookup |
*apiKey must be provided via .codebite.local.json or CODEBITE_API_KEY env var. tools.context7ApiKey can also come from CONTEXT7_API_KEY.
CLI Reference
codebite init
codebite init \
--provider openai \ # provider name (see docs/providers.md)
--model gpt-4o \ # model ID
[--apikey sk-...] \ # LLM API key (prefer .codebite.local.json instead)
[--base-url https://...] \ # optional: custom base URL (LiteLLM defaults to http://localhost:4000)
[--tavily-key tvly-...] \ # optional: enable web search
[--context7-key ctx7-...] \ # optional: enable Context7 MCP docs lookup
[--max-steps 50] \ # optional: override default 30
[--deep] # optional: enable deep mode globallyShorthand — you can combine provider and model into one flag:
codebite init --model openai/gpt-4o
# ^^^^^^ auto-parsed as provider=openai, model=gpt-4ocodebite index
Analyzes every source file with the LLM and builds a vector index at .codebite-index/.
codebite indexcodebite index reads config the same way as ask, so putting apiKey in .codebite.local.json is supported.
How it works:
- Scans all files (respects
.gitignore, skips binaries and files > 100 KB) - LLM analyzes each file and produces a structured analysis:
purpose— one-sentence description of the filesummary— 2–4 sentence overview of the file's role in the projectfunctions— per-function/method/class descriptions (name + what it does)services— external services/APIs integrated (e.g. "AWS S3", "Azure Notification Hub", "Redis", "Stripe")exports— top-level public API surfacedependencies— external packages importedpatterns— architectural/design patterns used
- Generates embeddings from the combined analysis text (purpose + summary + services + function descriptions)
- Stores everything in a vectra
LocalIndexat.codebite-index/alongside ameta.jsonwith creation timestamp
Git storage — commit the index to your repo:
git add .codebite-index/
git commit -m "Add codebase vector index"Committing .codebite-index/ lets your whole team run semantic search without re-indexing. The index is plain JSON, diffs cleanly, and only needs rebuilding when the codebase changes significantly.
Staleness warning:
codebite ask automatically checks how old the index is. If it is older than 2 weeks, a warning is printed before each query:
⚠ Warning: Codebase index is 18 days old. Run "codebite index" to refresh it.
.codebite/(chats, local config) stays gitignored. Only.codebite-index/(the vector DB) should be committed.
codebite ask
codebite ask "your question"
codebite ask --deep "exhaustive analysis question"
codebite ask --max-steps 60 "complex question on large codebase"
codebite ask --diagnostics "trace this investigation"
# Flags can be combined freely
codebite ask --deep --max-steps 80 --diagnostics "Explain the full auth lifecycle"| Flag | Description |
|------|-------------|
| --deep | Enable deep analysis mode for this turn (spawns parallel subagents, exhaustive exploration). Overrides the global deepMode setting for one query. |
| --max-steps <n> | Override the maximum number of agent steps for this run (integer 1–200). Takes precedence over the maxSteps field in .codebite.json. Useful for bumping the limit on large codebases or trimming it for quick lookups. |
| --diagnostics [path] | Write a full JSONL event log of the run (see Diagnostics below). |
If an active chat exists, ask continues that conversation automatically and persists the new turn.
Typical flow after creating or restoring a chat:
# Start or restore a chat first
codebite new "auth-review"
# or: codebite restore "auth-review"
# Normal follow-up turns
codebite ask "Where is authentication implemented?"
codebite ask "Now explain the token validation path"
# Deep-mode follow-up turn in the same chat
codebite ask --deep "Give me an exhaustive auth flow analysis"
# Back to normal mode in the same chat
codebite ask "Summarize the auth risks in 5 bullets"Each ask command appends to the currently active chat, so the agent keeps the earlier conversation context. --deep changes only that one turn unless deep mode is enabled globally in config.
--diagnostics — full run logging
codebite ask --diagnostics "Why does the auth flow break on token refresh?"
# writes to .codebite/diagnostics/adhoc-2026-04-15-....jsonl
codebite ask --diagnostics ./my-run.jsonl "trace this"
# writes to a custom pathWhen --diagnostics is set, every event during the agentic run is appended as a JSON line to a JSONL file. The log contains:
| Event type | What is captured |
|------------|-----------------|
| run-start | Timestamp, question, system prompt, initial conversation messages, repository structure, config (provider, model, maxSteps, deepMode) |
| step-start | Timestamp, step number, full input context sent to the LLM (system, messages, active tools list, tool choice) |
| step-finish | Timestamp, step number, duration, finish reason, token usage, raw LLM response metadata (id, modelId, headers), generated text, all tool calls with arguments and results, response messages |
| error | Timestamp, error details (name, message, stack trace), context label |
| run-finish | Timestamp, total step count, total token usage, finish reason, final answer text |
The log file path defaults to .codebite/diagnostics/{chatId}-{timestamp}.jsonl (or adhoc-{timestamp}.jsonl outside a chat). Pass an explicit path to write elsewhere.
This is useful for:
- Debugging unexpected agent behavior step-by-step
- Auditing all LLM calls and tool interactions in a run
- Performance analysis — token usage and duration per step
- Replaying a run to understand which tools were called and why
Accuracy and completeness
For questions that enumerate files, integrations, or usages ("which files use X?", "where is Y integrated?", "list all Z"), the agent follows a layered strategy to avoid missing anything:
Semantic search + grep in parallel — when a semantic index exists,
semantic_searchruns alongsidegrep_searchin the first step. grep catches literal keyword matches; semantic search catches files that implement the concept without containing the exact term (e.g. a Redis client wrapper that never uses the word "Redis" in top-level identifiers).Indirect consumer detection — after finding direct matches, the agent greps for import patterns targeting the central module (e.g.
import.*provider) to surface files that depend on a capability indirectly. This catches wrappers, adapters, and initializers that are indirect consumers.Dependency cross-reference — the agent verifies that every declared dependency relevant to the question (from
dependency_analysis) is represented by at least one file in the answer. Missing entries trigger a deeper investigation rather than a silent omission.Final verification pass — before writing the answer, the agent runs a broad grep with alternative phrasings/synonyms to confirm completeness.
These rules are baked into the agent's system prompt and apply automatically to every ask run — no extra flags needed.
codebite new
Create a new persistent chat and make it active:
codebite new
codebite new "auth-review"If you omit the name, the chat starts as Untitled chat and is automatically renamed from the first user message, capped at 100 characters. After codebite new, start talking with that chat by using codebite ask .... You do not need a separate chat command for each reply.
codebite restore
Restore an existing chat by name or id:
codebite restore "auth-review"
codebite resture "auth-review" # typo-compatible aliascodebite list
List saved chats for the current project:
codebite listExample Questions
codebite ask "What does this project do and how is it structured?"
codebite ask "Where is authentication implemented?"
codebite ask "Find all API endpoints and explain what each one does"
codebite ask "What are the gaps in test coverage?"
codebite ask "Explain how the database connection is managed"
codebite ask "What external dependencies are used and what are they for?"
codebite ask "Are there any obvious security concerns?"
# Semantic search shines for service/integration queries (requires codebite index)
codebite ask "Which files integrate with Azure Notification Hub?"
codebite ask "Where is Stripe used and what does each integration do?"
codebite ask "Show me everything that touches Redis"
codebite ask "Which modules send emails?"
# Deep mode — exhaustive, cross-referenced analysis
codebite ask --deep "Explain the full request lifecycle from HTTP to database"
codebite ask --deep "Find security vulnerabilities in this codebase"
codebite ask --deep "What design patterns are used and are they applied consistently?"
# Deep mode can delegate independent sub-investigations to parallel subagents
codebite ask --deep "Analyze this codebase from three angles: auth flow, test coverage, and external integrations"
# ^ the main agent may spawn parallel subagents for each angle and synthesize the resultsRunning Against Any Project
# Clone any open-source project
git clone https://github.com/expressjs/express /tmp/express
cd /tmp/express
# Initialize (no API key in config)
codebite init --provider vercel --model openai/gpt-4o-mini
# Add your API key locally
echo '{ "apiKey": "vck_your-key" }' > .codebite.local.json
# Build semantic index (optional but recommended)
codebite index
# Commit the index for team-wide semantic search
git add .codebite-index/
git commit -m "Add codebase vector index"
# Ask away
codebite ask "How does Express handle middleware chains?"
codebite ask "How are route parameters extracted?"
codebite ask --deep "Explain the full request-response cycle"Large Codebases
The agent handles large projects automatically:
- Uses
glob_search+grep_searchto narrow scope before reading files - Starts with the root tree up to 2 levels deep already in context
- Reads files in focused chunks with
read_file_chunkorread_fileoffset/limit navigation - Uses
folder_childrenfor one-level folder inspection without recursive noise - Uses
semantic_searchto jump to relevant files by concept - Summarizes findings progressively — never holds entire files in context
For very deep analyses on large repos, increase --max-steps:
codebite ask --max-steps 80 "Explain the entire auth system"Agent Tools
| Tool | What it does |
|------|-------------|
| read_file | Read file contents with line numbers, offset and limit |
| read_file_chunk | Read a smaller targeted slice of a file for tighter context control |
| glob_search | Find files by pattern (**/*.ts, src/**/*.test.js) |
| grep_search | Search file contents by text or regex with surrounding context |
| directory_tree | Show project structure (respects .gitignore) |
| list_directory | List direct child files and folders in a directory |
| folder_children | Alias focused on one-level folder structure only |
| file_stats | File size, line count, language detection |
| get_cwd | Get project root path |
| shell_command | Read-only git commands (git log, git blame, git diff, …) |
| dependency_analysis | Parse package.json, go.mod, Cargo.toml, requirements.txt, … |
| semantic_search | Find files by semantic meaning — matches on purpose, functions, and service integrations (requires codebite index) |
| web_search | Search the web for docs and library info (requires Tavily key) |
| context7_docs | Query up-to-date docs via Context7 MCP (requires Context7 key) |
| spawn_subagents | Spawn 1–5 parallel subagents for independent investigations (deep mode only) |
The agent calls tools in parallel when independent — a native feature of the Vercel AI SDK.
Development
npm install
npm run build # tsc → dist/
npm test # vitest run (unit tests only)
npm run test:e2e # integration tests (requires VERCEL_API_KEY)
npm run test:watch # watch mode
npm link # optional: point the global `codebite` command at this checkout
# Run without building (dev mode)
npx tsx src/cli.ts ask "What is this project?"Local API key for development
Create .codebite.local.json in the project root (it is gitignored):
{
"apiKey": "vck_your-local-key"
}This file is merged on top of .codebite.json at runtime. You can also override any other config field here.
Troubleshooting local config
If codebite says apiKey is missing even though .codebite.local.json exists:
- Confirm you are running the expected binary with
Get-Command codebitein PowerShell. - A stale global install can point to an older package version that does not match your local checkout.
- From this repo,
npm linkwill repoint the globalcodebitecommand at the current source tree. - As a fallback during development, run
node .\\dist\\cli.js <command>ornpx tsx src/cli.ts <command>from the repo root.
CI / CD
PR checks
Every pull request targeting main runs two jobs automatically via GitHub Actions:
- Unit Tests —
npm test(fast, no API key needed) - E2E Tests —
npm run test:e2e(calls the live LLM; requiresVERCEL_API_KEYsecret)
The E2E test asks "What is this repo?" against the actual codebase and asserts the answer contains relevant keywords (codebase, codebite, cli, agent, llm, or analysis).
Publishing to npm
Push a version tag to trigger automatic publishing:
npm version patch # or minor / major
git push --follow-tagsThe publish workflow runs unit tests, builds, then publishes to npm using the NPM_TOKEN secret.
Required GitHub Secrets
| Secret | Used by | Description |
|--------|---------|-------------|
| NPM_TOKEN | publish workflow | npm automation token with publish rights |
| VERCEL_API_KEY | CI e2e workflow | Vercel AI Gateway key for live LLM calls |
Ignoring Files
Both the agent and indexer respect:
- Your project's
.gitignore - Always ignored (never scanned or shown in directory trees):
node_modules,.git,.codebite,.codebite-index,dist,build,coverage,__pycache__,target,vendor
.codebite-index/ is excluded from scanning even though it is committed to git — it is a database artifact, not source code.
License
MIT
