npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

codebite

v0.7.5

Published

Agentic codebase analysis CLI — explore, understand, and analyze any codebase using LLM agents

Readme

codebite

An agentic codebase analysis CLI — explore, understand, and analyze any codebase using LLM agents. Think Claude Code or Cursor, but optimized purely for reading and understanding code, not writing it.

All LLM calls go through the Vercel AI SDK regardless of provider, giving you a unified, streaming-capable agent runtime.

Features

  • Multi-provider LLM support — 14 providers including OpenAI, Anthropic, Google, Vercel AI Gateway, and LiteLLM
  • Smart agentic loop — agent takes as many steps as needed, uses tools in parallel
  • Deep indexing — LLM analyzes each file to extract purpose, per-function summaries, external service integrations, and dependencies — stored in a vector DB for semantic search
  • Built-in analysis tools — chunked file reads, child-folder inspection, dependency analysis, git history, semantic search, web search, and Context7 docs lookup
  • Persistent chats — create, restore, and list project-local conversations under .codebite/
  • Deep mode — exhaustive multi-angle exploration for complex questions
  • Parallel subagents — in deep mode the main agent can spawn up to 5 focused subagents in parallel, each independently investigating a sub-question, and then synthesize their findings
  • Context-optimized — agent starts with the top-level tree, reads file chunks when needed, and supports full diagnostics logging for troubleshooting and auditing
  • Technology agnostic — works with any language and any project structure

Installation

# Global install
npm install -g codebite

# Or run without installing
npx codebite <command>

Requirements: Node.js ≥ 18

Quick Start

# 1. Go to any project
cd /path/to/your-project

# 2. Initialize
codebite init --provider vercel --model openai/gpt-4o-mini --apikey vck_your-key-here

# 3. Or move the API key into local config after init (never committed)
echo '{ "apiKey": "vck_your-key-here" }' > .codebite.local.json

# 4. Index the codebase (optional, but required for semantic search)
codebite index

# 5. Commit the index so your whole team benefits from it
git add .codebite-index/
git commit -m "Add codebase vector index"

# 6. Ask questions
codebite ask "What does this project do and how is it structured?"
codebite ask "Which files integrate with Stripe?"

API Key Management

API keys are never stored in .codebite.json. Instead, use one of these approaches:

Local development — .codebite.local.json

Create .codebite.local.json in your project root (it is gitignored automatically):

{
  "apiKey": "vck_your-api-key-here"
}

This file overrides any field in .codebite.json, so you can also use it to override other settings locally (e.g. switch models without touching the committed config). It is used by all commands that load config, including codebite index and codebite ask.

Environment variable — CODEBITE_API_KEY

CODEBITE_API_KEY=vck_your-key codebite ask "What is this project?"

Priority order

Config is resolved in this order (later values win):

  1. .codebite.json (committed, no secrets)
  2. .codebite.local.json (gitignored, local overrides)
  3. CODEBITE_API_KEY environment variable

Supported Providers

codebite supports 14 providers via the Vercel AI SDK. See docs/providers.md for the full list of models, setup instructions, and embedding support details.

| Provider | --provider | Example model | |----------|-------------|--------------| | OpenAI | openai | gpt-4o, gpt-4o-mini | | Anthropic | anthropic | claude-opus-4-5, claude-haiku-4-5-20251001 | | Google Gemini | google | gemini-2.0-flash, gemini-2.5-pro | | Mistral | mistral | mistral-large-latest | | Vercel AI Gateway | vercel | openai/gpt-4o-mini, anthropic/claude-sonnet-4-5 | | Groq | groq | llama-3.3-70b-versatile, mixtral-8x7b-32768 | | xAI (Grok) | xai | grok-3, grok-2-1212 | | Cohere | cohere | command-r-plus, command-a-03-2025 | | DeepSeek | deepseek | deepseek-chat, deepseek-reasoner | | AWS Bedrock | bedrock | anthropic.claude-3-5-sonnet-20241022-v2:0 | | Azure OpenAI | azure | your deployment name | | Together AI | togetherai | meta-llama/Llama-3.3-70B-Instruct-Turbo | | Fireworks AI | fireworks | accounts/fireworks/models/llama-v3p3-70b-instruct | | LiteLLM | litellm | ollama/llama3, any OpenAI-compatible model |

Vercel AI Gateway

Route all calls through your Vercel AI Gateway — just set provider: vercel and your Vercel API key:

codebite init --provider vercel --model openai/gpt-4o-mini
echo '{ "apiKey": "vck_your-vercel-key" }' > .codebite.local.json

The gateway URL is constructed from two optional environment variables:

VERCEL_TEAM_ID=your-team-slug       # defaults to "default"
VERCEL_GATEWAY_NAME=my-gateway      # defaults to "default"

Resulting URL: https://gateway.ai.vercel.sh/v1/{VERCEL_TEAM_ID}/{VERCEL_GATEWAY_NAME}

Configuration

Base settings are stored in .codebite.json in your project root (commit this, but omit apiKey):

{
  "provider": "vercel",
  "model": "openai/gpt-4o-mini",
  "maxSteps": 30,
  "deepMode": false
}

Local overrides go in .codebite.local.json (gitignored):

{
  "apiKey": "vck_your-api-key-here",
  "tools": {
    "tavilyApiKey": "tvly-your-tavily-key",
    "context7ApiKey": "ctx7-your-key"
  }
}

| Field | Required | Default | Description | |-------|----------|---------|-------------| | provider | Yes | — | Provider name — see docs/providers.md | | model | Yes | — | Model ID for the chosen provider | | apiKey | Yes* | — | API key — use .codebite.local.json or CODEBITE_API_KEY | | baseURL | No | — | Custom base URL (required for azure; defaults to http://localhost:4000 for litellm) | | maxSteps | No | 30 | Max agent steps per query (1–200) | | deepMode | No | false | Enable deep mode globally | | disableSubagents | No | false | Disable subagent spawning in deep mode | | tools.tavilyApiKey | No | — | Tavily key for web search | | tools.context7ApiKey | No | — | Context7 key for MCP-backed documentation lookup |

*apiKey must be provided via .codebite.local.json or CODEBITE_API_KEY env var. tools.context7ApiKey can also come from CONTEXT7_API_KEY.

CLI Reference

codebite init

codebite init \
  --provider openai \         # provider name (see docs/providers.md)
  --model gpt-4o \            # model ID
  [--apikey sk-...] \         # LLM API key (prefer .codebite.local.json instead)
  [--base-url https://...] \  # optional: custom base URL (LiteLLM defaults to http://localhost:4000)
  [--tavily-key tvly-...] \   # optional: enable web search
  [--context7-key ctx7-...] \ # optional: enable Context7 MCP docs lookup
  [--max-steps 50] \          # optional: override default 30
  [--deep]                    # optional: enable deep mode globally

Shorthand — you can combine provider and model into one flag:

codebite init --model openai/gpt-4o
#                     ^^^^^^ auto-parsed as provider=openai, model=gpt-4o

codebite index

Analyzes every source file with the LLM and builds a vector index at .codebite-index/.

codebite index

codebite index reads config the same way as ask, so putting apiKey in .codebite.local.json is supported.

How it works:

  1. Scans all files (respects .gitignore, skips binaries and files > 100 KB)
  2. LLM analyzes each file and produces a structured analysis:
    • purpose — one-sentence description of the file
    • summary — 2–4 sentence overview of the file's role in the project
    • functions — per-function/method/class descriptions (name + what it does)
    • services — external services/APIs integrated (e.g. "AWS S3", "Azure Notification Hub", "Redis", "Stripe")
    • exports — top-level public API surface
    • dependencies — external packages imported
    • patterns — architectural/design patterns used
  3. Generates embeddings from the combined analysis text (purpose + summary + services + function descriptions)
  4. Stores everything in a vectra LocalIndex at .codebite-index/ alongside a meta.json with creation timestamp

Git storage — commit the index to your repo:

git add .codebite-index/
git commit -m "Add codebase vector index"

Committing .codebite-index/ lets your whole team run semantic search without re-indexing. The index is plain JSON, diffs cleanly, and only needs rebuilding when the codebase changes significantly.

Staleness warning:

codebite ask automatically checks how old the index is. If it is older than 2 weeks, a warning is printed before each query:

⚠ Warning: Codebase index is 18 days old. Run "codebite index" to refresh it.

.codebite/ (chats, local config) stays gitignored. Only .codebite-index/ (the vector DB) should be committed.

codebite ask

codebite ask "your question"
codebite ask --deep "exhaustive analysis question"
codebite ask --max-steps 60 "complex question on large codebase"
codebite ask --diagnostics "trace this investigation"

# Flags can be combined freely
codebite ask --deep --max-steps 80 --diagnostics "Explain the full auth lifecycle"

| Flag | Description | |------|-------------| | --deep | Enable deep analysis mode for this turn (spawns parallel subagents, exhaustive exploration). Overrides the global deepMode setting for one query. | | --max-steps <n> | Override the maximum number of agent steps for this run (integer 1–200). Takes precedence over the maxSteps field in .codebite.json. Useful for bumping the limit on large codebases or trimming it for quick lookups. | | --diagnostics [path] | Write a full JSONL event log of the run (see Diagnostics below). |

If an active chat exists, ask continues that conversation automatically and persists the new turn.

Typical flow after creating or restoring a chat:

# Start or restore a chat first
codebite new "auth-review"
# or: codebite restore "auth-review"

# Normal follow-up turns
codebite ask "Where is authentication implemented?"
codebite ask "Now explain the token validation path"

# Deep-mode follow-up turn in the same chat
codebite ask --deep "Give me an exhaustive auth flow analysis"

# Back to normal mode in the same chat
codebite ask "Summarize the auth risks in 5 bullets"

Each ask command appends to the currently active chat, so the agent keeps the earlier conversation context. --deep changes only that one turn unless deep mode is enabled globally in config.

--diagnostics — full run logging

codebite ask --diagnostics "Why does the auth flow break on token refresh?"
# writes to .codebite/diagnostics/adhoc-2026-04-15-....jsonl

codebite ask --diagnostics ./my-run.jsonl "trace this"
# writes to a custom path

When --diagnostics is set, every event during the agentic run is appended as a JSON line to a JSONL file. The log contains:

| Event type | What is captured | |------------|-----------------| | run-start | Timestamp, question, system prompt, initial conversation messages, repository structure, config (provider, model, maxSteps, deepMode) | | step-start | Timestamp, step number, full input context sent to the LLM (system, messages, active tools list, tool choice) | | step-finish | Timestamp, step number, duration, finish reason, token usage, raw LLM response metadata (id, modelId, headers), generated text, all tool calls with arguments and results, response messages | | error | Timestamp, error details (name, message, stack trace), context label | | run-finish | Timestamp, total step count, total token usage, finish reason, final answer text |

The log file path defaults to .codebite/diagnostics/{chatId}-{timestamp}.jsonl (or adhoc-{timestamp}.jsonl outside a chat). Pass an explicit path to write elsewhere.

This is useful for:

  • Debugging unexpected agent behavior step-by-step
  • Auditing all LLM calls and tool interactions in a run
  • Performance analysis — token usage and duration per step
  • Replaying a run to understand which tools were called and why

Accuracy and completeness

For questions that enumerate files, integrations, or usages ("which files use X?", "where is Y integrated?", "list all Z"), the agent follows a layered strategy to avoid missing anything:

  1. Semantic search + grep in parallel — when a semantic index exists, semantic_search runs alongside grep_search in the first step. grep catches literal keyword matches; semantic search catches files that implement the concept without containing the exact term (e.g. a Redis client wrapper that never uses the word "Redis" in top-level identifiers).

  2. Indirect consumer detection — after finding direct matches, the agent greps for import patterns targeting the central module (e.g. import.*provider) to surface files that depend on a capability indirectly. This catches wrappers, adapters, and initializers that are indirect consumers.

  3. Dependency cross-reference — the agent verifies that every declared dependency relevant to the question (from dependency_analysis) is represented by at least one file in the answer. Missing entries trigger a deeper investigation rather than a silent omission.

  4. Final verification pass — before writing the answer, the agent runs a broad grep with alternative phrasings/synonyms to confirm completeness.

These rules are baked into the agent's system prompt and apply automatically to every ask run — no extra flags needed.

codebite new

Create a new persistent chat and make it active:

codebite new
codebite new "auth-review"

If you omit the name, the chat starts as Untitled chat and is automatically renamed from the first user message, capped at 100 characters. After codebite new, start talking with that chat by using codebite ask .... You do not need a separate chat command for each reply.

codebite restore

Restore an existing chat by name or id:

codebite restore "auth-review"
codebite resture "auth-review"   # typo-compatible alias

codebite list

List saved chats for the current project:

codebite list

Example Questions

codebite ask "What does this project do and how is it structured?"
codebite ask "Where is authentication implemented?"
codebite ask "Find all API endpoints and explain what each one does"
codebite ask "What are the gaps in test coverage?"
codebite ask "Explain how the database connection is managed"
codebite ask "What external dependencies are used and what are they for?"
codebite ask "Are there any obvious security concerns?"

# Semantic search shines for service/integration queries (requires codebite index)
codebite ask "Which files integrate with Azure Notification Hub?"
codebite ask "Where is Stripe used and what does each integration do?"
codebite ask "Show me everything that touches Redis"
codebite ask "Which modules send emails?"

# Deep mode — exhaustive, cross-referenced analysis
codebite ask --deep "Explain the full request lifecycle from HTTP to database"
codebite ask --deep "Find security vulnerabilities in this codebase"
codebite ask --deep "What design patterns are used and are they applied consistently?"

# Deep mode can delegate independent sub-investigations to parallel subagents
codebite ask --deep "Analyze this codebase from three angles: auth flow, test coverage, and external integrations"
# ^ the main agent may spawn parallel subagents for each angle and synthesize the results

Running Against Any Project

# Clone any open-source project
git clone https://github.com/expressjs/express /tmp/express
cd /tmp/express

# Initialize (no API key in config)
codebite init --provider vercel --model openai/gpt-4o-mini

# Add your API key locally
echo '{ "apiKey": "vck_your-key" }' > .codebite.local.json

# Build semantic index (optional but recommended)
codebite index

# Commit the index for team-wide semantic search
git add .codebite-index/
git commit -m "Add codebase vector index"

# Ask away
codebite ask "How does Express handle middleware chains?"
codebite ask "How are route parameters extracted?"
codebite ask --deep "Explain the full request-response cycle"

Large Codebases

The agent handles large projects automatically:

  • Uses glob_search + grep_search to narrow scope before reading files
  • Starts with the root tree up to 2 levels deep already in context
  • Reads files in focused chunks with read_file_chunk or read_file offset/limit navigation
  • Uses folder_children for one-level folder inspection without recursive noise
  • Uses semantic_search to jump to relevant files by concept
  • Summarizes findings progressively — never holds entire files in context

For very deep analyses on large repos, increase --max-steps:

codebite ask --max-steps 80 "Explain the entire auth system"

Agent Tools

| Tool | What it does | |------|-------------| | read_file | Read file contents with line numbers, offset and limit | | read_file_chunk | Read a smaller targeted slice of a file for tighter context control | | glob_search | Find files by pattern (**/*.ts, src/**/*.test.js) | | grep_search | Search file contents by text or regex with surrounding context | | directory_tree | Show project structure (respects .gitignore) | | list_directory | List direct child files and folders in a directory | | folder_children | Alias focused on one-level folder structure only | | file_stats | File size, line count, language detection | | get_cwd | Get project root path | | shell_command | Read-only git commands (git log, git blame, git diff, …) | | dependency_analysis | Parse package.json, go.mod, Cargo.toml, requirements.txt, … | | semantic_search | Find files by semantic meaning — matches on purpose, functions, and service integrations (requires codebite index) | | web_search | Search the web for docs and library info (requires Tavily key) | | context7_docs | Query up-to-date docs via Context7 MCP (requires Context7 key) | | spawn_subagents | Spawn 1–5 parallel subagents for independent investigations (deep mode only) |

The agent calls tools in parallel when independent — a native feature of the Vercel AI SDK.

Development

npm install
npm run build          # tsc → dist/
npm test               # vitest run (unit tests only)
npm run test:e2e       # integration tests (requires VERCEL_API_KEY)
npm run test:watch     # watch mode
npm link               # optional: point the global `codebite` command at this checkout

# Run without building (dev mode)
npx tsx src/cli.ts ask "What is this project?"

Local API key for development

Create .codebite.local.json in the project root (it is gitignored):

{
  "apiKey": "vck_your-local-key"
}

This file is merged on top of .codebite.json at runtime. You can also override any other config field here.

Troubleshooting local config

If codebite says apiKey is missing even though .codebite.local.json exists:

  • Confirm you are running the expected binary with Get-Command codebite in PowerShell.
  • A stale global install can point to an older package version that does not match your local checkout.
  • From this repo, npm link will repoint the global codebite command at the current source tree.
  • As a fallback during development, run node .\\dist\\cli.js <command> or npx tsx src/cli.ts <command> from the repo root.

CI / CD

PR checks

Every pull request targeting main runs two jobs automatically via GitHub Actions:

  • Unit Testsnpm test (fast, no API key needed)
  • E2E Testsnpm run test:e2e (calls the live LLM; requires VERCEL_API_KEY secret)

The E2E test asks "What is this repo?" against the actual codebase and asserts the answer contains relevant keywords (codebase, codebite, cli, agent, llm, or analysis).

Publishing to npm

Push a version tag to trigger automatic publishing:

npm version patch   # or minor / major
git push --follow-tags

The publish workflow runs unit tests, builds, then publishes to npm using the NPM_TOKEN secret.

Required GitHub Secrets

| Secret | Used by | Description | |--------|---------|-------------| | NPM_TOKEN | publish workflow | npm automation token with publish rights | | VERCEL_API_KEY | CI e2e workflow | Vercel AI Gateway key for live LLM calls |

Ignoring Files

Both the agent and indexer respect:

  • Your project's .gitignore
  • Always ignored (never scanned or shown in directory trees): node_modules, .git, .codebite, .codebite-index, dist, build, coverage, __pycache__, target, vendor

.codebite-index/ is excluded from scanning even though it is committed to git — it is a database artifact, not source code.

License

MIT