codeprimer

v0.5.1

Published

13 days ago

Auto-generate AI context files (CLAUDE.md, .cursorrules, copilot-instructions.md, GEMINI.md) from your codebase

0High
0Medium
0Low

ai context claude cursor copilot gemini codebase

CodePrimer analyzes your codebase and generates context files for every major AI coding tool — so your AI agents understand your architecture, patterns, and conventions from the first prompt.

npx codeprimer ./my-repo

By default, generates CLAUDE.md. Pass --only to generate other formats:

| Format | File written | Tool | |--------|--------------|------| | claude (default) | CLAUDE.md | Claude Code | | cursor | .cursor/rules/codeprimer.mdc | Cursor | | copilot | .github/copilot-instructions.md | GitHub Copilot | | gemini | GEMINI.md | Google Gemini | | windsurf | .windsurf/rules/codeprimer.md | Windsurf | | cline | .clinerules | Cline / Roo Code | | aider | .aider.instructions.md | Aider | | continue | .continue/prompts/codeprimer.md | Continue.dev | | agents | AGENTS.md | OpenAI Codex / others | | context | .codeprimer/CONTEXT.md | Any AI tool |

Each tool reads from a fixed path it expects, so the file locations are not configurable. Generic context lives under .codeprimer/ to keep the repo root clean.

Why?

Every time an AI agent opens your repo, it starts from zero. It doesn't know your architecture, your naming conventions, or that you never use ORMs. It hallucinates file paths, ignores your auth patterns, and writes code that doesn't match your team's style.

IDE tools like Cursor and Windsurf build ephemeral context that disappears when you close the tab. CodePrimer generates permanent, committed, shared context that every developer and every AI tool can use — including headless agents in CI that don't have an IDE.

Quick Start

# Use any LLM provider — auto-detected from your API key
export ANTHROPIC_API_KEY=sk-ant-...    # or OPENAI_API_KEY or GEMINI_API_KEY

# Generate context for your repo
npx codeprimer ./my-repo

# Or install globally for the c8r shorthand
npm install -g codeprimer
c8r ./my-repo

How It Works

1. SCAN      Discover files, filter noise, rank by architectural importance
2. EXTRACT   Parse AST (TypeScript, Python, Go, Rust, Java, C#, Ruby, Kotlin, Swift)
3. SUMMARIZE Call fast LLM (Haiku/GPT-4o-mini/Gemini Flash) per file, in parallel
4. SYNTHESIZE Call smart LLM (Sonnet/GPT-4o/Gemini Pro) for architecture overview
5. GENERATE  Format for each AI tool and write to repo

Hybrid AST + LLM approach: For supported languages, CodePrimer extracts the structural skeleton (exports, imports, classes, signatures) via AST parsing and sends only that to the LLM — not the full source code. This makes summarization faster and more accurate.

Options

Usage: codeprimer [options] <path>

Common
  --only <formats>     Restrict output: claude,cursor,copilot,gemini,windsurf,cline,aider,continue,agents,context
  --scan-only          Preview files that would be analyzed — no API calls, free
  --dry-run            Print output without writing files
  --verbose            Show detailed progress and token usage

Provider
  --provider <name>    LLM provider: anthropic, openai, gemini (auto-detected from API key)
  --list-models        List available models for your provider, then exit
  --model <name>       Override the LLM model for both summary and synthesis steps

Advanced
  --out <dir>          Output directory (default: repo root)
  --batch              Use batch API for bulk processing (slower, 50% cheaper)
  --max-files <n>      Limit number of files to process (default: 300)
  --ignore <patterns>  Glob patterns to exclude, comma-separated
  --subpackages        Generate context files per detected subpackage (writes CLAUDE.md, AGENTS.md, GEMINI.md, .clinerules inside each subpackage directory)

Examples

# Preview what files would be analyzed (free, no API calls)
c8r --scan-only ./my-repo

# Generate only CLAUDE.md
c8r --only claude ./my-repo

# Limit to 50 most important files (cheaper, faster)
c8r --max-files 50 ./my-repo

# Use batch mode (slower, more efficient)
c8r --batch ./my-repo

# Exclude directories
c8r --ignore "tests,docs,vendor" ./my-repo

# Use OpenAI instead of Anthropic
export OPENAI_API_KEY=sk-...
c8r --provider openai ./my-repo

LLM Providers

CodePrimer auto-detects your provider from environment variables:

| Provider | Env Variable | Summary Model | Synthesis Model | |----------|-------------|---------------|-----------------| | Anthropic | ANTHROPIC_API_KEY | Claude Haiku 4.5 | Claude Sonnet 4.5 | | OpenAI | OPENAI_API_KEY | GPT-4o-mini | GPT-4o | | Google | GEMINI_API_KEY | Gemini 2.5 Flash | Gemini 2.5 Flash |

Recommended: Claude Haiku 4.5 — in our testing, Haiku produces the best context files for this task: fast, accurate, and cost-effective. Use --model to override the default for any provider:

# Use Haiku for everything (faster + cheaper than Sonnet for this task)
c8r --model claude-haiku-4-5-20251001 ./my-repo

# Use a specific OpenAI model
c8r --provider openai --model gpt-4o-mini ./my-repo

Your API key, your tokens, your data. CodePrimer is a local tool — nothing is sent to CodePrimer servers.

Languages

AST extraction (optimized, ~70% token savings):

TypeScript, JavaScript, JSX, TSX (TypeScript Compiler API)
Python, Go, Rust, Java, C#, Ruby, Kotlin, Swift, Scala, C, C++, PHP (tree-sitter)

LLM-only fallback (works for any language):

Vue, Svelte, Astro, Dart, Elixir, Zig, and any other language

Large Repos

For repos with thousands of files, CodePrimer uses smart file selection:

Scores every file by architectural importance — entry points, route handlers, schemas, models, config files, and hub files (imported by many others) rank highest
Monorepo-aware — detects packages/, services/, apps/ directories and ensures each service gets proportional coverage
Processes top 300 files by default (override with --max-files)
Two-pass scanning — scores file paths first (instant), then reads only the top candidates

# See what it would pick
c8r --scan-only --verbose ./big-monorepo

# Override the default 300-file limit
c8r --max-files 1000 ./big-monorepo

Caching

Summaries are cached in .codeprimer/cache.json in your repo. Re-running on the same repo skips unchanged files — cached summaries are reused instantly. Only the final synthesis call runs.

Cache invalidates when:

File content changes
LLM model changes
Prompt version changes

Commit the cache — checking in .codeprimer/cache.json is recommended. Team members and CI share cached summaries, so only changed files ever hit the LLM again.

If you prefer to keep the cache local, add just the cache file (not the whole directory) to your .gitignore:

.codeprimer/cache.json

This keeps .codeprimer/CONTEXT.md committable while the cache stays off git.

Augment, Never Overwrite

If your repo already has a hand-written CLAUDE.md (or .cursorrules, etc.), CodePrimer never overwrites your content. It appends an auto-generated block with clear markers:

# Your hand-written rules (untouched)
- Never use ORMs
- Always wrap errors in AppError

<!-- BEGIN AUTO-GENERATED CONTEXT — do not edit below this line -->
<!-- Generated by CodePrimer (codeprimer.dev) -->

## Architecture
...

<!-- END AUTO-GENERATED CONTEXT -->

Everything outside the markers is yours forever. CodePrimer only replaces content between the markers on re-runs.

Working with Claude Code's `/init`

CodePrimer and Claude Code's /init command are complementary:

/init creates the human layer — build commands, rules, team conventions
CodePrimer adds the AI-synthesized layer — architecture, API routes, flows, pitfalls

Recommended order:

Run /init first (or write CLAUDE.md by hand)
Run c8r ./my-repo — it appends the auto-generated block below your content
On re-runs, only the auto-generated block is updated

What if /init runs after CodePrimer? /init overwrites the file, but your CodePrimer cache is still intact. Just run c8r again — it restores the auto-generated block in seconds (cached summaries are reused).

Custom Ignore

Two ways to exclude files/directories:

# Per-run (ad-hoc)
c8r --ignore "tests,docs,vendor" ./my-repo

# Persistent (commit to repo)
echo "tests/**\ndocs/**\nvendor/**" > .c8rignore
c8r ./my-repo

.c8rignore uses the same syntax as .gitignore. Both stack with .gitignore — all three are applied together.

FAQ

Does CodePrimer read my code?

Yes — locally. Your code is sent to the LLM provider you configured (Anthropic, OpenAI, or Google) using YOUR API key. CodePrimer has no servers, no accounts, no telemetry. It's a local CLI tool.

How is this different from Cursor/Windsurf indexing my codebase?

IDE indexing is ephemeral (gone when you close the tab), tool-specific (Cursor's index doesn't help Copilot), and single-repo (no cross-repo awareness). CodePrimer generates permanent, committed files that work across all 10 tools, persist in git, and are reviewable by humans.

Will this conflict with my existing CLAUDE.md / .cursorrules?

No. CodePrimer appends an auto-generated block with markers. Your hand-written content above the markers is never touched. On re-runs, only the content between markers is replaced.

What about Claude Code's `/init` command?

They're complementary. /init creates the human-written rules. CodePrimer adds the AI-synthesized architecture. Run /init first, then c8r — both coexist in the same file.

Can I use this in CI / GitHub Actions?

Yes. Run npx codeprimer ./repo in a GitHub Action with your API key as a secret. This gives you auto-updated context files on every PR — for free.

What if I have a monorepo with services in different languages?

CodePrimer handles this. It detects workspace directories (packages/, services/, apps/), ensures each service gets proportional file coverage, and synthesizes a unified architecture view across all languages.

Can it generate separate context files per service in a monorepo?

Yes — use --subpackages. CodePrimer detects workspace directories (packages/, services/, apps/) and generates a dedicated set of context files inside each one: CLAUDE.md, AGENTS.md, GEMINI.md, and .clinerules. Each service gets its own synthesized architecture view, not just a slice of the global one. The root-level files are still generated as normal.

c8r --subpackages ./my-monorepo
# writes packages/api/CLAUDE.md, packages/api/AGENTS.md, ...
# writes packages/web/CLAUDE.md, packages/web/AGENTS.md, ...

How does it handle 20,000+ file repos?

Two-pass scanning: scores all file paths by architectural importance (instant, no file reading), then reads only the top candidates. Default processes 300 files. Override with --max-files. Use --scan-only to preview the selection.

Can I generate for just one tool?

Yes: c8r --only claude ./my-repo generates only CLAUDE.md. Use comma-separated values for multiple: --only claude,cursor,copilot.

Is the generated output good enough to commit?

Yes — but review it first. Use --dry-run to preview output. The quality depends on your codebase size and the LLM model used. We recommend a quick review of the Architecture and Common Pitfalls sections before committing.

What languages are supported?

All of them. AST extraction (faster, more accurate) supports TypeScript, JavaScript, Python, Go, Rust, Java, C#, Ruby, Kotlin, Swift, Scala, C, C++, PHP. All other languages fall back to LLM-only analysis — works fine.

License

MIT