codeprimer
v0.5.1
Published
Auto-generate AI context files (CLAUDE.md, .cursorrules, copilot-instructions.md, GEMINI.md) from your codebase
Maintainers
Readme
CodePrimer analyzes your codebase and generates context files for every major AI coding tool — so your AI agents understand your architecture, patterns, and conventions from the first prompt.
npx codeprimer ./my-repoBy default, generates CLAUDE.md. Pass --only to generate other formats:
| Format | File written | Tool |
|--------|--------------|------|
| claude (default) | CLAUDE.md | Claude Code |
| cursor | .cursor/rules/codeprimer.mdc | Cursor |
| copilot | .github/copilot-instructions.md | GitHub Copilot |
| gemini | GEMINI.md | Google Gemini |
| windsurf | .windsurf/rules/codeprimer.md | Windsurf |
| cline | .clinerules | Cline / Roo Code |
| aider | .aider.instructions.md | Aider |
| continue | .continue/prompts/codeprimer.md | Continue.dev |
| agents | AGENTS.md | OpenAI Codex / others |
| context | .codeprimer/CONTEXT.md | Any AI tool |
Each tool reads from a fixed path it expects, so the file locations are not configurable. Generic context lives under .codeprimer/ to keep the repo root clean.
Why?
Every time an AI agent opens your repo, it starts from zero. It doesn't know your architecture, your naming conventions, or that you never use ORMs. It hallucinates file paths, ignores your auth patterns, and writes code that doesn't match your team's style.
IDE tools like Cursor and Windsurf build ephemeral context that disappears when you close the tab. CodePrimer generates permanent, committed, shared context that every developer and every AI tool can use — including headless agents in CI that don't have an IDE.
Quick Start
# Use any LLM provider — auto-detected from your API key
export ANTHROPIC_API_KEY=sk-ant-... # or OPENAI_API_KEY or GEMINI_API_KEY
# Generate context for your repo
npx codeprimer ./my-repo
# Or install globally for the c8r shorthand
npm install -g codeprimer
c8r ./my-repoHow It Works
1. SCAN Discover files, filter noise, rank by architectural importance
2. EXTRACT Parse AST (TypeScript, Python, Go, Rust, Java, C#, Ruby, Kotlin, Swift)
3. SUMMARIZE Call fast LLM (Haiku/GPT-4o-mini/Gemini Flash) per file, in parallel
4. SYNTHESIZE Call smart LLM (Sonnet/GPT-4o/Gemini Pro) for architecture overview
5. GENERATE Format for each AI tool and write to repoHybrid AST + LLM approach: For supported languages, CodePrimer extracts the structural skeleton (exports, imports, classes, signatures) via AST parsing and sends only that to the LLM — not the full source code. This makes summarization faster and more accurate.
Options
Usage: codeprimer [options] <path>
Common
--only <formats> Restrict output: claude,cursor,copilot,gemini,windsurf,cline,aider,continue,agents,context
--scan-only Preview files that would be analyzed — no API calls, free
--dry-run Print output without writing files
--verbose Show detailed progress and token usage
Provider
--provider <name> LLM provider: anthropic, openai, gemini (auto-detected from API key)
--list-models List available models for your provider, then exit
--model <name> Override the LLM model for both summary and synthesis steps
Advanced
--out <dir> Output directory (default: repo root)
--batch Use batch API for bulk processing (slower, 50% cheaper)
--max-files <n> Limit number of files to process (default: 300)
--ignore <patterns> Glob patterns to exclude, comma-separated
--subpackages Generate context files per detected subpackage (writes CLAUDE.md, AGENTS.md, GEMINI.md, .clinerules inside each subpackage directory)Examples
# Preview what files would be analyzed (free, no API calls)
c8r --scan-only ./my-repo
# Generate only CLAUDE.md
c8r --only claude ./my-repo
# Limit to 50 most important files (cheaper, faster)
c8r --max-files 50 ./my-repo
# Use batch mode (slower, more efficient)
c8r --batch ./my-repo
# Exclude directories
c8r --ignore "tests,docs,vendor" ./my-repo
# Use OpenAI instead of Anthropic
export OPENAI_API_KEY=sk-...
c8r --provider openai ./my-repoLLM Providers
CodePrimer auto-detects your provider from environment variables:
| Provider | Env Variable | Summary Model | Synthesis Model |
|----------|-------------|---------------|-----------------|
| Anthropic | ANTHROPIC_API_KEY | Claude Haiku 4.5 | Claude Sonnet 4.5 |
| OpenAI | OPENAI_API_KEY | GPT-4o-mini | GPT-4o |
| Google | GEMINI_API_KEY | Gemini 2.5 Flash | Gemini 2.5 Flash |
Recommended: Claude Haiku 4.5 — in our testing, Haiku produces the best context files for this task: fast, accurate, and cost-effective. Use --model to override the default for any provider:
# Use Haiku for everything (faster + cheaper than Sonnet for this task)
c8r --model claude-haiku-4-5-20251001 ./my-repo
# Use a specific OpenAI model
c8r --provider openai --model gpt-4o-mini ./my-repoYour API key, your tokens, your data. CodePrimer is a local tool — nothing is sent to CodePrimer servers.
Languages
AST extraction (optimized, ~70% token savings):
- TypeScript, JavaScript, JSX, TSX (TypeScript Compiler API)
- Python, Go, Rust, Java, C#, Ruby, Kotlin, Swift, Scala, C, C++, PHP (tree-sitter)
LLM-only fallback (works for any language):
- Vue, Svelte, Astro, Dart, Elixir, Zig, and any other language
Large Repos
For repos with thousands of files, CodePrimer uses smart file selection:
- Scores every file by architectural importance — entry points, route handlers, schemas, models, config files, and hub files (imported by many others) rank highest
- Monorepo-aware — detects
packages/,services/,apps/directories and ensures each service gets proportional coverage - Processes top 300 files by default (override with
--max-files) - Two-pass scanning — scores file paths first (instant), then reads only the top candidates
# See what it would pick
c8r --scan-only --verbose ./big-monorepo
# Override the default 300-file limit
c8r --max-files 1000 ./big-monorepoCaching
Summaries are cached in .codeprimer/cache.json in your repo. Re-running on the same repo skips unchanged files — cached summaries are reused instantly. Only the final synthesis call runs.
Cache invalidates when:
- File content changes
- LLM model changes
- Prompt version changes
Commit the cache — checking in .codeprimer/cache.json is recommended. Team members and CI share cached summaries, so only changed files ever hit the LLM again.
If you prefer to keep the cache local, add just the cache file (not the whole directory) to your .gitignore:
.codeprimer/cache.jsonThis keeps .codeprimer/CONTEXT.md committable while the cache stays off git.
Augment, Never Overwrite
If your repo already has a hand-written CLAUDE.md (or .cursorrules, etc.), CodePrimer never overwrites your content. It appends an auto-generated block with clear markers:
# Your hand-written rules (untouched)
- Never use ORMs
- Always wrap errors in AppError
<!-- BEGIN AUTO-GENERATED CONTEXT — do not edit below this line -->
<!-- Generated by CodePrimer (codeprimer.dev) -->
## Architecture
...
<!-- END AUTO-GENERATED CONTEXT -->Everything outside the markers is yours forever. CodePrimer only replaces content between the markers on re-runs.
Working with Claude Code's /init
CodePrimer and Claude Code's /init command are complementary:
/initcreates the human layer — build commands, rules, team conventions- CodePrimer adds the AI-synthesized layer — architecture, API routes, flows, pitfalls
Recommended order:
- Run
/initfirst (or write CLAUDE.md by hand) - Run
c8r ./my-repo— it appends the auto-generated block below your content - On re-runs, only the auto-generated block is updated
What if /init runs after CodePrimer? /init overwrites the file, but your CodePrimer cache is still intact. Just run c8r again — it restores the auto-generated block in seconds (cached summaries are reused).
Custom Ignore
Two ways to exclude files/directories:
# Per-run (ad-hoc)
c8r --ignore "tests,docs,vendor" ./my-repo
# Persistent (commit to repo)
echo "tests/**\ndocs/**\nvendor/**" > .c8rignore
c8r ./my-repo.c8rignore uses the same syntax as .gitignore. Both stack with .gitignore — all three are applied together.
FAQ
Does CodePrimer read my code?
Yes — locally. Your code is sent to the LLM provider you configured (Anthropic, OpenAI, or Google) using YOUR API key. CodePrimer has no servers, no accounts, no telemetry. It's a local CLI tool.
How is this different from Cursor/Windsurf indexing my codebase?
IDE indexing is ephemeral (gone when you close the tab), tool-specific (Cursor's index doesn't help Copilot), and single-repo (no cross-repo awareness). CodePrimer generates permanent, committed files that work across all 10 tools, persist in git, and are reviewable by humans.
Will this conflict with my existing CLAUDE.md / .cursorrules?
No. CodePrimer appends an auto-generated block with markers. Your hand-written content above the markers is never touched. On re-runs, only the content between markers is replaced.
What about Claude Code's /init command?
They're complementary. /init creates the human-written rules. CodePrimer adds the AI-synthesized architecture. Run /init first, then c8r — both coexist in the same file.
Can I use this in CI / GitHub Actions?
Yes. Run npx codeprimer ./repo in a GitHub Action with your API key as a secret. This gives you auto-updated context files on every PR — for free.
What if I have a monorepo with services in different languages?
CodePrimer handles this. It detects workspace directories (packages/, services/, apps/), ensures each service gets proportional file coverage, and synthesizes a unified architecture view across all languages.
Can it generate separate context files per service in a monorepo?
Yes — use --subpackages. CodePrimer detects workspace directories (packages/, services/, apps/) and generates a dedicated set of context files inside each one: CLAUDE.md, AGENTS.md, GEMINI.md, and .clinerules. Each service gets its own synthesized architecture view, not just a slice of the global one. The root-level files are still generated as normal.
c8r --subpackages ./my-monorepo
# writes packages/api/CLAUDE.md, packages/api/AGENTS.md, ...
# writes packages/web/CLAUDE.md, packages/web/AGENTS.md, ...How does it handle 20,000+ file repos?
Two-pass scanning: scores all file paths by architectural importance (instant, no file reading), then reads only the top candidates. Default processes 300 files. Override with --max-files. Use --scan-only to preview the selection.
Can I generate for just one tool?
Yes: c8r --only claude ./my-repo generates only CLAUDE.md. Use comma-separated values for multiple: --only claude,cursor,copilot.
Is the generated output good enough to commit?
Yes — but review it first. Use --dry-run to preview output. The quality depends on your codebase size and the LLM model used. We recommend a quick review of the Architecture and Common Pitfalls sections before committing.
What languages are supported?
All of them. AST extraction (faster, more accurate) supports TypeScript, JavaScript, Python, Go, Rust, Java, C#, Ruby, Kotlin, Swift, Scala, C, C++, PHP. All other languages fall back to LLM-only analysis — works fine.
License
MIT
