@qlucent/code-dna
v0.1.3
Published
Zero-Token Pre-Analysis Layer for codebase analysis
Maintainers
Readme
code-dna
Zero-Token Pre-Analysis Layer — give any LLM instant codebase understanding
Table of Contents
- The Problem
- The Solution
- Quick Start
- What It Extracts (4 Layers)
- Supported Languages
- CLI Usage
- MCP Integration
- Configuration
- Programmatic API
- Example Output
- Contributing
- License
The Problem
LLMs waste 50,000–200,000 tokens exploring unfamiliar codebases. Typical workflows involve asking the model to read file trees, open individual files, trace imports, and re-derive architecture facts it will forget next session. Context packers ship raw source code. Knowledge graphs need infrastructure.
The result: slow, expensive, and inconsistent onboarding every time a new LLM session touches your codebase.
The Solution
code-dna runs static analysis in under 5 seconds and produces a compact 5–10k token "DNA file" that gives any LLM architectural understanding — without reading source files.
The DNA file captures:
- The project's module structure and symbol inventory
- Architectural style, detected framework, and layer organisation
- Coding conventions derived from the actual codebase
- Hot files, risk scores, and dependency centrality
- Git churn data and ownership information
Give any LLM the DNA file as its first context document and it hits the ground running.
Quick Start
# Run once, output to stdout
npx code-dna analyze
# Save to a file (recommended)
npx code-dna analyze --output CODEBASE-DNA.md
# YAML output for programmatic consumption
npx code-dna analyze --format yaml --output CODEBASE-DNA.yaml
# Analyse a specific directory
npx code-dna analyze /path/to/project --output CODEBASE-DNA.mdWhat It Extracts (4 Layers)
code-dna runs four analysis layers in sequence (Layers 1 and 2 execute in parallel):
Layer 1: Structural Skeleton
Discovers all source files, parses them with Tree-sitter AST grammars, and builds:
- File tree with language and role annotations (
controller,service,model, etc.) - Module map — hierarchical directory structure with per-file symbol inventories
- Dependency graph — import/export edges with fan-in/fan-out metrics and circular dependency detection
- Symbol index — every exported function, class, interface, type, and variable
Layer 2: Git Archaeology
Queries the local git history to surface temporal patterns:
- Commit heatmap — files ranked by total commits
- Ownership map — primary author per file
- Co-change coupling — files that change together frequently (configurable window)
- Hot files — churn hotspots with commit counts and last-modified timestamps
Gracefully skipped when no git history is available.
Layer 3: Pattern Inference
Uses Layer 1 results to infer higher-level patterns without configuration:
- Framework detection — identifies Next.js, Express, FastAPI, Spring Boot, NestJS, and more from dependency manifests and file markers
- Architecture style — classifies projects as MVC, hexagonal, layered, event-driven, or monolith
- Naming conventions — detects camelCase, PascalCase, snake_case, kebab-case across files, functions, classes, and variables
- File organisation — by-feature, by-layer, by-type, or hybrid
- Import and export style — relative vs. aliased paths, named vs. default exports
Layer 4: Risk Surface
Combines all previous layers to produce a risk-ranked file list:
- Centrality score — files with the highest in-degree (most imported)
- Churn score — correlation between frequency of change and dependency weight
- Coverage proxy — estimated test coverage based on co-located test files
- Composite risk score — 0–100 rank with per-factor breakdowns
Supported Languages
| Language | Extensions | Support Tier |
|----------|-----------|--------------|
| TypeScript | .ts, .tsx | Full AST parsing |
| JavaScript | .js, .jsx, .mjs, .cjs | Full AST parsing |
| Python | .py, .pyi | Full AST parsing |
| Go | .go | File discovery + framework detection |
| Rust | .rs | File discovery + framework detection |
| Java | .java | File discovery + framework detection |
| Vue | .vue | File discovery + framework detection |
| C# | .cs | File discovery + framework detection |
| Ruby | .rb | File discovery + framework detection |
| Kotlin | .kt, .kts | File discovery + framework detection |
| Swift | .swift | File discovery + framework detection |
| PHP | .php | File discovery + framework detection |
| C / C++ | .c, .h, .cpp, .cc, .cxx, .hpp | File discovery + framework detection |
| Solidity | .sol | Discovery only |
Run code-dna info to verify the languages and tiers detected by your installed version.
CLI Usage
analyze [path]
Run the full analysis pipeline and output DNA.
code-dna analyze [path] [options]Arguments:
| Argument | Description | Default |
|----------|-------------|---------|
| path | Directory to analyse | Current working directory |
Options:
| Flag | Description | Default |
|------|-------------|---------|
| -f, --format <format> | Output format: md or yaml | md |
| -o, --output <file> | Write output to file instead of stdout | stdout |
| -l, --layers <layers> | Comma-separated layers to run | 1,2,3,4 |
| --languages <langs> | Language filter, e.g. ts,py,go | all languages |
| --scope <dir> | Scope analysis to a subdirectory | none |
| --token-budget <n> | Target token count for Markdown output | 8000 |
| --git-depth <n> | Maximum git commits to traverse | 1000 |
| --no-git | Skip git archaeology (disables Layer 2) | false |
| -q, --quiet | Suppress progress output | false |
Examples:
# Full analysis, Markdown output to stdout
code-dna analyze
# Save to file with YAML format
code-dna analyze . --format yaml --output CODEBASE-DNA.yaml
# Only structural skeleton, no git or risk analysis
code-dna analyze --layers 1,3
# Analyse only TypeScript and Python files
code-dna analyze --languages ts,py
# Scope to a single service in a monorepo
code-dna analyze --scope services/api --output services/api/DNA.md
# Large repo with tight token budget
code-dna analyze --token-budget 5000 --git-depth 500diff <dna-a> <dna-b>
Compare two DNA YAML snapshots and produce a Markdown diff report.
code-dna diff before.yaml after.yaml
code-dna diff before.yaml after.yaml --output diff-report.mdThe diff report covers: files added/removed/modified, symbols added/removed, dependency graph changes, risk score movements, convention and framework shifts.
mcp
Start the code-dna MCP server over stdio for use with MCP-compatible clients.
code-dna mcp
code-dna mcp --path /path/to/project
code-dna mcp --path /path/to/project --watchSee MCP Integration for client configuration details.
info
Show version, Node.js version, platform, and supported languages with their tiers.
code-dna infoMCP Integration
code-dna exposes its analysis pipeline as an MCP server, allowing LLM clients to query codebase DNA directly without running CLI commands.
Starting the Server
# Start against current directory
code-dna mcp
# Start against a specific project
code-dna mcp --path /path/to/project
# Watch mode: auto-refresh cache on file changes
code-dna mcp --path /path/to/project --watchClaude Code Configuration
Add code-dna to your .mcp.json (project-scoped) or your global Claude Code settings:
{
"mcpServers": {
"code-dna": {
"command": "npx",
"args": ["code-dna", "mcp", "--path", "/absolute/path/to/project", "--watch"]
}
}
}Cursor Configuration
In Cursor settings, add a new MCP server:
{
"mcp": {
"servers": {
"code-dna": {
"command": "npx",
"args": ["code-dna", "mcp", "--path", "${workspaceFolder}", "--watch"]
}
}
}
}Available MCP Resources
Once connected, clients can read these resources:
| URI | Content |
|-----|---------|
| codedna://full | Complete DNA Markdown output |
| codedna://skeleton | Architecture and Module Map sections |
| codedna://dependencies | Dependencies section |
| codedna://conventions | Conventions section |
| codedna://risks | Risk Surface and Hot Files sections |
| codedna://hotfiles | Hot Files section only |
Available MCP Tools
| Tool | Description |
|------|-------------|
| analyze | Run analysis on a directory, update the cache, return full DNA |
| diff | Compute a structural diff between two DNA Markdown strings |
See docs/MCP.md for the full MCP reference including tool parameter schemas.
Configuration
Create a .codedna.yaml file in your project root to customise analysis:
# Additional glob patterns to ignore (built-in ignores always apply)
ignore:
- "generated/**"
- "vendor/**"
- "*.pb.go"
# Toggle individual analysis layers
layers:
skeleton: true
git: true
patterns: true
risk: true
# Git archaeology settings
git:
max_commits: 1000
max_blame_files: 50
coupling_window: 30 # days
# Per-language overrides
languages:
python:
enabled: true
framework: "fastapi" # override auto-detection
solidity:
enabled: false # skip entirely
# Output preferences
output:
format: md
token_budget: 8000
filename: CODEBASE-DNA.md
sections:
architecture: 15
module_map: 25
dependencies: 15
conventions: 15
hot_files: 10
risk_surface: 10
api_surface: 5
# Monorepo: include/exclude sub-directories
scope:
include:
- "services/api"
- "packages/shared"
exclude:
- "packages/legacy"All fields are optional and fall back to sensible defaults.
Programmatic API
code-dna can be used as a library from TypeScript or JavaScript:
npm install code-dnaimport { analyze, formatMarkdown, formatYaml } from 'code-dna/lib';
// Run the full 4-layer analysis
const dna = await analyze('/path/to/project', {
layers: [1, 2, 3, 4],
tokenBudget: 8000,
});
// Render as Markdown (token-budget aware)
const markdown = formatMarkdown(dna, budget);
// Render as YAML (full data, no truncation)
const yaml = formatYaml(dna);See docs/API.md for the complete programmatic API reference.
Example Output
The following is a truncated excerpt from code-dna analysing itself:
# Codebase DNA -- code-dna
> Generated by code-dna v0.1.0 on 2026-03-26.
> Languages: typescript (99%), javascript (1%) | Files: 101 | LOC: 35,864
## Architecture
**Style:** layered (85% confidence)
**Framework:** Node.js / Commander CLI
### Layers
- **cli** (3 files): entry point, MCP command
- **core** (8 files): engine, types, diff engine, token budget
- **analyzers** (6 files): git, framework, architecture, conventions, risk
- **parsers** (19 files): Tree-sitter extractors for 14 languages
- **output** (3 files): Markdown and YAML formatters
- **mcp** (2 files): MCP server
## Conventions
- **Files:** kebab-case
- **Functions:** camelCase
- **Classes:** PascalCase
- **Exports:** named
- **Imports:** external-first, relative paths
- **Tests:** co-located
## Risk Surface
| File | Score | Factors |
|------|-------|---------|
| src/core/engine.ts | 82 | high-centrality, high-churn |
| src/core/types.ts | 74 | high-centrality |
| src/parsers/parser-engine.ts | 65 | high-centrality |Contributing
- Clone the repository and install dependencies:
npm install - Build:
npm run build - Run all tests:
npm test(1199 tests, Node.js 20+ required) - Lint:
npm run lint - Typecheck:
npm run typecheck
All code changes require tests written first (TDD). Commits follow Conventional Commits (feat(scope):, fix(scope):).
License
MIT
