@qlucent/code-dna

v0.1.3

Published

2 months ago

Zero-Token Pre-Analysis Layer for codebase analysis

0High
0Medium
0Low

kpkaranam

codebase-analysis static-analysis mcp tree-sitter code-intelligence cli ast

code-dna

Zero-Token Pre-Analysis Layer — give any LLM instant codebase understanding

The Problem

LLMs waste 50,000–200,000 tokens exploring unfamiliar codebases. Typical workflows involve asking the model to read file trees, open individual files, trace imports, and re-derive architecture facts it will forget next session. Context packers ship raw source code. Knowledge graphs need infrastructure.

The result: slow, expensive, and inconsistent onboarding every time a new LLM session touches your codebase.

The Solution

code-dna runs static analysis in under 5 seconds and produces a compact 5–10k token "DNA file" that gives any LLM architectural understanding — without reading source files.

The DNA file captures:

The project's module structure and symbol inventory
Architectural style, detected framework, and layer organisation
Coding conventions derived from the actual codebase
Hot files, risk scores, and dependency centrality
Git churn data and ownership information

Give any LLM the DNA file as its first context document and it hits the ground running.

Quick Start

# Run once, output to stdout
npx code-dna analyze

# Save to a file (recommended)
npx code-dna analyze --output CODEBASE-DNA.md

# YAML output for programmatic consumption
npx code-dna analyze --format yaml --output CODEBASE-DNA.yaml

# Analyse a specific directory
npx code-dna analyze /path/to/project --output CODEBASE-DNA.md

What It Extracts (4 Layers)

code-dna runs four analysis layers in sequence (Layers 1 and 2 execute in parallel):

Layer 1: Structural Skeleton

Discovers all source files, parses them with Tree-sitter AST grammars, and builds:

File tree with language and role annotations (controller, service, model, etc.)
Module map — hierarchical directory structure with per-file symbol inventories
Dependency graph — import/export edges with fan-in/fan-out metrics and circular dependency detection
Symbol index — every exported function, class, interface, type, and variable

Layer 2: Git Archaeology

Queries the local git history to surface temporal patterns:

Commit heatmap — files ranked by total commits
Ownership map — primary author per file
Co-change coupling — files that change together frequently (configurable window)
Hot files — churn hotspots with commit counts and last-modified timestamps

Gracefully skipped when no git history is available.

Layer 3: Pattern Inference

Uses Layer 1 results to infer higher-level patterns without configuration:

Framework detection — identifies Next.js, Express, FastAPI, Spring Boot, NestJS, and more from dependency manifests and file markers
Architecture style — classifies projects as MVC, hexagonal, layered, event-driven, or monolith
Naming conventions — detects camelCase, PascalCase, snake_case, kebab-case across files, functions, classes, and variables
File organisation — by-feature, by-layer, by-type, or hybrid
Import and export style — relative vs. aliased paths, named vs. default exports

Layer 4: Risk Surface

Combines all previous layers to produce a risk-ranked file list:

Centrality score — files with the highest in-degree (most imported)
Churn score — correlation between frequency of change and dependency weight
Coverage proxy — estimated test coverage based on co-located test files
Composite risk score — 0–100 rank with per-factor breakdowns

Supported Languages

| Language | Extensions | Support Tier | |----------|-----------|--------------| | TypeScript | .ts, .tsx | Full AST parsing | | JavaScript | .js, .jsx, .mjs, .cjs | Full AST parsing | | Python | .py, .pyi | Full AST parsing | | Go | .go | File discovery + framework detection | | Rust | .rs | File discovery + framework detection | | Java | .java | File discovery + framework detection | | Vue | .vue | File discovery + framework detection | | C# | .cs | File discovery + framework detection | | Ruby | .rb | File discovery + framework detection | | Kotlin | .kt, .kts | File discovery + framework detection | | Swift | .swift | File discovery + framework detection | | PHP | .php | File discovery + framework detection | | C / C++ | .c, .h, .cpp, .cc, .cxx, .hpp | File discovery + framework detection | | Solidity | .sol | Discovery only |

Run code-dna info to verify the languages and tiers detected by your installed version.

CLI Usage

`analyze [path]`

Run the full analysis pipeline and output DNA.

code-dna analyze [path] [options]

Arguments:

| Argument | Description | Default | |----------|-------------|---------| | path | Directory to analyse | Current working directory |

Options:

| Flag | Description | Default | |------|-------------|---------| | -f, --format <format> | Output format: md or yaml | md | | -o, --output <file> | Write output to file instead of stdout | stdout | | -l, --layers <layers> | Comma-separated layers to run | 1,2,3,4 | | --languages <langs> | Language filter, e.g. ts,py,go | all languages | | --scope <dir> | Scope analysis to a subdirectory | none | | --token-budget <n> | Target token count for Markdown output | 8000 | | --git-depth <n> | Maximum git commits to traverse | 1000 | | --no-git | Skip git archaeology (disables Layer 2) | false | | -q, --quiet | Suppress progress output | false |

Examples:

# Full analysis, Markdown output to stdout
code-dna analyze

# Save to file with YAML format
code-dna analyze . --format yaml --output CODEBASE-DNA.yaml

# Only structural skeleton, no git or risk analysis
code-dna analyze --layers 1,3

# Analyse only TypeScript and Python files
code-dna analyze --languages ts,py

# Scope to a single service in a monorepo
code-dna analyze --scope services/api --output services/api/DNA.md

# Large repo with tight token budget
code-dna analyze --token-budget 5000 --git-depth 500

`diff <dna-a> <dna-b>`

Compare two DNA YAML snapshots and produce a Markdown diff report.

code-dna diff before.yaml after.yaml
code-dna diff before.yaml after.yaml --output diff-report.md

The diff report covers: files added/removed/modified, symbols added/removed, dependency graph changes, risk score movements, convention and framework shifts.

`mcp`

Start the code-dna MCP server over stdio for use with MCP-compatible clients.

code-dna mcp
code-dna mcp --path /path/to/project
code-dna mcp --path /path/to/project --watch

See MCP Integration for client configuration details.

`info`

Show version, Node.js version, platform, and supported languages with their tiers.

code-dna info

MCP Integration

code-dna exposes its analysis pipeline as an MCP server, allowing LLM clients to query codebase DNA directly without running CLI commands.

Starting the Server

# Start against current directory
code-dna mcp

# Start against a specific project
code-dna mcp --path /path/to/project

# Watch mode: auto-refresh cache on file changes
code-dna mcp --path /path/to/project --watch

Claude Code Configuration

Add code-dna to your .mcp.json (project-scoped) or your global Claude Code settings:

{
  "mcpServers": {
    "code-dna": {
      "command": "npx",
      "args": ["code-dna", "mcp", "--path", "/absolute/path/to/project", "--watch"]
    }
  }
}

Cursor Configuration

In Cursor settings, add a new MCP server:

{
  "mcp": {
    "servers": {
      "code-dna": {
        "command": "npx",
        "args": ["code-dna", "mcp", "--path", "${workspaceFolder}", "--watch"]
      }
    }
  }
}

Available MCP Resources

Once connected, clients can read these resources:

| URI | Content | |-----|---------| | codedna://full | Complete DNA Markdown output | | codedna://skeleton | Architecture and Module Map sections | | codedna://dependencies | Dependencies section | | codedna://conventions | Conventions section | | codedna://risks | Risk Surface and Hot Files sections | | codedna://hotfiles | Hot Files section only |

Available MCP Tools

| Tool | Description | |------|-------------| | analyze | Run analysis on a directory, update the cache, return full DNA | | diff | Compute a structural diff between two DNA Markdown strings |

See docs/MCP.md for the full MCP reference including tool parameter schemas.

Configuration

Create a .codedna.yaml file in your project root to customise analysis:

# Additional glob patterns to ignore (built-in ignores always apply)
ignore:
  - "generated/**"
  - "vendor/**"
  - "*.pb.go"

# Toggle individual analysis layers
layers:
  skeleton: true
  git: true
  patterns: true
  risk: true

# Git archaeology settings
git:
  max_commits: 1000
  max_blame_files: 50
  coupling_window: 30   # days

# Per-language overrides
languages:
  python:
    enabled: true
    framework: "fastapi"   # override auto-detection
  solidity:
    enabled: false         # skip entirely

# Output preferences
output:
  format: md
  token_budget: 8000
  filename: CODEBASE-DNA.md
  sections:
    architecture: 15
    module_map: 25
    dependencies: 15
    conventions: 15
    hot_files: 10
    risk_surface: 10
    api_surface: 5

# Monorepo: include/exclude sub-directories
scope:
  include:
    - "services/api"
    - "packages/shared"
  exclude:
    - "packages/legacy"

All fields are optional and fall back to sensible defaults.

Programmatic API

code-dna can be used as a library from TypeScript or JavaScript:

npm install code-dna

import { analyze, formatMarkdown, formatYaml } from 'code-dna/lib';

// Run the full 4-layer analysis
const dna = await analyze('/path/to/project', {
  layers: [1, 2, 3, 4],
  tokenBudget: 8000,
});

// Render as Markdown (token-budget aware)
const markdown = formatMarkdown(dna, budget);

// Render as YAML (full data, no truncation)
const yaml = formatYaml(dna);

See docs/API.md for the complete programmatic API reference.

Example Output

The following is a truncated excerpt from code-dna analysing itself:

# Codebase DNA -- code-dna

> Generated by code-dna v0.1.0 on 2026-03-26.
> Languages: typescript (99%), javascript (1%) | Files: 101 | LOC: 35,864

## Architecture

**Style:** layered (85% confidence)
**Framework:** Node.js / Commander CLI

### Layers
- **cli** (3 files): entry point, MCP command
- **core** (8 files): engine, types, diff engine, token budget
- **analyzers** (6 files): git, framework, architecture, conventions, risk
- **parsers** (19 files): Tree-sitter extractors for 14 languages
- **output** (3 files): Markdown and YAML formatters
- **mcp** (2 files): MCP server

## Conventions

- **Files:** kebab-case
- **Functions:** camelCase
- **Classes:** PascalCase
- **Exports:** named
- **Imports:** external-first, relative paths
- **Tests:** co-located

## Risk Surface

| File | Score | Factors |
|------|-------|---------|
| src/core/engine.ts | 82 | high-centrality, high-churn |
| src/core/types.ts | 74 | high-centrality |
| src/parsers/parser-engine.ts | 65 | high-centrality |

Contributing

Clone the repository and install dependencies: npm install
Build: npm run build
Run all tests: npm test (1199 tests, Node.js 20+ required)
Lint: npm run lint
Typecheck: npm run typecheck

All code changes require tests written first (TDD). Commits follow Conventional Commits (feat(scope):, fix(scope):).

License

MIT