codebase-dna

v1.3.0

Published

4 days ago

Codebase intelligence layer for AI coding agents - auto-discovers conventions, architecture, and static contract signals via MCP

0High
0Medium
0Low

adurin04

mcp ai coding-agent codebase conventions architecture vibe-coding developer-tools

codebase-dna 🧬

The Codebase Intelligence Layer for AI Coding Agents

codebase-dna is a local MCP (Model Context Protocol) server that statically analyzes your JavaScript/TypeScript and PHP/Laravel codebase to extract static evidence about local patterns. It serves this evidence to AI coding agents, helping them generate code that aligns with your project's conventions, architecture, and static contract signals.

Runs 100% locally. Zero cost.

Documentation

Start with docs/getting-started.md if you are new to the project.

Detailed docs:

What Is It All About?

Existing AI coding agents (like Cursor, Claude Code, GitHub Copilot CLI, Windsurf) can read files, but they don't natively infer the implicit rules of your codebase. codebase-dna acts as a static-analysis bridge by extracting three layers of evidence:

Conventions — Naming patterns, async style, error handling, export styles, and more.
Architecture — Module boundaries, allowed import directions, and layer classification (e.g., flagging services that import directly from the DB layer when explicit rules say they should not).
Static Contract Signals — Function signatures, parameter types, return annotations, side effects, throws, guard clauses, confidence, and extraction provenance.

Agents query this information via MCP before writing code, turning them from generic code generators into "team-aware contributors".

The Problem It Solves

When you tell an AI agent to "Add a payment processing service", it might:

✗ Use function declarations instead of your team's preferred arrow functions.
✗ Use console.log instead of your structured logger.
✗ Name the file PaymentService.ts instead of payment-service.ts.
✗ Violate architecture by importing database modules directly into a web route.

Why existing solutions fall short:

AGENTS.md / .cursorrules: Require manual updates and go stale quickly.
ESLint / Prettier: Enforce configured rules, but don't discover what the codebase actually does.
Codegraph / GitNexus: Map import relationships but miss conventions and static contract signals.
Full code intelligence platforms (Sourcegraph, SonarQube, CodeScene, etc.): Much broader and deeper, but heavier to adopt. codebase-dna aims to be the local, lightweight MCP context layer that agents can query before edits.

What codebase-dna does differently:

Auto-discovery: Zero manual rule writing. It scans and summarizes evidence from your existing code.
Pre-generation Signals: Agents query before generating code, reducing avoidable mistakes.
File-change Updates: Refreshes file-level contract, import, and call signals quickly, with full scans for conventions and inferred architecture.
MCP-Native: Purpose-built for AI agents via the standard Model Context Protocol.

Setup & Installation

codebase-dna connects directly to your AI coding agent (Cursor, Claude Desktop, Windsurf, Antigravity, etc.) via the Model Context Protocol (MCP).

Prerequisites

Node.js (v20+): Required to run the server. Check your version by running node -v in your terminal. If you don't have it, download it from nodejs.org.

Installation Steps

Step 1: Open your AI Agent's MCP Settings Locate your AI tool's MCP configuration file. This is usually named mcp.json, claude_desktop_config.json, or found inside the tool's settings menu under "MCP Servers".

Step 2: Add the Configuration

[!IMPORTANT] codebase-dna needs to know which project directory to scan. By default it uses the server process's working directory (cwd), but many IDEs set this to their own installation folder — not your project. You must tell the server your project root using one of the methods below.

Option A: Workspace-specific config (Recommended for Cursor / VS Code)

Place this in your workspace-level MCP settings (e.g., .cursor/mcp.json or .vscode/mcp.json). The IDE will typically set cwd to the workspace root automatically:

{
  "mcpServers": {
    "codebase-dna": {
      "command": "npx",
      "args": ["-y", "codebase-dna@latest", "serve"]
    }
  }
}

Option B: Global config with explicit `--rootDir` (Recommended for Antigravity / global setups)

If your IDE only supports a global MCP config, you must pass --rootDir to tell the server which project to scan:

{
  "mcpServers": {
    "codebase-dna": {
      "command": "npx",
      "args": [
        "-y",
        "codebase-dna@latest",
        "serve",
        "--rootDir", "C:/Projects/your-project"
      ]
    }
  }
}

[!NOTE] Replace C:/Projects/your-project with the absolute path to the project you want to analyze. When switching projects, update this path accordingly.

Option C: Environment variable

You can also set the CODEBASE_DNA_ROOT environment variable instead of using --rootDir:

{
  "mcpServers": {
    "codebase-dna": {
      "command": "npx",
      "args": ["-y", "codebase-dna@latest", "serve"],
      "env": {
        "CODEBASE_DNA_ROOT": "C:/Projects/your-project"
      }
    }
  }
}

Priority order: --rootDir flag > CODEBASE_DNA_ROOT env var > process.cwd().

Step 3: Restart and Scan

Restart your AI application to load the new configuration.
Open a chat in your AI tool and say: "Please use the dna_scan tool to refresh analysis for this project."
After you review the initial report/context and trust the current state, run dna_accept_baseline once so future dna_verify checks have a trusted comparison point.

[!NOTE]
Generated Files: Once the scan runs successfully, codebase-dna will automatically create a .codebase-dna folder in your project directory to store the analyzed knowledge. Add .codebase-dna/ to your project's .gitignore file.

Troubleshooting

Symptoms of wrong project root:

The scan runs but finds files from the IDE's own installation directory
dna_conventions returns examples pointing to IDE internal files (e.g., resources/app/extensions/...)
dna_boundaries only finds one layer that doesn't match your project
dna_contract can't find your project's functions
dna_can_import rejects project files with "path traversal" errors
No .codebase-dna/ folder appears in your project directory

Fix: Add --rootDir to your MCP config args pointing to your project's absolute path (see Option B above).

Configuration (Optional)

codebase-dna works out of the box with zero configuration, but you can customize its behavior by creating a codebase-dna.config.json file in your project root:

{
  "include": ["**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs,php}"],
  "exclude": [
    "**/node_modules/**", 
    "**/vendor/**",
    "**/dist/**", 
    "**/*.test.*",
    "**/__tests__/**"
  ],
  "conventionThreshold": 0.75,
  "architectureMode": "strict",
  "boundaryOverrides": [
    {
      "layer": "services",
      "allowedImports": ["repositories", "utils"]
    }
  ]
}

[!NOTE] include controls which files are considered for scanning, but only supported languages are parsed. Adding extensions such as py to include will not enable Python analysis until a Python parser is implemented.

Available MCP Tools

Once connected, your AI agent will have access to 14 native tools:

dna_scan: Refreshes analysis without changing the verification baseline.
dna_conventions: Retrieves discovered coding conventions and patterns.
dna_check_style: Verifies if a proposed code snippet adheres to local styles.
dna_boundaries: Explains the architectural layers and directory structures.
dna_can_import: Checks if importing from directory A to B violates boundaries.
dna_contract: Retrieves static contract signals for specific functions or symbols, including confidence/provenance.
dna_verify: Checks current code against the latest explicit baseline by default.
dna_context: Provides comprehensive intelligence for a specific task.
dna_accept_baseline: Accepts the current codebase state as the new verification baseline after intentional changes.
dna_report: Generates a Markdown report of conventions, boundaries, contracts, side effects, and risks.
dna_callees: Lists static calls made by a function or method.
dna_callers: Lists static callers of a function or method.
dna_impact: Estimates upstream impact by walking static callers of a symbol.
dna_suggest_boundaries: Emits advisory boundaryOverrides JSON with confidence labels and warnings; nothing is enforced unless copied into config.

Trust Model

codebase-dna separates hard static evidence from heuristic inference:

Observed imports are evidence. Inferred architecture never treats an import that already exists in the scanned codebase as a violation by itself.
Enforced architecture is explicit. Inferred boundaries are advisory; boundaryOverrides are the mechanism for rules that should fail dna_can_import or dna_verify.
Contracts are static signals, not runtime proofs. Parameters and return annotations are extracted from syntax; side effects, throws, and guards include confidence/provenance metadata.
Call graph results are graded. dna_callees, dna_callers, dna_impact, and dna_report distinguish resolved internal targets from unresolved static call sites.
Quality is testable. This repository currently passes npm run lint, npm test, npm run build, and npm run benchmark with 13/13 static-signal scenarios passing; dna_report also surfaces resolved call-edge counts so teams can judge usefulness on their own codebase.

CLI Commands

codebase-dna scan --rootDir <path>: Refresh analysis without changing the baseline.
codebase-dna accept-baseline --rootDir <path>: Accept the current state as the new baseline.
codebase-dna verify --rootDir <path>: Verify current analysis against the accepted baseline.
codebase-dna report --rootDir <path> --out DNA_REPORT.md: Generate a Markdown intelligence report.
codebase-dna doctor --rootDir <path>: Check root detection, store readiness, aliases, and baseline status.
codebase-dna mcp-config cursor --rootDir <path>: Print ready-to-paste MCP config.
codebase-dna suggest-boundaries --rootDir <path>: Print advisory boundary override suggestions.
codebase-dna callees <symbol> --rootDir <path>: List static calls from a symbol.
codebase-dna callers <symbol> --rootDir <path>: List static callers of a symbol.

Benchmarks

Run npm run benchmark to execute the local static-signal benchmark suite. The current verified result is 13/13 passing scenarios across naming drift, explicit boundary violations, contract signals, and call-impact resolution.

[!NOTE] Inferred architecture is advisory by default. Add boundaryOverrides when a rule should be enforced by dna_can_import and dna_verify.

Project Structure

src/server.ts - MCP Server & Tool Handlers
src/scanner.ts - Orchestrates full codebase scanning
src/analyzers/ - Logic for extracting conventions, architecture boundaries, and contracts
src/store.ts - SQLite Database layer for state persistence
src/watcher.ts - File watcher for incremental live-updates

Alternative: Running from Source

If you prefer to clone the repository and run the server directly from source instead of using npx via the npm registry, follow these steps:

Clone and build the project in your terminal:

git clone https://github.com/your-username/codebase-dna.git
cd codebase-dna
npm install
npm run build

Update your MCP JSON configuration to use node instead of npx, and point it to the absolute path of the compiled dist/bin/codebase-dna.js file on your computer:

{
  "mcpServers": {
    "codebase-dna": {
      "command": "node",
      "args": [
        "C:/absolute/path/to/cloned/codebase-dna/dist/bin/codebase-dna.js",
        "serve",
        "--rootDir", "C:/Projects/your-project"
      ]
    }
  }
}

(Replace both paths with the actual absolute paths on your system.)

License

MIT