codebase-ctx

v0.4.2

Published

17 days ago

Generate AI-optimized codebase summaries via static analysis

0High
0Medium
0Low

silupanda

codebase-ctx

Generate AI-optimized codebase summaries via static analysis.

codebase-ctx is a zero-dependency static analysis tool that reads a project directory and produces structured, token-efficient context about its setup, dependencies, scripts, language, runtime, and architecture. The output is designed to be injected into AI instruction files (CLAUDE.md, .cursorrules), passed as prompt context, or consumed programmatically by other tools.

Unlike source code dumpers that concatenate entire repositories (producing hundreds of thousands of tokens), codebase-ctx runs a pipeline of modular analyzers that each extract a specific dimension of project context and compress the results into a structured summary. A typical output is 300--800 tokens: enough to convey what a developer learns in their first hour with a codebase, compact enough to leave the vast majority of the context window for the actual task.

All analysis is deterministic, offline, and fast (under 500ms for most projects). No LLM calls, no network requests, no API keys required.

Installation

npm install codebase-ctx

Or install globally for CLI usage:

npm install -g codebase-ctx

Requires Node.js >= 18.

Quick Start

import { analyzeProject, analyzeDependencies, analyzeScripts } from 'codebase-ctx';

// Analyze project metadata
const project = analyzeProject('/path/to/project');
console.log(project);
// {
//   name: 'my-app',
//   version: '1.0.0',
//   description: 'My application',
//   license: 'MIT',
//   language: 'TypeScript',
//   runtime: 'Node.js',
//   repository: 'https://github.com/user/my-app',
//   nodeVersion: '>=18',
// }

// Analyze dependencies with automatic categorization
const deps = analyzeDependencies('/path/to/project');
console.log(deps.summary);
// {
//   totalProduction: 5,
//   totalDev: 3,
//   frameworks: ['react', 'next'],
//   databases: ['@prisma/client'],
//   testingTools: ['vitest'],
// }

// Analyze npm scripts
const scripts = analyzeScripts('/path/to/project');
console.log(scripts.hasBuild, scripts.hasTest);
// true true

Features

Zero dependencies -- all analysis uses Node.js built-ins (node:fs, node:path). No AST parsers, no tree-sitter, no external tools.
Modular analyzers -- each analyzer extracts one dimension of context (project metadata, dependencies, scripts) and runs independently. Analyzers fail gracefully when their input files are missing.
Dependency categorization -- a built-in registry of 190+ common npm packages automatically classifies dependencies into categories: framework, database, testing, build, lint, ui, auth, api, observability, and type-definitions.
Language detection -- infers TypeScript or JavaScript from tsconfig.json, file extensions in src/, or the presence of typescript in dependencies.
Runtime detection -- infers Node.js, Bun, Deno, or Browser from engines fields and framework dependencies.
Script categorization -- classifies npm scripts into build, test, lint, start, deploy, and other categories by name pattern matching.
Token estimation -- estimates LLM token counts for any text using ceil(chars / 4).
Deterministic output -- same codebase always produces the same analysis result.
TypeScript-first -- full type definitions shipped with the package.

API Reference

Analyzers

`analyzeProject(projectPath: string): ProjectInfo`

Reads package.json and inspects the project directory to produce a ProjectInfo summary.

Parameters:

| Parameter | Type | Description | |---|---|---| | projectPath | string | Absolute path to the project directory |

Returns: ProjectInfo

| Field | Type | Description | |---|---|---| | name | string \| null | Package name from package.json, or directory name as fallback | | version | string \| null | Package version | | description | string \| null | Package description | | license | string \| null | License identifier (e.g. "MIT") | | language | string | Detected language: "TypeScript", "JavaScript", or "unknown" | | runtime | string | Detected runtime: "Node.js", "Bun", "Deno", "Browser", or "unknown" | | repository | string \| null | Repository URL (cleaned of git+ prefix and .git suffix) | | nodeVersion | string \| null | Node.js version constraint from engines.node |

Language detection checks in order: tsconfig.json existence, .ts/.tsx files in src/, typescript in dependencies. Falls back to "JavaScript".

Runtime detection checks in order: engines.bun, engines.deno, browser-only framework dependencies (React/Vue/Angular/Svelte without an SSR framework like Next/Nuxt), then defaults to "Node.js".

Fallback behavior: When no package.json exists, returns the directory name as name, "unknown" as language and runtime, and null for all other fields.

Example:

import { analyzeProject } from 'codebase-ctx';

const info = analyzeProject('/home/user/my-express-app');
// info.language === 'TypeScript'
// info.runtime === 'Node.js'
// info.repository === 'https://github.com/user/my-express-app'

`analyzeDependencies(projectPath: string): DependencyInfo`

Reads package.json and extracts all dependency sections, categorizing each dependency by its purpose using a built-in registry.

Parameters:

| Parameter | Type | Description | |---|---|---| | projectPath | string | Absolute path to the project directory |

Returns: DependencyInfo

| Field | Type | Description | |---|---|---| | production | DependencyEntry[] | Dependencies from dependencies | | dev | DependencyEntry[] | Dependencies from devDependencies | | peer | DependencyEntry[] | Dependencies from peerDependencies | | optional | DependencyEntry[] | Dependencies from optionalDependencies | | summary | object | Aggregated summary (see below) |

summary fields:

| Field | Type | Description | |---|---|---| | totalProduction | number | Count of production dependencies | | totalDev | number | Count of dev dependencies | | frameworks | string[] | Names of all framework dependencies across all sections | | databases | string[] | Names of all database dependencies across all sections | | testingTools | string[] | Names of all testing dependencies across all sections |

Each DependencyEntry has the shape:

{
  name: string;      // Package name (e.g. "react")
  version: string;   // Version range (e.g. "^18.2.0")
  category: DependencyCategory;
}

Category inference: Known packages are mapped via a built-in registry of 190+ entries. Unrecognized packages in dependencies default to "utility"; unrecognized packages in devDependencies default to "build". Packages matching @types/* are always categorized as "type-definitions".

Fallback behavior: When package.json does not exist, returns empty arrays and zero counts for all fields.

Example:

import { analyzeDependencies } from 'codebase-ctx';

const deps = analyzeDependencies('/home/user/my-project');

// Inspect categorized production deps
for (const dep of deps.production) {
  console.log(`${dep.name} (${dep.category}): ${dep.version}`);
}
// react (framework): ^18.2.0
// @prisma/client (database): ^5.0.0
// winston (observability): ^3.0.0

// Use the summary for quick context
console.log(deps.summary.frameworks);  // ['react']
console.log(deps.summary.databases);   // ['@prisma/client']

`analyzeScripts(projectPath: string): ScriptInfo`

Reads the scripts field from package.json and categorizes each script by name.

Parameters:

| Parameter | Type | Description | |---|---|---| | projectPath | string | Absolute path to the project directory |

Returns: ScriptInfo

| Field | Type | Description | |---|---|---| | scripts | ScriptEntry[] | All scripts with their names, commands, and categories | | hasBuild | boolean | true if any script is categorized as "build" | | hasTest | boolean | true if any script is categorized as "test" | | hasLint | boolean | true if any script is categorized as "lint" | | hasStart | boolean | true if any script is categorized as "start" |

Each ScriptEntry has the shape:

{
  name: string;      // Script name (e.g. "build")
  command: string;   // Script command (e.g. "tsc")
  category: 'build' | 'test' | 'lint' | 'start' | 'deploy' | 'other';
}

Category mapping:

| Category | Matched script names | |---|---| | build | build, compile, bundle, tsc, build:* | | test | test, spec, e2e, coverage, test:* | | lint | lint, check, format, prettier, lint:* | | start | start, dev, serve, develop | | deploy | deploy, release, publish | | other | Everything else |

Scripts prefixed with pre or post (e.g. pretest, postbuild, prepublishOnly) inherit the category of their base script.

Fallback behavior: When package.json has no scripts field or does not exist, returns an empty array and all boolean flags as false.

Example:

import { analyzeScripts } from 'codebase-ctx';

const scripts = analyzeScripts('/home/user/my-project');

if (scripts.hasTest) {
  const testScripts = scripts.scripts.filter(s => s.category === 'test');
  for (const s of testScripts) {
    console.log(`${s.name}: ${s.command}`);
  }
}
// test: vitest run
// test:unit: vitest run src/

Dependency Registry

`DEPENDENCY_REGISTRY`

A Record<string, DependencyCategory> mapping 190+ common npm package names to their categories. This is the lookup table used by analyzeDependencies.

import { DEPENDENCY_REGISTRY } from 'codebase-ctx';

console.log(DEPENDENCY_REGISTRY['react']);      // 'framework'
console.log(DEPENDENCY_REGISTRY['prisma']);      // 'database'
console.log(DEPENDENCY_REGISTRY['vitest']);      // 'testing'
console.log(DEPENDENCY_REGISTRY['typescript']);  // 'build'
console.log(DEPENDENCY_REGISTRY['eslint']);      // 'lint'
console.log(DEPENDENCY_REGISTRY['tailwindcss']); // 'ui'
console.log(DEPENDENCY_REGISTRY['passport']);    // 'auth'
console.log(DEPENDENCY_REGISTRY['axios']);       // 'api'
console.log(DEPENDENCY_REGISTRY['winston']);     // 'observability'

Categories covered: framework, database, testing, build, lint, ui, auth, api, observability.

`categorize(name: string, isDevDep: boolean): DependencyCategory`

Categorizes a single package name. Checks the built-in registry first, then @types/* prefix, then falls back to "utility" for production dependencies or "build" for dev dependencies.

Parameters:

| Parameter | Type | Description | |---|---|---| | name | string | The npm package name | | isDevDep | boolean | Whether the package is a devDependency |

Returns: DependencyCategory -- one of "framework", "database", "testing", "build", "lint", "utility", "type-definitions", "ui", "auth", "api", "observability".

Example:

import { categorize } from 'codebase-ctx';

categorize('react', false);          // 'framework'
categorize('@types/node', true);     // 'type-definitions'
categorize('some-unknown-pkg', false); // 'utility'
categorize('some-unknown-pkg', true);  // 'build'

File Utilities

`fileExists(filePath: string): boolean`

Synchronously checks whether a file exists at the given path.

import { fileExists } from 'codebase-ctx';

if (fileExists('/path/to/tsconfig.json')) {
  // TypeScript project
}

`readFileContent(filePath: string): string`

Reads a file synchronously and returns its contents as a UTF-8 string. Throws if the file does not exist.

import { readFileContent } from 'codebase-ctx';

const content = readFileContent('/path/to/package.json');

`readLines(filePath: string): string[]`

Reads a file and splits it into an array of lines. Throws if the file does not exist.

import { readLines } from 'codebase-ctx';

const lines = readLines('/path/to/src/index.ts');
console.log(`${lines.length} lines`);

`readJsonFile<T>(filePath: string): T | null`

Reads and parses a JSON file. Returns null if the file does not exist or contains invalid JSON. Never throws.

Type parameter: T -- the expected shape of the parsed JSON object.

import { readJsonFile } from 'codebase-ctx';

interface PkgJson {
  name: string;
  version: string;
}

const pkg = readJsonFile<PkgJson>('/path/to/package.json');
if (pkg) {
  console.log(pkg.name, pkg.version);
}

`estimateTokens(text: string): number`

Estimates the LLM token count for a string using the approximation Math.ceil(text.length / 4).

Returns 0 for empty strings.

import { estimateTokens } from 'codebase-ctx';

const tokens = estimateTokens('Hello, world!');
// 4 (ceil(13 / 4))

Configuration

`AnalyzeOptions`

Options accepted by the analysis pipeline:

interface AnalyzeOptions {
  analyzers?: AnalyzerName[];   // Which analyzers to run (default: all)
  exclude?: string[];           // Directory/file patterns to exclude
  detailLevel?: DetailLevel;    // 'minimal' | 'standard' | 'detailed'
  maxSampleFiles?: number;      // Max files for pattern sampling
  maxDepth?: number;            // Max directory traversal depth
}

`FormatOptions`

Options for formatting output:

interface FormatOptions {
  detailLevel?: DetailLevel;                    // 'minimal' | 'standard' | 'detailed'
  formatter?: (context: CodebaseContext) => string;  // Custom formatter function
  includeTokenCount?: boolean;                  // Append token count to output
}

`AnalyzerName`

Valid analyzer names:

`DependencyCategory`

Valid dependency categories:

`DetailLevel`

Controls information density in formatted output:

| Level | Target tokens | Description | |---|---|---| | minimal | 150--300 | Language, framework, architecture, entry point, build/test commands only | | standard | 400--800 | All major context dimensions with summaries | | detailed | 800--2,000 | Full dependency lists, complete API surface, all patterns with evidence |

`OutputFormat`

Supported output formats:

| Format | Description | |---|---| | markdown | Structured markdown for CLAUDE.md / .cursorrules injection | | json | Machine-readable JSON for programmatic consumption | | compact | Minimal token count, maximum information density | | custom | User-provided formatter function via FormatOptions.formatter |

Error Handling

All analyzers follow a graceful degradation pattern:

Missing package.json: Analyzers that depend on package.json return sensible defaults -- empty arrays, zero counts, null fields, or the directory name as a fallback project name.
Invalid JSON: readJsonFile returns null when a file contains malformed JSON. Analyzers that use it handle the null case explicitly.
Missing files: fileExists returns false; analyzers skip analysis for files that do not exist rather than throwing.
No matching data: Analyzers return empty result objects (empty arrays, false flags) rather than throwing when the data they look for is absent.

The general contract: individual analyzers never throw. If an analyzer cannot extract data, it returns a typed fallback value. Only infrastructure-level errors (directory does not exist, permission denied) produce exceptions.

Advanced Usage

Combining Analyzers

Run multiple analyzers against the same project and assemble the results:

import { analyzeProject, analyzeDependencies, analyzeScripts } from 'codebase-ctx';

const projectPath = '/home/user/my-app';

const project = analyzeProject(projectPath);
const deps = analyzeDependencies(projectPath);
const scripts = analyzeScripts(projectPath);

// Build a context summary for prompt injection
const summary = [
  `Project: ${project.name} v${project.version}`,
  `Language: ${project.language}, Runtime: ${project.runtime}`,
  `Frameworks: ${deps.summary.frameworks.join(', ') || 'none'}`,
  `Databases: ${deps.summary.databases.join(', ') || 'none'}`,
  `Testing: ${deps.summary.testingTools.join(', ') || 'none'}`,
  `Build: ${scripts.hasBuild ? 'yes' : 'no'}, Test: ${scripts.hasTest ? 'yes' : 'no'}`,
].join('\n');

console.log(summary);

Extending the Dependency Registry

You can use categorize alongside your own logic to handle packages not in the built-in registry:

import { categorize, DEPENDENCY_REGISTRY } from 'codebase-ctx';
import type { DependencyCategory } from 'codebase-ctx';

const CUSTOM_REGISTRY: Record<string, DependencyCategory> = {
  'my-internal-framework': 'framework',
  '@company/auth-sdk': 'auth',
};

function customCategorize(name: string, isDev: boolean): DependencyCategory {
  if (CUSTOM_REGISTRY[name]) return CUSTOM_REGISTRY[name];
  return categorize(name, isDev);
}

Estimating Prompt Budget

Use estimateTokens to verify that your assembled context fits within a model's context window:

import { estimateTokens } from 'codebase-ctx';

const context = '... assembled context string ...';
const tokens = estimateTokens(context);

const MODEL_LIMIT = 128_000;
const TASK_BUDGET = MODEL_LIMIT - tokens;
console.log(`Context: ${tokens} tokens, leaving ${TASK_BUDGET} for the task`);

Implementing the Analyzer Interface

The Analyzer interface provides a contract for custom analyzer implementations:

import type { Analyzer, CodebaseContext, OutputFormat, FormatOptions } from 'codebase-ctx';

const myAnalyzer: Analyzer = {
  async analyze(projectPath?: string): Promise<CodebaseContext> {
    // Run analysis and return a CodebaseContext
  },
  async analyzeAndFormat(
    projectPath?: string,
    outputFormat?: OutputFormat,
    formatOptions?: FormatOptions,
  ): Promise<string> {
    // Analyze and return formatted string
  },
};

TypeScript

This package is written in TypeScript and ships type declarations (dist/index.d.ts). All public interfaces and type aliases are exported from the package root:

import type {
  // Core result types
  CodebaseContext,
  ProjectInfo,
  DependencyInfo,
  DependencyEntry,
  ScriptInfo,
  ScriptEntry,

  // Additional analyzer result types
  StructureInfo,
  DirectoryEntry,
  TypeScriptInfo,
  APISurface,
  APIEntry,
  ConfigInfo,
  ConfigEntry,
  GitInfo,
  StatsInfo,
  FileStat,
  LanguageStat,
  PatternInfo,
  DetectedPattern,
  AnalysisMeta,

  // Option and configuration types
  AnalyzeOptions,
  FormatOptions,
  AnalyzerConfig,
  Analyzer,

  // String union types
  DetailLevel,
  OutputFormat,
  AnalyzerName,
  DependencyCategory,
} from 'codebase-ctx';

Compiled with target: ES2022, module: commonjs, strict: true.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

codebase-ctx

Installation

Quick Start

Features

API Reference

Analyzers

analyzeProject(projectPath: string): ProjectInfo

analyzeDependencies(projectPath: string): DependencyInfo

analyzeScripts(projectPath: string): ScriptInfo

Dependency Registry

DEPENDENCY_REGISTRY

categorize(name: string, isDevDep: boolean): DependencyCategory

File Utilities

fileExists(filePath: string): boolean

readFileContent(filePath: string): string

readLines(filePath: string): string[]

readJsonFile<T>(filePath: string): T | null

estimateTokens(text: string): number

Configuration

AnalyzeOptions

FormatOptions

AnalyzerName

DependencyCategory

DetailLevel

OutputFormat

Error Handling

Advanced Usage

Combining Analyzers

Extending the Dependency Registry

Estimating Prompt Budget

Implementing the Analyzer Interface

TypeScript

License

`analyzeProject(projectPath: string): ProjectInfo`

`analyzeDependencies(projectPath: string): DependencyInfo`

`analyzeScripts(projectPath: string): ScriptInfo`

`DEPENDENCY_REGISTRY`

`categorize(name: string, isDevDep: boolean): DependencyCategory`

`fileExists(filePath: string): boolean`

`readFileContent(filePath: string): string`

`readLines(filePath: string): string[]`

`readJsonFile<T>(filePath: string): T | null`

`estimateTokens(text: string): number`

`AnalyzeOptions`

`FormatOptions`

`AnalyzerName`

`DependencyCategory`

`DetailLevel`

`OutputFormat`