codeoracle

v0.1.0

Published

19 days ago

Multi-agent system for analyzing legacy codebases using AI

0High
0Medium
0Low

legacy-code code-analysis ai-agents multi-agent claude anthropic documentation modernization refactoring neo4j graph-database

codeoracle

A multi-agent system for analyzing and understanding legacy codebases.

codeoracle uses specialized AI agents to analyze complex, inherited code, generating comprehensive documentation, identifying issues, and preparing codebases for modernization.

Why codeoracle

Legacy codebases present unique challenges:

| Challenge | codeoracle Solution | |-----------|---------------------| | Undocumented code | Documentation Agent generates missing docs | | Unknown dependencies | Dependency Agent maps internal/external deps | | Hidden security issues | Security Agent audits for vulnerabilities | | Dead code accumulation | Structure Agent identifies unused code | | Complex architecture | Integrator creates architecture diagrams | | Knowledge silos | Chat interface enables conversational exploration |

Key Features

Multi-agent architecture - Specialized agents for different analysis aspects
Context optimization - Handles large codebases through chunking and caching
Model selection - Uses appropriate models (Haiku/Sonnet/Opus) per task
Neo4j integration - Visualize dependencies as an interactive graph
Incremental analysis - Resume long-running analyses
Comprehensive reports - Markdown reports, JSON data, visual diagrams

Installation

Prerequisites

Bun >= 1.0.0
Neo4j (optional, for graph visualization)
Anthropic API key

Install globally

bun install -g codeoracle

Or run directly

bunx codeoracle <command>

Environment Setup

export ANTHROPIC_API_KEY="your-api-key"

# Optional: Neo4j connection
export NEO4J_URI="bolt://localhost:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="password"

Quick Start

1. Initialize configuration

cd your-legacy-project
codeoracle init

This creates codeoracle.config.yaml:

project:
  name: "my-legacy-project"
  path: "./src"

languages:
  - javascript
  - typescript

goals:
  - understand_architecture
  - generate_documentation

2. Scan the codebase

codeoracle scan

Output:

Scanning project structure...

Files found: 847
Languages: TypeScript (623), JavaScript (189), JSON (35)
Total size: 12.4 MB
Entry points detected: 3

Structure saved to .codeoracle/structure.json

3. Run analysis

codeoracle analyze

Output:

Starting multi-agent analysis...

[1/5] Structure Agent (Haiku)
      Mapping modules and relationships...
      Found 42 modules, 156 exports

[2/5] Dependency Agent (Haiku)
      Analyzing imports...
      Internal: 234, External: 47

[3/5] Pattern Agent (Sonnet)
      Detecting patterns and anti-patterns...
      Patterns: 12, Anti-patterns: 8

[4/5] Security Agent (Opus)
      Auditing for vulnerabilities...
      Critical: 0, High: 2, Medium: 5

[5/5] Integration Agent (Sonnet)
      Consolidating results...

Analysis complete. Reports saved to ./output/

4. View reports

codeoracle report

Opens generated reports in your default viewer.

Commands

init

Initialize a new codeoracle configuration.

codeoracle init [options]

Options:

| Option | Description | |--------|-------------| | --template <name> | Use a preset template (node, python, java) | | --force | Overwrite existing configuration |

scan

Scan and map the codebase structure.

codeoracle scan [options]

Options:

| Option | Description | |--------|-------------| | --depth <n> | Maximum directory depth (default: 10) | | --output <path> | Output file path |

analyze

Run multi-agent analysis on the codebase.

codeoracle analyze [options]

Options:

| Option | Description | |--------|-------------| | --full | Run all agents (default) | | --structure | Only structure analysis | | --deps | Only dependency analysis | | --patterns | Only pattern detection | | --security | Only security audit | | --resume <id> | Resume previous analysis | | --parallel | Run agents in parallel |

report

Generate and view reports.

codeoracle report [options]

Options:

| Option | Description | |--------|-------------| | --format <fmt> | Output format: md, html, json | | --open | Open in default viewer |

graph

Visualize the codebase graph in Neo4j.

codeoracle graph [options]

Options:

| Option | Description | |--------|-------------| | --browser | Open Neo4j browser | | --export <path> | Export graph as JSON |

chat

Interactive chat with your codebase.

codeoracle chat

Example session:

> What are the main entry points?
The codebase has 3 main entry points:
1. src/index.ts - Main application
2. src/api/server.ts - REST API
3. src/cli/main.ts - CLI tool

> How is authentication handled?
Authentication is handled in src/auth/ module...

Configuration

Full configuration reference

# codeoracle.config.yaml

project:
  name: "project-name"
  path: "./src"
  description: "Optional project description"

languages:
  - typescript
  - javascript
  - sql
  - graphql

structure:
  entry_points:
    - src/index.ts
    - src/api/server.ts
  ignore:
    - node_modules
    - dist
    - coverage
    - "*.test.ts"
    - "*.spec.ts"
  max_file_size: 1048576  # 1MB

context:
  documentation: "./docs"
  diagrams: "./architecture"
  readme: "./README.md"

goals:
  - understand_architecture
  - find_dead_code
  - security_audit
  - generate_documentation
  - identify_patterns

analysis:
  parallel: true
  cache_ttl: 3600  # 1 hour
  max_tokens_per_file: 8000

security:
  check_secrets: true
  check_vulnerabilities: true
  owasp_top_10: true

output:
  directory: "./output"
  formats:
    - markdown
    - json

neo4j:
  enabled: true
  uri: "bolt://localhost:7687"
  database: "codeoracle"

Architecture

Multi-Agent System

┌─────────────────────────────────────────────────────────────┐
│                         USER INPUT                           │
│              codeoracle.config.yaml + CLI flags              │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                    COORDINATOR (Sonnet)                      │
│                                                              │
│  - Parses configuration                                      │
│  - Orchestrates agent execution                              │
│  - Manages context and caching                               │
│  - Consolidates results                                      │
└─────────────────────────┬───────────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        │                 │                 │
        ▼                 ▼                 ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│   STRUCTURE   │ │   ANALYZER    │ │   SECURITY    │
│    (Haiku)    │ │   (Sonnet)    │ │    (Opus)     │
│               │ │               │ │               │
│ - File tree   │ │ - Patterns    │ │ - Vulns       │
│ - Modules     │ │ - Dependencies│ │ - Secrets     │
│ - Exports     │ │ - Data flow   │ │ - OWASP       │
│ - Entry pts   │ │ - Complexity  │ │ - Best pracs  │
└───────────────┘ └───────────────┘ └───────────────┘
        │                 │                 │
        └─────────────────┼─────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                    INTEGRATOR (Sonnet)                       │
│                                                              │
│  - Merges agent outputs                                      │
│  - Generates documentation                                   │
│  - Creates architecture diagrams                             │
│  - Produces actionable recommendations                       │
└─────────────────────────┬───────────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        │                 │                 │
        ▼                 ▼                 ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│    REPORTS    │ │    NEO4J      │ │    CACHE      │
│               │ │               │ │               │
│ - Markdown    │ │ - Nodes       │ │ - Prompts     │
│ - JSON        │ │ - Relations   │ │ - Sessions    │
│ - HTML        │ │ - Queries     │ │ - Results     │
└───────────────┘ └───────────────┘ └───────────────┘

Agents

Coordinator Agent

Model: Sonnet Role: Orchestration and consolidation

Responsibilities:

Parse user configuration
Plan analysis strategy
Delegate to specialized agents
Manage context budget
Handle errors and retries

Structure Agent

Model: Haiku (fast, cost-effective) Tools: Glob, Grep, Read

Analyzes:

Directory structure
File types and sizes
Module boundaries
Export/import relationships
Entry points

Dependency Agent

Model: Haiku Tools: Read, Grep

Analyzes:

Internal dependencies
External packages
Circular dependencies
Unused dependencies
Version conflicts

Pattern Agent

Model: Sonnet (requires reasoning) Tools: Read, Grep

Detects:

Design patterns (Factory, Singleton, Observer, etc.)
Anti-patterns (God class, Spaghetti code, etc.)
Code smells
Complexity hotspots

Security Agent

Model: Opus (critical analysis) Tools: Read, Grep

Checks:

Hardcoded secrets
SQL injection vectors
XSS vulnerabilities
Insecure dependencies
OWASP Top 10
Authentication issues

Documentation Agent

Model: Sonnet Tools: Read, Write

Generates:

Module documentation
API documentation
Architecture overview
Getting started guide

Integrator Agent

Model: Sonnet Tools: Read, Write

Produces:

Executive summary
Consolidated findings
Priority recommendations
Migration roadmap

Output

Generated Files

output/
├── report.md                 # Executive summary
├── architecture.md           # System architecture
├── architecture.mermaid      # Mermaid diagram
├── dependencies.json         # Dependency graph
├── dependencies.md           # Dependency analysis
├── security-findings.md      # Security audit results
├── security-findings.json    # Machine-readable findings
├── patterns.md               # Detected patterns
├── dead-code.md              # Unused code report
├── recommendations.md        # Prioritized actions
└── documentation/
    ├── modules/
    │   ├── auth.md
    │   ├── api.md
    │   └── ...
    └── api/
        ├── endpoints.md
        └── schemas.md

Report Format

# Analysis Report: my-legacy-project

## Executive Summary

- **Total files**: 847
- **Languages**: TypeScript, JavaScript
- **Complexity score**: 7.2/10
- **Security issues**: 2 high, 5 medium
- **Documentation coverage**: 34%

## Critical Findings

### Security

1. **Hardcoded API key** in `src/config/api.ts:42`
   - Severity: High
   - Recommendation: Move to environment variables

...

Neo4j Integration

Setup

Install Neo4j Desktop or use Docker:

docker run -d \
  --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:latest

Enable in configuration:

neo4j:
  enabled: true
  uri: "bolt://localhost:7687"
  database: "codeoracle"

Graph Schema

Nodes:

Module - Code modules/files
Function - Functions/methods
Class - Classes
Package - External dependencies

Relationships:

IMPORTS - Import relationships
CALLS - Function calls
EXTENDS - Class inheritance
DEPENDS_ON - Package dependencies

Example Queries

// Find modules with most dependencies
MATCH (m:Module)-[r:IMPORTS]->(d)
RETURN m.name, count(r) as deps
ORDER BY deps DESC
LIMIT 10

// Find circular dependencies
MATCH (a:Module)-[:IMPORTS]->(b:Module)-[:IMPORTS]->(a)
RETURN a.name, b.name

// Find unused modules
MATCH (m:Module)
WHERE NOT (m)<-[:IMPORTS]-()
RETURN m.name

Context Optimization

codeoracle uses several techniques to handle large codebases:

1. Prompt Caching

Static content (documentation, structure) is cached:

5-minute ephemeral cache (default)
1-hour extended cache (for batch processing)

2. Chunking

Large files are split into semantic chunks:

Function-level chunking
Class-level chunking
Block-level chunking

3. Summarization Chain

For very large files:

Chunk the file
Summarize each chunk
Summarize the summaries
Use summary for high-level analysis

4. Model Selection

| Task | Model | Reason | |------|-------|--------| | File scanning | Haiku | Fast, cheap | | Pattern detection | Sonnet | Good reasoning | | Security audit | Opus | Critical accuracy | | Documentation | Sonnet | Quality writing |

5. Session Resumption

Long analyses can be resumed:

# Start analysis
codeoracle analyze
# Output: Analysis ID: abc123

# Resume if interrupted
codeoracle analyze --resume abc123

API Usage

codeoracle can be used programmatically:

import { CodeOracle } from 'codeoracle';

const oracle = new CodeOracle({
  projectPath: './src',
  languages: ['typescript'],
  goals: ['security_audit']
});

// Run analysis
const results = await oracle.analyze();

// Access specific reports
console.log(results.security.findings);
console.log(results.architecture.modules);

// Chat with codebase
const answer = await oracle.chat('How is auth handled?');

Contributing

Contributions are welcome. Please read the contributing guidelines before submitting a pull request.

Development Setup

git clone https://github.com/yourusername/codeoracle.git
cd codeoracle
bun install
bun run dev

Running Tests

bun test
bun test --coverage

License

MIT License. See LICENSE for details.

Changelog

See CHANGELOG.md for version history.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

codeoracle

Table of Contents

Why codeoracle

Key Features

Installation

Prerequisites

Install globally

Or run directly

Environment Setup

Quick Start

1. Initialize configuration

2. Scan the codebase

3. Run analysis

4. View reports

Commands

init

scan

analyze

report

graph

chat

Configuration

Full configuration reference

Architecture

Multi-Agent System

Agents

Coordinator Agent

Structure Agent

Dependency Agent

Pattern Agent

Security Agent

Documentation Agent

Integrator Agent

Output

Generated Files

Report Format

Neo4j Integration

Setup

Graph Schema

Example Queries

Context Optimization

1. Prompt Caching

2. Chunking

3. Summarization Chain

4. Model Selection

5. Session Resumption

API Usage

Contributing

Development Setup

Running Tests

License

Changelog