codeoracle
v0.1.0
Published
Multi-agent system for analyzing legacy codebases using AI
Maintainers
Readme
codeoracle
A multi-agent system for analyzing and understanding legacy codebases.
codeoracle uses specialized AI agents to analyze complex, inherited code, generating comprehensive documentation, identifying issues, and preparing codebases for modernization.
Table of Contents
- Why codeoracle
- Installation
- Quick Start
- Commands
- Configuration
- Architecture
- Agents
- Output
- Neo4j Integration
- Context Optimization
- Contributing
- License
Why codeoracle
Legacy codebases present unique challenges:
| Challenge | codeoracle Solution | |-----------|---------------------| | Undocumented code | Documentation Agent generates missing docs | | Unknown dependencies | Dependency Agent maps internal/external deps | | Hidden security issues | Security Agent audits for vulnerabilities | | Dead code accumulation | Structure Agent identifies unused code | | Complex architecture | Integrator creates architecture diagrams | | Knowledge silos | Chat interface enables conversational exploration |
Key Features
- Multi-agent architecture - Specialized agents for different analysis aspects
- Context optimization - Handles large codebases through chunking and caching
- Model selection - Uses appropriate models (Haiku/Sonnet/Opus) per task
- Neo4j integration - Visualize dependencies as an interactive graph
- Incremental analysis - Resume long-running analyses
- Comprehensive reports - Markdown reports, JSON data, visual diagrams
Installation
Prerequisites
Install globally
bun install -g codeoracleOr run directly
bunx codeoracle <command>Environment Setup
export ANTHROPIC_API_KEY="your-api-key"
# Optional: Neo4j connection
export NEO4J_URI="bolt://localhost:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="password"Quick Start
1. Initialize configuration
cd your-legacy-project
codeoracle initThis creates codeoracle.config.yaml:
project:
name: "my-legacy-project"
path: "./src"
languages:
- javascript
- typescript
goals:
- understand_architecture
- generate_documentation2. Scan the codebase
codeoracle scanOutput:
Scanning project structure...
Files found: 847
Languages: TypeScript (623), JavaScript (189), JSON (35)
Total size: 12.4 MB
Entry points detected: 3
Structure saved to .codeoracle/structure.json3. Run analysis
codeoracle analyzeOutput:
Starting multi-agent analysis...
[1/5] Structure Agent (Haiku)
Mapping modules and relationships...
Found 42 modules, 156 exports
[2/5] Dependency Agent (Haiku)
Analyzing imports...
Internal: 234, External: 47
[3/5] Pattern Agent (Sonnet)
Detecting patterns and anti-patterns...
Patterns: 12, Anti-patterns: 8
[4/5] Security Agent (Opus)
Auditing for vulnerabilities...
Critical: 0, High: 2, Medium: 5
[5/5] Integration Agent (Sonnet)
Consolidating results...
Analysis complete. Reports saved to ./output/4. View reports
codeoracle reportOpens generated reports in your default viewer.
Commands
init
Initialize a new codeoracle configuration.
codeoracle init [options]Options:
| Option | Description |
|--------|-------------|
| --template <name> | Use a preset template (node, python, java) |
| --force | Overwrite existing configuration |
scan
Scan and map the codebase structure.
codeoracle scan [options]Options:
| Option | Description |
|--------|-------------|
| --depth <n> | Maximum directory depth (default: 10) |
| --output <path> | Output file path |
analyze
Run multi-agent analysis on the codebase.
codeoracle analyze [options]Options:
| Option | Description |
|--------|-------------|
| --full | Run all agents (default) |
| --structure | Only structure analysis |
| --deps | Only dependency analysis |
| --patterns | Only pattern detection |
| --security | Only security audit |
| --resume <id> | Resume previous analysis |
| --parallel | Run agents in parallel |
report
Generate and view reports.
codeoracle report [options]Options:
| Option | Description |
|--------|-------------|
| --format <fmt> | Output format: md, html, json |
| --open | Open in default viewer |
graph
Visualize the codebase graph in Neo4j.
codeoracle graph [options]Options:
| Option | Description |
|--------|-------------|
| --browser | Open Neo4j browser |
| --export <path> | Export graph as JSON |
chat
Interactive chat with your codebase.
codeoracle chatExample session:
> What are the main entry points?
The codebase has 3 main entry points:
1. src/index.ts - Main application
2. src/api/server.ts - REST API
3. src/cli/main.ts - CLI tool
> How is authentication handled?
Authentication is handled in src/auth/ module...Configuration
Full configuration reference
# codeoracle.config.yaml
project:
name: "project-name"
path: "./src"
description: "Optional project description"
languages:
- typescript
- javascript
- sql
- graphql
structure:
entry_points:
- src/index.ts
- src/api/server.ts
ignore:
- node_modules
- dist
- coverage
- "*.test.ts"
- "*.spec.ts"
max_file_size: 1048576 # 1MB
context:
documentation: "./docs"
diagrams: "./architecture"
readme: "./README.md"
goals:
- understand_architecture
- find_dead_code
- security_audit
- generate_documentation
- identify_patterns
analysis:
parallel: true
cache_ttl: 3600 # 1 hour
max_tokens_per_file: 8000
security:
check_secrets: true
check_vulnerabilities: true
owasp_top_10: true
output:
directory: "./output"
formats:
- markdown
- json
neo4j:
enabled: true
uri: "bolt://localhost:7687"
database: "codeoracle"Architecture
Multi-Agent System
┌─────────────────────────────────────────────────────────────┐
│ USER INPUT │
│ codeoracle.config.yaml + CLI flags │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ COORDINATOR (Sonnet) │
│ │
│ - Parses configuration │
│ - Orchestrates agent execution │
│ - Manages context and caching │
│ - Consolidates results │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ STRUCTURE │ │ ANALYZER │ │ SECURITY │
│ (Haiku) │ │ (Sonnet) │ │ (Opus) │
│ │ │ │ │ │
│ - File tree │ │ - Patterns │ │ - Vulns │
│ - Modules │ │ - Dependencies│ │ - Secrets │
│ - Exports │ │ - Data flow │ │ - OWASP │
│ - Entry pts │ │ - Complexity │ │ - Best pracs │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
└─────────────────┼─────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ INTEGRATOR (Sonnet) │
│ │
│ - Merges agent outputs │
│ - Generates documentation │
│ - Creates architecture diagrams │
│ - Produces actionable recommendations │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ REPORTS │ │ NEO4J │ │ CACHE │
│ │ │ │ │ │
│ - Markdown │ │ - Nodes │ │ - Prompts │
│ - JSON │ │ - Relations │ │ - Sessions │
│ - HTML │ │ - Queries │ │ - Results │
└───────────────┘ └───────────────┘ └───────────────┘Agents
Coordinator Agent
Model: Sonnet Role: Orchestration and consolidation
Responsibilities:
- Parse user configuration
- Plan analysis strategy
- Delegate to specialized agents
- Manage context budget
- Handle errors and retries
Structure Agent
Model: Haiku (fast, cost-effective) Tools: Glob, Grep, Read
Analyzes:
- Directory structure
- File types and sizes
- Module boundaries
- Export/import relationships
- Entry points
Dependency Agent
Model: Haiku Tools: Read, Grep
Analyzes:
- Internal dependencies
- External packages
- Circular dependencies
- Unused dependencies
- Version conflicts
Pattern Agent
Model: Sonnet (requires reasoning) Tools: Read, Grep
Detects:
- Design patterns (Factory, Singleton, Observer, etc.)
- Anti-patterns (God class, Spaghetti code, etc.)
- Code smells
- Complexity hotspots
Security Agent
Model: Opus (critical analysis) Tools: Read, Grep
Checks:
- Hardcoded secrets
- SQL injection vectors
- XSS vulnerabilities
- Insecure dependencies
- OWASP Top 10
- Authentication issues
Documentation Agent
Model: Sonnet Tools: Read, Write
Generates:
- Module documentation
- API documentation
- Architecture overview
- Getting started guide
Integrator Agent
Model: Sonnet Tools: Read, Write
Produces:
- Executive summary
- Consolidated findings
- Priority recommendations
- Migration roadmap
Output
Generated Files
output/
├── report.md # Executive summary
├── architecture.md # System architecture
├── architecture.mermaid # Mermaid diagram
├── dependencies.json # Dependency graph
├── dependencies.md # Dependency analysis
├── security-findings.md # Security audit results
├── security-findings.json # Machine-readable findings
├── patterns.md # Detected patterns
├── dead-code.md # Unused code report
├── recommendations.md # Prioritized actions
└── documentation/
├── modules/
│ ├── auth.md
│ ├── api.md
│ └── ...
└── api/
├── endpoints.md
└── schemas.mdReport Format
# Analysis Report: my-legacy-project
## Executive Summary
- **Total files**: 847
- **Languages**: TypeScript, JavaScript
- **Complexity score**: 7.2/10
- **Security issues**: 2 high, 5 medium
- **Documentation coverage**: 34%
## Critical Findings
### Security
1. **Hardcoded API key** in `src/config/api.ts:42`
- Severity: High
- Recommendation: Move to environment variables
...Neo4j Integration
Setup
- Install Neo4j Desktop or use Docker:
docker run -d \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
neo4j:latest- Enable in configuration:
neo4j:
enabled: true
uri: "bolt://localhost:7687"
database: "codeoracle"Graph Schema
Nodes:
Module- Code modules/filesFunction- Functions/methodsClass- ClassesPackage- External dependencies
Relationships:
IMPORTS- Import relationshipsCALLS- Function callsEXTENDS- Class inheritanceDEPENDS_ON- Package dependencies
Example Queries
// Find modules with most dependencies
MATCH (m:Module)-[r:IMPORTS]->(d)
RETURN m.name, count(r) as deps
ORDER BY deps DESC
LIMIT 10
// Find circular dependencies
MATCH (a:Module)-[:IMPORTS]->(b:Module)-[:IMPORTS]->(a)
RETURN a.name, b.name
// Find unused modules
MATCH (m:Module)
WHERE NOT (m)<-[:IMPORTS]-()
RETURN m.nameContext Optimization
codeoracle uses several techniques to handle large codebases:
1. Prompt Caching
Static content (documentation, structure) is cached:
- 5-minute ephemeral cache (default)
- 1-hour extended cache (for batch processing)
2. Chunking
Large files are split into semantic chunks:
- Function-level chunking
- Class-level chunking
- Block-level chunking
3. Summarization Chain
For very large files:
- Chunk the file
- Summarize each chunk
- Summarize the summaries
- Use summary for high-level analysis
4. Model Selection
| Task | Model | Reason | |------|-------|--------| | File scanning | Haiku | Fast, cheap | | Pattern detection | Sonnet | Good reasoning | | Security audit | Opus | Critical accuracy | | Documentation | Sonnet | Quality writing |
5. Session Resumption
Long analyses can be resumed:
# Start analysis
codeoracle analyze
# Output: Analysis ID: abc123
# Resume if interrupted
codeoracle analyze --resume abc123API Usage
codeoracle can be used programmatically:
import { CodeOracle } from 'codeoracle';
const oracle = new CodeOracle({
projectPath: './src',
languages: ['typescript'],
goals: ['security_audit']
});
// Run analysis
const results = await oracle.analyze();
// Access specific reports
console.log(results.security.findings);
console.log(results.architecture.modules);
// Chat with codebase
const answer = await oracle.chat('How is auth handled?');Contributing
Contributions are welcome. Please read the contributing guidelines before submitting a pull request.
Development Setup
git clone https://github.com/yourusername/codeoracle.git
cd codeoracle
bun install
bun run devRunning Tests
bun test
bun test --coverageLicense
MIT License. See LICENSE for details.
Changelog
See CHANGELOG.md for version history.
