cf-indexer
v2.0.2
Published
High-performance universal codebase indexer for AI assistants and development tools
Maintainers
Readme
@cloneglobal/indexer
Universal code indexer for AI assistants and development tools with Claude Code integration.
Production Status: Successfully deployed on Clone Global monorepo (1,408+ files indexed in 13.2s) Claude Integration: Full
-iflag support withindexer-analyzersubagent for enhanced code intelligence
📊 Implementation Status
The indexer exceeds original specifications with enhanced capabilities:
- 9 language parsers implemented (vs 4 specified)
- 5 export formats (vs 0 specified)
- Real-time updates with 300ms debouncing via Chokidar
- Monorepo support with cross-service dependency tracking
- Claude Code hooks for AI assistant integration Run one command and get:
- Complete Code Index: All languages, all files, all relationships
- Multi-Repo Knowledge Graph: Cross-repository dependency visualization
- Interactive Dashboard: Beautiful HTML interface showing everything
- Mermaid Diagrams: Three types of visualizations for VS Code/Cursor
- All Export Formats: JSON, Markdown, GraphViz, ASCII - generated simultaneously
- IDE Integration: VS Code and Cursor support configured automatically
- Service Detection: Monorepo, multi-repo, frameworks - all detected
See docs/spec-compare.md for detailed implementation vs specification comparison.
🚀 Quick Install
# One-line installer
curl -fsSL https://raw.githubusercontent.com/clone-global/indexer/main/install.sh | bash
# Or clone and install manually
git clone https://github.com/clone-global/indexer.git
cd indexer
yarn install
yarn build🎣 Claude Code Integration
Hook Integration (-i flag)
Use the -i flag in any Claude prompt to automatically index your project:
claude "fix auth bug -i" # Auto-index with 50k tokens (default)
claude "refactor database -i75" # Target 75k tokens
claude "analyze security -ic200" # Export 200k tokens to clipboard for external AIDirect Reference (@PROJECT_INDEX.json)
For small projects, reference the index directly in your prompts:
# First, create the index
indexer init
# Then reference it directly
claude "@PROJECT_INDEX.json what functions handle authentication?"Auto-Regeneration
The index automatically updates:
- After every Claude session (via Stop hook)
- When you use the
-iflag - Captures all changes made during your work
Smart Features
- Auto-sizing: Remembers your last token preference per project
- Progressive compression: Automatically adjusts detail level to fit token limits
- Session awareness: Stop hook regenerates index after each session
- Clipboard export: Share index with any AI tool supporting large contexts
- Specialized agent:
indexer-analyzeragent provides strategic code insights - Direct references: Use
@PROJECT_INDEX.jsonfor instant access
✨ Key Features
Core Capabilities
- ⚡ Lightning Fast - Index 1,408+ files in 13.2s (106 files/second)
- 🔄 Real-time Updates - Auto-refresh with 300ms debounced file watching
- 🌐 Multi-language - 9 languages: JS/TS, Python, Go, SQL, GraphQL, YAML, Astro, JSON
- 🤖 AI-Optimized - Claude Code integration with
indexer-analyzersubagent - 📊 Rich Visualizations - 6 export formats: JSON, GraphViz, Markdown, Mermaid, ASCII, Compressed
- 🎨 Interactive Diagrams - HTML-rendered visualizations with zoom and export
- 🏗️ Monorepo Aware - Cross-repository dependency tracking and service detection
- 💾 Smart Caching - LRU cache with 500MB limit prevents memory leaks
🆕 Call Graph Analysis (v2.0.2)
- Function Call Tracking - Bidirectional mapping of which functions call which others
- Dead Code Detection - Automatically identifies unused functions
- Circular Dependencies - Finds and reports dependency cycles
- Impact Analysis - Shows all functions affected by changes
- Execution Path Tracing - Follows code flow from entry points
- Cross-File Dependencies - Tracks calls across file boundaries
🆕 AI-Optimized Compression (v2.0.2)
- 50-70% Size Reduction - Compressed format for maximum information density
- Token-Aware Sizing - Respects LLM token limits (configurable 1k-800k)
- Progressive Compression - Three levels: minimal, standard, aggressive
- Unified Exporter - Single JSONExporter handles all formats
- Smart Truncation - Preserves most important information when hitting limits
Advanced Claude Code SDK Integration
- Native TypeScript SDK: Full
@anthropic-ai/claude-codeintegration with streaming - MCP (Model Context Protocol): 16+ custom tool integrations (Slack, Linear, Datadog, Semgrep, Snyk)
- Specialized AI Agents: Security, Performance, Architecture, and Testing agents
- Real-time Streaming: SSE/WebSocket for live analysis updates
- Session Management: Multi-turn conversations with context retention
- Concurrent Analysis: 4x faster with parallel agent execution
Language Parsing Excellence
- Tree-sitter Python Parser: 100% accurate AST parsing for Python 3 (decorators, type hints, async/await, metaclasses)
- Babel JavaScript/TypeScript: Full ES2024 and TypeScript 5+ support
- GraphQL AST Parser: Complete schema and query analysis
- Multi-language Support: 9 languages with extensible architecture
Multi-Repository Knowledge Graph
- Cross-Repo Dependencies: Visualize how backend APIs connect to frontend, skills, and other services
- API Flow Mapping: Track REST and GraphQL calls across repository boundaries
- Service Architecture: Understand the complete microservices ecosystem
- Mermaid Diagrams: Generate beautiful visualizations viewable in VS Code/Cursor
Capabilities
- Bug Prediction: 70-85% accuracy using specialized agents
- Security Scanning: OWASP Top 10 compliance with MCP-powered Semgrep/Snyk
- Code Smell Detection: Pattern recognition with architectural analysis
- Test Generation: Automated test creation with Jest/Mocha integration
- Refactoring Suggestions: Context-aware improvements with session continuity
- Performance Analysis: Real-time profiling with Datadog metrics
- Secret Detection: Hardcoded credentials and API key scanning
- Dependency Analysis: CVE tracking and vulnerability assessment
What Is This?
The @cloneglobal/indexer is an AI-powered code intelligence platform that analyzes your entire codebase and creates a comprehensive JSON index containing:
- All functions, classes, and their relationships
- Import/export dependencies across files
- Cross-service dependencies in monorepos
- Code complexity metrics
- Language statistics
- AI-powered bug predictions and security analysis
This index enables:
- Assistants (like Claude) to understand your entire codebase context across multiple repos
- Predictive Bug Detection using Claude Code SDK
- Security Vulnerability Scanning with AI-powered analysis
- Automated Test Generation with 80% coverage targets
- Automated Bug Triage via Slack bot that creates Linear tickets
- Real-time Monitoring through Datadog dashboards
- Instant Code Search across millions of lines
- Architecture Visualization with dependency graphs
- Unified Dashboard: Single interface showing everything:
- Statistics cards
- Feature cards with direct links
- Quick action buttons
- Beautiful gradient UI
First-Run Experience
The indexer includes a friendly welcome wizard for new users:
- Automatically detects first-time usage
- Shows getting started guide
- Displays success message after first index
- Creates marker file to track initialization
Current Issues
Test Coverage
- Current: 8% (4 test files)
- Target: 80% coverage
- Risk: Limited test coverage may impact reliability
Technical Debt
- Type safety:
noImplicitAny: falseallows unsafeanytypes - Memory management: CacheManager underutilized by parsers
- Performance: No incremental parsing (full re-parse on changes)
Installation
For Users (via NPM Registry)
# Install globally with yarn (recommended)
yarn global add cf-indexer
# Or with npm
npm install -g cf-indexerFor Contributors (from Source)
# Clone the private repository
git clone [email protected]:clone-global/indexer.git
cd indexer
# Install dependencies
yarn install
# Build the project
yarn build
# Link globally for CLI usage
yarn linkFuture yarn Installation (Not Yet Available)
# Will be available after yarn publish
yarn global add @cloneglobal/indexerQuick Start
# Initialize in your project
indexer init
# Scan and create index
indexer scan
# Watch for changes
indexer watch
# Query the index
indexer query "handleClick"
# Export in different formats
indexer export json --pretty # Standard format with indentation
indexer export compressed --token-limit 50000 # AI-optimized compressed format
indexer export markdown --output docs/index.md
indexer export graphviz --output deps.dot
indexer export mermaid --output diagram.mmd
indexer export ascii --ascii-style tree --output structure.txtVisualization Exports
The indexer provides multiple visualization formats to help you understand your codebase:
Mermaid Diagrams
Generate interactive flowcharts, class diagrams, and ER diagrams:
# Basic Mermaid flowchart (limited to 50 nodes by default)
indexer export mermaid --output dependencies.mmd
# Interactive HTML viewer with zoom and export
indexer export mermaid --render-html --theme dark --output viewer.html
# IMPORTANT: For large codebases, use filtering to avoid size limits
# Limit number of nodes (default: 50)
indexer export mermaid --max-nodes 30 --render-html --output viewer.html
# Filter by service
indexer export mermaid --service-filter frontend --render-html --output frontend.html
# Filter by pattern
indexer export mermaid --filter-pattern "component|service" --output filtered.mmd
# Different diagram types
indexer export mermaid --diagram-type classDiagram --max-nodes 20 --output classes.mmd
indexer export mermaid --diagram-type erDiagram --output entities.mmd
# View online at https://mermaid.live
# Or render locally: mmdc -i diagram.mmd -o diagram.pngNote: Mermaid has a maximum text size limit. For large projects (>100 files), use --max-nodes, --service-filter, or --filter-pattern to reduce diagram complexity.
ASCII Visualizations
Generate text-based visualizations for documentation and terminals:
# Tree structure (like 'tree' command)
indexer export ascii --ascii-style tree --output structure.txt
# Dependency graph
indexer export ascii --ascii-style graph --output dependencies.txt
# Statistics table
indexer export ascii --ascii-style table --colorize --output stats.txt
# Service architecture boxes
indexer export ascii --ascii-style boxes --output services.txtGraphViz (DOT)
Traditional dependency graphs:
# Generate DOT file
indexer export graphviz --output deps.dot
# Convert to image
dot -Tpng deps.dot -o deps.png
dot -Tsvg deps.dot -o deps.svgExport Options
| Format | Options | Description |
|--------|---------|-------------|
| mermaid | --render-html | Generate interactive HTML viewer |
| | --theme | Theme: default, dark, forest, neutral |
| | --diagram-type | flowchart, classDiagram, erDiagram |
| | --max-nodes | Limit nodes (default: 50, important for large projects) |
| | --service-filter | Filter by service: frontend, backend, skills, etc. |
| | --filter-pattern | Regex pattern to filter files |
| ascii | --ascii-style | tree, graph, table, boxes |
| | --colorize | Add terminal colors |
| | --max-width | Maximum output width |
| graphviz | --rankdir | TB, BT, LR, RL |
| | --group-by-dir | Group files by directory |
Usage Guide
For New Projects
# Navigate to your project root
cd /path/to/your/project
# Initialize indexer
indexer init
# This creates:
# - .indexer.yml (configuration)
# - PROJECT_INDEX.json (the index file)
# - Starts file watcher
# Query your codebase
indexer query "function_name" --type function
indexer query "Component" --type class
indexer query "import" --type import
# Export for external AI tools
indexer export clipboard --token-limit 200000 # Copy to clipboard with 200k tokens
indexer export clipboard --token-limit 50000 # Default 50k for Claude
# Export documentation
indexer export markdown --output CODE_DOCUMENTATION.mdFor Existing Large Projects (Monorepos)
# Navigate to monorepo root
cd /path/to/monorepo
# Create comprehensive configuration
cat > .indexer.yml << 'EOF'
version: 2
name: my-monorepo
description: Monorepo indexing configuration
include:
- "**/*.{js,jsx,ts,tsx,py,go,sql,graphql,gql,yaml,yml,astro}"
ignore:
- "**/node_modules/**"
- "**/dist/**"
- "**/build/**"
- "**/.git/**"
performance:
parallel: true
workers: 6
cache: true
EOF
# Full scan (may take 1-2 minutes for large projects)
indexer scan --parallel 6
# Start watching for changes
indexer watch --debounce 300 --quiet
# Query across all services
indexer query "GraphQL" --type function
indexer query "skill" --type all | grep "function\|class"Daily Development Workflow
# Check project health
indexer health
# Find specific patterns
indexer query "useState" --type function
indexer query "mutation" --type function | grep -i skill
# Export dependency graph
indexer export graphviz --output deps.dot
dot -Tpng deps.dot -o dependencies.png
# Get project statistics
indexer stats --json
# Force refresh if needed
indexer scan --incrementalAdvanced Querying
# Find all React components
indexer query "Props$" --type class
# Find GraphQL operations
indexer query "query\|mutation" --type function --json
# Find cross-service dependencies
indexer query "" --circular
# Search within specific file types
indexer query "handleClick" --type function | grep ".tsx"
# Export filtered results
indexer query "API" --type all --json > api_analysis.jsonClaude Code Integration
The indexer automatically integrates with Claude Code through hooks:
# Check hook status
indexer hook --status
# Install hooks (if not auto-installed)
indexer hook --install
# The hooks will:
# - Update index before Claude reads files
# - Refresh index after Claude writes files
# - Provide project context without consuming tokensMonitoring & Maintenance
# Regular health check
indexer health
# Validate index integrity
indexer validate
# Clean and rebuild if issues
indexer clean
indexer scan --force
# Monitor file watcher
ps aux | grep "indexer watch"
# Check index age and size
ls -lah PROJECT_INDEX.jsonTroubleshooting
Index not updating:
# Kill existing watcher
pkill -f "indexer watch"
# Restart with verbose output
indexer watch --verbose --debounce 1000Large index size:
# Use quick mode
indexer init --mode quick
# Add more ignore patterns
echo "**/*.test.*" >> .indexerignore
echo "**/coverage/**" >> .indexerignoreMemory issues:
# Reduce parallel workers
indexer scan --parallel 2
# Use incremental updates only
indexer scan --incrementalIntegration Examples
Package.json scripts:
{
"scripts": {
"index": "indexer scan --incremental",
"index:watch": "indexer watch --quiet",
"index:docs": "indexer export markdown --output docs/CODE_INDEX.md",
"index:deps": "indexer export graphviz --output docs/dependencies.dot",
"predev": "indexer scan --incremental --quiet"
}
}Git hooks (.git/hooks/post-merge):
#!/bin/bash
indexer scan --incremental --quietOption 3: Separate Indexer Repository
For organizations managing multiple repositories, consider creating a dedicated indexer service repository:
Benefits of Separate Repository Approach
- Centralized Index Management: Single source of truth for all code intelligence
- Scalable CI/CD: GitHub Actions can index multiple repos on schedule
- Team Collaboration: Dedicated repository for indexer improvements and configuration
- Version Control: Track indexer configuration changes across your organization
- Distribution: Publish as npm/yarn package for consistent usage across teams
Implementation Structure
clone-global-indexer/
├── .github/workflows/
│ ├── multi-repo-index.yml # Index multiple repos
│ ├── publish-package.yml # Publish to npm registry
│ └── scheduled-updates.yml # Daily/weekly indexing
├── configs/
│ ├── monorepo.yml # Config for large monorepos
│ ├── microservices.yml # Config for service-oriented
│ └── standard.yml # Default configuration
├── scripts/
│ ├── index-org-repos.js # Script to index all org repos
│ ├── compare-indexes.js # Compare indexes between versions
│ └── health-check.js # Validate index integrity
├── src/ # Indexer source code
└── indexes/ # Generated indexes storage
├── repo1-PROJECT_INDEX.json
├── repo2-PROJECT_INDEX.json
└── combined-INDEX.json # Merged organization indexMulti-Repository GitHub Actions Workflow
# .github/workflows/multi-repo-index.yml
name: Index Organization Repositories
on:
schedule:
- cron: '0 */6 * * *' # Every 6 hours
workflow_dispatch: # Manual trigger
jobs:
index-repos:
runs-on: ubuntu-latest
strategy:
matrix:
repo: [
'clone-global/frontend',
'clone-global/backend',
'clone-global/skills',
'clone-global/data-ops',
'clone-global/marketing'
]
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v4
with:
repository: ${{ matrix.repo }}
path: ./repos/${{ matrix.repo }}
token: ${{ secrets.ORG_ACCESS_TOKEN }}
- uses: actions/setup-node@v4
with:
node-version: '18'
- name: Install indexer
run: |
yarn install
yarn build
- name: Index repository
run: |
cd repos/${{ matrix.repo }}
../../bin/indexer.js scan --output ../../indexes/${{ matrix.repo }}-INDEX.json
- name: Upload index artifact
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.repo }}-index
path: indexes/${{ matrix.repo }}-INDEX.json
combine-indexes:
needs: index-repos
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
with:
path: ./indexes
- name: Combine indexes
run: |
node scripts/combine-indexes.js
- name: Commit updated indexes
run: |
git config --local user.email "[email protected]"
git config --local user.name "Indexer Bot"
git add indexes/
git diff --staged --quiet || git commit -m "Auto-update organization code indexes"
git pushOrganization-wide Index Script
// scripts/index-org-repos.js
const { Octokit } = require("@octokit/rest");
const { execSync } = require("child_process");
const fs = require("fs");
const path = require("path");
class OrgIndexer {
constructor(org, token) {
this.octokit = new Octokit({ auth: token });
this.org = org;
}
async indexAllRepos() {
const repos = await this.getOrgRepos();
const indexes = [];
for (const repo of repos) {
console.log(`Indexing ${repo.name}...`);
const index = await this.indexRepository(repo);
if (index) indexes.push(index);
}
// Combine all indexes
const combinedIndex = this.combineIndexes(indexes);
fs.writeFileSync('./indexes/combined-INDEX.json', JSON.stringify(combinedIndex, null, 2));
return combinedIndex;
}
async getOrgRepos() {
const { data } = await this.octokit.repos.listForOrg({
org: this.org,
type: 'all',
sort: 'updated',
per_page: 100
});
return data.filter(repo =>
!repo.archived &&
!repo.fork &&
!repo.name.includes('archive')
);
}
async indexRepository(repo) {
try {
// Clone repo to temp directory
const tempDir = `./temp/${repo.name}`;
execSync(`git clone ${repo.clone_url} ${tempDir}`, { stdio: 'pipe' });
// Run indexer
const outputPath = `./indexes/${repo.name}-INDEX.json`;
execSync(`./bin/indexer.js scan ${tempDir} --output ${outputPath}`, { stdio: 'pipe' });
// Load and return index
const index = JSON.parse(fs.readFileSync(outputPath, 'utf-8'));
index.repository = repo;
// Cleanup
execSync(`rm -rf ${tempDir}`, { stdio: 'pipe' });
return index;
} catch (error) {
console.error(`Failed to index ${repo.name}:`, error.message);
return null;
}
}
combineIndexes(indexes) {
const combined = {
version: '2.0.0',
timestamp: new Date().toISOString(),
organization: this.org,
repositories: indexes.length,
totalFiles: indexes.reduce((sum, idx) => sum + Object.keys(idx.files || {}).length, 0),
files: {},
statistics: {
totalFiles: 0,
totalFunctions: 0,
totalClasses: 0,
languages: {}
}
};
for (const index of indexes) {
// Merge files with repository prefix
for (const [filePath, fileData] of Object.entries(index.files || {})) {
const prefixedPath = `${index.repository.name}/${filePath}`;
combined.files[prefixedPath] = fileData;
}
// Merge statistics
if (index.statistics) {
combined.statistics.totalFiles += index.statistics.totalFiles || 0;
combined.statistics.totalFunctions += index.statistics.totalFunctions || 0;
combined.statistics.totalClasses += index.statistics.totalClasses || 0;
for (const [lang, count] of Object.entries(index.statistics.languages || {})) {
combined.statistics.languages[lang] = (combined.statistics.languages[lang] || 0) + count;
}
}
}
return combined;
}
}
// Usage
if (require.main === module) {
const org = process.env.GITHUB_ORG || 'clone-global';
const token = process.env.GITHUB_TOKEN;
if (!token) {
console.error('GITHUB_TOKEN environment variable required');
process.exit(1);
}
const indexer = new OrgIndexer(org, token);
indexer.indexAllRepos().then(() => {
console.log('Organization indexing complete');
}).catch(error => {
console.error('Indexing failed:', error);
process.exit(1);
});
}Setup Instructions for Option 3
# 1. Create new repository
git clone https://github.com/clone-global/indexer.git
cd indexer
# 2. Install and configure
yarn install
yarn build
# 3. Set up GitHub secrets
# - GITHUB_TOKEN: Personal access token with repo access
# - ORG_ACCESS_TOKEN: Organization access token
# 4. Configure organization settings
cp configs/monorepo.yml .indexer.yml
# Edit .indexer.yml for your organization
# 5. Test locally
node scripts/index-org-repos.js
# 6. Enable GitHub Actions workflows
git add .github/workflows/
git commit -m "Add multi-repo indexing workflows"
git pushIntegration with Claude Code
# Install organization indexer package globally
yarn global add @cloneglobal/indexer
# Or use specific repository indexes
export CLAUDE_CODE_CONTEXT="https://raw.githubusercontent.com/clone-global/indexer/main/indexes/combined-INDEX.json"Supported Languages
| Language | Extensions | Features |
|------------|-------------------------------|-------------------------------------------|
| JavaScript | .js, .jsx | Functions, classes, imports/exports |
| TypeScript | .ts, .tsx | Types, interfaces, decorators |
| Python | .py | Functions, classes, decorators |
| Go | .go | Functions, structs, interfaces |
| SQL | .sql | Tables, functions, procedures |
| GraphQL | .graphql, .gql | Types, queries, mutations |
| YAML | .yaml, .yml | Configuration structures |
| Astro | .astro | Components, frontmatter, client scripts |
Output Organization
All indexer outputs are organized in a centralized .indexer-output/ directory:
project-root/
├── .indexer.yml # Configuration file (at project root)
└── .indexer-output/ # All outputs (gitignored)
├── indexes/
│ └── PROJECT_INDEX.json # Main index
├── docs/
│ └── CODE_INDEX.md # Documentation
├── visualizations/
│ ├── dependencies.dot # GraphViz
│ └── dependencies.mmd # Mermaid
└── viewers/
└── *.html # Interactive viewersConfiguration
Important: Place .indexer.yml at the root of the project being indexed, not in the indexer tool directory.
.indexer.yml
version: 2
name: my-project
description: Project indexing configuration
# Include/exclude patterns
include:
- "**/*.{js,jsx,ts,tsx,py,go,sql,graphql,gql,yaml,yml,astro}"
ignore:
- "**/node_modules/**"
- "**/dist/**"
- "**/.git/**"
# Performance tuning
performance:
parallel: true
workers: 4
cache: true
cacheDir: .indexer-cache
maxFileSize: 2MB
maxFiles: 10000
# Export settings
export:
outputDirectory: .indexer-output # Centralized output location
formats:
json:
path: indexes/PROJECT_INDEX.json
markdown:
path: docs/CODE_INDEX.md
graphviz:
path: visualizations/dependencies.dot
# AI assistant integration
integrations:
claude:
enabled: true
hooks: auto
github:
actions: true
pr_comments: true.indexerconfig.json
{
"include": ["**/*.{js,ts,py,go,astro}"],
"ignore": ["**/node_modules/**", "**/dist/**"],
"parallel": true,
"workers": 4,
"cache": true,
"monorepo": true
}CLI Commands
Core Commands
# Initialize project
indexer init [options]
--mode <type> # quick|full|deep (default: full)
--watch # Start watching immediately
--output <path> # Custom output path
# Scan files
indexer scan [path] [options]
--incremental # Update existing index
--parallel <n> # Number of workers
--quiet # Suppress output
# Watch for changes
indexer watch [options]
--debounce <ms> # Debounce delay (default: 300)
--ignore <glob> # Additional ignore patternsQuery Commands
# Search by name
indexer query "functionName"
indexer query "ClassName" --type class
indexer query "*.test.*" --type function
# Advanced queries
indexer query --type all --unused # Find unused exports
indexer query --type import --circular # Detect circular depsExport Commands
# Export to different formats
indexer export json --output index.json --pretty
indexer export graphviz --output deps.dot --group-by-dir
indexer export markdown --output README.md --sort-by complexity
# Export options
--pretty # Pretty print JSON
--statistics # Include statistics
--group-by-dir # Group files by directory (GraphViz)
--sort-by <field> # Sort by: name|type|size|complexityUtility Commands
# Project health
indexer health
indexer health --check circular # Check for circular dependencies
indexer health --check unused # Find unused exports
indexer health --check complexity # Analyze complexity
# Statistics
indexer stats # Show project statistics
indexer stats --json # JSON output
# Maintenance
indexer validate # Validate index integrity
indexer migrate # Migrate to latest formatHook Management
# Manage AI assistant hooks
indexer hook --install # Install Claude Code hooks
indexer hook --status # Show hook status
indexer hook --uninstall # Remove hooksExport Formats
JSON Export
{
"version": "2.0.0",
"timestamp": "2025-08-21T07:45:32.000Z",
"projectRoot": "/path/to/project",
"files": {
"src/app.js": {
"functions": [{"name": "main", "async": true}],
"imports": [{"source": "./utils", "specifiers": [...]}],
"language": "JavaScript"
}
},
"statistics": {
"totalFiles": 150,
"totalFunctions": 423,
"languages": {"JavaScript": 45, "TypeScript": 105}
}
}GraphViz Export
Generates .dot files for dependency visualization:
# Generate dependency graph
indexer export graphviz --output deps.dot
# Create PNG image (requires Graphviz)
dot -Tpng deps.dot -o deps.pngMarkdown Export
Creates comprehensive documentation with:
- Project statistics
- File index with metrics
- Dependency analysis
- Language breakdown
- Function and class listings
Monorepo Support
The indexer automatically detects monorepo structures and provides:
- Service detection - Identifies frontend, backend, skills, etc.
- Cross-service dependencies - Tracks dependencies between services
- Service-specific configuration - Different indexing depth per service
- Health monitoring - Circular dependency detection across services
Detected Service Patterns
packages/ # Lerna/Yarn workspaces
├── frontend/ # React/Vue/Angular apps
├── backend/ # API services
├── shared/ # Common utilities
└── docs/ # Documentation
apps/ # Nx workspaces
├── web/
├── mobile/
└── api/
services/ # Microservices
├── auth-service/
├── user-service/
└── payment-service/AI Assistant Integration
Claude Code Integration
The indexer integrates seamlessly with Claude Code via hooks:
Install hooks (one-time setup):
indexer hook --installAutomatic indexing - Updates index when Claude accesses files
Context injection - Provides relevant code context without consuming tokens
Zero overhead - Hooks run outside Claude's context window
GitHub Actions
name: Code Index
on: [push, pull_request]
jobs:
index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: yarn global add @cloneglobal/indexer
- run: indexer scan --quiet
- run: indexer export markdown --output code-index.md
- uses: actions/upload-artifact@v4
with:
name: code-index
path: code-index.mdPerformance
Benchmarks
| Project Size | Files | Time | Memory | Example | |--------------|-------|------|--------|---------| | Small | 100 | 2s | 50MB | Single service | | Medium | 1,000 | 15s | 100MB | Standard app | | Large | 1,400 | 13s | 85MB | Clone Global | | Enterprise | 10,000| 90s | 300MB | Major platform |
Optimization Tips
- Use parallel processing:
--parallel 8for faster scanning - Enable caching: Speeds up incremental updates by 10x
- Exclude large files: Set
maxFileSizeto skip binaries - Use ignore patterns: Exclude test files and generated code
API Reference
Programmatic Usage
import { Indexer, ConfigLoader, QueryEngine } from '@cloneglobal/indexer';
// Initialize
const config = new ConfigLoader('.indexer.yml');
const indexer = new Indexer(config);
// Build index
const index = await indexer.buildIndex(process.cwd(), {
mode: 'full',
parallel: true
});
// Query index
const queryEngine = new QueryEngine(index);
const results = queryEngine.findByName('handleClick');
// Export
import { MarkdownExporter } from '@cloneglobal/indexer';
const exporter = new MarkdownExporter();
await exporter.export(index, 'output.md');Plugin Development
import { Parser, ParserResult } from '@cloneglobal/indexer';
export class RustParser implements Parser {
language = 'Rust';
extensions = ['.rs'];
parse(content: string, filePath: string): ParserResult {
// Custom parsing logic
return {
imports: [],
exports: [],
functions: [],
classes: [],
constants: [],
dependencies: [],
errors: []
};
}
}Clone Global Integration
Repository Structure
clone-global/
├── frontend/ # Next.js application
├── backend/ # Go microservices
├── skills/ # Python skills
├── data-ops/ # ETL pipeline
├── marketing/ # Astro.js website
├── indexer/ # This project
├── .indexerconfig.json
└── PROJECT_INDEX.jsonGit Workflow
Cloning the Repository
# Clone with all submodules
git clone --recursive https://github.com/clone-global/clone-global.git
# Or if already cloned
git submodule update --init --recursiveBranch Strategy
# Create feature branch from develop
git checkout develop
git pull origin develop
git checkout -b feature/your-feature-name
# For indexer-specific work
git checkout -b indexer/your-feature-nameCommit Conventions
# Use conventional commits
git commit -m "feat(indexer): add support for Astro language"
git commit -m "fix(parser): handle edge case in TypeScript generics"
git commit -m "docs(readme): update installation instructions"AI-Powered Analysis Quick Start
Step 1: Configure Claude SDK
# Set your API key
export ANTHROPIC_API_KEY="your-api-key"
# Optional: Configure MCP tools
export SLACK_BOT_TOKEN="xoxb-..."
export LINEAR_API_KEY="lin_api_..."
export GITHUB_TOKEN="ghp_..."Step 2: Start the API Server
# Start with all features enabled
indexer api --port 4000
# Or use yarn script
yarn apiStep 2: Run AI Analysis
# Analyze entire codebase with AI
curl -X POST http://localhost:4000/api/ai/analyze \
-H "Content-Type: application/json" \
-d '{"useHybrid": true}'
# Analyze Python skills repository
curl -X POST http://localhost:4000/api/ai/analyze-skills \
-H "Content-Type: application/json" \
-d '{"path": "/path/to/skills"}'
# Predict bugs
curl -X POST http://localhost:4000/api/ai/predict-bugs \
-H "Content-Type: application/json"
# Generate tests
curl -X POST http://localhost:4000/api/ai/generate-tests \
-H "Content-Type: application/json" \
-d '{"function": "handleLogin", "file": "auth.ts"}'Step 3: View Results
// Example AI analysis output
{
"bugPredictions": [
{
"file": "auth.service.ts",
"bugs": [
{
"type": "null_reference",
"line": 45,
"severity": "high",
"description": "Potential null reference when user is undefined",
"fix": "Add null check before accessing user.id"
}
],
"probability": 0.82
}
],
"securityIssues": [
{
"type": "sql_injection",
"cwe": "CWE-89",
"severity": "critical",
"file": "database.service.ts",
"mitigation": "Use parameterized queries"
}
],
"testSuggestions": [
{
"function": "handleLogin",
"coverage": 85,
"testCases": ["null user", "invalid password", "expired token"]
}
]
}Advanced Features (v2.2)
Real-time Streaming Analysis
Get live updates as the AI analyzes your code:
// Connect to SSE endpoint for streaming
const eventSource = new EventSource('http://localhost:4000/api/stream/analyze');
eventSource.addEventListener('progress', (event) => {
const data = JSON.parse(event.data);
console.log(`Analysis ${data.progress}% complete`);
});
eventSource.addEventListener('security-update', (event) => {
const data = JSON.parse(event.data);
console.log('Security issue found:', data);
});
eventSource.addEventListener('complete', (event) => {
const summary = JSON.parse(event.data).summary;
console.log('Analysis complete:', summary);
});Specialized Agents
Different agents for different analysis needs:
# Security-focused analysis with OWASP compliance
curl http://localhost:4000/api/agents/security
# Performance optimization opportunities
curl http://localhost:4000/api/agents/performance
# Architecture review and patterns
curl http://localhost:4000/api/agents/architecture
# Test coverage and quality
curl http://localhost:4000/api/agents/testingMCP Tool Integration
Leverage 16+ external tools via Model Context Protocol:
| Tool | Purpose | Configuration |
|------|---------|---------------|
| Semgrep | Static analysis | Auto-configured |
| Snyk | Dependency scanning | SNYK_TOKEN |
| GitHub | Security alerts | GITHUB_TOKEN |
| Datadog | Performance metrics | DD_API_KEY |
| Linear | Issue tracking | LINEAR_API_KEY |
| Slack | Bug monitoring | SLACK_BOT_TOKEN |
Session Management
Continue conversations with the AI for deeper analysis:
// Start analysis and get session ID
const response = await fetch('/api/ai/analyze');
const { sessionId } = await response.json();
// Ask follow-up questions
const followUp = await fetch('/api/sessions/continue', {
method: 'POST',
body: JSON.stringify({
sessionId,
prompt: 'Can you explain the security issue in more detail?'
})
});What Does It Actually Do?
1. Scans Your Codebase
The indexer reads every source file and uses language-specific parsers to extract:
{
"files": {
"src/auth/service.ts": {
"functions": ["login", "logout", "validateToken"],
"classes": ["AuthService"],
"imports": ["bcrypt", "./user.model"],
"exports": ["AuthService", "login"],
"complexity": 8
}
}
}2. Builds Dependency Graph
Maps how files connect to each other:
{
"dependencyGraph": {
"src/auth/service.ts": ["src/models/user.ts", "src/utils/crypto.ts"],
"src/api/auth.controller.ts": ["src/auth/service.ts"]
}
}3. Provides Real-time Updates
Watch mode keeps the index current:
indexer watch
# ✓ Watching for changes...
# ✓ Updated: src/auth/service.ts (87ms)
# ✓ Index refreshed4. Integrates with Your Workflow
For Development (Claude Code SDK):
# Claude can read PROJECT_INDEX.json to understand your entire codebase
indexer hook --install # Auto-updates before/after Claude edits filesFor Bug Monitoring:
# Slack bot watches #bugs channel
indexer slack --start
# When someone posts "Login throws TypeError on null email"
# Bot creates Linear ticket with affected files and code contextFor Team Insights:
# Datadog dashboard shows:
# - Indexing performance (files/second)
# - Memory usage patterns
# - Error rates by language
# - Code complexity trendsEnterprise Use Cases
1. AI-Assisted Development
// AI assistants can query the index to understand code context
const index = JSON.parse(fs.readFileSync('PROJECT_INDEX.json'));
// "Where is user authentication handled?"
// AI searches index.files for auth-related functions2. Automated Bug Triage
# When bug reported in Slack:
"TypeError in checkout process when cart is empty"
# Bot automatically:
1. Searches index for "checkout" functions
2. Finds: checkout.service.ts, cart.validator.ts
3. Creates Linear ticket with:
- Stack trace
- Affected files
- Related functions
- Severity: High3. Monorepo Intelligence
# Understand service dependencies
indexer export mermaid --output architecture.html
# Shows:
# frontend -> backend (23 API calls)
# backend -> skills (5 webhooks)
# skills -> shared-utils (15 imports)4. Code Quality Monitoring
indexer health
# Statistics:
# Total Files: 1,408
# Avg Complexity: 4.2 (Good)
#
# High Complexity Files:
# payment.service.ts: 47
# checkout.flow.ts: 38
#
# Circular Dependencies: 2 detectedComplete Setup Guide
Basic Setup (Local Development)
- Clone and Build
git clone https://github.com/clone-global/indexer.git
cd indexer
yarn install
yarn build
yarn link # Makes 'indexer' command available globally- Configure Your Project
Create
.indexer.ymlin your project root:
version: 2
name: my-project
include:
- "src/**/*.{js,ts,py,go}"
ignore:
- "**/node_modules/**"
- "**/test/**"
performance:
parallel: true
workers: 4
maxFileSize: 2MB- Run Initial Index
cd /your/project
indexer init
# Creates: PROJECT_INDEX.json (main index)
# .indexer-output/ (visualizations)Production Setup (With Monitoring)
- Setup Datadog Agent
# Install agent
brew install datadog-agent # macOS
# Configure
export DD_API_KEY="your-key"
export DD_SITE="datadoghq.com"- Setup Slack Bot
# Create Slack app at https://api.slack.com/apps
# Add OAuth scopes: channels:history, chat:write, users:read
# Get tokens:
export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_SIGNING_SECRET="..."
export LINEAR_API_KEY="lin_api_..."- Configure Integrations
.indexer.yml:
integrations:
datadog:
enabled: true
host: localhost
port: 8125
slack:
enabled: true
bugChannel: bugs
linear:
apiKey: ${LINEAR_API_KEY}- Start Services
# Terminal 1: File watcher
indexer watch
# Terminal 2: Slack bot
indexer slack --start
# Metrics flow to Datadog automaticallyWhat's in the Index?
PROJECT_INDEX.json Structure
{
"version": "2.0.0",
"timestamp": "2024-01-15T10:30:00Z",
"projectRoot": "/Users/dev/myproject",
"statistics": {
"totalFiles": 523,
"totalFunctions": 1847,
"totalClasses": 234,
"languages": {
"TypeScript": 312,
"Python": 89,
"Go": 122
}
},
"files": {
"src/auth/login.service.ts": {
"language": "TypeScript",
"size": 4521,
"complexity": 8,
"lastModified": "2024-01-15T09:15:00Z",
"functions": [
{
"name": "validateCredentials",
"startLine": 23,
"endLine": 45,
"params": ["email", "password"],
"isAsync": true
}
],
"classes": [
{
"name": "LoginService",
"methods": ["login", "logout", "refresh"],
"startLine": 10
}
],
"imports": [
{ "source": "bcrypt", "specifiers": ["hash", "compare"] },
{ "source": "./user.model", "specifiers": ["User"] }
],
"exports": ["LoginService", "validateCredentials"]
}
},
"dependencyGraph": {
"src/auth/login.service.ts": [
"src/models/user.model.ts",
"src/utils/crypto.ts"
]
},
"monorepo": {
"services": {
"frontend": { "files": 234, "dependencies": ["shared"] },
"backend": { "files": 289, "dependencies": ["shared", "database"] }
}
}
}Query Examples
CLI Queries
# Find all React hooks
indexer query "use[A-Z]" --type function
# Find all GraphQL resolvers
indexer query "resolver" --type class
# Find circular dependencies
indexer query "" --circular
# Find unused exports
indexer query "" --unusedProgrammatic Queries
import { Indexer, QueryEngine } from '@cloneglobal/indexer';
const index = JSON.parse(fs.readFileSync('PROJECT_INDEX.json'));
const query = new QueryEngine(index);
// Find all authentication functions
const authFunctions = query.findFunctions(func =>
func.name.includes('auth') ||
func.name.includes('login')
);
// Find files importing a specific module
const redisUsers = query.findImports('redis');
// Get dependency chain
const deps = query.getDependencyChain('src/api/user.controller.ts');Visualizations
Dependency Graphs
# Interactive HTML diagram
indexer export mermaid --render-html --output deps.html
# GraphViz for large codebases
indexer export graphviz --output deps.dot
dot -Tsvg deps.dot -o deps.svg
# ASCII for documentation
indexer export ascii --ascii-style tree --output structure.txtArchitecture Overview
graph LR
Frontend --> API
API --> AuthService
API --> PaymentService
AuthService --> Database
PaymentService --> Stripe
PaymentService --> DatabaseTroubleshooting
Index Not Updating
# Check watcher status
ps aux | grep indexer
# Manually rebuild
indexer scan --force
# Check for errors
cat .indexer-cache/errors.logMemory Issues
# Reduce memory usage in .indexer.yml
cache:
maxSize: 104857600 # 100MB instead of 500MB
performance:
workers: 2 # Reduce parallel workers
maxFileSize: 1MB # Skip large filesSlack Bot Issues
# Test configuration
indexer slack --status
# Check tokens
echo $SLACK_BOT_TOKEN
echo $LINEAR_API_KEY
# View logs
indexer slack --start --debugPerformance Metrics
| Codebase Size | Files | Languages | Index Time | Memory | Index Size | |--------------|-------|-----------|------------|--------|------------| | Small Project | 100 | 2 | 2 sec | 50 MB | 500 KB | | Medium Project | 1,000 | 4 | 15 sec | 100 MB | 5 MB | | Large Monorepo | 5,000 | 7 | 45 sec | 200 MB | 25 MB | | Enterprise | 10,000+ | 9+ | 90 sec | 300 MB | 50 MB |
Real Example: Clone Global
- 1,408 files across 5 services
- 7 languages (TS, Go, Python, GraphQL, SQL, YAML, Astro)
- Indexed in 13.2 seconds
- 6 MB index file
- <100ms incremental updates
Language Support
| Language | Parser Type | Quality | Features | |----------|------------|---------|----------| | JavaScript/JSX | AST (Babel) | ⭐⭐⭐⭐⭐ | Full ES2024, JSX, imports/exports | | TypeScript/TSX | AST (TS Compiler) | ⭐⭐⭐⭐⭐ | Types, interfaces, generics, decorators | | Python | AST (tree-sitter) | ⭐⭐⭐⭐⭐ | Full Python 3 support: functions, classes, decorators, type hints, async/await, generators, metaclasses | | Go | Regex | ⭐⭐⭐⭐ | Functions, types, interfaces, imports | | SQL | Regex | ⭐⭐⭐ | Tables, views, procedures | | GraphQL | AST (GraphQL.js) | ⭐⭐⭐⭐⭐ | Types, queries, mutations, subscriptions | | YAML | AST (js-yaml) | ⭐⭐⭐⭐ | Full structure, anchors, references | | Astro | Regex | ⭐⭐⭐ | Components, frontmatter, scripts |
Advanced Configuration
Memory Management
cache:
maxSize: 524288000 # 500MB limit
maxAge: 3600000 # 1 hour TTL
updateAgeOnGet: true # Refresh on access
performance:
workers: 4 # Parallel processing
maxFileSize: 2097152 # 2MB file limitTroubleshooting
Common Issues
Index not updating
# Clear cache and rebuild
rm -rf .indexer-cache
indexer scan --forceClaude Code Hook Issues
-i flag not working
# Check if hooks are installed
cat ~/.claude/settings.json | grep hook
# Reinstall hooks
curl -fsSL https://raw.githubusercontent.com/clone-global/indexer/main/install.sh | bash
# Verify hook execution
ls -la ~/.claude/hooks/@PROJECT_INDEX.json not found
# Create the index first
indexer init
# Verify it exists
ls -la PROJECT_INDEX.json
# Check path resolution
pwd # Should be in project rootStop hook not regenerating
# Check Stop hook is installed
ls ~/.claude/hooks/stop-hook.js
# Enable auto-update in your index
echo '{"_meta": {"auto_update_enabled": true}}' >> PROJECT_INDEX.json
# Test manually
node ~/.claude/hooks/stop-hook.jsClipboard export fails
# Linux: Install clipboard tool
sudo apt-get install xclip # or xsel
# macOS: Should work out of the box
# Windows: Uses PowerShell (may need admin rights)
# Fallback: Look for .clipboard_content.txt
cat .clipboard_content.txtMissing files
# Check configuration
indexer validate
# Verify include patterns
cat .indexer.ymlPerformance issues
# Reduce workers on low-memory systems
indexer scan --parallel 2
# Enable incremental mode
indexer watch --incrementalHook issues
# Check hook status
indexer hook --status
# Reinstall hooks
indexer hook --uninstall && indexer hook --installDevelopment Setup
git clone https://github.com/clone-global/indexer.git
cd indexer
yarn install
yarn build
# Run tests
yarn test
yarn test:e2e
# Link for development
yarn linkRunning Tests
yarn test # Unit tests
yarn test:e2e # End-to-end tests
yarn test:coverage # Coverage report
yarn lint # Code lintingTest File Conventions
When testing the indexer's file watcher or trigger functionality, please follow these conventions:
Naming Patterns
Test files created for indexer development should use these patterns:
*.test-indexer.*- For indexer trigger testing*.test-watcher.*- For file watcher testingtest-indexer-*- Prefixed test filestest-watcher-*- Watcher test files
Important Notes
- DO NOT commit test artifact files - These patterns are gitignored
- DO NOT create test files in target repositories - Use proper test fixtures in the test directory
- DO clean up - Remove any test files created during development
Example of what NOT to do:
# Bad: Creating test files in target repos
echo "test" > ../frontend/test-indexer-trigger.js
echo "test" > ../skills/watcher-test.py
# Good: Use test fixtures
yarn test:watcher # Uses proper test harnessThese test artifacts are automatically excluded via .gitignore to prevent repository pollution.
Monorepo Detection
monorepo:
enabled: true
patterns:
- packages/*
- services/*
- apps/*
serviceMap:
frontend: frontend/**
backend: backend/**
shared: packages/shared/**Export Customization
export:
outputDirectory: .indexer-output
formats:
json:
pretty: true
includeStatistics: true
markdown:
includeComplexity: true
sortBy: complexity
mermaid:
maxNodes: 50
theme: darkIntegration Examples
GitHub Actions
name: Update Code Index
on: [push]
jobs:
index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- run: yarn global add @cloneglobal/indexer
- run: indexer scan
- uses: actions/upload-artifact@v2
with:
name: code-index
path: PROJECT_INDEX.jsonPre-commit Hook
#!/bin/sh
# .git/hooks/pre-commit
indexer scan --incremental --quiet
if [ $? -ne 0 ]; then
echo "Index update failed"
exit 1
fiDocker Integration
FROM node:18
RUN yarn global add @cloneglobal/indexer
WORKDIR /app
COPY . .
RUN indexer init
CMD ["indexer", "watch"]API Reference
Core Classes
class Indexer {
buildIndex(root: string, options?: IndexerOptions): Promise<ProjectIndex>
updateFile(path: string): Promise<FileIndex>
clearCache(): void
}
class QueryEngine {
query(options: QueryOptions): QueryResult[]
findFunctions(predicate: (f: Function) => boolean): Function[]
findImports(module: string): FileInfo[]
getDependencyChain(file: string): string[]
}
class DatadogMetrics {
trackIndexing(metrics: IndexMetrics): void
trackError(error: Error, context: any): void
sendEvent(title: string, text: string): void
}
class SlackBugMonitor {
start(): Promise<void>
stop(): Promise<void>
processBugReport(report: BugReport): Promise<LinearTicket>
}Architecture
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐
│ │ │ │ │ │
│ File System │────▶│ Indexer │────▶│ Output │
│ │ │ │ │ │
└─────────────────┘ └──────────────┘ └─────────────┘
│ │ │
│ │ │
File Events Parse & Analyze INDEX.json
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐
│ File Watcher │ │ Parsers │ │ Exporters │
│ (Chokidar) │ │ (9 langs) │ │ (5 formats)│
└─────────────────┘ └──────────────┘ └─────────────┘
│
▼
┌──────────────┐
│ Integrations │
├──────────────┤
│ Datadog │
│ Slack Bot │
│ Linear │
│ Claude │
└──────────────┘Roadmap
Current Version (2.0.1) - Production Deployed
- Multi-language parsing (8 languages) - TypeScript (552), Go (592), Python (120), GraphQL (103) with React hook detection
- Organized output structure - All files stored in
.indexer-output/with structured subdirectories - Enhanced GraphQL parser - Detects React hooks from GraphQL operations
- YAML configuration - Full .indexer.yml support with proper project root placement
- Monorepo support - Clone Global 5-service indexing verified
- Claude Code integration - Pre/post hooks active and monitored
- Real-time file watching - Background updates with 300ms debounce
- Advanced query system - Pattern matching across 1,404+ files
- Health monitoring - Live project metrics and analysis
Next Release (2.1.0)
- [ ] Plugin system for custom parsers
- [ ] Regular expression queries
- [ ] Environment variable support
- [ ] VS Code extension
Future Releases
- [ ] Language Server Protocol support
- [ ] Web dashboard
- [ ] Cloud sync service
- [ ] AI-powered insights
License
MIT © Clone Global
