magector
v1.4.2
Published
Semantic code search for Magento 2 — index, search, MCP server
Maintainers
Readme
Magector
Semantic code search engine for Magento 2 and Adobe Commerce, powered by ONNX embeddings and HNSW vector search.
Magector indexes an entire Magento 2 or Adobe Commerce codebase and lets you search it with natural language. Instead of grepping for keywords, ask questions like "how are checkout totals calculated?" or "where is the product price determined?" and get ranked, relevant results in under 50ms.
Why Magector
Magento 2 and Adobe Commerce have 18,000+ source files across hundreds of modules. Finding the right code is slow:
| Approach | Finds semantic matches | Understands Magento patterns | Speed (18K files) |
|----------|:---------------------:|:---------------------------:|:-----------------:|
| grep / ripgrep | No | No | 100-500ms |
| IDE search | No | No | 200-1000ms |
| GitHub search | Partial | No | 500-2000ms |
| Magector | Yes | Yes | 10-45ms |
Magector understands that a query about "payment capture" should return Sales/Model/Order/Payment/Operations/CaptureOperation.php, not just files containing the word "capture".
Magector vs Built-in AI Search
Claude Code and Cursor both have built-in code search -- but they rely on keyword matching (grep/ripgrep) and file-tree heuristics. On a Magento 2 / Adobe Commerce codebase with 18,000+ files, that approach breaks down fast.
| Capability | Claude Code / Cursor (built-in) | Magector |
|---|---|---|
| Search method | Keyword grep / ripgrep | Semantic vector search (ONNX embeddings) |
| Understands intent | No -- literal string matching only | Yes -- "payment capture" finds CaptureOperation.php |
| Magento pattern awareness | None -- treats all PHP the same | Detects controllers, plugins, observers, blocks, resolvers, cron, and 20+ patterns |
| Query speed (36K vectors) | 200-1000ms per grep pass; multiple rounds needed | 10-45ms single pass |
| Context window cost | Reads many wrong files, burns tokens | Returns structured JSON with ranked results, methods, and snippets |
| Works offline | Yes | Yes -- local ONNX model, no API calls |
| Setup | Built-in | npx magector init (one command) |
What this means in practice
Without Magector, asking Claude Code or Cursor "how are checkout totals calculated?" triggers multiple grep searches, reads dozens of files, and still may miss the right ones. With Magector, the AI calls magento_search("checkout totals calculation") and gets the exact files ranked by relevance in one step -- saving tokens and time.
Magector doesn't replace your AI tool -- it gives it a better search engine.
Features
- Semantic search -- find code by meaning, not exact keywords
- 99.2% accuracy -- validated with 101 E2E test queries across 16 tool categories, plus 557 Rust-level test cases
- Hybrid search -- combines semantic vector similarity with keyword re-ranking for best-of-both-worlds results
- Structured JSON output -- results include file path, class name, methods list, role badges, and content snippets for minimal round-trips
- Persistent serve mode -- keeps ONNX model and HNSW index resident in memory, eliminating cold-start latency
- Incremental re-indexing -- background file watcher detects changes and updates the index without restart (tombstone + compact strategy)
- ONNX embeddings -- native 384-dim transformer embeddings via ONNX Runtime
- 36K+ vectors -- indexes the complete Magento 2 / Adobe Commerce codebase including framework internals
- Magento-aware -- understands controllers, plugins, observers, blocks, resolvers, repositories, and 20+ Magento patterns
- Adobe Commerce compatible -- works with both Magento Open Source and Adobe Commerce (B2B, Staging, and all Commerce-specific modules)
- AST-powered -- tree-sitter parsing for PHP and JavaScript extracts classes, methods, namespaces, and inheritance
- Cross-tool discovery -- tool descriptions include keywords and "See also" references so AI clients find the right tool on the first try
- Diff analysis -- risk scoring and change classification for git commits and staged changes
- Complexity analysis -- cyclomatic complexity, function count, and hotspot detection across modules
- Fast -- 10-45ms queries via persistent serve process, batched ONNX embedding with adaptive thread scaling
- MCP server -- 20 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
- Clean architecture -- Rust core handles all indexing/search, Node.js MCP server delegates to it
Architecture
flowchart TD
subgraph rust ["Rust Core"]
A["AST Parser · PHP + JS"]
B["Pattern Detection · 20+"]
C["ONNX Embedder · 384d"]
D["HNSW + Reranking"]
A --> B --> C --> D
end
subgraph node ["Node.js Layer"]
E["MCP Server · 20 tools"]
F["Persistent Serve"]
G["CLI · init/index/search"]
E --> F
G --> F
end
node -->|stdin/stdout JSON| rust
style rust fill:#f4a460,color:#000
style node fill:#68b684,color:#000Indexing Pipeline
flowchart TD
A[Source File] --> B[AST Parser]
B --> C[Pattern Detection]
C --> D[Text Enrichment]
D --> E[ONNX Embedding]
E --> F[(HNSW Index)]
A --> G[Metadata]
G --> FSearch Pipeline
flowchart TD
Q[Query] --> E1[Synonym Enrichment]
E1 --> E2[ONNX Embedding]
E2 --> H[HNSW Search]
H --> R[Hybrid Reranking]
R --> J[Structured JSON]Components
| Component | Technology | Purpose |
|-----------|-----------|---------|
| Embeddings | ort (ONNX Runtime) | all-MiniLM-L6-v2, 384 dimensions |
| Vector search | hnsw_rs + hybrid reranking | Approximate nearest neighbor + keyword boosting |
| PHP parsing | tree-sitter-php | Class, method, namespace extraction |
| JS parsing | tree-sitter-javascript | AMD/ES6 module detection |
| Pattern detection | Custom Rust | 20+ Magento-specific patterns |
| CLI | clap | Command-line interface (index, search, serve, validate) |
| MCP server | @modelcontextprotocol/sdk | AI tool integration with structured JSON output |
Quick Start
Prerequisites
1. Initialize in Your Project
cd /path/to/your/magento2 # or Adobe Commerce project
npx magector initThis single command handles the entire setup:
flowchart TD
A["npx magector init"] --> B[Verify Project]
B --> C[Download Model]
C --> D[Index Codebase]
D --> E[Detect IDE]
E --> F[Write Config]
F --> G[Update .gitignore]2. Search
npx magector search "product price calculation"
npx magector search "checkout totals collector" -l 203. Re-index After Changes
npx magector index4. IDE Setup Only (Skip Indexing)
npx magector setupCLI Reference
Rust Core CLI
magector-core <COMMAND>
Commands:
index Index a Magento codebase
search Search the index semantically
serve Start persistent server mode (stdin/stdout JSON protocol)
validate Run validation suite (downloads Magento if needed)
download Download Magento 2 Open Source
stats Show index statistics
embed Generate embedding for textindex
magector-core index [OPTIONS]
Options:
-m, --magento-root <PATH> Path to Magento root directory
-d, --database <PATH> Index database path [default: ./magector.db]
-c, --model-cache <PATH> Model cache directory [default: ./models]
-v, --verbose Enable verbose outputsearch
magector-core search <QUERY> [OPTIONS]
Options:
-d, --database <PATH> Index database path [default: ./magector.db]
-l, --limit <N> Number of results [default: 10]
-f, --format <FORMAT> Output format: text, json [default: text]serve
magector-core serve [OPTIONS]
Options:
-d, --database <PATH> Index database path [default: ./magector.db]
-c, --model-cache <PATH> Model cache directory [default: ./models]
-m, --magento-root <PATH> Magento root (enables file watcher)
--watch-interval <SECS> File watcher poll interval [default: 60]Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.
When --magento-root is provided, a background file watcher polls for changed files every --watch-interval seconds and incrementally re-indexes them without restart. Modified and deleted files are soft-deleted (tombstoned) in the HNSW index; new vectors are appended. When tombstoned entries exceed 20% of total vectors, the index is automatically compacted by rebuilding the HNSW graph.
Protocol (one JSON object per line):
// Request:
{"command":"search","query":"product price","limit":10}
// Response:
{"ok":true,"data":[{"id":123,"score":0.85,"metadata":{...}}]}
// Stats request:
{"command":"stats"}
// Watcher status:
{"command":"watcher_status"}
// Response:
{"ok":true,"data":{"running":true,"tracked_files":18234,"last_scan_changes":3,"interval_secs":60}}Node.js CLI
npx magector init [path] # Full setup: index + IDE config
npx magector index [path] # Index (or re-index) Magento codebase
npx magector search <query> # Search indexed code
npx magector stats # Show indexer statistics
npx magector setup [path] # IDE setup only (no indexing)
npx magector mcp # Start MCP server
npx magector help # Show helpEnvironment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| MAGENTO_ROOT | Path to Magento installation | Current directory |
| MAGECTOR_DB | Path to index database | ./magector.db |
| MAGECTOR_BIN | Path to magector-core binary | Auto-detected |
| MAGECTOR_MODELS | Path to ONNX model directory | ~/.magector/models/ |
MCP Server Tools
The MCP server exposes 20 tools for AI-assisted Magento 2 and Adobe Commerce development. All search tools return structured JSON with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.
Output Format
All search tools return structured JSON:
{
"results": [
{
"rank": 1,
"score": 0.892,
"path": "vendor/magento/module-catalog/Model/ProductRepository.php",
"module": "Magento_Catalog",
"className": "ProductRepository",
"namespace": "Magento\\Catalog\\Model",
"methods": ["save", "getById", "getList", "delete", "deleteById"],
"magentoType": "repository",
"fileType": "php",
"badges": ["repository"],
"snippet": "class ProductRepository implements ProductRepositoryInterface..."
}
],
"count": 1
}Key fields:
methods-- list of method names in the class (avoids needing to read the file)badges-- role indicators:plugin,controller,observer,repository,graphql-resolver,model,blocksnippet-- first 300 characters of indexed content for quick assessment
Search Tools
| Tool | Description |
|------|-------------|
| magento_search | Semantic search -- find any PHP class, method, XML config, template, or GraphQL schema by natural language |
| magento_find_class | Find PHP class, interface, abstract class, or trait by name |
| magento_find_method | Find method implementations across the codebase |
Magento-Specific Finders
| Tool | Description |
|------|-------------|
| magento_find_config | Find XML configuration (di.xml, events.xml, routes.xml, system.xml, webapi.xml, module.xml, layout) |
| magento_find_template | Find PHTML template files for frontend or admin rendering |
| magento_find_plugin | Find interceptor plugins (before/after/around methods) and di.xml declarations |
| magento_find_observer | Find event observers and events.xml declarations |
| magento_find_preference | Find DI preference overrides -- which class implements an interface |
| magento_find_controller | Find MVC controllers by frontend or admin route path |
| magento_find_block | Find Block classes for view rendering |
| magento_find_graphql | Find GraphQL schema definitions, resolvers, types, queries, and mutations |
| magento_find_api | Find REST/SOAP API endpoints in webapi.xml |
| magento_find_cron | Find cron job definitions in crontab.xml |
| magento_find_db_schema | Find database table definitions in db_schema.xml (declarative schema) |
Flow Tracing
| Tool | Description |
|------|-------------|
| magento_trace_flow | Trace execution flow from an entry point (route, API, GraphQL, event, cron) -- maps controller → plugins → observers → templates in one call |
Auto-detects entry type from pattern (/V1/... → API, snake_case → event, camelCase → GraphQL, path/segments → route), or override with entryType. Use depth: "shallow" (entry + config + plugins) or depth: "deep" (adds observers, layout, templates, DI preferences).
Analysis Tools
| Tool | Description |
|------|-------------|
| magento_analyze_diff | Analyze git diffs for risk scoring and change classification |
| magento_complexity | Analyze cyclomatic complexity, function count, and line count |
Utility Tools
| Tool | Description |
|------|-------------|
| magento_module_structure | Show complete module structure -- controllers, models, blocks, plugins, observers, configs |
| magento_index | Trigger re-indexing of the codebase |
| magento_stats | View index statistics |
Tool Cross-References
Each tool description includes "See also" hints to help AI clients chain tools effectively:
graph TD
cls["find_class"] --> plg["find_plugin"]
cls --> prf["find_preference"]
cls --> mtd["find_method"]
cfg["find_config"] --> obs["find_observer"]
cfg --> prf
cfg --> api["find_api"]
plg --> cls
plg --> mtd
tpl["find_template"] --> blk["find_block"]
blk --> tpl
blk --> cfg
dbs["find_db_schema"] --> cls
gql["find_graphql"] --> cls
gql --> mtd
ctl["find_controller"] --> cfg
trc["trace_flow"] -.-> ctl
trc -.-> plg
trc -.-> obs
trc -.-> tpl
trc -.-> api
trc -.-> gql
style cls fill:#4a90d9,color:#fff
style mtd fill:#4a90d9,color:#fff
style cfg fill:#e8a838,color:#000
style plg fill:#d94a4a,color:#fff
style obs fill:#d94a4a,color:#fff
style prf fill:#e8a838,color:#000
style api fill:#e8a838,color:#000
style tpl fill:#68b684,color:#000
style blk fill:#68b684,color:#000
style dbs fill:#9b59b6,color:#fff
style gql fill:#9b59b6,color:#fff
style ctl fill:#4a90d9,color:#fff
style trc fill:#2ecc71,color:#000Query Examples
magento_search("how are checkout totals calculated")
magento_search("product price with tier pricing and catalog rules")
magento_find_class("ProductRepositoryInterface")
magento_find_method("getById")
magento_find_config("di.xml plugin for ProductRepository")
magento_find_plugin({ targetClass: "Topmenu" })
magento_find_observer("sales_order_place_after")
magento_find_preference("StoreManagerInterface")
magento_find_api("/V1/orders")
magento_find_controller("catalog/product/view")
magento_find_graphql("placeOrder")
magento_find_db_schema("sales_order")
magento_find_cron("indexer")
magento_find_block("cart totals")
magento_find_template("minicart")
magento_analyze_diff({ commitHash: "abc123" })
magento_complexity({ module: "Magento_Catalog", threshold: 10 })
magento_trace_flow({ entryPoint: "checkout/cart/add", depth: "deep" })
magento_trace_flow({ entryPoint: "/V1/products" })
magento_trace_flow({ entryPoint: "placeOrder", entryType: "graphql" })
magento_trace_flow({ entryPoint: "sales_order_place_after" })Supported Platforms
Pre-built binaries are provided for the following platforms:
| Platform | Architecture | Package |
|----------|-------------|---------|
| macOS | ARM64 (Apple Silicon) | @magector/cli-darwin-arm64 |
| Linux | x86_64 | @magector/cli-linux-x64 |
| Linux | ARM64 | @magector/cli-linux-arm64 |
| Windows | x86_64 | @magector/cli-win32-x64 |
Note: macOS Intel (x86_64) is not supported as a pre-built binary. Intel Mac users can build from source.
Validation
Magector is validated at two levels:
- E2E MCP accuracy tests -- 101 queries across 16 tool categories via stdio JSON-RPC
- Rust-level validation -- 557 test cases across 50+ categories against Magento 2.4.7
E2E Accuracy (MCP Tools)
---
config:
themeVariables:
pie1: "#4caf50"
pie2: "#f44336"
---
pie title Test Pass Rate (101 queries)
"Passed (101)" : 101
"Failed (0)" : 0| Metric | Value | |--------|-------| | Grade | A+ (99.2/100) | | Pass rate | 101/101 (100%) | | Precision | 98.7% | | MRR | 99.3% | | NDCG@10 | 98.7% | | Index size | 35,795 vectors | | Query time | 10-45ms |
Per-Tool Performance
| Tool | Pass | Precision | MRR | NDCG | |------|------|-----------|-----|------| | find_class | 100% | 100% | 100% | 100% | | find_method | 100% | 98% | 92% | 97% | | find_controller | 100% | 100% | 100% | 100% | | find_observer | 100% | 100% | 100% | 100% | | find_plugin | 100% | 100% | 100% | 100% | | find_preference | 100% | 100% | 100% | 100% | | find_api | 100% | 100% | 100% | 100% | | find_cron | 100% | 100% | 100% | 100% | | find_db_schema | 100% | 100% | 100% | 100% | | find_graphql | 100% | 100% | 100% | 100% | | find_block | 100% | 100% | 100% | 100% | | find_config | 100% | 100% | 100% | 100% | | find_template | 100% | 100% | 100% | 100% | | search | 100% | 100% | 100% | 100% | | module_structure | 100% | 100% | 100% | 100% |
Integration Tests
62 integration tests covering MCP protocol compliance, tool schemas, tool calls, analysis tools, and stdout JSON integrity.
Running Tests
# E2E accuracy tests (101 queries, requires indexed codebase)
npm run test:accuracy
npm run test:accuracy:verbose
# Integration tests (62 tests)
npm test
# Rust validation (557 test cases)
cd rust-core && cargo run --release -- validate -m ./magento2 --skip-indexProject Structure
magector/
├── src/ # Node.js source
│ ├── cli.js # CLI entry point (npx magector <command>)
│ ├── mcp-server.js # MCP server (20 tools, structured JSON output)
│ ├── binary.js # Platform binary resolver
│ ├── model.js # ONNX model resolver/downloader
│ ├── init.js # Full init command (index + IDE config)
│ ├── magento-patterns.js # Magento pattern detection (JS)
│ ├── templates/ # IDE rules templates
│ │ ├── cursorrules.js # .cursorrules content
│ │ └── claude-md.js # CLAUDE.md content
│ └── validation/ # JS validation suite
│ ├── validator.js
│ ├── benchmark.js
│ ├── test-queries.js
│ ├── test-data-generator.js
│ └── accuracy-calculator.js
├── tests/ # Automated tests
│ ├── mcp-server.test.js # Integration tests (62 tests)
│ ├── mcp-accuracy.test.js # E2E accuracy tests (101 queries)
│ └── results/ # Test result artifacts
│ └── accuracy-report.json
├── platforms/ # Platform-specific binary packages
│ ├── darwin-arm64/ # macOS ARM (Apple Silicon)
│ ├── linux-x64/ # Linux x64
│ ├── linux-arm64/ # Linux ARM64
│ └── win32-x64/ # Windows x64
├── rust-core/ # Rust high-performance core
│ ├── Cargo.toml
│ ├── src/
│ │ ├── main.rs # Rust CLI (index, search, serve, validate)
│ │ ├── lib.rs # Library exports
│ │ ├── indexer.rs # Core indexing with progress output
│ │ ├── embedder.rs # ONNX embedding (MiniLM-L6-v2)
│ │ ├── vectordb.rs # HNSW vector database + hybrid search + tombstones
│ │ ├── watcher.rs # File watcher for incremental re-indexing
│ │ ├── ast.rs # Tree-sitter AST (PHP + JS)
│ │ ├── magento.rs # Magento pattern detection (Rust)
│ │ └── validation.rs # 557 test cases, validation framework
│ └── models/ # ONNX model files (auto-downloaded)
│ ├── all-MiniLM-L6-v2.onnx
│ └── tokenizer.json
├── .github/
│ └── workflows/
│ └── release.yml # Cross-compile + publish CI
├── scripts/
│ └── setup.sh # Claude Code MCP setup script
├── config/
│ └── mcp-config.json # MCP server configuration template
├── package.json
├── .gitignore
├── LICENSE
└── README.mdHow It Works
1. Indexing
Magector scans every .php, .js, .xml, .phtml, and .graphqls file in a Magento 2 or Adobe Commerce codebase:
- AST parsing -- Tree-sitter extracts class names, namespaces, methods, inheritance, and interface implementations from PHP and JavaScript files
- Pattern detection -- Identifies Magento-specific patterns: controllers, models, repositories, plugins, observers, blocks, GraphQL resolvers, admin grids, cron jobs, and more
- Search text enrichment -- Combines AST metadata with Magento pattern keywords to create semantically rich text representations
- Embedding -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2
- Indexing -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search
2. Searching
- Query text is enriched with pattern synonyms (e.g., "controller" adds "action execute http request dispatch")
- The enriched query is embedded into the same 384-dimensional vector space
- HNSW finds the nearest neighbors by cosine similarity
- Hybrid reranking boosts results with keyword matches in path and search text
- Results are returned as structured JSON with file path, class name, methods, role badges, and content snippet
3. Persistent Serve Mode
The MCP server spawns a persistent Rust process (magector-core serve) that keeps the ONNX model and HNSW index loaded in memory. Queries are sent as JSON over stdin and responses returned via stdout -- eliminating the ~2.6s cold-start overhead of loading the model per query. Falls back to single-shot execFileSync if the serve process is unavailable.
flowchart TD
subgraph startup ["Startup (once)"]
S1[Load Model] --> S2[Load Index]
S2 --> S3[Ready Signal]
end
subgraph query ["Per Query (10-45ms)"]
Q1[stdin JSON] --> Q2[Embed]
Q2 --> Q3[HNSW Search]
Q3 --> Q4[Rerank]
Q4 --> Q5[stdout JSON]
end
startup --> query
subgraph fallback ["Fallback"]
F1[execFileSync ~2.6s]
end
style startup fill:#e8f4e8,color:#000
style query fill:#e8e8f4,color:#000
style fallback fill:#f4e8e8,color:#0004. File Watcher (Incremental Re-indexing)
When the serve process is started with --magento-root, a background thread polls the filesystem for changes every 60 seconds (configurable via --watch-interval). Changed files are incrementally re-indexed without restarting the server.
Since hnsw_rs does not support point deletion, Magector uses a tombstone strategy: old vectors for modified/deleted files are marked as tombstoned and filtered out of search results. New vectors are appended. When tombstoned entries exceed 20% of total vectors, the HNSW graph is automatically rebuilt (compacted) to reclaim memory and restore search performance.
flowchart TD
W1[Sleep 60s] --> W2[Scan Filesystem]
W2 --> W3{Changes?}
W3 -->|No| W1
W3 -->|Yes| W4[Tombstone Old Vectors]
W4 --> W5[Parse + Embed New Files]
W5 --> W6[Append to HNSW]
W6 --> W7{Tombstone > 20%?}
W7 -->|Yes| W8[Compact / Rebuild HNSW]
W7 -->|No| W9[Save to Disk]
W8 --> W9
W9 --> W1
style W4 fill:#f4e8e8,color:#000
style W5 fill:#e8f4e8,color:#000
style W8 fill:#e8e8f4,color:#0005. MCP Integration
The MCP server delegates all search/index operations to the Rust core binary. Analysis tools (diff, complexity) use ruvector JS modules directly.
sequenceDiagram
participant Dev
participant AI
participant MCP
participant Rust
participant HNSW
Dev->>AI: "checkout totals?"
AI->>MCP: magento_search(...)
MCP->>Rust: JSON query
Rust->>HNSW: embed + search
HNSW-->>Rust: candidates
Rust-->>MCP: JSON results
MCP-->>AI: paths, methods, badges
AI-->>Dev: TotalsCollector.phpMagento Patterns Detected
mindmap
root((Patterns))
PHP
Controller
Model
Repository
Block
Helper
ViewModel
Interception
Plugin
Observer
Preference
XML
di.xml
events.xml
webapi.xml
routes.xml
crontab.xml
db_schema.xml
Frontend
Template
JavaScript
GraphQLMagector understands these Magento 2 architectural patterns:
| Pattern | Detection Method | Example |
|---------|-----------------|---------|
| Controller | Path + execute() method | Controller/Adminhtml/Order/View.php |
| Model | Path + extends AbstractModel | Model/Product.php |
| Repository | Path + implements RepositoryInterface | Model/ProductRepository.php |
| Block | Path + extends AbstractBlock | Block/Product/View.php |
| Plugin | Path + before/after/around methods | Plugin/Product/SavePlugin.php |
| Observer | Path + implements ObserverInterface | Observer/ProductSaveObserver.php |
| GraphQL Resolver | Path + implements ResolverInterface | Model/Resolver/Products.php |
| Helper | Path under Helper/ | Helper/Data.php |
| Cron | Path under Cron/ | Cron/CleanExpiredQuotes.php |
| Console Command | Path + extends Command | Console/Command/IndexerReindex.php |
| Data Provider | Path + DataProvider | Ui/DataProvider/Product/Listing.php |
| ViewModel | Path + implements ArgumentInterface | ViewModel/Product/Breadcrumbs.php |
| Setup Patch | Path + Patch/Data or Patch/Schema | Setup/Patch/Data/AddAttribute.php |
| di.xml | Path matching | etc/di.xml, etc/frontend/di.xml |
| events.xml | Path matching | etc/events.xml |
| webapi.xml | Path matching | etc/webapi.xml |
| layout XML | Path under layout/ | view/frontend/layout/catalog_product_view.xml |
| Template | .phtml extension | view/frontend/templates/product/view.phtml |
| JavaScript | .js with AMD/ES6 detection | view/frontend/web/js/view/minicart.js |
| GraphQL Schema | .graphqls extension | etc/schema.graphqls |
Configuration
Cursor IDE Rules
Copy .cursorrules to your Magento project root for optimized AI-assisted development. The rules instruct the AI to:
- Use Magector MCP tools before reading files manually
- Write effective semantic queries
- Follow Magento development patterns
- Interpret search results correctly
Model Configuration
The ONNX model (all-MiniLM-L6-v2) is automatically downloaded on first run to ~/.magector/models/. To use a different location:
magector-core index -m /path/to/magento -c /custom/model/pathDevelopment
Building from Source
git clone https://github.com/krejcif/magector.git
cd magector
# Install Node.js dependencies
npm install
# Build the Rust core
cd rust-core
cargo build --release
cd ..
# The CLI will automatically find the dev binary at rust-core/target/release/magector-core
node src/cli.js helpBuilding
# Rust core
cd rust-core
cargo build --release
# Run unit tests
cargo test
# Run validation
cargo run --release -- validateTesting
# Integration tests (62 tests, requires indexed codebase)
npm test
# E2E accuracy tests (101 queries)
npm run test:accuracy
npm run test:accuracy:verbose
# Run without index (unit + schema tests only)
npm run test:no-index
# Rust unit tests
cd rust-core && cargo test
# Rust validation (557 test cases)
cd rust-core && cargo run --release -- validate -m ./magento2 --skip-indexAdding New Magento Patterns
- Add pattern detection in
rust-core/src/magento.rs - Add search text enrichment in
rust-core/src/indexer.rs - Add validation test cases in
rust-core/src/validation.rs - Add E2E accuracy test cases in
tests/mcp-accuracy.test.js - Rebuild and run validation to verify:
cargo build --release
./target/release/magector-core validate -m ./magento2 --skip-index
npm run test:accuracyAdding MCP Tools
- Define the tool schema in
src/mcp-server.js(ListToolsRequestSchema handler) - Include keyword-rich descriptions and cross-tool "See also" references
- Implement the handler in the CallToolRequestSchema handler
- Return structured JSON via
formatSearchResults() - Add E2E test cases in
tests/mcp-accuracy.test.js - Test with Claude Code or the MCP inspector
Technical Details
Embedding Model
- Model: all-MiniLM-L6-v2
- Dimensions: 384
- Pooling: Mean pooling with attention mask
- Normalization: L2 normalized
- Runtime: ONNX Runtime (via
ortcrate)
Vector Index
- Algorithm: HNSW (Hierarchical Navigable Small World)
- Library:
hnsw_rs - Parameters: M=32, max_layers=16, ef_construction=200
- Distance metric: Cosine similarity
- Hybrid search: Semantic nearest-neighbor + keyword reranking in path and search text
- Incremental updates: Tombstone soft-delete + periodic HNSW rebuild (compact)
- Persistence: Bincode V2 binary serialization (backward-compatible with V1)
Index Structure
Each indexed file produces a vector entry with metadata:
struct IndexMetadata {
path: String,
file_type: String, // php, xml, js, template, graphql
magento_type: String, // controller, model, block, plugin, ...
class_name: Option<String>,
namespace: Option<String>,
methods: Vec<String>, // extracted method names
search_text: String, // enriched searchable text
is_controller: bool,
is_plugin: bool,
is_observer: bool,
is_model: bool,
is_block: bool,
is_repository: bool,
is_resolver: bool,
// ... 20+ pattern flags
}Performance Characteristics
| Operation | Time | Notes | |-----------|------|-------| | Full index (36K vectors) | ~1 min | Parallel parsing + batched ONNX embedding | | Single query (warm) | 10-45ms | Persistent serve process, HNSW + rerank | | Single query (cold) | ~2.6s | Includes ONNX model + index load | | Embedding generation | ~2ms | ONNX Runtime with CoreML/CUDA | | Batch embedding (32) | ~30ms | Batched ONNX inference | | Model load | ~500ms | One-time at startup | | Index save/load | <1s | Bincode binary serialization |
Performance Optimizations
- Persistent serve mode -- Rust process keeps ONNX model + HNSW index in memory via stdin/stdout JSON protocol
- Query cache -- LRU cache (200 entries) avoids re-embedding identical queries
- Hybrid reranking -- combines semantic similarity with keyword matching for better precision
- Batched ONNX embedding -- 32 texts per inference call (vs. 1-at-a-time), 3-5x faster embedding
- Dynamic thread scaling -- ONNX intra-op threads scale to CPU core count
- Thread-local AST parsers -- each rayon thread gets its own tree-sitter parser (no mutex contention)
- Bincode persistence -- binary serialization replaces JSON (3-5x faster save/load, ~5x smaller files)
- Adaptive HNSW capacity -- pre-sized to actual vector count
- Parallel HNSW insert -- batch insert uses hnsw_rs parallel insertion on load and index
- Tuned ef_search -- optimized search parameters for 36K vector index (ef_search=50 for search, 64 for hybrid)
Roadmap
gantt
title Roadmap
dateFormat YYYY-MM
axisFormat %b
section Done
Hybrid search :done, 2025-01, 30d
Serve mode :done, 2025-02, 30d
JSON output :done, 2025-03, 15d
Cross-tool hints :done, 2025-03, 15d
E2E tests :done, 2025-03, 15d
Adobe Commerce :done, 2025-03, 15d
section Next
Method chunking :active, 2025-04, 30d
Intent detection :2025-05, 30d
Type filtering :2025-06, 30d
Incremental index :done, 2025-04, 30d
section Future
VSCode extension :2025-08, 60d
Web UI :2025-10, 60d- [x] Hybrid search (semantic + keyword re-ranking)
- [x] Persistent serve mode (eliminates cold-start latency)
- [x] Structured JSON output (methods, badges, snippets)
- [x] Cross-tool discovery hints for AI clients
- [x] E2E accuracy test suite (101 queries)
- [x] Adobe Commerce support (B2B, Staging, and all Commerce-specific modules)
- [ ] Method-level chunking (per-method vectors for direct method search)
- [ ] Query intent classification (auto-detect "give me XML" vs "give me PHP")
- [ ] Filtered search by file type at the vector level
- [x] Incremental indexing (background file watcher with tombstone + compact strategy)
- [ ] VSCode extension
- [ ] Web UI for browsing results
License
MIT License. See LICENSE for details.
Contributing
Contributions are welcome. Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Add tests for new functionality
- Run validation to ensure accuracy doesn't regress:
npm run test:accuracy - Submit a pull request
Built with Rust and Node.js for the Magento and Adobe Commerce community.
