@bvvvp009/semcode
v0.1.1
Published
Semantic code search CLI - find code by meaning, not just text patterns
Downloads
24
Maintainers
Readme
semcode - Semantic Code Search CLI
A semantic, grep-like search tool for code that understands natural language queries. Use grep for exact matches, semcode for semantic understanding.
About
semcode is a local-first semantic code search tool that helps you find code by meaning, not just text patterns. It intelligently routes queries:
- Simple/exact queries → Use
grep(fast, instant results) - Complex/semantic queries → Use
semcode(token savings, better relevance)
Key Features:
- 🔍 Semantic search with natural language queries
- 💰 82% token savings vs grep on complex queries
- ⚡ Fast indexing with local embeddings
- 🎯 Intelligent tool selection (grep vs semcode)
- 🔒 Works entirely offline, no cloud required
Quick Start
1. Install
# Option 1: Install globally (when published)
npm install -g @bvvvp009/semcode
# Option 2: Build from source
cd /path/to/semcode
npm install
npm run build2. Initialize in Your Project
cd /path/to/your-project
# One command: indexes files + sets up Cursor rules
semcode initThat's it! The init command:
- ✅ Indexes your workspace files
- ✅ Creates
.cursor/rules/semcode-search.mdcwith intelligent routing rules - ✅ Configures Cursor to automatically route queries to grep (simple) or semcode (complex)
3. Restart Cursor
Close and reopen Cursor to load the new rules. Cursor agents will now automatically validate queries and route them to the correct tool:
- Simple/exact queries →
grep(fast, low tokens) - Complex/semantic queries →
semcode(82% token savings, reduces cache reads)
This validation reduces cache reading in large projects by 82%+, significantly cutting costs.
Commands
semcode init
Initialize workspace: index files and setup Cursor rules.
semcode init # Index + setup rules
semcode init --clear # Clear existing index first
semcode init --skip-index # Only setup rules, skip indexingsemcode index or semcode index-local
Re-index your workspace (use when files change significantly).
semcode index # Re-index files (alias for index-local)
semcode index-local # Re-index files (local, no cloud)
semcode index --clear # Clear existing index first
semcode index-local --clear # Clear existing index first
semcode index --omit dist build # Exclude additional folders (node_modules excluded by default)Default exclusions:
node_modules/- Dependencies (excluded by default).git/- Git metadata (excluded by default)dist/,build/- Build outputs (excluded by default)- Files matching:
*.lock,*.bin,*.ipynb,*.pyc,*.pyo
Note: All subfolders are indexed recursively. Use --omit to exclude additional paths.
semcode watch-local
Automatically watch for file changes and update the index in real-time.
semcode watch-local # Watch and auto-index file changes
semcode watch-local --omit dist # Watch with additional exclusionsFeatures:
- ✅ Automatically indexes files when they change
- ✅ Watches all subfolders recursively (except excluded paths)
- ✅ Real-time updates - no manual re-indexing needed
- ✅ Excludes
node_modules/by default - ✅ Press Ctrl+C to stop watching
How It Works
After running semcode init, Cursor agents validate queries and automatically route to the correct tool:
Validation Process (Automatic)
Before each search, agents:
- Count words in query
- Check for question words (how, what, where, why, when, which)
- Identify query type (exact vs semantic)
- Route to appropriate tool
Simple Queries → grep
- Exact matches:
authenticateUser,const API_KEY - Short queries (< 10 words, no questions)
- Regex patterns
- Debugging exact strings
Example:
User: "find authenticateUser"
Agent: grep -r "authenticateUser" src/Complex Queries → semcode
- Natural language: "how is authentication implemented?"
- Long queries (≥ 10 words)
- Question words: how, what, where, why
- Architecture/pattern exploration
Example:
User: "how is user authentication and authorization implemented?"
Agent: Uses semcode search internally (configured via rules)Cache Reduction & Cost Savings
Without validation (wrong tool selection):
- Semantic queries with grep → 3,500+ tokens (multiple file reads, cache exhaustion)
- Cost: $1,054.75/month (500 sessions, 50 queries each)
With validation (correct tool selection):
- Semantic queries with semcode → 750 tokens (targeted results, minimal cache reads)
- Cost: $189.00/month (500 sessions, 50 queries each)
- Savings: $865.75/month (82% reduction)
The rules enforce agents to use semcode for semantic queries instead of reading entire files, drastically reducing cache reads in large projects.
Benchmarks
Token Savings: 82% Reduction
We tested 8 difficult semantic queries comparing grep vs semcode:
| Query Type | grep Tokens | semcode Tokens | Savings | |------------|-------------|--------------|---------| | Error Handling | ~4,500 | ~746 | -83.4% | | Authentication | ~5,500 | ~760 | -86.2% | | API Routes | ~8,750 | ~760 | -91.3% | | State Management | ~3,000 | ~760 | -74.7% | | File Processing | ~5,000 | ~760 | -84.8% | | Performance | ~2,000 | ~760 | -62.0% | | Configuration | ~2,750 | ~755 | -72.5% | | Security | ~2,250 | ~750 | -66.7% | | TOTAL | ~33,750 | ~6,051 | -82.1% |
Key Findings
Token Savings:
- grep average: ~4,219 tokens per complex query
- semcode average: ~756 tokens per complex query
- Savings per query: ~3,463 tokens (82%)
Performance:
- grep: ~57ms average (but needs multiple queries + filtering)
- semcode: ~940ms average (single semantic query)
- Trade-off: semcode is ~16x slower but returns 82% fewer, more relevant tokens
Relevance:
- grep: ~17% relevant results (many false positives, requires reading multiple files)
- semcode: ~80% relevant results (semantic understanding, targeted results)
- semcode finds 4x more useful information per result
Cache Reduction:
- grep (semantic): Reads 20+ files to find patterns → High cache usage
- semcode (semantic): Returns top 10 relevant results → Minimal cache usage
- 82% reduction in file reads and cache consumption in large projects
Real-World Cost Analysis
Scenario: AI Agent Session (50 semantic queries)
- grep: ~211,000 tokens = $2.11 (at $0.01/1K tokens)
- semcode: ~38,000 tokens = $0.38
- Savings: $1.73 per session (82% reduction)
Monthly Usage (500 sessions):
- grep: $1,054.75
- semcode: $189.00
- Savings: $865.75/month
When to Use Each Tool
Use grep when:
- ✅ You know the exact pattern/symbol
- ✅ You need speed (instant results)
- ✅ You want ALL matches (comprehensive search)
- ✅ Searching for exact strings, regex patterns
- ✅ Debugging specific issues (exact error messages)
Example:
# Perfect for grep - exact symbol
grep -r "authenticateUser" src/
# Perfect for grep - regex pattern
grep -r "error.*code.*[0-9]{3}" src/Use semcode when:
- ✅ You're exploring unfamiliar codebase
- ✅ You need semantic understanding
- ✅ You want TOP relevant results (not all matches)
- ✅ Token usage matters (AI agents, API costs)
- ✅ You don't know exact naming conventions
Example:
# Perfect for semcode - semantic understanding
# Cursor agents automatically use semcode for queries like:
# "how is user authentication implemented"
# "how are API endpoints structured"File Structure
After running semcode init:
your-project/
├── .cursor/
│ └── rules/
│ └── semcode-search.mdc # Cursor rules (auto-created)
├── .semcode/
│ ├── local-index.json # Search index
│ └── .lock # Lock file
└── ...Keeping Index Updated
Re-index your workspace when files change significantly:
semcode index # Re-index files
semcode index --clear # Clear and re-indexTroubleshooting
semcode command not found
Solution:
- Use full path:
/path/to/semcode/dist/index.js init - Or install globally:
npm install -g @bvvvp009/semcode
Index not found
Solution: Run semcode index, semcode index-local, or semcode init
Cursor not using semcode
Solution:
- Check
.cursor/rules/semcode-search.mdcexists - Restart Cursor
- Verify query complexity (should be ≥ 10 words or contain questions)
Index out of date
Solution: Run semcode index or semcode index-local to re-index
How It Works Under the Hood
- Indexing: Files are chunked and embedded using local Transformers.js models
- Search: Queries are embedded and matched against indexed chunks using cosine similarity
- Routing: Cursor rules analyze query complexity and route to appropriate tool
- Storage: Everything is stored locally in
.semcode/local-index.json
No cloud required - all processing happens locally on your machine.
License
Apache 2.0
Contributing
Contributions welcome! Please open an issue or PR.
Built with ❤️ for developers who want smarter code search
