@bvvvp009/semcode

v0.1.1

Published

a month ago

Semantic code search CLI - find code by meaning, not just text patterns

Downloads

0High
0Medium
0Low

bvvvp009

semantic code-search search grep semantic-search code-analysis local-search

semcode - Semantic Code Search CLI

A semantic, grep-like search tool for code that understands natural language queries. Use grep for exact matches, semcode for semantic understanding.

About

semcode is a local-first semantic code search tool that helps you find code by meaning, not just text patterns. It intelligently routes queries:

Simple/exact queries → Use grep (fast, instant results)
Complex/semantic queries → Use semcode (token savings, better relevance)

Key Features:

🔍 Semantic search with natural language queries
💰 82% token savings vs grep on complex queries
⚡ Fast indexing with local embeddings
🎯 Intelligent tool selection (grep vs semcode)
🔒 Works entirely offline, no cloud required

Quick Start

1. Install

# Option 1: Install globally (when published)
npm install -g @bvvvp009/semcode

# Option 2: Build from source
cd /path/to/semcode
npm install
npm run build

2. Initialize in Your Project

cd /path/to/your-project

# One command: indexes files + sets up Cursor rules
semcode init

That's it! The init command:

✅ Indexes your workspace files
✅ Creates .cursor/rules/semcode-search.mdc with intelligent routing rules
✅ Configures Cursor to automatically route queries to grep (simple) or semcode (complex)

3. Restart Cursor

Close and reopen Cursor to load the new rules. Cursor agents will now automatically validate queries and route them to the correct tool:

Simple/exact queries → grep (fast, low tokens)
Complex/semantic queries → semcode (82% token savings, reduces cache reads)

This validation reduces cache reading in large projects by 82%+, significantly cutting costs.

Commands

`semcode init`

Initialize workspace: index files and setup Cursor rules.

semcode init              # Index + setup rules
semcode init --clear      # Clear existing index first
semcode init --skip-index # Only setup rules, skip indexing

`semcode index` or `semcode index-local`

Re-index your workspace (use when files change significantly).

semcode index                    # Re-index files (alias for index-local)
semcode index-local              # Re-index files (local, no cloud)
semcode index --clear            # Clear existing index first
semcode index-local --clear      # Clear existing index first
semcode index --omit dist build  # Exclude additional folders (node_modules excluded by default)

Default exclusions:

node_modules/ - Dependencies (excluded by default)
.git/ - Git metadata (excluded by default)
dist/, build/ - Build outputs (excluded by default)
Files matching: *.lock, *.bin, *.ipynb, *.pyc, *.pyo

Note: All subfolders are indexed recursively. Use --omit to exclude additional paths.

`semcode watch-local`

Automatically watch for file changes and update the index in real-time.

semcode watch-local              # Watch and auto-index file changes
semcode watch-local --omit dist  # Watch with additional exclusions

Features:

✅ Automatically indexes files when they change
✅ Watches all subfolders recursively (except excluded paths)
✅ Real-time updates - no manual re-indexing needed
✅ Excludes node_modules/ by default
✅ Press Ctrl+C to stop watching

How It Works

After running semcode init, Cursor agents validate queries and automatically route to the correct tool:

Validation Process (Automatic)

Before each search, agents:

Count words in query
Check for question words (how, what, where, why, when, which)
Identify query type (exact vs semantic)
Route to appropriate tool

Simple Queries → grep

Exact matches: authenticateUser, const API_KEY
Short queries (< 10 words, no questions)
Regex patterns
Debugging exact strings

Example:

User: "find authenticateUser"
Agent: grep -r "authenticateUser" src/

Complex Queries → semcode

Natural language: "how is authentication implemented?"
Long queries (≥ 10 words)
Question words: how, what, where, why
Architecture/pattern exploration

Example:

User: "how is user authentication and authorization implemented?"
Agent: Uses semcode search internally (configured via rules)

Cache Reduction & Cost Savings

Without validation (wrong tool selection):

Semantic queries with grep → 3,500+ tokens (multiple file reads, cache exhaustion)
Cost: $1,054.75/month (500 sessions, 50 queries each)

With validation (correct tool selection):

Semantic queries with semcode → 750 tokens (targeted results, minimal cache reads)
Cost: $189.00/month (500 sessions, 50 queries each)
Savings: $865.75/month (82% reduction)

The rules enforce agents to use semcode for semantic queries instead of reading entire files, drastically reducing cache reads in large projects.

Benchmarks

Token Savings: 82% Reduction

We tested 8 difficult semantic queries comparing grep vs semcode:

| Query Type | grep Tokens | semcode Tokens | Savings | |------------|-------------|--------------|---------| | Error Handling | ~4,500 | ~746 | -83.4% | | Authentication | ~5,500 | ~760 | -86.2% | | API Routes | ~8,750 | ~760 | -91.3% | | State Management | ~3,000 | ~760 | -74.7% | | File Processing | ~5,000 | ~760 | -84.8% | | Performance | ~2,000 | ~760 | -62.0% | | Configuration | ~2,750 | ~755 | -72.5% | | Security | ~2,250 | ~750 | -66.7% | | TOTAL | ~33,750 | ~6,051 | -82.1% |

Key Findings

Token Savings:

grep average: ~4,219 tokens per complex query
semcode average: ~756 tokens per complex query
Savings per query: ~3,463 tokens (82%)

Performance:

grep: ~57ms average (but needs multiple queries + filtering)
semcode: ~940ms average (single semantic query)
Trade-off: semcode is ~16x slower but returns 82% fewer, more relevant tokens

Relevance:

grep: ~17% relevant results (many false positives, requires reading multiple files)
semcode: ~80% relevant results (semantic understanding, targeted results)
semcode finds 4x more useful information per result

Cache Reduction:

grep (semantic): Reads 20+ files to find patterns → High cache usage
semcode (semantic): Returns top 10 relevant results → Minimal cache usage
82% reduction in file reads and cache consumption in large projects

Real-World Cost Analysis

Scenario: AI Agent Session (50 semantic queries)

grep: ~211,000 tokens = $2.11 (at $0.01/1K tokens)
semcode: ~38,000 tokens = $0.38
Savings: $1.73 per session (82% reduction)

Monthly Usage (500 sessions):

grep: $1,054.75
semcode: $189.00
Savings: $865.75/month

When to Use Each Tool

Use grep when:

✅ You know the exact pattern/symbol
✅ You need speed (instant results)
✅ You want ALL matches (comprehensive search)
✅ Searching for exact strings, regex patterns
✅ Debugging specific issues (exact error messages)

Example:

# Perfect for grep - exact symbol
grep -r "authenticateUser" src/

# Perfect for grep - regex pattern
grep -r "error.*code.*[0-9]{3}" src/

Use semcode when:

✅ You're exploring unfamiliar codebase
✅ You need semantic understanding
✅ You want TOP relevant results (not all matches)
✅ Token usage matters (AI agents, API costs)
✅ You don't know exact naming conventions

Example:

# Perfect for semcode - semantic understanding
# Cursor agents automatically use semcode for queries like:
# "how is user authentication implemented"
# "how are API endpoints structured"

File Structure

After running semcode init:

your-project/
├── .cursor/
│   └── rules/
│       └── semcode-search.mdc # Cursor rules (auto-created)
├── .semcode/
│   ├── local-index.json       # Search index
│   └── .lock                  # Lock file
└── ...

Keeping Index Updated

Re-index your workspace when files change significantly:

semcode index           # Re-index files
semcode index --clear   # Clear and re-index

Troubleshooting

semcode command not found

Solution:

Use full path: /path/to/semcode/dist/index.js init
Or install globally: npm install -g @bvvvp009/semcode

Index not found

Solution: Run semcode index, semcode index-local, or semcode init

Cursor not using semcode

Solution:

Check .cursor/rules/semcode-search.mdc exists
Restart Cursor
Verify query complexity (should be ≥ 10 words or contain questions)

Index out of date

Solution: Run semcode index or semcode index-local to re-index

How It Works Under the Hood

Indexing: Files are chunked and embedded using local Transformers.js models
Search: Queries are embedded and matched against indexed chunks using cosine similarity
Routing: Cursor rules analyze query complexity and route to appropriate tool
Storage: Everything is stored locally in .semcode/local-index.json

No cloud required - all processing happens locally on your machine.

License

Apache 2.0

Contributing

Contributions welcome! Please open an issue or PR.

Built with ❤️ for developers who want smarter code search

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

semcode - Semantic Code Search CLI

About

Quick Start

1. Install

2. Initialize in Your Project

3. Restart Cursor

Commands

semcode init

semcode index or semcode index-local

semcode watch-local

How It Works

Validation Process (Automatic)

Simple Queries → grep

Complex Queries → semcode

Cache Reduction & Cost Savings

Benchmarks

Token Savings: 82% Reduction

Key Findings

Real-World Cost Analysis

When to Use Each Tool

Use grep when:

Use semcode when:

File Structure

Keeping Index Updated

Troubleshooting

semcode command not found

Index not found

Cursor not using semcode

Index out of date

How It Works Under the Hood

License

Contributing

`semcode init`

`semcode index` or `semcode index-local`

`semcode watch-local`