@howaboua/pi-semantic-grep
v0.1.3
Published
Semantic code and docs search for pi, backed by OpenAI-compatible embeddings and repo-local SQLite indexes.
Maintainers
Readme
pi-semantic-grep
Semantic search for pi. This extension gives the agent a semantic_grep tool that finds relevant code or documentation by meaning, not just by exact words. It indexes each repository into a local SQLite database under .pi/, embeds code chunks through your configured OpenAI-compatible embeddings endpoint, and incrementally refreshes the index at the start of each pi session.
Package: @howaboua/pi-semantic-grep
Repository: https://github.com/IgorWarzocha/pi-semantic-grep
What it is good for
Use it when a normal text search is too literal:
- “Where is auth/session state handled?”
- “Find the code that formats tool results.”
- “Where do we build prompts for the model?”
- “What files are involved in indexing documents?”
- “Find docs or examples for adding a custom renderer.”
The agent receives ranked matches with file paths, line ranges, scores, and snippets. By default the result renders compactly in pi and expands with the normal tool expand keybinding.
Requirements
- pi
- Node.js compatible with pi extensions
- An OpenAI-compatible embeddings endpoint
The endpoint must accept:
POST /v1/embeddings
Content-Type: application/json
{
"model": "your-embedding-model",
"input": "text to embed"
}and return an OpenAI-style response containing:
{
"data": [
{ "embedding": [0.1, 0.2, 0.3] }
]
}This has been designed for OpenAI-compatible local servers such as LM Studio, llama.cpp-style servers, or any service that exposes the same embeddings response shape. It should also work with hosted OpenAI-compatible embedding APIs when configured with the right URL/model/API key.
Installation
Install as a pi package:
pi install npm:@howaboua/pi-semantic-grepOr run directly from a local checkout:
git clone https://github.com/IgorWarzocha/pi-semantic-grep.git
cd pi-semantic-grep
npm install
pi -e ./src/index.tsConfiguration
On first load, the extension creates:
~/.pi/agent/semantic-grep.jsonDefault config:
{
"embeddings": {
"url": "http://127.0.0.1:1234/v1/embeddings",
"model": "text-embedding-embeddinggemma-300m-qat"
},
"indexing": {
"chunkLines": 80,
"chunkOverlap": 20,
"maxFileBytes": 512000,
"maxChunkChars": 12000,
"skipOversizedChunks": false,
"includeExtensions": [".ts", ".tsx", ".js", ".jsx", ".mjs", ".cjs", ".py", ".lua", ".rs", ".go", ".java", ".cs", ".cpp", ".c", ".h", ".hpp", ".md", ".json", ".yaml", ".yml", ".toml", ".css", ".scss", ".html", ".svelte", ".vue"],
"excludeDirs": [".git", ".pi", "node_modules", "dist", "build", "target", ".venv", "venv", "vendor", ".next", ".cache"]
},
"search": {
"defaultTopK": 8,
"maxTopK": 30
},
"autoIndex": {
"enabled": true,
"mode": "incremental"
},
"safety": {
"requireProjectMarker": true,
"projectMarkers": [".git", "package.json", "pyproject.toml", "Cargo.toml", "go.mod", "deno.json", "bun.lock", "pnpm-lock.yaml", "yarn.lock"],
"denyRootBasenames": ["Desktop", "Documents", "Downloads", "Pictures", "Music", "Movies", "Videos", "Public", "Templates", "Applications", "Library", "System", "Volumes", "Users", "Program Files", "Program Files (x86)", "ProgramData", "Windows", "PerfLogs", "AppData", "OneDrive", "Dropbox", "Google Drive", "iCloud Drive"],
"denyRootPaths": ["~", "/", "C:\\", "C:/"]
}
}For your example local endpoint:
curl http://127.0.0.1:1234/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-embeddinggemma-300m-qat",
"input": "Some text to embed"
}'use:
{
"embeddings": {
"url": "http://127.0.0.1:1234/v1/embeddings",
"model": "text-embedding-embeddinggemma-300m-qat"
}
}For an endpoint requiring a bearer token:
{
"embeddings": {
"url": "https://example.com/v1/embeddings",
"model": "my-embedding-model",
"apiKey": "YOUR_API_KEY"
}
}Indexing behavior
The index is stored per repository:
<repo>/.pi/semantic-grep.sqliteAt session start, the extension syncs the index automatically.
autoIndex.mode options:
incremental— default; only embed new or changed files and remove deleted filesmissing— build only if the SQLite index does not existalways— force a full rebuild at every session start
A full rebuild is also triggered when indexing settings change, such as embedding model, chunk size, included extensions, excluded directories, max chunk size, or schema version. Oversized chunks are split by default instead of failing the whole indexing run; tune indexing.maxChunkChars for your embedder.
Safety defaults
The extension intentionally avoids indexing broad system or user directories. By default it requires a project marker such as .git, package.json, pyproject.toml, Cargo.toml, or go.mod, and refuses protected roots like ~, /, C:\\, plus common Windows/macOS/Linux home folders such as Desktop, Documents, Downloads, Pictures, Applications, Library, Program Files, Windows, AppData, OneDrive, Dropbox, Google Drive, and iCloud Drive.
You can adjust these rules in ~/.pi/agent/semantic-grep.json under safety.
Tool available to the agent
semantic_grep({
query: string,
top_k?: number
})Examples:
semantic_grep({ query: "where are tool calls dispatched?" })
semantic_grep({ query: "code that formats markdown output", top_k: 5 })
semantic_grep({ query: "configuration loading and defaults", top_k: 10 })Tech stack
- TypeScript pi extension
better-sqlite3for repo-local storage- OpenAI-compatible
/v1/embeddingsHTTP API for vectors - Simple line-window chunking with overlap
- SHA256 per-file tracking for incremental indexing
- Brute-force cosine similarity over vectors loaded from SQLite
The implementation is intentionally simple and portable. It does not require Python, FAISS, a background service, or a separate vector database. For very large repositories, an approximate nearest-neighbor index may be added later.
