npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@howaboua/pi-semantic-grep

v0.1.3

Published

Semantic code and docs search for pi, backed by OpenAI-compatible embeddings and repo-local SQLite indexes.

Readme

pi-semantic-grep

Semantic search for pi. This extension gives the agent a semantic_grep tool that finds relevant code or documentation by meaning, not just by exact words. It indexes each repository into a local SQLite database under .pi/, embeds code chunks through your configured OpenAI-compatible embeddings endpoint, and incrementally refreshes the index at the start of each pi session.

Package: @howaboua/pi-semantic-grep
Repository: https://github.com/IgorWarzocha/pi-semantic-grep

What it is good for

Use it when a normal text search is too literal:

  • “Where is auth/session state handled?”
  • “Find the code that formats tool results.”
  • “Where do we build prompts for the model?”
  • “What files are involved in indexing documents?”
  • “Find docs or examples for adding a custom renderer.”

The agent receives ranked matches with file paths, line ranges, scores, and snippets. By default the result renders compactly in pi and expands with the normal tool expand keybinding.

Requirements

  • pi
  • Node.js compatible with pi extensions
  • An OpenAI-compatible embeddings endpoint

The endpoint must accept:

POST /v1/embeddings
Content-Type: application/json

{
  "model": "your-embedding-model",
  "input": "text to embed"
}

and return an OpenAI-style response containing:

{
  "data": [
    { "embedding": [0.1, 0.2, 0.3] }
  ]
}

This has been designed for OpenAI-compatible local servers such as LM Studio, llama.cpp-style servers, or any service that exposes the same embeddings response shape. It should also work with hosted OpenAI-compatible embedding APIs when configured with the right URL/model/API key.

Installation

Install as a pi package:

pi install npm:@howaboua/pi-semantic-grep

Or run directly from a local checkout:

git clone https://github.com/IgorWarzocha/pi-semantic-grep.git
cd pi-semantic-grep
npm install
pi -e ./src/index.ts

Configuration

On first load, the extension creates:

~/.pi/agent/semantic-grep.json

Default config:

{
  "embeddings": {
    "url": "http://127.0.0.1:1234/v1/embeddings",
    "model": "text-embedding-embeddinggemma-300m-qat"
  },
  "indexing": {
    "chunkLines": 80,
    "chunkOverlap": 20,
    "maxFileBytes": 512000,
    "maxChunkChars": 12000,
    "skipOversizedChunks": false,
    "includeExtensions": [".ts", ".tsx", ".js", ".jsx", ".mjs", ".cjs", ".py", ".lua", ".rs", ".go", ".java", ".cs", ".cpp", ".c", ".h", ".hpp", ".md", ".json", ".yaml", ".yml", ".toml", ".css", ".scss", ".html", ".svelte", ".vue"],
    "excludeDirs": [".git", ".pi", "node_modules", "dist", "build", "target", ".venv", "venv", "vendor", ".next", ".cache"]
  },
  "search": {
    "defaultTopK": 8,
    "maxTopK": 30
  },
  "autoIndex": {
    "enabled": true,
    "mode": "incremental"
  },
  "safety": {
    "requireProjectMarker": true,
    "projectMarkers": [".git", "package.json", "pyproject.toml", "Cargo.toml", "go.mod", "deno.json", "bun.lock", "pnpm-lock.yaml", "yarn.lock"],
    "denyRootBasenames": ["Desktop", "Documents", "Downloads", "Pictures", "Music", "Movies", "Videos", "Public", "Templates", "Applications", "Library", "System", "Volumes", "Users", "Program Files", "Program Files (x86)", "ProgramData", "Windows", "PerfLogs", "AppData", "OneDrive", "Dropbox", "Google Drive", "iCloud Drive"],
    "denyRootPaths": ["~", "/", "C:\\", "C:/"]
  }
}

For your example local endpoint:

curl http://127.0.0.1:1234/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-embeddinggemma-300m-qat",
    "input": "Some text to embed"
  }'

use:

{
  "embeddings": {
    "url": "http://127.0.0.1:1234/v1/embeddings",
    "model": "text-embedding-embeddinggemma-300m-qat"
  }
}

For an endpoint requiring a bearer token:

{
  "embeddings": {
    "url": "https://example.com/v1/embeddings",
    "model": "my-embedding-model",
    "apiKey": "YOUR_API_KEY"
  }
}

Indexing behavior

The index is stored per repository:

<repo>/.pi/semantic-grep.sqlite

At session start, the extension syncs the index automatically.

autoIndex.mode options:

  • incremental — default; only embed new or changed files and remove deleted files
  • missing — build only if the SQLite index does not exist
  • always — force a full rebuild at every session start

A full rebuild is also triggered when indexing settings change, such as embedding model, chunk size, included extensions, excluded directories, max chunk size, or schema version. Oversized chunks are split by default instead of failing the whole indexing run; tune indexing.maxChunkChars for your embedder.

Safety defaults

The extension intentionally avoids indexing broad system or user directories. By default it requires a project marker such as .git, package.json, pyproject.toml, Cargo.toml, or go.mod, and refuses protected roots like ~, /, C:\\, plus common Windows/macOS/Linux home folders such as Desktop, Documents, Downloads, Pictures, Applications, Library, Program Files, Windows, AppData, OneDrive, Dropbox, Google Drive, and iCloud Drive.

You can adjust these rules in ~/.pi/agent/semantic-grep.json under safety.

Tool available to the agent

semantic_grep({
  query: string,
  top_k?: number
})

Examples:

semantic_grep({ query: "where are tool calls dispatched?" })
semantic_grep({ query: "code that formats markdown output", top_k: 5 })
semantic_grep({ query: "configuration loading and defaults", top_k: 10 })

Tech stack

  • TypeScript pi extension
  • better-sqlite3 for repo-local storage
  • OpenAI-compatible /v1/embeddings HTTP API for vectors
  • Simple line-window chunking with overlap
  • SHA256 per-file tracking for incremental indexing
  • Brute-force cosine similarity over vectors loaded from SQLite

The implementation is intentionally simple and portable. It does not require Python, FAISS, a background service, or a separate vector database. For very large repositories, an approximate nearest-neighbor index may be added later.