npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@lojban/semantic-search-mcp

v1.0.16

Published

Local-first MCP server for semantic search using transformers.js and SQLite

Readme

Semantic Local MCP

A local-first MCP (Model Context Protocol) server for semantic search over your documents. Index text files (e.g. TSV, CSV, TXT) line-by-line, then search or filter by meaning using embeddings—all on your machine, no API keys required.

Use it in Cursor, Claude Code, or any IDE that supports MCP to search through dictionaries, glossaries, and corpora by semantic similarity.

Use cases

  • Lojban (or any) dictionary: Index a TSV where each line is a word/definition. Find entries similar to a phrase or concept, or discover gaps—word combinations or concepts your dictionary doesn't cover yet.
  • Glossaries & term bases: "Find entries that mean something like …" without exact keyword match.
  • Corpora & line-based data: Any file where each line is a record (TSV, CSV, one-sentence-per-line TXT). Index once, query by meaning.

How it works

  • Indexing: On startup, the server indexes content in the background. If SEMANTIC_SEARCH_INDEX_DIRS is set (comma-separated paths), it scans those directories. If it is not set, the server downloads the lojban/sampu_vlaste repository from GitHub and indexes that instead. In both cases, the server looks for .txt, .md, .tsv, .csv files. Each non-empty line gets a vector embedding (via Hugging Face Transformers.js, model Xenova/all-MiniLM-L6-v2) and is stored in a local SQLite database with @dao-xyz/sqlite3-vec (SQLite + sqlite-vec for Node and browser). Indexing runs asynchronously so the server stays responsive and uses bounded memory.
  • Search: You send a natural-language query; the server embeds it and returns the closest lines by cosine similarity.
  • Storage: Index is stored in your project's .semantic-search/data/ (or set SEMANTIC_SEARCH_DATA_DIR). No cloud, no API keys.

Requirements

  • Node.js 18+ (20+ recommended)
  • npm or pnpm

First run will download the embedding model (~80MB) and cache it locally.

Use in Cursor IDE

There is no build step and no need to run npm install yourself. The server runs only via npx tsx (TypeScript is run directly). Add a single command to MCP; on first run, npx will download the package and its dependencies, and the server will download the embedding model (~80MB) when you first index or search.

The package is published as @lojban/semantic-search-mcp. (To run from source before/without publishing, see the From source setup in the Development section.)

  1. Add the MCP server in Cursor:

    • Open SettingsCursor SettingsMCP (or edit ~/.cursor/mcp.json).
    • Add:
    {
      "mcpServers": {
        "semantic-search": {
          "command": "npx",
          "args": ["-y", "@lojban/semantic-search-mcp"]
        }
      }
    }

    No cwd needed: the server stores its index in your project directory (.semantic-search/data/), so open your project in Cursor and the index is per-workspace. To use a fixed data directory instead, add "env": { "SEMANTIC_SEARCH_DATA_DIR": "/path/to/data" }. To have the server index specific directories on startup, set "env": { "SEMANTIC_SEARCH_INDEX_DIRS": "./dictionary,./glossary" } (comma-separated paths). If you omit SEMANTIC_SEARCH_INDEX_DIRS, the server will download and index the lojban/sampu_vlaste repo automatically.

  2. Restart Cursor (or reload the window). Indexing starts automatically in the background: from your configured SEMANTIC_SEARCH_INDEX_DIRS, or from the downloaded sampu_vlaste repo if that env is not set.

  3. In chat or Composer, ask the AI to use the tools:

    • Search: "Use semantic-search tool: find combinations of words that can express the concept of …", "Use semantic-search tool: search the index for …" or "Use semantic-search tool: Find entries similar to …"
    • Stats: "use semantic-search mcp. run get_index_stats" — stats include progress and start time (locale-formatted) when indexing is in progress.

The AI will call search and get_index_stats for you.

Use in other AI IDEs (Claude Code, etc.)

Any environment that supports MCP over stdio can use this server. Run:

  • One-liner: npx -y @lojban/semantic-search-mcp — dependencies are installed on first run; index is stored in the current working directory's .semantic-search/data/. Set env SEMANTIC_SEARCH_INDEX_DIRS (comma-separated paths) to index those directories on startup; if unset, the server downloads and indexes lojban/sampu_vlaste from GitHub. Tools: search, get_index_stats.

From source: Clone the repo, run npm install once, then use "command": "npx", "args": ["tsx", "src/index.ts"], "cwd": "/path/to/semantic-search-mcp" or "command": "node", "args": ["/path/to/semantic-search-mcp/run.mjs"] (no cwd needed with the latter). See MCP_SETUP.md for details.

MCP tools

| Tool | Description | |------|-------------| | search | Semantic search: query (string), optional limit (default 10). Returns file path, line number, content, and similarity score. | | get_index_stats | Returns total number of indexed files and lines. When indexing is running in the background, also returns progress: indexing.started_at (locale-formatted), lines_indexed_so_far, files_indexed_so_far, and in_progress. |

Indexing on startup

  • With your own dirs: Set the environment variable SEMANTIC_SEARCH_INDEX_DIRS to a comma-separated list of directories to index. When the MCP server starts, it begins indexing those directories in the background (async).
  • Default (no env set): If SEMANTIC_SEARCH_INDEX_DIRS is not set, the server downloads the lojban/sampu_vlaste repository from GitHub (as a zip), extracts it under .semantic-search/sampu_vlaste/, and indexes that. The download is cached; subsequent starts reuse the cached copy.

The index is cleared and rebuilt each time the server starts. Use absolute paths or paths relative to the server's working directory when setting SEMANTIC_SEARCH_INDEX_DIRS. The server reads and indexes all supported .txt, .md, .tsv, .csv files under each directory recursively. Indexing uses bounded memory and yields to the event loop so the OS stays responsive.

Example: Lojban dictionary gaps

  1. Put your dictionary TSV (e.g. jbo-eng.tsv) in a folder (e.g. ./dictionary).
  2. Set SEMANTIC_SEARCH_INDEX_DIRS=./dictionary in your MCP config (or in the environment). Restart the server; indexing runs in the background.
  3. In Cursor: "Search for entries similar to 'to cause to become warm' and limit 20."
  4. Or: "Search for 'emotional state of joy' and show me what we have; then suggest word combinations the dictionary might be missing."

The index is stored in .semantic-search/data/vectors.db (or your project root). Restart the server to re-index when you add or change files.

Development

The server is not built to JavaScript; it runs via npx tsx src/index.ts or node run.mjs. No tsc or node dist/ usage.

From source (e.g. before publishing to npm):

  1. Run npm install once in the repo.
  2. In MCP config use either:
    • "command": "npx", "args": ["tsx", "src/index.ts"], "cwd": "/path/to/semantic-search-mcp", or
    • "command": "node", "args": ["/path/to/semantic-search-mcp/run.mjs"] (run.mjs sets cwd automatically; see MCP_SETUP.md).

To run the server from the repo: npm run dev or npx tsx src/index.ts.

License

MIT