npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, πŸ‘‹, I’m Ryan HefnerΒ  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you πŸ™

Β© 2026 – Pkg Stats / Ryan Hefner

semsearch

v0.4.1

Published

πŸ” AI-powered semantic search library with Azure OpenAI embeddings, intelligent reranking, and local SQLite storage. Find documents by meaning, not just keywords.

Readme

πŸ” File Semantic Search Library

A standalone file semantic search library with Azure OpenAI integration, featuring intelligent file indexing, vector similarity search, and AI-powered reranking.

✨ Features

  • Setup Wizard: Interactive guided setup with rich terminal interface
  • Azure OpenAI Integration: Text embeddings + GPT-4 reranking
  • SQLite Storage: Persistent vector database with efficient retrieval
  • Smart File Indexing: Automatic file discovery, summarization, and embedding
  • File Semantic Search: Intent-aware search beyond keyword matching across your documents
  • Flexible Auth: API key or Azure AD authentication
  • MCP Server: Model Context Protocol integration for AI assistants
  • Performance Optimized: sqlite-vec integration for fast vector search
  • Customizable: Configurable prompts, thresholds, and processing options
  • Comprehensive: Detailed status reporting and database statistics

πŸš€ Quick Start

1. Initialize (First Time Setup)

npx semsearch init

The setup wizard will guide you through:

  • Azure OpenAI endpoint configuration
  • Authentication method (API key vs Azure AD)
  • Model deployment selection
  • Database location setup
  • Connection testing

2. Index Your Document Files

npx semsearch index ./my-documents

3. Search Your Files

npx semsearch search "machine learning algorithms"

πŸ“‹ Commands

| Command | Description | |---------|-------------| | init | Run setup wizard (first time) | | status | Show configuration and database status | | index <path> | Index files for search | | search <query> | Search indexed files | | info <id> | Get detailed file information | | reset | Reset configuration and/or database | | prompts | Manage AI prompts for summarization and reranking | | test-connection | Test Azure OpenAI connectivity | | mcp | Start MCP server for AI assistant integration |

Command Options

Global Options

  • --endpoint <url> - Azure OpenAI endpoint URL
  • --api-key <key> - Azure OpenAI API key (if not using managed identity)
  • --embedding-model <model> - Azure OpenAI embedding deployment name (default: text-embedding-ada-002)
  • --llm-model <model> - Azure OpenAI LLM deployment name for reranking (default: gpt-4.1-mini)
  • -v, --verbose - Show detailed progress information

status [options]

  • --db <path> - SQLite database path

reset [options]

  • --force - Skip confirmation prompt
  • --db-only - Reset only the database, keep configuration

index <path> [options]

  • --db <path> - SQLite database path
  • --maxChars <n> - Max characters to read per file (default: 50000)
  • --force - Reprocess all files even if already indexed
  • -c, --concurrency <n> - Number of files to process in parallel (default: 3)

search <query> [options]

  • --db <path> - SQLite database path
  • --topK <n> - Maximum number of results to return (default: 5)
  • --min-similarity <n> - Minimum cosine similarity threshold 0.0-1.0, filters vector matches (default: 0.1)
  • --min-score <n> - Minimum final score threshold 0-100, filters reranked results (default: 0)

info <id> [options]

  • --db <path> - SQLite database path

prompts [options]

  • --edit - Edit prompts interactively
  • --reset - Reset prompts to defaults

test-connection [options]

  • --endpoint <url> - Azure OpenAI endpoint URL
  • --api-key <key> - Azure OpenAI API key

Example Usage

# Basic commands
npx semsearch init                              # Interactive setup
npx semsearch index ./documents                 # Index documents
npx semsearch search "machine learning"         # Basic search

# Advanced indexing options
npx semsearch index ./docs --maxChars 100000 --concurrency 5 --force

# Advanced search with filtering
npx semsearch search "database design" --topK 10 --min-similarity 0.7 --min-score 80

# Use custom database location
npx semsearch search "query" --db ./custom/index.db

# Reset database only (keep configuration)
npx semsearch reset --db-only --force

# Edit AI prompts
npx semsearch prompts --edit

# Test connection with specific credentials
npx semsearch test-connection --endpoint https://your-endpoint.com --api-key your-key

πŸ€– MCP Server Integration

The MCP (Model Context Protocol) server allows AI assistants to access your semantic search functionality directly.

Start MCP Server

npx semsearch mcp
# OR use the direct binary
npx semsearch-mcp

This starts a JSON-RPC server using stdio transport that provides:

πŸ› οΈ Tools

  • search: Perform semantic search across indexed files
    • Parameters: query (string), topK (number, default: 5), minScore (number, default: 0)
  • get_stats: Get database statistics (file count, size, etc.)

πŸ“š Resources

  • file://<id>: Access individual indexed files by their UUID
  • Dynamic resource list based on your indexed files

AI Assistant Integration

Configure your AI assistant (Claude Desktop, VS Code Copilot, etc.) to connect to the MCP server:

{
  "mcpServers": {
    "semsearch": {
      "command": "npx",
      "args": ["semsearch@latest", "mcp"],
      "type": "stdio"
    }
  }
}

Note: The MCP server automatically uses your configured database and settings from the default locations. No environment variables needed!

The AI assistant can then:

  • Search your files semantically
  • Retrieve specific files by ID
  • Get database statistics
  • Access your knowledge base contextually

πŸ”§ Configuration

The setup wizard creates two configuration files:

.env (Environment Variables)

AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your_api_key_here
AZURE_OPENAI_EMBED_DEPLOYMENT=text-embedding-ada-002
AZURE_OPENAI_RERANK_DEPLOYMENT=gpt-4.1-mini
SEMSEARCH_DEFAULT_DB=./.data/index.db

.semsearch.json (Configuration Metadata)

{
  "version": "0.3.0",
  "setupDate": "2025-09-14T08:24:01.945Z",
  "authentication": "api-key",
  "endpoints": {
    "azure": "https://your-resource.openai.azure.com/"
  },
  "deployments": {
    "embedding": "text-embedding-ada-002",
    "rerank": "gpt-4.1-mini"
  },
  "storage": {
    "defaultDatabase": "./.data/index.db"
  }
}

🎨 CLI Examples

Status Check

$ npx semsearch status
πŸ“Š Semantic Search Status

Configuration:
  Endpoint: https://your-resource.openai.azure.com/
  Auth: API Key
  Embedding: text-embedding-ada-002
  Rerank: gpt-4.1-mini

Database:
  Path: ./.data/index.db
  Size: 60.0 KB
  Modified: 2025-09-14T08:25:13.398Z
  Documents: 5

Config Files:
  .env: βœ“
  .semsearch.json: βœ“

Search Results

$ npx semsearch search "database design"
πŸ” Searching for: database design

Found 5 results:

1. 100.0 database-design.md (eded72e8...)
   The text outlines fundamental database design principles, including 
   normalization, data integrity, performance optimization, and security...

2. 15.0 semantic-search.md (a9408b5e...)
   The text discusses semantic search technology, which leverages natural
   language processing and machine learning...

# Search with minimum similarity threshold
$ npx semsearch search "machine learning" --min-similarity 0.7
πŸ” Searching for: machine learning
   Minimum cosine similarity: 0.70

Found 2 results:

# Search with minimum final score threshold  
$ npx semsearch search "database" --min-score 80
πŸ” Searching for: database
   Minimum final score: 80

Found 1 result:

πŸ—οΈ Architecture

  • Azure OpenAI: Text embeddings (text-embedding-ada-002) + reranking (gpt-4.1-mini)
  • SQLite Database: Vector storage with sqlite-vec extension and WAL mode
  • TypeScript: Fully typed with strict compilation
  • CLI Framework: Commander.js with inquirer, chalk, ora, boxen for UX
  • Binary Entry Points:
    • semsearch - Main CLI interface
    • semsearch-mcp - Direct MCP server launcher
  • MCP Integration: JSON-RPC over stdio transport for AI assistant integration

πŸ” Authentication Options

API Key

  • Simple setup through init wizard
  • Stored securely in .env file
  • Immediate access without additional configuration

Azure AD

  • Uses DefaultAzureCredential
  • Supports managed identities
  • Enhanced security for enterprise environments

πŸ“Š Performance

  • Embedding Model: text-embedding-ada-002 (1536 dimensions)
  • Search Strategy: Vector similarity + AI reranking
  • Storage: SQLite with sqlite-vec extension for optimized vector operations
  • Retrieval: Sub-linear vector search with fallback to brute-force when needed
  • Optimization: Dynamic vector table creation with dimension detection
  • Concurrency: Configurable parallel processing for indexing operations

πŸ’» Library Usage

import { SemanticStore } from 'semantic-search';
import { DefaultAzureCredential } from '@azure/identity';

const store = new SemanticStore({
  azure: {
    endpoint: process.env.AZURE_OPENAI_ENDPOINT || 'https://your-resource.openai.azure.com/',
    embeddingDeployment: process.env.AZURE_OPENAI_EMBED_DEPLOYMENT || 'text-embedding-ada-002',
    rerankDeployment: process.env.AZURE_OPENAI_RERANK_DEPLOYMENT || 'gpt-4.1-mini'
  },
  credential: new DefaultAzureCredential(),
  sqlite: { path: './.data/index.db' },
  summarizer: { maxChars: 50000 }
});

await store.indexPath('./docs');  // Index all files in ./docs directory
const results = await store.search('How to configure X?', { topK: 10 });

πŸ› οΈ Development

Build from Source

npm install
npm run build

Project Structure

src/
β”œβ”€β”€ cli.ts                  # Command-line interface with all commands
β”œβ”€β”€ init-wizard.ts          # Setup wizard with interactive UX
β”œβ”€β”€ store.ts                # Main SemanticStore class
β”œβ”€β”€ azure.ts                # Azure OpenAI client management
β”œβ”€β”€ sqlite.ts               # SQLite vector database with sqlite-vec
β”œβ”€β”€ nlp.ts                  # NLP operations (embed, summarize, rerank)
β”œβ”€β”€ mcp-server-simple.ts    # MCP server implementation (JSON-RPC over stdio)
β”œβ”€β”€ config-manager.ts       # Configuration file management
β”œβ”€β”€ indexing-progress.ts    # Progress tracking for indexing operations
β”œβ”€β”€ prompts-manager.ts      # AI prompt management and customization
β”œβ”€β”€ types.ts                # TypeScript type definitions
└── utils.ts                # Utility functions

πŸ“ Notes

  • Embeddings are normalized Float32 arrays stored as SQLite BLOBs
  • Uses sqlite-vec extension for optimized vector search with brute-force fallback
  • Files are only reprocessed if not found in database (no automatic change detection)
  • Use --force flag to reprocess all files regardless of database state
  • WAL mode enabled for better SQLite performance
  • Dual storage: main files table + virtual vec_files table for vector operations
  • Supports text files with automatic MIME type detection

πŸ“ License

MIT