npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@tan-yong-sheng/code-context-core

v1.0.0

Published

Core indexing engine for Code Context

Readme

@tan-yong-sheng/code-context-core

The core indexing engine for Code Context - a powerful tool for semantic search and analysis of codebases using vector embeddings and AI.

npm version npm downloads

📖 New to Code Context? Check out the main project README for an overview and quick start guide.

Installation

npm install @tan-yong-sheng/code-context-core

Prepare Environment Variables

Option 1: SQLite-vec (Recommended - Zero Config)

No additional configuration needed! sqlite-vec uses local SQLite files for vector storage.

# Optional: Custom directory for vector databases (defaults to ~/.code-context/vectors)
VECTOR_DB_PATH=/custom/path/to/vectors

Option 2: OpenAI API key (for embeddings)

See OpenAI Documentation for more details to get your API key.

OPENAI_API_KEY=your-openai-api-key

💡 Tip: For easier configuration management across different usage scenarios, consider using global environment variables.

Quick Start

Option 1: SQLite-vec (Recommended - Zero Config)

The easiest way to get started with local vector storage using SQLite:

import {
  Context,
  OpenAIEmbedding,
  SqliteVecVectorDatabase
} from '@tan-yong-sheng/code-context-core';

// Initialize embedding provider
const embedding = new OpenAIEmbedding({
  apiKey: process.env.OPENAI_API_KEY || 'your-openai-api-key',
  model: 'text-embedding-3-small'
});

// Initialize sqlite-vec vector database (zero config!)
const vectorDatabase = new SqliteVecVectorDatabase();

// Create context instance
const context = new Context({
  embedding,
  vectorDatabase
});

// Index a codebase
const stats = await context.indexCodebase('./my-project', (progress) => {
  console.log(`${progress.phase} - ${progress.percentage}%`);
});

console.log(`Indexed ${stats.indexedFiles} files with ${stats.totalChunks} chunks`);

// Search the codebase
const results = await context.semanticSearch(
  './my-project',
  'function that handles user authentication',
  5
);

results.forEach(result => {
  console.log(`${result.relativePath}:${result.startLine}-${result.endLine}`);
  console.log(`Score: ${result.score}`);
  console.log(result.content);
});

Features

  • Multi-language Support: Index TypeScript, JavaScript, Python, Java, C++, and many other programming languages
  • Semantic Search: Find code using natural language queries powered by AI embeddings
  • Flexible Architecture: Pluggable embedding providers and vector databases
  • Smart Chunking: Intelligent code splitting that preserves context and structure
  • Batch Processing: Efficient processing of large codebases with progress tracking
  • Pattern Matching: Built-in ignore patterns for common build artifacts and dependencies
  • Incremental File Synchronization: Efficient change detection using Merkle trees to only re-index modified files

Embedding Providers

  • OpenAI Embeddings (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002)
  • VoyageAI Embeddings - High-quality embeddings optimized for code (voyage-code-3, voyage-3.5, etc.)
  • Gemini Embeddings - Google's embedding models (gemini-embedding-001)
  • Ollama Embeddings - Local embedding models via Ollama

Vector Database Support

  • SQLite-vec - Zero-config local vector database using SQLite
    • Stores vectors in local SQLite files
    • No external dependencies or services
    • Hybrid search with FTS5 support
    • Cross-platform (Linux, macOS, Windows)

Code Splitters

  • AST Code Splitter - AST-based code splitting with automatic fallback (default)
  • LangChain Code Splitter - Character-based code chunking

Configuration

ContextConfig

interface ContextConfig {
  embedding?: Embedding;           // Embedding provider
  vectorDatabase?: VectorDatabase; // Vector database instance (required)
  codeSplitter?: Splitter;        // Code splitting strategy
  supportedExtensions?: string[]; // File extensions to index
  ignorePatterns?: string[];      // Patterns to ignore
  customExtensions?: string[];    // Custom extensions from MCP
  customIgnorePatterns?: string[]; // Custom ignore patterns from MCP
}

Supported File Extensions (Default)

[
  // Programming languages
  '.ts', '.tsx', '.js', '.jsx', '.py', '.java', '.cpp', '.c', '.h', '.hpp',
  '.cs', '.go', '.rs', '.php', '.rb', '.swift', '.kt', '.scala', '.m', '.mm',
  // Text and markup files  
  '.md', '.markdown', '.ipynb'
]

Default Ignore Patterns

  • Build and dependency directories: node_modules/**, dist/**, build/**, out/**, target/**
  • Version control: .git/**, .svn/**, .hg/**
  • IDE files: .vscode/**, .idea/**, *.swp, *.swo
  • Cache directories: .cache/**, __pycache__/**, .pytest_cache/**, coverage/**
  • Minified files: *.min.js, *.min.css, *.bundle.js, *.map
  • Log and temp files: logs/**, tmp/**, temp/**, *.log
  • Environment files: .env, .env.*, *.local

API Reference

Context

Methods

  • indexCodebase(path, progressCallback?, forceReindex?) - Index an entire codebase
  • reindexByChange(path, progressCallback?) - Incrementally re-index only changed files
  • semanticSearch(path, query, topK?, threshold?, filterExpr?) - Search indexed code semantically
  • hasIndex(path) - Check if codebase is already indexed
  • clearIndex(path, progressCallback?) - Remove index for a codebase
  • updateIgnorePatterns(patterns) - Update ignore patterns
  • addCustomIgnorePatterns(patterns) - Add custom ignore patterns
  • addCustomExtensions(extensions) - Add custom file extensions
  • updateEmbedding(embedding) - Switch embedding provider
  • updateVectorDatabase(vectorDB) - Switch vector database
  • updateSplitter(splitter) - Switch code splitter

Search Results

interface SemanticSearchResult {
  content: string;      // Code content
  relativePath: string; // File path relative to codebase root
  startLine: number;    // Starting line number
  endLine: number;      // Ending line number
  language: string;     // Programming language
  score: number;        // Similarity score (0-1)
}

Examples

Using SQLite-vec with Local Embeddings (Ollama)

import {
  Context,
  SqliteVecVectorDatabase,
  OllamaEmbedding
} from '@tan-yong-sheng/code-context-core';

// Use Ollama for local embeddings (no API keys needed!)
const embedding = new OllamaEmbedding({
  model: 'nomic-embed-text',
  baseUrl: 'http://localhost:11434'
});

// sqlite-vec for local vector storage
const vectorDatabase = new SqliteVecVectorDatabase();

const context = new Context({
  embedding,
  vectorDatabase
});

// Index and search completely offline!
await context.indexCodebase('./my-project');
const results = await context.semanticSearch('./my-project', 'authentication');

Using VoyageAI Embeddings

import {
  Context,
  SqliteVecVectorDatabase,
  VoyageAIEmbedding
} from '@tan-yong-sheng/code-context-core';

// Initialize with VoyageAI embedding provider
const embedding = new VoyageAIEmbedding({
  apiKey: process.env.VOYAGEAI_API_KEY || 'your-voyageai-api-key',
  model: 'voyage-code-3'
});

// sqlite-vec for local vector storage (zero config!)
const vectorDatabase = new SqliteVecVectorDatabase();

const context = new Context({
  embedding,
  vectorDatabase
});

Custom File Filtering

import { Context, SqliteVecVectorDatabase, OpenAIEmbedding } from '@tan-yong-sheng/code-context-core';

const embedding = new OpenAIEmbedding({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small'
});

const vectorDatabase = new SqliteVecVectorDatabase();

const context = new Context({
  embedding,
  vectorDatabase,
  supportedExtensions: ['.ts', '.js', '.py', '.java'],
  ignorePatterns: [
    'node_modules/**',
    'dist/**',
    '*.spec.ts',
    '*.test.js'
  ]
});

File Synchronization Architecture

Code Context implements an intelligent file synchronization system that efficiently tracks and processes only the files that have changed since the last indexing operation. This dramatically improves performance when working with large codebases.

File Synchronization Architecture

How It Works

The file synchronization system uses a Merkle tree-based approach combined with SHA-256 file hashing to detect changes:

1. File Hashing

  • Each file in the codebase is hashed using SHA-256
  • File hashes are computed based on file content, not metadata
  • Hashes are stored with relative file paths for consistency across different environments

2. Merkle Tree Construction

  • All file hashes are organized into a Merkle tree structure
  • The tree provides a single root hash that represents the entire codebase state
  • Any change to any file will cause the root hash to change

3. Snapshot Management

  • File synchronization state is persisted to ~/.context/merkle/ directory
  • Each codebase gets a unique snapshot file based on its absolute path hash
  • Snapshots contain both file hashes and serialized Merkle tree data

4. Change Detection Process

  1. Quick Check: Compare current Merkle root hash with stored snapshot
  2. Detailed Analysis: If root hashes differ, perform file-by-file comparison
  3. Change Classification: Categorize changes into three types:
    • Added: New files that didn't exist before
    • Modified: Existing files with changed content
    • Removed: Files that were deleted from the codebase

5. Incremental Updates

  • Only process files that have actually changed
  • Update vector database entries only for modified chunks
  • Remove entries for deleted files
  • Add entries for new files

Contributing

This package is part of the Code Context monorepo. Please see:

Related Packages

License

MIT - See LICENSE for details