npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

rag-codebase-indexer

v0.0.1

Published

Codebase indexer for RAG systems

Readme

rag-codebase-indexer

CI

Index, embed and store a codebase in a vector database, then search for semantic context on your prompts.

[!CAUTION] This is a beta release — help is appreciated: issues

All LLM code files are in .docs.

Overview

The pipeline has three stages:

  1. Index - Create embeddings, stores state in JSON file by default.
    1. Chunk — parse source files into AST-aware chunks [Treesitter by default]
    2. Embed — generate vector embeddings for each chunk (Ollama / HuggingFace Transformers)
  2. Ingest — store chunks + embeddings in a vector database (ChromaDB)
  3. Search — hybrid search combining exact, semantic, and keyword matching

Two primary exports drive this: Indexer handles stages 1.1–1.2, VectorStore handles stages 2–3. Both extend EventEmitter and emit progress events during long-running operations.

Requirements

  • Node.js ≥ 20 (ESM)
  • Ollama running locally (default embedding provider)
  • ChromaDB running locally (default vector store)
  • Docker (for ChromaDB): docker-compose up -d

Installation

npm install rag-codebase-indexer

Module Usage

import { Indexer, VectorStore } from 'rag-codebase-indexer';

// ── Index ─────────────────────────────────────────────────

const indexer = await Indexer.create({
  provider: 'Ollama',
  model: 'nomic-embed-text',
  projectName: 'my-project',
});

indexer.on('progress', ({ phase, current, total, message }) => {
  process.stdout.write(`\r[${phase}] ${current}/${total} — ${message}`);
});

indexer.on('error', ({ phase, message, recoverable }) => {
  if (recoverable) console.warn(`\n⚠ [${phase}] ${message}`);
});

const result = await indexer.index({
  projectPath: '/home/user/my-project',
  cacheDir: '~/.cache/rag',
});

// ── Ingest ────────────────────────────────────────────────

const store = await VectorStore.connect({
  collection: 'my-project',
  dimensions: result.dimensions,
});

store.on('progress', ({ phase, current, total }) => {
  if (phase === 'ingest') {
    process.stdout.write(`\rIngesting: ${current}/${total}`);
  }
});

await store.ingest(result);

// ── Search ────────────────────────────────────────────────

const searchResult = await store.search('how does authentication work?', {
  topK: 10,
  filters: { source_file: 'src/auth/' },
});

console.log(searchResult.results);

// ── Cleanup ───────────────────────────────────────────────

await indexer.dispose();
await store.dispose();

Event System

Both Indexer and VectorStore emit typed events:

| Event | Payload | Description | |-------|---------|-------------| | progress | { phase, status, current, total, message } | Granular phase/status updates | | error | { phase, message, error, recoverable } | Non-fatal errors (pipeline continues). Fatal errors reject the Promise. | | done | IndexResult or IngestResult | Pipeline complete. Also returned by await. |

Phases for Indexer: scanchunkembedcache. Phases for VectorStore: ingest, search.

Configuration

Adding a provider

Add the provider to config -> providers[]

...
"providers": {
  "$providerName": {
    opts: {},
    models: {
      '$modelName': { ...}
    }
  }
  "Ollama": {
    ...
  }
  "Transformers": {
    ...
  }
...

Create the embedder logic for your new provider by extending lib/embedders/Embedder.js with $providerName.js (camel case):

import { Ollama as OllamaClient } from 'ollama';
import Embedder from './Embedder.js';

/**
 * Ollama-based embedder for local embedding generation using the official ollama-js library.
 * Requires Ollama service running locally or on a remote host.
 *
 * Recommended models:
 * - nomic-embed-text (768 dimensions) - Best for general use
 * - mxbai-embed-large (1024 dimensions) - Higher quality
 * - all-minilm (384 dimensions) - Faster, smaller
 *
 * @class Ollama
 * @extends {Embedder}
 * @see https://ollama.com/library for available models
 * @see https://github.com/ollama/ollama-js for library documentation
 */
class Ollama extends Embedder{
...
async initialize() {...}
async embed(text {...}
async embedBatch(texts) {...}
getDimensions() {...}
getModelName() {...}
}

Adding a model

  • add the model to config -> providers -> models:
"providers": {
  "Ollama": {
    host: 'http://localhost:11434',
    timeout: 30000, // 30 seconds
    dimensions: null,
    keepAlive: '5m',
    options: {},
    models: {
      ...
    }
    "Transformers": {
      models: {
      ...
      }
  }

CLI Usage

The CLI wraps the same Indexer and VectorStore APIs.

Embed a codebase

Scans files, chunks with Treesitter, generates embeddings, and writes a cache file.

node bin/cli.js embed ./src
node bin/cli.js embed ./src --provider Ollama --model nomic-embed-text
node bin/cli.js embed ./src --chunker Treesitter --cache-dir ~/.cache/rag --project my-project

| Option | Default | Description | |--------|---------|-------------| | --provider | Ollama | Embedding provider (Ollama or Transformers) | | --model | nomic-embed-text | Model name | | --chunker | Treesitter | Chunking strategy | | --cache-dir | ./embeddings_cache | Where to write the embeddings JSON | | --project | directory name | Project name (used in cache filename) |

Load into ChromaDB

Reads a cached embeddings file and bulk-inserts into a ChromaDB collection.

node bin/cli.js load ./embeddings_cache/embeddings_my-project.json --collection my-project

| Option | Default | Description | |--------|---------|-------------| | --collection | (required) | ChromaDB collection name | | --url | http://localhost:8000 | ChromaDB URL | | --batch-size | 200 | Insert batch size | | --dimensions | (read from file) | Override embedding dimensions | | --provider | Ollama | Embedding provider | | --model | nomic-embed-text | Model name |

Search

Runs hybrid search (exact identifier matching + semantic vector similarity + keyword) with fusion ranking.

node bin/cli.js search "authentication middleware" --collection my-project
node bin/cli.js search "handleAuth" --collection my-project --file src/auth/
node bin/cli.js search --interactive --collection my-project

| Option | Default | Description | |--------|---------|-------------| | --collection | (required) | ChromaDB collection name | | --url | http://localhost:8000 | ChromaDB URL | | --provider | Ollama | Embedding provider for query embedding | | --model | nomic-embed-text | Model name | | --top-k | 15 | Max results | | --file | — | Filter results by file path | | --dimensions | 384 | Embedding dimensions | | --interactive | — | Launch interactive search REPL |

Analyze chunks

Reports chunk size distribution from an embeddings cache file.

node bin/cli.js analyze ./embeddings_cache/embeddings_my-project.json

Tests

npm run coverage

Roadmap

v0.1.0 — Pipeline Composition API

  • [x] Indexer async factory with EventEmitter progress
  • [x] VectorStore with ingest + hybrid search
  • [x] CLI rewrite using new API
  • [ ] Config module cleanup
  • [ ] Release beta

v0.2.0 — Extensibility

  • [ ] Pluggable vector database interface (don't hardcode ChromaDB)
  • [ ] Additional embedding providers
  • [ ] Additional language support (Go, Python, TypeScript)

Future

  • [ ] Chat memory system
  • [ ] AbortController support for cancellable pipelines