@vivantel/virage-core

v0.2.23

Published

a day ago

Core RAG pipeline tools - universal chunking, embedding, vector store interfaces

0High
0Medium
0Low

sergemso

rag embeddings vector git pipeline

@vivantel/virage-core

Pipeline orchestrator, provider interfaces, and CLI for Git-aware RAG indexing.

Installation

npm install @vivantel/virage-core

Quick start

npx virage init    # generate virage.config.json interactively
npx virage         # run the pipeline

What it does

Four pipeline stages run in sequence:

GitTracker — finds files matching your chunker patterns and detects changes via commit hashes
ChunkProcessor — splits each file into Chunk[] using your configured strategy
EmbedderProcessor — embeds chunks incrementally (skips unchanged content); detects model changes and auto-invalidates stale embeddings
Uploader — syncs the vector store: deletes stale documents, upserts new ones

Provider interfaces

Implement these three interfaces to integrate any backend:

interface FileChunker {
  name: string;
  patterns: string[];
  chunk(filePath: string, commitHash: string): Promise<Chunk[]>;
}

interface EmbeddingProvider {
  name: string;
  dimensions: number;
  model?: string; // used for cache invalidation
  embed(text: string): Promise<number[]>;
  embedBatch?(texts: string[]): Promise<number[][]>;
}

interface VectorStore {
  name: string;
  initialize(): Promise<void>;
  upsert(docs: VectorDocument[]): Promise<void>;
  deleteBySourceFile(files: string[]): Promise<void>;
  getCurrentState(): Promise<Map<string, string>>;
  search(embedding: number[], topK: number): Promise<VectorSearchResult[]>;
}

createChunker helper

import { createChunker } from "@vivantel/virage-core";
import { markdownHeadersStrategy } from "@vivantel/virage-strategies";

// Strategy shorthand
createChunker({
  patterns: ["docs/**/*.md"],
  strategy: markdownHeadersStrategy(),
});

// Custom process function
createChunker({
  name: "custom",
  patterns: ["**/*.txt"],
  process: async (content, filePath, commitHash) => [
    { content, metadata: {}, sourceFile: filePath, commitHash },
  ],
});

Embeddings cache invalidation

embeddings.json stores metadata about the last embedding run. If the model or dimensions of your provider changes, the cache is automatically cleared and all chunks are re-embedded. Switching providers (e.g., from GitHub Models to OpenAI direct) but keeping the same model name does not invalidate the cache — the vectors are identical.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@vivantel/virage-core

Installation

Quick start

What it does

Provider interfaces

createChunker helper

Embeddings cache invalidation

License