@vivantel/virage-core
v0.2.23
Published
Core RAG pipeline tools - universal chunking, embedding, vector store interfaces
Maintainers
Readme
@vivantel/virage-core
Pipeline orchestrator, provider interfaces, and CLI for Git-aware RAG indexing.
Installation
npm install @vivantel/virage-coreQuick start
npx virage init # generate virage.config.json interactively
npx virage # run the pipelineWhat it does
Four pipeline stages run in sequence:
- GitTracker — finds files matching your chunker patterns and detects changes via commit hashes
- ChunkProcessor — splits each file into
Chunk[]using your configured strategy - EmbedderProcessor — embeds chunks incrementally (skips unchanged content); detects model changes and auto-invalidates stale embeddings
- Uploader — syncs the vector store: deletes stale documents, upserts new ones
Provider interfaces
Implement these three interfaces to integrate any backend:
interface FileChunker {
name: string;
patterns: string[];
chunk(filePath: string, commitHash: string): Promise<Chunk[]>;
}
interface EmbeddingProvider {
name: string;
dimensions: number;
model?: string; // used for cache invalidation
embed(text: string): Promise<number[]>;
embedBatch?(texts: string[]): Promise<number[][]>;
}
interface VectorStore {
name: string;
initialize(): Promise<void>;
upsert(docs: VectorDocument[]): Promise<void>;
deleteBySourceFile(files: string[]): Promise<void>;
getCurrentState(): Promise<Map<string, string>>;
search(embedding: number[], topK: number): Promise<VectorSearchResult[]>;
}createChunker helper
import { createChunker } from "@vivantel/virage-core";
import { markdownHeadersStrategy } from "@vivantel/virage-strategies";
// Strategy shorthand
createChunker({
patterns: ["docs/**/*.md"],
strategy: markdownHeadersStrategy(),
});
// Custom process function
createChunker({
name: "custom",
patterns: ["**/*.txt"],
process: async (content, filePath, commitHash) => [
{ content, metadata: {}, sourceFile: filePath, commitHash },
],
});Embeddings cache invalidation
embeddings.json stores metadata about the last embedding run. If the model or dimensions of your provider changes, the cache is automatically cleared and all chunks are re-embedded. Switching providers (e.g., from GitHub Models to OpenAI direct) but keeping the same model name does not invalidate the cache — the vectors are identical.
License
MIT
