@ejiogbevoices/sovereign-rag
v0.1.1
Published
Multi-vector RAG pipeline engine for Ejiogbe Voices. Declarative YAML pipeline orchestration with multi-vector fusion search. TypeScript-native, Supabase-backed.
Maintainers
Readme
Sovereign RAG
Multi-vector RAG pipeline engine for cultural heritage audio.
Declarative pipeline orchestration with multi-vector fusion search, built in TypeScript for React, React Native, and Swift environments.
Built for Ejiogbe Voices — the Sovereign AI (Ancestral Intelligence) Platform.
What It Does
Two problems solved in one library:
Pipeline orchestration: Define RAG pipelines as declarative step sequences with loops, branches, and streaming. Tools are plain async functions running in-process.
Multi-vector fusion search: Search across multiple embedding spaces simultaneously (text + audio, text + image, any combination) and fuse results via Reciprocal Rank Fusion or weighted scoring. Backed by Supabase pgvector.
Install
npm install @ejiogbevoices/sovereign-rag @supabase/supabase-jsQuick Start
import { createSovereignRAG, RrfReranker } from '@ejiogbevoices/sovereign-rag';
import { createClient } from '@supabase/supabase-js';
const rag = createSovereignRAG({
supabase: createClient(SUPABASE_URL, SUPABASE_KEY),
schemas: [{
name: 'audio_segments',
vectors: [
{ name: 'text_embedding', dimension: 3072 },
{ name: 'audio_embedding', dimension: 512 },
],
fields: [
{ name: 'transcript', dataType: 'STRING' },
{ name: 'language', dataType: 'STRING' },
{ name: 'tradition', dataType: 'STRING' },
],
}],
generate: async (prompt) => {
const res = await fetch('https://api.anthropic.com/v1/messages', { ... });
return res.json().content[0].text;
},
textEmbed: async (text) => {
const res = await fetch('https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent', { ... });
return res.json().embedding.values;
},
reranker: new RrfReranker({ topn: 10, rankConstant: 60 }),
});
// Run a pipeline
const ctx = await rag.engine.run({
pipeline: [
'embed.text',
'retriever.search',
'generator.generate',
],
}, { text: 'Yoruba chanting patterns similar to Gregorian chant' });
console.log(ctx.vars.answer);Architecture
┌─────────────────────────────────────────────────────┐
│ PipelineEngine │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Steps │→ │ Loops │→ │ Branches │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ ToolRegistry │ │
│ └───────┬───────┘ │
│ ┌───────────────────┼───────────────────┐ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ embed retriever gen prompt utils/custom │
└──┬───────────┬───────────────────────────────────────┘
│ │
▼ ▼
┌──────┐ ┌──────────────────────────┐
│Gemini│ │ Collection │
│CLAP │ │ ┌─────────┐ ┌────────┐ │
│GLAP │ │ │ text_emb │ │audio_emb│ │
│ │ │ └────┬─────┘ └───┬────┘ │
└──────┘ │ │ Reranker │ │
│ └─────┬──────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ RRF/Weighted│ │
│ └──────┬───────┘ │
└────────────┼──────────────┘
▼
┌─────────────────┐
│ Supabase pgvec │
│ (or MemoryStore)│
└─────────────────┘Multi-Vector Search
Search multiple embedding spaces in parallel and fuse the results.
import { Collection, MemoryVectorStore, RrfReranker } from '@ejiogbevoices/sovereign-rag';
const store = new MemoryVectorStore();
const collection = new Collection({
store,
schema: {
name: 'audio_segments',
vectors: [
{ name: 'text_embedding', dimension: 3072 },
{ name: 'audio_embedding', dimension: 512 },
],
},
});
// Insert a document with both text and audio embeddings
await collection.insert({
id: 'seg_042',
fields: { transcript: 'Sacred drumming pattern', language: 'yo' },
vectors: {
text_embedding: textVec, // from Gemini text-embedding-004
audio_embedding: audioVec, // from CLAP or GLAP
},
});
// Fusion search: text meaning + acoustic similarity
const results = await collection.query({
vectors: [
{ fieldName: 'text_embedding', vector: queryTextVec },
{ fieldName: 'audio_embedding', vector: queryAudioVec },
],
topk: 10,
reranker: new RrfReranker({ topn: 10, rankConstant: 60 }),
});Rerankers
RrfReranker (recommended default): Fuses by rank position across lists. Works well when mixing embeddings of different dimensions and scales (text 3072-dim + audio 512-dim). No score normalization needed.
WeightedReranker: Normalizes scores per field, multiplies by weights, sums. Use when you want explicit control: "70% text relevance, 30% acoustic similarity."
CustomReranker: Pass your own fusion function for domain-specific strategies (e.g., boost results matching the user's language preference).
Pipeline Engine
Declarative pipeline definitions with loops and branches.
Simple Pipeline
const ctx = await engine.run({
pipeline: [
'embed.text', // embed the query
'retriever.search', // search the collection
'prompt.build', // build the prompt from retrieved passages
'generator.generate', // generate the answer
],
}, { text: 'user query here' });Loop (Iterative Refinement)
const ctx = await engine.run({
pipeline: [
'embed.text',
'retriever.search',
{
loop: {
times: 3,
steps: [
'prompt.generate_subqueries',
'generator.generate',
'retriever.search',
'utils.merge_passages',
],
},
},
'prompt.final_answer',
'generator.generate',
],
});Branch (Conditional Routing)
const ctx = await engine.run({
pipeline: [
'embed.text',
'retriever.search',
{
branch: {
router: ['router.check_quality'],
branches: {
sufficient: ['generator.generate'],
insufficient: [
'prompt.generate_subqueries',
'retriever.search',
'generator.generate',
],
},
},
},
],
});Custom Tools
const registry = new ToolRegistry();
registry.tool('custom', 'filter_by_tradition', {
handler: async (input, ctx) => {
const passages = ctx.vars.passages as Doc[];
const tradition = ctx.vars.tradition as string;
const filtered = passages.filter(p => p.fields?.tradition === tradition);
return { passages: filtered };
},
});
engine.run({
pipeline: ['retriever.search', 'custom.filter_by_tradition', 'generator.generate'],
}, { tradition: 'Yoruba' });Stream Events
const engine = new PipelineEngine({
registry,
onStream: (event) => {
switch (event.type) {
case 'step_start': console.log(`Starting: ${event.step}`); break;
case 'token': process.stdout.write(event.content); break;
case 'loop_iter': console.log(`Iteration ${event.iteration}`); break;
case 'branch': console.log(`Took branch: ${event.branch}`); break;
}
},
});Supabase Setup
For each vector field you want to search, create an RPC function in Supabase:
create table audio_segments (
id text primary key,
transcript text,
language text,
tradition text,
text_embedding vector(3072),
audio_embedding vector(512)
);
create index on audio_segments
using hnsw (text_embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);
create index on audio_segments
using hnsw (audio_embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);
create or replace function match_audio_segments_text_embedding(
query_embedding vector(3072),
match_count int default 10,
filter_expr text default null
)
returns table (id text, similarity float, transcript text, language text, tradition text)
language plpgsql as $$
begin
return query
select
a.id,
1 - (a.text_embedding <=> query_embedding) as similarity,
a.transcript, a.language, a.tradition
from audio_segments a
order by a.text_embedding <=> query_embedding
limit match_count;
end;
$$;
create or replace function match_audio_segments_audio_embedding(
query_embedding vector(512),
match_count int default 10,
filter_expr text default null
)
returns table (id text, similarity float, transcript text, language text, tradition text)
language plpgsql as $$
begin
return query
select
a.id,
1 - (a.audio_embedding <=> query_embedding) as similarity,
a.transcript, a.language, a.tradition
from audio_segments a
order by a.audio_embedding <=> query_embedding
limit match_count;
end;
$$;The naming convention for RPC functions is match_{table}_{column}. Override per collection:
const store = new SupabaseVectorStore({
client: supabase,
collections: {
audio_segments: {
rpcMap: {
text_embedding: 'search_text',
audio_embedding: 'search_audio',
},
},
},
});Ejiogbe Voices Integration
Cross-tradition sonic discovery pipeline:
const rag = createSovereignRAG({
supabase,
schemas: [{
name: 'audio_segments',
vectors: [
{ name: 'text_embedding', dimension: 3072 },
{ name: 'audio_embedding', dimension: 512 },
],
}],
textEmbed: geminiEmbed,
audioEmbed: clapEmbed,
generate: claudeGenerate,
reranker: new RrfReranker({ topn: 10 }),
});
// "Find recordings that sound like this Yoruba chant but from other traditions"
const ctx = await rag.engine.run({
pipeline: [
'embed.audio',
'embed.text',
'retriever.multi_search',
'prompt.build',
'generator.generate',
],
}, {
audio_data: referenceClipBuffer,
text: 'rhythmic call-and-response chanting patterns',
query_embeddings: {
text_embedding: textVec,
audio_embedding: audioVec,
},
});Design Decisions
In-process tools. Tools are plain async functions grouped by namespace. No subprocess spawning, no protocol overhead. Works on React Native and client-side environments.
Supabase pgvector as the vector backend. ANN search runs in PostgreSQL via HNSW indexes. The multi-vector fusion and reranking logic runs in TypeScript on the client.
Type-safe pipeline definitions. Pipeline steps, tools, and I/O mappings are fully typed. Your IDE catches wiring errors before runtime.
Portable across platforms. Works in Node.js, React Native, Deno, and (via API) Swift. No Python, no CUDA, no Docker required at the application layer.
License
Apache 2.0
