@ejiogbevoices/sovereign-rag

v0.1.1

Published

3 months ago

Multi-vector RAG pipeline engine for Ejiogbe Voices. Declarative YAML pipeline orchestration with multi-vector fusion search. TypeScript-native, Supabase-backed.

0High
0Medium
0Low

monroerodriguez

rag vector-search multi-vector pipeline supabase pgvector cultural-heritage ancestral-intelligence ejiogbe

Sovereign RAG

Multi-vector RAG pipeline engine for cultural heritage audio.

Declarative pipeline orchestration with multi-vector fusion search, built in TypeScript for React, React Native, and Swift environments.

Built for Ejiogbe Voices — the Sovereign AI (Ancestral Intelligence) Platform.

What It Does

Two problems solved in one library:

Pipeline orchestration: Define RAG pipelines as declarative step sequences with loops, branches, and streaming. Tools are plain async functions running in-process.

Multi-vector fusion search: Search across multiple embedding spaces simultaneously (text + audio, text + image, any combination) and fuse results via Reciprocal Rank Fusion or weighted scoring. Backed by Supabase pgvector.

Install

npm install @ejiogbevoices/sovereign-rag @supabase/supabase-js

Quick Start

import { createSovereignRAG, RrfReranker } from '@ejiogbevoices/sovereign-rag';
import { createClient } from '@supabase/supabase-js';

const rag = createSovereignRAG({
  supabase: createClient(SUPABASE_URL, SUPABASE_KEY),
  schemas: [{
    name: 'audio_segments',
    vectors: [
      { name: 'text_embedding', dimension: 3072 },
      { name: 'audio_embedding', dimension: 512 },
    ],
    fields: [
      { name: 'transcript', dataType: 'STRING' },
      { name: 'language', dataType: 'STRING' },
      { name: 'tradition', dataType: 'STRING' },
    ],
  }],
  generate: async (prompt) => {
    const res = await fetch('https://api.anthropic.com/v1/messages', { ... });
    return res.json().content[0].text;
  },
  textEmbed: async (text) => {
    const res = await fetch('https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent', { ... });
    return res.json().embedding.values;
  },
  reranker: new RrfReranker({ topn: 10, rankConstant: 60 }),
});

// Run a pipeline
const ctx = await rag.engine.run({
  pipeline: [
    'embed.text',
    'retriever.search',
    'generator.generate',
  ],
}, { text: 'Yoruba chanting patterns similar to Gregorian chant' });

console.log(ctx.vars.answer);

Architecture

┌─────────────────────────────────────────────────────┐
│                   PipelineEngine                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────┐  │
│  │  Steps   │→ │  Loops   │→ │    Branches      │  │
│  └──────────┘  └──────────┘  └──────────────────┘  │
│                      │                               │
│              ┌───────▼───────┐                       │
│              │ ToolRegistry  │                       │
│              └───────┬───────┘                       │
│  ┌───────────────────┼───────────────────┐          │
│  │           │       │       │           │          │
│  ▼           ▼       ▼       ▼           ▼          │
│ embed    retriever  gen   prompt   utils/custom      │
└──┬───────────┬───────────────────────────────────────┘
   │           │
   ▼           ▼
┌──────┐  ┌──────────────────────────┐
│Gemini│  │       Collection          │
│CLAP  │  │  ┌─────────┐ ┌────────┐ │
│GLAP  │  │  │ text_emb │ │audio_emb│ │
│      │  │  └────┬─────┘ └───┬────┘ │
└──────┘  │       │  Reranker  │      │
          │       └─────┬──────┘      │
          │             ▼             │
          │     ┌──────────────┐      │
          │     │  RRF/Weighted│      │
          │     └──────┬───────┘      │
          └────────────┼──────────────┘
                       ▼
              ┌─────────────────┐
              │ Supabase pgvec  │
              │ (or MemoryStore)│
              └─────────────────┘

Multi-Vector Search

Search multiple embedding spaces in parallel and fuse the results.

import { Collection, MemoryVectorStore, RrfReranker } from '@ejiogbevoices/sovereign-rag';

const store = new MemoryVectorStore();
const collection = new Collection({
  store,
  schema: {
    name: 'audio_segments',
    vectors: [
      { name: 'text_embedding', dimension: 3072 },
      { name: 'audio_embedding', dimension: 512 },
    ],
  },
});

// Insert a document with both text and audio embeddings
await collection.insert({
  id: 'seg_042',
  fields: { transcript: 'Sacred drumming pattern', language: 'yo' },
  vectors: {
    text_embedding: textVec,   // from Gemini text-embedding-004
    audio_embedding: audioVec, // from CLAP or GLAP
  },
});

// Fusion search: text meaning + acoustic similarity
const results = await collection.query({
  vectors: [
    { fieldName: 'text_embedding', vector: queryTextVec },
    { fieldName: 'audio_embedding', vector: queryAudioVec },
  ],
  topk: 10,
  reranker: new RrfReranker({ topn: 10, rankConstant: 60 }),
});

Rerankers

RrfReranker (recommended default): Fuses by rank position across lists. Works well when mixing embeddings of different dimensions and scales (text 3072-dim + audio 512-dim). No score normalization needed.

WeightedReranker: Normalizes scores per field, multiplies by weights, sums. Use when you want explicit control: "70% text relevance, 30% acoustic similarity."

CustomReranker: Pass your own fusion function for domain-specific strategies (e.g., boost results matching the user's language preference).

Pipeline Engine

Declarative pipeline definitions with loops and branches.

Simple Pipeline

const ctx = await engine.run({
  pipeline: [
    'embed.text',          // embed the query
    'retriever.search',    // search the collection
    'prompt.build',        // build the prompt from retrieved passages
    'generator.generate',  // generate the answer
  ],
}, { text: 'user query here' });

Loop (Iterative Refinement)

const ctx = await engine.run({
  pipeline: [
    'embed.text',
    'retriever.search',
    {
      loop: {
        times: 3,
        steps: [
          'prompt.generate_subqueries',
          'generator.generate',
          'retriever.search',
          'utils.merge_passages',
        ],
      },
    },
    'prompt.final_answer',
    'generator.generate',
  ],
});

Branch (Conditional Routing)

const ctx = await engine.run({
  pipeline: [
    'embed.text',
    'retriever.search',
    {
      branch: {
        router: ['router.check_quality'],
        branches: {
          sufficient: ['generator.generate'],
          insufficient: [
            'prompt.generate_subqueries',
            'retriever.search',
            'generator.generate',
          ],
        },
      },
    },
  ],
});

Custom Tools

const registry = new ToolRegistry();

registry.tool('custom', 'filter_by_tradition', {
  handler: async (input, ctx) => {
    const passages = ctx.vars.passages as Doc[];
    const tradition = ctx.vars.tradition as string;
    const filtered = passages.filter(p => p.fields?.tradition === tradition);
    return { passages: filtered };
  },
});

engine.run({
  pipeline: ['retriever.search', 'custom.filter_by_tradition', 'generator.generate'],
}, { tradition: 'Yoruba' });

Stream Events

const engine = new PipelineEngine({
  registry,
  onStream: (event) => {
    switch (event.type) {
      case 'step_start': console.log(`Starting: ${event.step}`); break;
      case 'token':      process.stdout.write(event.content); break;
      case 'loop_iter':  console.log(`Iteration ${event.iteration}`); break;
      case 'branch':     console.log(`Took branch: ${event.branch}`); break;
    }
  },
});

Supabase Setup

For each vector field you want to search, create an RPC function in Supabase:

create table audio_segments (
  id text primary key,
  transcript text,
  language text,
  tradition text,
  text_embedding vector(3072),
  audio_embedding vector(512)
);

create index on audio_segments
  using hnsw (text_embedding vector_cosine_ops)
  with (m = 16, ef_construction = 64);

create index on audio_segments
  using hnsw (audio_embedding vector_cosine_ops)
  with (m = 16, ef_construction = 64);

create or replace function match_audio_segments_text_embedding(
  query_embedding vector(3072),
  match_count int default 10,
  filter_expr text default null
)
returns table (id text, similarity float, transcript text, language text, tradition text)
language plpgsql as $$
begin
  return query
    select
      a.id,
      1 - (a.text_embedding <=> query_embedding) as similarity,
      a.transcript, a.language, a.tradition
    from audio_segments a
    order by a.text_embedding <=> query_embedding
    limit match_count;
end;
$$;

create or replace function match_audio_segments_audio_embedding(
  query_embedding vector(512),
  match_count int default 10,
  filter_expr text default null
)
returns table (id text, similarity float, transcript text, language text, tradition text)
language plpgsql as $$
begin
  return query
    select
      a.id,
      1 - (a.audio_embedding <=> query_embedding) as similarity,
      a.transcript, a.language, a.tradition
    from audio_segments a
    order by a.audio_embedding <=> query_embedding
    limit match_count;
end;
$$;

The naming convention for RPC functions is match_{table}_{column}. Override per collection:

const store = new SupabaseVectorStore({
  client: supabase,
  collections: {
    audio_segments: {
      rpcMap: {
        text_embedding: 'search_text',
        audio_embedding: 'search_audio',
      },
    },
  },
});

Ejiogbe Voices Integration

Cross-tradition sonic discovery pipeline:

const rag = createSovereignRAG({
  supabase,
  schemas: [{
    name: 'audio_segments',
    vectors: [
      { name: 'text_embedding', dimension: 3072 },
      { name: 'audio_embedding', dimension: 512 },
    ],
  }],
  textEmbed: geminiEmbed,
  audioEmbed: clapEmbed,
  generate: claudeGenerate,
  reranker: new RrfReranker({ topn: 10 }),
});

// "Find recordings that sound like this Yoruba chant but from other traditions"
const ctx = await rag.engine.run({
  pipeline: [
    'embed.audio',
    'embed.text',
    'retriever.multi_search',
    'prompt.build',
    'generator.generate',
  ],
}, {
  audio_data: referenceClipBuffer,
  text: 'rhythmic call-and-response chanting patterns',
  query_embeddings: {
    text_embedding: textVec,
    audio_embedding: audioVec,
  },
});

Design Decisions

In-process tools. Tools are plain async functions grouped by namespace. No subprocess spawning, no protocol overhead. Works on React Native and client-side environments.

Supabase pgvector as the vector backend. ANN search runs in PostgreSQL via HNSW indexes. The multi-vector fusion and reranking logic runs in TypeScript on the client.

Type-safe pipeline definitions. Pipeline steps, tools, and I/O mappings are fully typed. Your IDE catches wiring errors before runtime.

Portable across platforms. Works in Node.js, React Native, Deno, and (via API) Swift. No Python, no CUDA, no Docker required at the application layer.

License

Apache 2.0