@chatman-media/kb

v1.14.1

Published

15 hours ago

Tenant-scoped Knowledge Base: hybrid retrieval (pgvector + BM25), ingest, answer pipeline, persona/skill composition. LLM I/O живёт в @chatman-media/llm-router.

Downloads

2,126

0High
0Medium
0Low

chatman-media

rag retrieval-augmented-generation pgvector bm25 hybrid-search llm chatbot sales-bot telegram-bot ollama openai openrouter

@chatman-media/kb

Tenant-scoped Knowledge Base for the Lead Engine platform. Provides a full retrieval-augmented generation (RAG) pipeline: hybrid search (pgvector cosine + BM25 with RRF fusion), multi-query expansion, cross-encoder reranking, MMR diversification, dynamic distance threshold, semantic caching, persona/skill composition, photo classification and passport OCR.

LLM I/O lives in @chatman-media/llm-router.

Key modules

| Module | What it does | |---|---| | answer.ts | Full RAG answer pipeline: retrieve → filter → diversify → rerank → generate | | ingest.ts | Document ingest: parse → chunk → embed → upsert | | hybrid-search.ts | pgvector cosine + BM25 keyword fusion via RRF | | retrieval-utils.ts | rrfMerge, applyDynamicThreshold, mmrDiversify — post-retrieval transforms | | multi-query.ts | expandQueries — LLM-generated query variants for parallel search | | reranker.ts | JinaReranker, CohereReranker — cross-encoder second-pass reranking | | rewrite-query.ts | Context-aware query rewriting (resolves pronouns / ellipsis via history) | | semantic-cache.ts | Vector-similarity cache for identical/near-identical questions | | vision.ts | classifyPhoto() + extractPassportIdentity() — passport OCR via vision LLM | | ab-router.ts | A/B experiment allocation for styles/personas | | grade-skills.ts | ELO-based skill grading via judge LLM | | prompt.ts | composeSystemPrompt() — assemble sales persona + KB context + style |

RAG pipeline

Each call to answerWithRag / answerWithRagStream goes through these stages:

1. [opt] Query rewrite     — resolves pronouns, expands ellipsis (rewriteQueryBeforeRetrieval)
2. [opt] Multi-query       — generate N variants → embed all in parallel (multiQuery)
3. Vector / hybrid search  — pgvector cosine or RRF(vector+BM25)
4. [opt] RRF merge         — fuse N result lists if multi-query was used (rrfMerge)
5. [opt] Distance filter   — drop hits > threshold (autoTrimDistance)
6. [opt] MMR diversify     — reduce duplicate chunks (mmr)
7. [opt] Cross-encoder     — reranker.rerank(query, candidates, topK) (reranker)
8. Prompt composition      — style + persona + context + skills + hooks
9. LLM generation          — stream or single response
10.[opt] Fact-checker      — hallucination guard (reflect)

All stages are optional and controlled per-request via AnswerInput fields.

API

`answerWithRag(input: AnswerInput): Promise<AnswerResult>`

import { answerWithRag, JinaReranker } from "@chatman-media/kb";

const result = await answerWithRag({
  question: "сколько стоит квартира в ЖК Марина?",
  kb,           // IKbStore implementation
  embedder,     // EmbeddingClient
  chat,         // ChatClient

  // ── Retrieval tuning ──────────────────────────────
  topK: 5,                        // final chunks to pass to the LLM
  hybridSearch: true,             // vector + BM25 fusion
  rewriteQueryBeforeRetrieval: true, // resolve "там" / "он" via history

  // Multi-query expansion (generate 2 rephrases, search 3 in parallel)
  multiQuery: true,
  multiQueryCount: 2,

  // Drop chunks with cosine distance > 0.45 (reduce hallucinations)
  autoTrimDistance: true,
  autoTrimThreshold: 0.45,

  // Maximal Marginal Relevance — diversify results
  mmr: true,
  mmrLambda: 0.6,               // 1.0 = pure relevance, 0.0 = pure diversity

  // Cross-encoder reranker (retrieves topK×3 candidates, returns topK)
  reranker: new JinaReranker({ apiKey: process.env.JINA_API_KEY! }),

  // ── Generation ────────────────────────────────────
  history,        // ChatMessage[] — conversation context
  persona,        // Persona — bot identity
  style,          // Style — sales methodology (SPIN / NEPQ / AIDA)
  stage,          // FunnelStage — current funnel stage
  skills,         // SkillForPrompt[] — active persuasion skills
  reflect: true,  // hallucination guard (LLM judge)

  onTelemetry: (t) => console.log(t), // retrieval_ms, top_distances, path, ...
});

console.log(result.text);         // generated reply
console.log(result.hits);         // KbSearchHit[] — chunks used
console.log(result.usedChunkIds); // chunk IDs referenced in the reply

Rerankers

import { JinaReranker, CohereReranker } from "@chatman-media/kb";

// Jina — multilingual, good for Russian (jina-reranker-v2-base-multilingual)
const reranker = new JinaReranker({
  apiKey: process.env.JINA_API_KEY!,
  model: "jina-reranker-v2-base-multilingual", // default
});

// Cohere — also multilingual (rerank-v3.5)
const reranker = new CohereReranker({
  apiKey: process.env.COHERE_API_KEY!,
  model: "rerank-v3.5", // default
});

Post-retrieval utilities

import { rrfMerge, applyDynamicThreshold, mmrDiversify } from "@chatman-media/kb";

// Merge results from multiple queries via Reciprocal Rank Fusion
const merged = rrfMerge([hitsFromQuery1, hitsFromQuery2, hitsFromQuery3], { topN: 15 });

// Drop hits with cosine distance > 0.4 (keep at least 1)
const trimmed = applyDynamicThreshold(hits, { threshold: 0.4, minHits: 1 });

// Maximal Marginal Relevance — diversify, reduce duplicates
const diverse = mmrDiversify(hits, { lambda: 0.6, topK: 5 });

Golden RAG evals

Use evaluateRagGoldenCases when you need an end-to-end regression harness around retrieval, answer grounding, persona/stage expectations and ablations. It runs the normal answerWithRag pipeline and scores each JSONL case against expected sources, required facts, forbidden claims and telemetry path. By default the scorer is deterministic (substring/source matching), so it can run in CI without live LLM calls. For semantic groundedness checks, pass judge: makeRagGoldenLlmJudge({ chat: judgeChat }).

import {
  defaultRagGoldenAblations,
  evaluateRagGoldenCases,
  formatRagGoldenFailures,
  parseRagGoldenJsonl,
} from "@chatman-media/kb";

const cases = parseRagGoldenJsonl(await Bun.file("packages/kb/evals/generic-rag.jsonl").text());

const report = await evaluateRagGoldenCases({
  cases,
  makeInput: (item) => ({
    question: item.question,
    kb,
    embedder,
    chat,
    hybridSearch: true,
    multiQuery: true,
    mmr: true,
  }),
  ablations: defaultRagGoldenAblations(),
});

const failures = formatRagGoldenFailures(report);
if (failures) throw new Error(failures);

For a deterministic local/CI smoke run without external LLM or DB calls:

bun run apps/api/scripts/eval-rag.ts --ablate
bun run apps/api/scripts/eval-rag.ts --vertical=exchange --baseline=rag-baseline.json
bun run apps/api/scripts/eval-rag.ts --update-baseline=rag-baseline.json --json

Vision

import { classifyPhoto, extractPassportIdentity } from "@chatman-media/kb";

const cls = await classifyPhoto({
  bytes: await res.arrayBuffer(),
  model: "gpt-4o",
  apiKey: "sk-...",
  provider: "openai",
});
// cls → "passport" | "full_body" | "portrait" | "other"

if (cls === "passport") {
  const identity = await extractPassportIdentity({ bytes, model, apiKey, provider });
  // identity → { family_name?, given_name?, passport_number?, passport_expiry? }
}

In apps/api, photo classification is wired automatically via photo-processor.ts: when a tenant has a vision LLM config, every incoming photo is classified and passport data is merged into contact.attributes_json.

Install

bun add @chatman-media/kb     # Bun
npm install @chatman-media/kb # npm / pnpm / yarn

Part of the lead-engine monorepo — a multi-tenant SaaS platform for AI sales bots on Telegram / WhatsApp.

License

MIT — Alexander Kireev / chatman-media

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@chatman-media/kb

Key modules

RAG pipeline

API

answerWithRag(input: AnswerInput): Promise<AnswerResult>

Rerankers

Post-retrieval utilities

Golden RAG evals

Vision

Install

License

`answerWithRag(input: AnswerInput): Promise<AnswerResult>`