@pixygon/knowledge-server
v0.1.2
Published
Storage + extraction + chunking + semantic search engine for any text knowledge. Used by @pixygon/chatbot-server for RAG; can also back a wiki/codex search layer.
Readme
@pixygon/knowledge-server
Storage + extraction + chunking + semantic search for any text knowledge.
+--------------------+ +-------------------------+
| Host Express app |----->| engine.router |
| (your auth here) | +-------------------------+
+--------------------+ |
v
KnowledgeDocument · KnowledgeChunk · extractors · embedder · searchUsed by:
@pixygon/chatbot-server— callsengine.search()from the RAG pipeline- (planned) Codex / wiki — calls
engine.upsertExternal()on entry save to keep an embedding index in sync with the rich domain model
What it does
- Documents. Operator pastes text, uploads PDF/DOCX/XLSX/CSV/TXT/MD, points at a URL (scraped once + embedded), or points at a URL marked live (re-fetched at query time).
- Extraction.
pdf-parse,mammoth,xlsx,@mozilla/readability+jsdomcover the common ingest paths. - Chunking. Paragraph-aware splitter producing ~2 KB chunks with 400-char overlap.
- Embedding. Whatever AI client the host passes — text in, vector out.
- Search. Cosine-similarity top-K over the embedding index, namespace-scoped.
- Namespaces. Multiple knowledge silos per tenant (
chatbot,codex,wiki,help-center, …) — search defaults to one namespace at a time, but cross-namespace queries are explicit. - External refs. A knowledge document can be linked back to a host-domain entity (a codex
LoreEntity, an LMSLesson, etc.).engine.upsertExternal()keeps the index in sync as the host model changes.
Install
npm install @pixygon/knowledge-serverPeer expectations:
express≥ 5mongoose≥ 8- Node ≥ 22
Usage
import mongoose from "mongoose";
import { createKnowledge } from "@pixygon/knowledge-server";
// Any object matching { embed(text), chat?({ messages, system }) } works.
// `@pixygon/chatbot-server`'s `createAiClient` returns this shape.
const ai = {
async embed(text: string) {
const e = await myEmbeddingClient.embed(text);
return { embedding: e.vector, tokens: e.tokens };
},
async chat({ messages, system }: any) {
const r = await myChatClient.chat({ messages, system });
return { content: r.text };
},
};
const knowledge = createKnowledge({
mongoose,
ai,
tenantField: "tenantId",
tenantRefName: "Tenant",
defaultNamespace: "default",
plugins: [
(schema, label) => schema.plugin(tenantScopedPlugin, { tenantField: "tenantId", label }),
(schema, label) => schema.plugin(auditLogPlugin, { entityType: label }),
],
});
// Mount the default HTTP router under whatever path the host owns.
app.use("/v1/tenants/:tenantId", verifyToken, tenantAccess, knowledge.router);
// Programmatic search — used by RAG pipelines, codex search, etc.
const hits = await knowledge.search({
tenantId, query: "fall protection rules", namespace: "chatbot", k: 5,
});
// Codex-style external sync. Idempotent — upsert by (namespace, externalRef).
await knowledge.upsertExternal({
tenantId,
namespace: "codex",
externalModelName: "LoreEntity",
externalId: loreEntity._id,
title: loreEntity.name,
content: loreEntity.description,
source: `codex/${loreEntity.slug}`,
tags: loreEntity.tags,
});HTTP surface
Default router (engine.router):
GET /knowledge ?namespace=&sourceType=
GET /knowledge/search ?q=&namespace=&k=
GET /knowledge/:id
POST /knowledge text { title, content, source?, namespace?, tags? }
POST /knowledge/upload multipart file, title?, namespace?, tags?
POST /knowledge/from-url json { url, title?, namespace?, extractInstruction?, isLive?, liveDescription?, tags? }
PUT /knowledge/:id { title?, content?, source?, tags?, namespace? }
DELETE /knowledge/:idCompanion package
@pixygon/knowledge-react ships the operator UI (list, tabbed upload dialog, RTK Query hooks) — see its README for the React wire-up.
License
MIT.
