@contractspec/lib.knowledge
v3.8.0
Published
RAG and knowledge base primitives
Downloads
10,841
Maintainers
Readme
@contractspec/lib.knowledge
@contractspec/lib.knowledge provides the retrieval, ingestion, query, and access-control primitives used to turn documents and external content into searchable knowledge for agents and workflows.
Website: https://contractspec.io
Installation
bun add @contractspec/lib.knowledge
or
npm install @contractspec/lib.knowledge
What belongs here
This package currently owns four related concerns:
- Retrieval contracts and implementations:
KnowledgeRetriever,StaticRetriever, andVectorRetriever. - Ingestion and indexing pipeline pieces:
DocumentProcessor,EmbeddingService,VectorIndexer, and ingestion adapters. - Retrieval-augmented query flow:
KnowledgeQueryServiceandKnowledgeQueryOptions. - Runtime convenience for OSS consumers:
KnowledgeRuntimeandcreateKnowledgeRuntime. - Access guardrails and localization:
KnowledgeAccessGuardand thei18nsurface.
Use this package when you need the knowledge-layer primitives inside ContractSpec. It is not the source of truth for knowledge-space specs, tenant bindings, provider SDKs, or background job orchestration.
Core workflows
Quickstart: one runtime for ingest + retrieve + answer
import { createKnowledgeRuntime } from "@contractspec/lib.knowledge";
const knowledge = createKnowledgeRuntime({
collection: "knowledge-support-faq",
namespace: "tenant-acme",
spaceKey: "support-faq",
embeddings: embeddingProvider,
vectorStore: vectorStoreProvider,
llm: llmProvider,
});
await knowledge.ingestDocument({
id: "faq-rotate-key",
mimeType: "text/plain",
data: new TextEncoder().encode("Rotate API keys from Settings > API."),
metadata: { locale: "en", category: "canonical" },
});
const snippets = await knowledge.retrieve("How do I rotate a key?", {
spaceKey: "support-faq",
tenantId: "tenant-acme",
locale: "en",
category: "canonical",
});
const answer = await knowledge.query("How do I rotate a key?", {
namespace: "tenant-acme",
filter: { locale: "en" },
});Ingest documents into a vector index
import {
DocumentProcessor,
EmbeddingService,
StorageIngestionAdapter,
VectorIndexer,
} from "@contractspec/lib.knowledge";
const processor = new DocumentProcessor();
const embeddings = new EmbeddingService(embeddingProvider);
const indexer = new VectorIndexer(vectorStoreProvider, {
collection: "knowledge-docs",
namespace: "tenant-acme",
});
const adapter = new StorageIngestionAdapter(processor, embeddings, indexer);
await adapter.ingestObject(objectFromStorageProvider);Run retrieval or RAG queries
import {
KnowledgeQueryService,
createVectorRetriever,
} from "@contractspec/lib.knowledge";
const retriever = createVectorRetriever({
embeddings: embeddingProvider,
vectorStore: vectorStoreProvider,
spaceCollections: {
"support-faq": "knowledge-support-faq",
},
});
const snippets = await retriever.retrieve("How do I rotate a key?", {
spaceKey: "support-faq",
topK: 3,
locale: "en",
category: "canonical",
});
const queryService = new KnowledgeQueryService(
embeddingProvider,
vectorStoreProvider,
llmProvider,
{
collection: "knowledge-support-faq",
namespace: "tenant-acme",
topK: 5,
}
);
const answer = await queryService.query("How do I rotate a key?", {
namespace: "tenant-acme",
topK: 3,
filter: { locale: "en" },
});Use static knowledge for tests or lightweight examples
import { createStaticRetriever } from "@contractspec/lib.knowledge";
const retriever = createStaticRetriever({
"product-canon": [
"Rotate API keys from Settings > API.",
"Only cite reviewed canon content in answers.",
].join("\n"),
});
const snippets = await retriever.retrieve("rotate", {
spaceKey: "product-canon",
topK: 1,
});Localize prompts and access messages
import { createKnowledgeI18n } from "@contractspec/lib.knowledge/i18n";
const i18n = createKnowledgeI18n("fr");
const noResults = i18n.t("query.noResults");Typical flow:
- Extract fragments from raw content.
- Embed fragments and upsert them into a vector collection.
- Retrieve snippets through a retriever or generate an answer through
KnowledgeQueryService. - Gate reads and writes with
KnowledgeAccessGuardwhen operating against resolved knowledge bindings.
API map
Retrieval
KnowledgeRetriever: shared retrieval interface for semantic and static retrieval.RetrieverConfig: shared defaults for retriever implementations.StaticRetrieverandcreateStaticRetriever: in-memory, line-oriented retrieval for simple spaces and tests.VectorRetrieverandcreateVectorRetriever: embedding + vector-store backed retrieval.RetrievalOptions: query filters such asspaceKey,topK,minScore,tenantId,locale,category, and provider filter metadata.RetrievalResult: returned content, source, score, and optional metadata.
Query
KnowledgeQueryService: embed -> search -> prompt -> LLM chat flow for knowledge-backed answers.KnowledgeQueryConfig: collection, namespace, prompt, filter,topK, and locale defaults.KnowledgeQueryOptions: per-query overrides for namespace,topK, filter, locale, and prompt.KnowledgeAnswer: answer text, references, and optional token usage.
Ingestion
RawDocument,DocumentFragment, andDocumentProcessor: extract raw document content into indexable fragments.EmbeddingService: batch fragment embeddings through anEmbeddingProvider.VectorIndexerandVectorIndexConfig: map embeddings toVectorStoreProvider.upsert()requests and persist canonicalpayload.textcontent for later retrieval/query use.GmailIngestionAdapter: convert email threads into plaintext documents and index them.StorageIngestionAdapter: fetch object content and run the same process/index pipeline.KnowledgeRuntimeandcreateKnowledgeRuntime: convenience composition for the common ingest -> retrieve -> answer path.
Guardrails and i18n
KnowledgeAccessGuardandKnowledgeAccessGuardOptions: policy-aware checks for read/write/search operations.KnowledgeAccessContextandKnowledgeAccessResult: runtime input/output contracts for access checks.createKnowledgeI18nandgetDefaultI18n: localization entrypoints for prompts, guard messages, and ingestion formatting../i18n,./i18n/catalogs/*,./i18n/keys,./i18n/locale, and./i18n/messages: public i18n subpaths.
Public entrypoints
The root barrel at src/index.ts re-exports public symbols from:
./access./ingestion./knowledge.feature./query./runtime./retriever./types./vector-payload
Published subpaths from package.json are grouped around:
- access:
./access,./access/guard - retrieval:
./retriever,./retriever/interface,./retriever/static-retriever,./retriever/vector-retriever - query:
./query,./query/service - ingestion:
./ingestion,./ingestion/document-processor,./ingestion/embedding-service,./ingestion/gmail-adapter,./ingestion/storage-adapter,./ingestion/vector-indexer - feature/runtime helpers:
./knowledge.feature, rootKnowledgeRuntime/createKnowledgeRuntime - i18n:
./i18n,./i18n/catalogs,./i18n/catalogs/en,./i18n/catalogs/es,./i18n/catalogs/fr,./i18n/keys,./i18n/locale,./i18n/messages - shared types/helpers:
./types,./vector-payload
Use package.json as the exhaustive source of truth for subpaths; the README calls out the clusters that matter most to consumers.
Operational semantics and gotchas
DocumentProcessoronly ships built-in extractors fortext/plainandapplication/json.- If no extractor matches,
DocumentProcessor.process()throws unless you registered a fallback such as*/*. - If an extractor returns no fragments,
DocumentProcessor.process()returns one empty fragment for the document. EmbeddingServicebatches fragments; the default batch size is16.StaticRetriever.retrieve()does simple line-level substring matching. It is not semantic retrieval.StaticRetrieverhonorstopKandminScore, but its scores are always1.0for matching lines.VectorRetriever.retrieve()returns[]when the requestedspaceKeyis not mapped to a collection.VectorRetrieverusestenantIdas the vector-store namespace when provided and automatically merges typedlocale/categoryfilters into the provider filter payload.VectorIndexer.upsert()stores fragment text underpayload.text, together with merged fragment/config metadata anddocumentId. Fragment metadata wins over config metadata when keys collide.KnowledgeQueryServicealways runs embed -> vector search -> prompt build ->LLMProvider.chat(), using indexedpayload.textas the primary context source and falling back to legacypayload.contentwhen needed. Per-query overrides let you change namespace,topK, filter, locale, and prompt without rebuilding the service.KnowledgeAccessGuarddefaults to blocking writes toexternalandephemeralknowledge categories.KnowledgeAccessGuardalso blocks writes whenResolvedKnowledge.space.access.automationWritableisfalse.KnowledgeAccessGuarddefaults to requiring workflow binding and not requiring agent binding. If a scoped workflow or agent allow-list exists, missing names are denied rather than silently bypassed.GmailIngestionAdapterconverts threads into plaintext and strips HTML when text bodies are missing.- This package localizes access messages, query prompts, and Gmail formatting through the
i18nsurface.
When not to use this package
- Do not use it as a vector store implementation.
- Do not use it as an embedding-model implementation.
- Do not use it as a full sync scheduler or background job system.
- Do not use it as the source of truth for
KnowledgeSpaceSpec, knowledge sources, or tenant bindings.
Related packages
@contractspec/lib.contracts-integrations: provider interfaces used by vector, embedding, LLM, email, and storage integrations.@contractspec/lib.contracts-spec: source of knowledge-space specs, resolved bindings, and shared translation helpers.@contractspec/lib.ai-agent: major consumer of retrieval and query contracts.@contractspec/example.knowledge-canon: runnable package-level example of binding knowledge spaces and answering throughlib.knowledge.- Context-storage modules and bundles in the repo reuse the ingestion pipeline primitives for document indexing flows.
Local commands
bun run lint:checkbun run typecheckbun run test
