@kognitivedev/documents
v0.2.29
Published
Cloud document SDK for Kognitive parsing, extraction, and pipelines APIs
Maintainers
Readme
@kognitivedev/documents
TypeScript SDK for the Kognitive Documents API.
Use it to upload files, create parse jobs, run schema-based extraction, and work with managed document pipelines.
Managed pipelines are the canonical knowledge-base resource in Kognitive. The dashboard "Knowledge bases" view, cloud pipeline APIs, and @kognitivedev/cloud-knowledge-base all use the same pipeline IDs and indexed artifacts.
Pipeline search is Qdrant hybrid by default: parsed artifacts are indexed with dense embeddings plus Qdrant BM25 sparse vectors, so semantic questions and exact keyword/ID lookups both work.
Installation
bun add @kognitivedev/documentsCommon Flow
Most applications start with:
- Upload a file
- Optionally parse it
- Optionally extract structured fields directly from the uploaded file
import { KognitiveDocumentsClient } from "@kognitivedev/documents";
const documents = new KognitiveDocumentsClient({
baseUrl: "http://localhost:3001",
apiKey: process.env.KOGNITIVE_API_KEY,
logLevel: "debug",
});
const file = await documents.files.upload({
filename: "invoice.pdf",
mimeType: "application/pdf",
data: await Bun.file("./invoice.pdf").arrayBuffer(),
});
const extractJob = await documents.extract.createJob({
fileId: file.id,
config: {
target: "per_doc",
tier: "cost_effective",
preset: "invoice",
schema: {
type: "object",
properties: {
invoiceNumber: { type: "string" },
total: { type: "string" },
status: { type: "string" },
},
},
citeSources: true,
confidenceScores: true,
},
});
const extraction = await documents.extract.waitForCompletion(extractJob.id);
console.log(extraction.payload);If you also need the parsed text, read the completed extraction job and fetch its derived parse result:
const completedExtractJob = await documents.extract.getJob(extractJob.id);
const parsed = await documents.parsing.getResult(completedExtractJob.parsingJobId!);Easiest Parse Options
Start with:
tierpresettargetPages
await documents.parsing.createJob({
fileId,
tier: "agentic",
preset: "scientific",
targetPages: [1, 2],
});Presets
invoice: invoices, receipts, billsscientific: papers and research PDFstechnicalDocumentation: technical docs and manualsforms: forms and checklists
If you are unsure, start with:
{ tier: "cost_effective" }Common Methods
Files
files.upload()files.list()files.get()files.download()files.delete()
Parsing
parsing.createJob()parsing.waitForCompletion()parsing.getJob()parsing.getResult()parsing.getPage()parsing.getArtifacts()
Extraction
extract.createConfig()extract.updateConfig()extract.createJob()extract.waitForCompletion()
Pipelines
pipelines.create()pipelines.addFile()pipelines.sync()returns a queued pipeline run (202 Accepted)pipelines.listRuns()pipelines.getRun()pipelines.search()pipelines.searchImages()
Pipeline defaults are intentionally zero-config:
- embeddings:
text-embedding-3-small - vector store: Qdrant
- retrieval: Qdrant hybrid BM25 sparse-vector + dense semantic
- retrieval modes:
chunks,files_via_metadata,files_via_content,auto_routed
Managed pipeline indexing requires REDIS_URL, QDRANT_URL, and OPENROUTER_API_KEY on the backend. Sync is worker-backed and asynchronous; poll pipelines.getRun() or pipelines.getStatus() until the run completes before search. QDRANT_API_KEY is optional for local deployments and required when your Qdrant instance uses API-key auth. PgVectorStore remains available in @kognitivedev/rag for local/custom RAG, but it is not the managed knowledge-base default.
For production retrieval integrations, prefer:
@kognitivedev/documentswhen you want the raw pipeline lifecycle and search APIs@kognitivedev/cloud-knowledge-basewhen you want normalized citations, agent context adapters, and workflow KB steps
Logging
Enable SDK request logging with:
const documents = new KognitiveDocumentsClient({
baseUrl: "http://localhost:3001",
apiKey: process.env.KOGNITIVE_API_KEY,
logLevel: "debug",
});This logs SDK request and response activity in your browser or Node process.
