@kognitivedev/documents

v0.2.29

Published

21 days ago

Cloud document SDK for Kognitive parsing, extraction, and pipelines APIs

0High
0Medium
0Low

vserifsaglam

kognitive documents sdk parse extract pipelines

@kognitivedev/documents

TypeScript SDK for the Kognitive Documents API.

Use it to upload files, create parse jobs, run schema-based extraction, and work with managed document pipelines.

Managed pipelines are the canonical knowledge-base resource in Kognitive. The dashboard "Knowledge bases" view, cloud pipeline APIs, and @kognitivedev/cloud-knowledge-base all use the same pipeline IDs and indexed artifacts.

Pipeline search is Qdrant hybrid by default: parsed artifacts are indexed with dense embeddings plus Qdrant BM25 sparse vectors, so semantic questions and exact keyword/ID lookups both work.

Installation

bun add @kognitivedev/documents

Common Flow

Most applications start with:

Upload a file
Optionally parse it
Optionally extract structured fields directly from the uploaded file

import { KognitiveDocumentsClient } from "@kognitivedev/documents";

const documents = new KognitiveDocumentsClient({
  baseUrl: "http://localhost:3001",
  apiKey: process.env.KOGNITIVE_API_KEY,
  logLevel: "debug",
});

const file = await documents.files.upload({
  filename: "invoice.pdf",
  mimeType: "application/pdf",
  data: await Bun.file("./invoice.pdf").arrayBuffer(),
});

const extractJob = await documents.extract.createJob({
  fileId: file.id,
  config: {
    target: "per_doc",
    tier: "cost_effective",
    preset: "invoice",
    schema: {
      type: "object",
      properties: {
        invoiceNumber: { type: "string" },
        total: { type: "string" },
        status: { type: "string" },
      },
    },
    citeSources: true,
    confidenceScores: true,
  },
});

const extraction = await documents.extract.waitForCompletion(extractJob.id);
console.log(extraction.payload);

If you also need the parsed text, read the completed extraction job and fetch its derived parse result:

const completedExtractJob = await documents.extract.getJob(extractJob.id);
const parsed = await documents.parsing.getResult(completedExtractJob.parsingJobId!);

Easiest Parse Options

Start with:

tier
preset
targetPages

await documents.parsing.createJob({
  fileId,
  tier: "agentic",
  preset: "scientific",
  targetPages: [1, 2],
});

Presets

invoice: invoices, receipts, bills
scientific: papers and research PDFs
technicalDocumentation: technical docs and manuals
forms: forms and checklists

If you are unsure, start with:

{ tier: "cost_effective" }

Common Methods

Files

files.upload()
files.list()
files.get()
files.download()
files.delete()

Parsing

parsing.createJob()
parsing.waitForCompletion()
parsing.getJob()
parsing.getResult()
parsing.getPage()
parsing.getArtifacts()

Extraction

extract.createConfig()
extract.updateConfig()
extract.createJob()
extract.waitForCompletion()

Pipelines

pipelines.create()
pipelines.addFile()
pipelines.sync() returns a queued pipeline run (202 Accepted)
pipelines.listRuns()
pipelines.getRun()
pipelines.search()
pipelines.searchImages()

Pipeline defaults are intentionally zero-config:

embeddings: text-embedding-3-small
vector store: Qdrant
retrieval: Qdrant hybrid BM25 sparse-vector + dense semantic
retrieval modes: chunks, files_via_metadata, files_via_content, auto_routed

Managed pipeline indexing requires REDIS_URL, QDRANT_URL, and OPENROUTER_API_KEY on the backend. Sync is worker-backed and asynchronous; poll pipelines.getRun() or pipelines.getStatus() until the run completes before search. QDRANT_API_KEY is optional for local deployments and required when your Qdrant instance uses API-key auth. PgVectorStore remains available in @kognitivedev/rag for local/custom RAG, but it is not the managed knowledge-base default.

For production retrieval integrations, prefer:

@kognitivedev/documents when you want the raw pipeline lifecycle and search APIs
@kognitivedev/cloud-knowledge-base when you want normalized citations, agent context adapters, and workflow KB steps

Logging

Enable SDK request logging with:

const documents = new KognitiveDocumentsClient({
  baseUrl: "http://localhost:3001",
  apiKey: process.env.KOGNITIVE_API_KEY,
  logLevel: "debug",
});

This logs SDK request and response activity in your browser or Node process.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@kognitivedev/documents

Installation

Common Flow

Easiest Parse Options

Presets

Common Methods

Files

Parsing

Extraction

Pipelines

Logging