@docufin/sdk
v0.0.2
Published
Docufin SDK is the workflow surface for building document pipelines.
Readme
Docufin SDK
Docufin SDK is the workflow surface for building document pipelines.
Install
bun add @docufin/sdkQuick start
import { pipeline } from "@docufin/sdk/workflow";
export const OcrExtractReview = pipeline("ocr-extract-review", {
concurrency: 2,
})
.each(async (doc, { steps }) => {
const text = await steps.ocr(doc);
const schema = await steps.readSchema("invoice_schema");
return await steps.extract(text, { schema });
})
.review({
title: "Approve extraction",
timeout: "72h",
});Concepts
Doc: A document reference (uri, optionalmimeType,filename,meta).Item<T>: A value paired with its source doc ({ doc, value })..review()runs on items.- Streams:
.each()starts with docs. Returning non-doc values switches the stream to items.
Pipeline DSL
.each(fn): Runs per doc. Return aDocto keep a doc stream, or any other value to produce items..all(fn): Runs once on the full set. Return aDoc/Doc[]to keep docs, or a value to finish with a scalar..review(fn | config): Runs on items only. Pass a handler or a config withtitle/timeout.
Concurrency is applied only to .each() steps.
Steps API
Use steps inside handlers to access runtime integrations:
- OCR and ML:
steps.ocr,steps.classify,steps.extract,steps.llm - Schemas and prompts:
steps.readSchema,steps.readPrompt - Review:
steps.review - Artifacts:
steps.readArtifact,steps.writeArtifact,steps.readText,steps.writeText,steps.readJson,steps.writeJson - Docs:
steps.mergePdf,steps.writeTextAsDoc
Example artifact usage:
import { pipeline } from "@docufin/sdk/workflow";
export const ArtifactExample = pipeline("artifact-example").each(
async (doc, { steps }) => {
const text = await steps.readText(doc);
const { uri } = await steps.writeText({
text,
filename: "raw.txt",
});
return await steps.readJson(uri);
}
);Params schema
Pass a schema with parse or safeParse to validate job params.
import { z } from "zod";
import { pipeline } from "@docufin/sdk/workflow";
const ParamsSchema = z.object({
topic: z.string().optional(),
});
export const Haiku = pipeline("haiku", {
schema: ParamsSchema,
}).all(async (_docs, { params, steps }) => {
const haiku = await steps.llm(`Write a haiku about ${params.topic}.`);
return await steps.writeTextAsDoc(`${haiku.trimEnd()}\n`, {
filename: "haiku.txt",
mimeType: "text/plain",
});
});Runtime rules
- The runtime is the only source of durable tasks and waits.
- Do not wrap runtime primitives in
Promise.all; use.each()/.all()fan-out instead. - The pipeline runtime hides task handles and queue details from pipeline authors.
More examples
See .memory/sdk/cookbook.md for more recipes.
