extractia-sdk
v1.4.0
Published
JavaScript SDK for the ExtractIA API — document extraction, OCR tools, AI summaries, templates & more
Downloads
421
Maintainers
Readme
Extractia SDK
JavaScript SDK for the Extractia document-extraction API. Works in Node.js ≥ 18 and modern browsers via the provided UMD build.
Requires an Extractia account and a valid API token.
Generate one at Settings → API Keys in the Extractia dashboard.
Table of Contents
- Installation
- Quick Start
- Authentication
- Error Handling
- API Reference
- TypeScript
- Rate Limits & Quotas
- Changelog
Installation
npm install extractia-sdkOr using yarn / pnpm:
yarn add extractia-sdk
pnpm add extractia-sdkBrowser (IIFE build):
<script src="https://unpkg.com/extractia-sdk/dist/extractia-sdk.browser.js"></script>
<script>
ExtractiaSDK.default.setToken("YOUR_API_TOKEN");
</script>Quick Start
import {
setToken,
suggestFields,
createTemplate,
processImage,
} from "extractia-sdk";
// 1. Authenticate
setToken("ext_YOUR_API_TOKEN_HERE");
// 2. Let AI suggest fields for your document type
const fields = await suggestFields(
"Invoice",
"header data plus all line items with product and quantity",
);
// 3. Create a template from the suggestions
const template = await createTemplate({ label: "Invoice", fields });
// 4. Process an image
import { readFileSync } from "fs";
const base64 = readFileSync("./invoice.png").toString("base64");
const doc = await processImage(template.id, base64);
console.log(JSON.parse(doc.rawJson));
// → { "Vendor": "Acme Corp", "Total": "1500.00", "Line Items": [...] }Authentication
Every SDK method requires a valid API token. Call setToken once at
application startup — it is stored in module scope and attached automatically as
a Bearer header on every subsequent request.
import { setToken } from "extractia-sdk";
setToken(process.env.EXTRACTIA_TOKEN);Security: Never hard-code your token in client-side code. Use environment variables or a secrets manager. Tokens can be rotated from the dashboard at any time.
Error Handling
The SDK maps every HTTP error to a typed exception. Catch the specific subclass
you need, or catch the base ExtractiaError for a generic fallback.
import {
processImage,
AuthError,
TierError,
RateLimitError,
NotFoundError,
ExtractiaError,
} from "extractia-sdk";
try {
const doc = await processImage(templateId, base64Image);
} catch (err) {
if (err instanceof AuthError) {
// 401 — token is missing, expired, or revoked
console.error("Authentication failed:", err.message);
} else if (err instanceof TierError) {
// 402/403 — monthly quota exhausted or plan doesn't allow this action
console.error("Upgrade your plan:", err.message);
} else if (err instanceof RateLimitError) {
// 429 — too many requests in a short window
console.warn("Rate limited. Retrying in 60s...");
await new Promise((r) => setTimeout(r, 60_000));
} else if (err instanceof NotFoundError) {
// 404 — template or document does not exist
console.error("Not found:", err.message);
} else if (err instanceof ExtractiaError) {
// Fallback for any other API error
console.error(`API error [${err.status}]:`, err.message);
} else {
throw err; // Re-throw network / unexpected errors
}
}Error class hierarchy
| Class | HTTP status | When thrown |
| ---------------- | ----------- | ------------------------------------------------- |
| ExtractiaError | any | Base class; fallback for unexpected codes |
| AuthError | 401 | Missing, expired, or revoked token |
| ForbiddenError | 403 | Unconfirmed account or sub-user permission denied |
| TierError | 402 / 403 | Monthly document quota exhausted |
| RateLimitError | 429 | Too many requests in time window |
| NotFoundError | 404 | Template or document not found |
API Reference
Profile & Webhook
setToken(token)
Sets the Bearer token used for all requests. Must be called before any other method.
| Parameter | Type | Required | Description |
| --------- | -------- | -------- | ---------------------------------- |
| token | string | ✅ | Your Extractia API token (ext_…) |
setToken("ext_abc123");getMyProfile()
Returns the profile and usage metrics of the authenticated user.
Returns: Promise<AppUserProfile>
const profile = await getMyProfile();
console.log(profile.email); // '[email protected]'
console.log(profile.formTemplatesCount); // 5
console.log(profile.documentsCount); // 42updateWebhook(url)
Updates the webhook URL. After each successful extraction, Extractia
sends a POST to this URL with the document payload.
| Parameter | Type | Required | Description |
| --------- | -------- | -------- | ---------------------------------------------- |
| url | string | ✅ | Webhook URL (pass empty string "" to remove) |
Returns: Promise<void>
await updateWebhook("https://myapp.example.com/hooks/extractia");
await updateWebhook(""); // remove webhookWebhook POST payload:
{
"documentId": "abc123",
"templateId": "tpl456",
"rawJson": "{ \"Total\": \"150.00\" }",
"createdAt": "2025-01-05T10:30:00Z"
}Templates
A template defines the fields to extract from a document.
Supported field types: TEXT | NUMBER | PERCENTAGE | DATE | BOOLEAN | EMAIL | PHONE | ADDRESS | CURRENCY | LIST
getTemplates()
Returns all templates owned by the authenticated user.
const templates = await getTemplates();
templates.forEach((t) => console.log(t.id, t.label));getTemplateById(id)
Returns a single template by its ID. Throws NotFoundError if missing.
const template = await getTemplateById("tpl_abc123");getTemplateByName(name)
Returns a template matched by its label name. Throws NotFoundError if missing.
const invoice = await getTemplateByName("Invoice");suggestFields(templateName, extractionContext?)
Uses AI to suggest extraction field definitions for a given document type.
Results can be passed directly to createTemplate.
Consumes AI credits.
| Parameter | Type | Required | Description |
| ------------------- | -------- | -------- | ----------------------------------------------------------- |
| templateName | string | ✅ | Document type name (e.g. "Invoice", "Driver's License") |
| extractionContext | string | — | Natural-language hint about what to extract or detect |
// Basic usage
const fields = await suggestFields("Receipt");
// With context — AI will only return what you describe
const fields = await suggestFields(
"Purchase Order",
"I need the supplier name, PO number, and all ordered products with quantity and unit price",
);
const template = await createTemplate({ label: "Purchase Order", fields });The returned array follows the FormField shape:
[
{ "label": "PO Number", "type": "TEXT", "required": true },
{ "label": "Supplier", "type": "TEXT", "required": true },
{
"label": "Order Items",
"type": "LIST",
"required": false,
"listLabel": "Product"
}
]createTemplate(template)
Creates a new template.
const template = await createTemplate({
label: "Purchase Order",
fields: [
{ label: "PO Number", type: "TEXT", required: true },
{ label: "Vendor", type: "TEXT", required: true },
{ label: "Total Amount", type: "CURRENCY", required: true },
{ label: "Order Date", type: "DATE", required: false },
{
label: "Line Items",
type: "LIST",
required: false,
listLabel: "Product",
},
],
});
console.log(template.id); // 'tpl_newid'updateTemplate(id, template)
Updates an existing template's label and/or fields.
const updated = await updateTemplate("tpl_abc123", {
fields: [
{ label: "Vendor", type: "TEXT", required: true },
{ label: "Total", type: "CURRENCY", required: true },
{ label: "Due Date", type: "DATE", required: false },
],
});deleteTemplate(id)
Deletes a template. Returns a 409 Conflict error if the template has
associated documents — call deleteAllTemplateDocuments first.
await deleteAllTemplateDocuments("tpl_abc123");
await deleteTemplate("tpl_abc123");deleteAllTemplateDocuments(id)
Deletes all documents associated with a template in one call.
await deleteAllTemplateDocuments("tpl_abc123");Documents
getDocumentsByTemplateId(templateId, options?)
Returns a paginated list of documents for a given template (newest first by default).
| Option | Type | Default | Description |
| -------------- | --------- | ------- | ------------------------------------------------- |
| preconformed | boolean | — | Filter: reviewed (true) or unreviewed (false) |
| index | number | 0 | Zero-based page index (10 docs per page) |
| sort | string | "-1" | Sort direction: "1" = ASC, "-1" = DESC |
| includeImage | boolean | false | Include base64 source image in results |
const page = await getDocumentsByTemplateId("tpl_abc123", {
preconformed: false, // only unreviewed
index: 0,
});
console.log(`Page 1 of ${page.totalPages}`);
for (const doc of page.content) {
const fields = JSON.parse(doc.rawJson);
console.log("Total:", fields["Total Amount"]);
}getDocumentById(templateId, docId, options?)
Returns a single document by template and document ID.
| Option | Type | Default | Description |
| -------------- | --------- | ------- | ------------------------------- |
| includeImage | boolean | false | Include the base64 source image |
const doc = await getDocumentById("tpl_abc123", "doc_xyz789", {
includeImage: true,
});
console.log(doc.imageBase64); // the original scangetRecentDocuments(size?)
Returns the N most-recent documents across all templates. Useful for dashboard feeds.
const recent = await getRecentDocuments(5);
recent.forEach((doc) => console.log(doc.createdAt, doc.formTemplateId));deleteDocument(documentId)
Permanently deletes a single document.
await deleteDocument("doc_xyz789");updateDocumentStatus(docId, status)
Updates the workflow status of a document. You define the status values — common
choices are "PENDING", "REVIEWED", "APPROVED", "REJECTED".
await updateDocumentStatus("doc_xyz789", "APPROVED");updateDocumentNotes(docId, notes)
Saves reviewer annotations on a document. Pass "" to clear.
await updateDocumentNotes(
"doc_xyz789",
"Verified against original: amounts match.",
);
await updateDocumentNotes("doc_xyz789", ""); // clear notesupdateDocumentData(docId, data, options?)
Corrects or overwrites the extracted field data programmatically. Optionally marks the document as preconformed (reviewed) in the same call.
| Option | Type | Default | Description |
| -------------- | --------- | ------- | ----------------------------------- |
| preconformed | boolean | false | Mark document as reviewed/confirmed |
// Fix a wrong extraction and confirm
await updateDocumentData(
"doc_xyz789",
{ "Total Amount": 1250.0, "Invoice Number": "INV-1042" },
{ preconformed: true },
);bulkPreconform(ids)
Marks multiple documents as reviewed/confirmed in a single API call. Returns the count of documents actually updated.
const { updated } = await bulkPreconform(["doc_1", "doc_2", "doc_3"]);
console.log(`${updated} documents confirmed`);Processing Images
processImage(templateId, base64Image)
Processes a single image — ideal for one-page documents.
Max image size: 5 MB decoded. Supported formats: PNG, JPEG, WEBP, BMP.
// Node.js
import { readFileSync } from "fs";
const base64 = readFileSync("./invoice.jpg").toString("base64");
const doc = await processImage("tpl_invoice_id", base64);
const fields = JSON.parse(doc.rawJson);
console.log("Vendor:", fields["Vendor"]);
console.log("Total:", fields["Total Amount"]);// Browser — convert a File input to base64
function fileToBase64(file) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => resolve(reader.result.split(",")[1]);
reader.onerror = reject;
reader.readAsDataURL(file);
});
}
const file = document.querySelector("#fileInput").files[0];
const base64 = await fileToBase64(file);
const doc = await processImage(templateId, base64);processImagesMultipage(templateId, base64ImagesArray)
Processes multiple images as a single multi-page document. All pages are merged into one result. Use this for PDFs split into images or multi-page scans.
import { readdirSync, readFileSync } from "fs";
import path from "path";
const pages = readdirSync("./scan-pages")
.sort()
.map((f) => readFileSync(path.join("./scan-pages", f)).toString("base64"));
const doc = await processImagesMultipage("tpl_contract_id", pages);
const result = JSON.parse(doc.rawJson);
console.log("Contract Date:", result["Contract Date"]);AI Features
generateDocumentSummary(docId)
Asks the AI to generate a concise bullet-point summary of a document's extracted data. Returns natural language — not JSON.
Consumes AI credits.
const { summary } = await generateDocumentSummary("doc_xyz789");
console.log(summary);
// • Invoice #1042 was issued by Acme Corp on January 5 2025.
// • The total amount due is $1,250.00, payable by February 4 2025.
// • The order includes 3 line items for a total of 15 units.Exports
Export all documents in a template to a file format for offline processing, spreadsheet import, or archiving.
exportDocumentsCsv(templateId, options?)
Returns all extracted data as a UTF-8 CSV string with BOM (Excel-compatible).
Each row is one document; columns are the extracted fields plus preconformed
and uploadedAt.
| Option | Type | Description |
| -------- | ---------- | ------------------------------------------------ |
| fields | string[] | Optional column subset. Dot-notation for nested. |
// Export everything
const csv = await exportDocumentsCsv("tpl_abc123");
fs.writeFileSync("invoices.csv", csv);
// Export a specific column subset
const csv = await exportDocumentsCsv("tpl_abc123", {
fields: ["Invoice Number", "Vendor", "Total Amount", "Issue Date"],
});exportDocumentsJson(templateId)
Returns all documents as a plain JSON array. Each element is the extracted field map plus three metadata keys:
_id— document ID_preconformed— whether the document was reviewed_uploadedAt— ISO timestamp
const records = await exportDocumentsJson("tpl_abc123");
records.forEach((doc) => {
console.log(doc._id, doc["Invoice Number"], doc["Total Amount"]);
});
// Save to file
import { writeFileSync } from "fs";
writeFileSync("invoices.json", JSON.stringify(records, null, 2));OCR Tools
OCR Tools let you ask AI yes/no questions, classify documents into labels, or extract free-form text from any image — without a template.
Credit consumption: every OCR Agent run deducts 1 document from the user's monthly (or add-on) document quota plus AI credits based on the token count of the AI analysis (calculated as
ceil(totalTokens / 1000)credits).
Dynamic parameters ({{?N}} syntax)
Prompts can include numbered placeholders of the form {{?1}}, {{?2}}, etc.
At run time you supply the actual values; the AI receives the substituted text.
// Tool with two dynamic parameters
const checker = await createOcrTool({
name: "Property Ownership Check",
prompt:
"Does this document appear to be a proof of ownership for the property at {{?1}} in the name of {{?2}}?",
outputType: "YES_NO",
parameterDefinitions: [
{
key: 1,
label: "Property address",
description: "Full street address",
maxChars: 200,
},
{
key: 2,
label: "Owner name",
description: "Full legal name of the owner",
maxChars: 150,
},
],
});
// Run with values
const result = await runOcrTool(checker.id, imageBase64, {
params: { 1: "123 Main St, Buenos Aires", 2: "Juan García" },
});Each ParameterDefinition has:
| Field | Type | Required | Description |
| ------------- | -------- | -------- | ------------------------------------------------------- |
| key | number | ✅ | 1-based index matching {{?N}} in the prompt |
| label | string | ✅ | Human-friendly name shown in the UI and API errors |
| description | string | | Optional hint shown to the user when filling values |
| maxChars | number | | Character limit for this parameter (1–500, default 200) |
Validation rules:
- Every
{{?N}}placeholder in the prompt must have a matchingParameterDefinition. - A
400 Bad Requestis returned if a required parameter is missing or exceeds itsmaxCharslimit. - The fully substituted prompt must not exceed 3 000 characters total; adjust individual
maxCharsvalues accordingly.
getOcrTools()
Returns all OCR tool configurations owned by the authenticated user.
const tools = await getOcrTools();
tools.forEach((t) => console.log(t.id, t.name, t.outputType));createOcrTool(config)
Creates a new OCR tool configuration.
| Field | Type | Required | Description |
| ---------------------- | ------------------------------- | -------- | --------------------------------------------------------- |
| name | string | ✅ | Human-friendly display name |
| prompt | string | ✅ | Natural-language instruction for the AI (max 3 000 chars) |
| outputType | "YES_NO" \| "LABEL" \| "TEXT" | ✅ | Expected output shape |
| labels | string[] | ⚠️ | Required when outputType === "LABEL" |
| parameterDefinitions | ParameterDefinition[] | | Dynamic parameter definitions for {{?N}} placeholders |
// YES/NO check
const checker = await createOcrTool({
name: "Proof of Residence Check",
prompt:
"Does this document appear to be a valid proof of residence? Look for an address, official stamp, and the person's name.",
outputType: "YES_NO",
});
// Document classifier
const classifier = await createOcrTool({
name: "Document Type Classifier",
prompt: "What type of document is this?",
outputType: "LABEL",
labels: [
"invoice",
"id_card",
"receipt",
"proof_of_residence",
"contract",
"other",
],
});
// Free-form extraction
const extractor = await createOcrTool({
name: "Serial Number Extractor",
prompt:
"Extract the serial number or product code printed on the label. Return only the code, nothing else.",
outputType: "TEXT",
});updateOcrTool(id, config)
Updates an existing OCR tool configuration.
await updateOcrTool("tool_abc", {
prompt:
"Does this document show a current address dated within the last 3 months?",
});deleteOcrTool(id)
Deletes an OCR tool configuration.
await deleteOcrTool("tool_abc");runOcrTool(id, base64Image, options?)
Runs an OCR tool against a base64-encoded image. Max image size: 5 MB.
Consumes 1 document credit + AI credits (based on token usage).
The AI output language matches the language of the prompt / parameter values automatically.
| Option | Type | Description |
| -------- | ------------------------ | ------------------------------------------------ |
| params | Record<string, string> | Values for {{?N}} placeholders, keyed by "N" |
import { readFileSync } from "fs";
const image = readFileSync("./id-card.jpg").toString("base64");
const result = await runOcrTool("tool_residence_check", image);
console.log(result.answer); // "YES"
console.log(result.explanation); // "The document shows a full address, an official municipal stamp, and the applicant's name."
// Document classification
const type = await runOcrTool("tool_classifier", image);
console.log(type.answer); // "id_card"
// With dynamic parameters
const ownership = await runOcrTool("tool_ownership_check", image, {
params: { 1: "Av. Corrientes 1234, CABA", 2: "María López" },
});
console.log(ownership.answer); // "YES"
// Full workflow: classify first, then extract
const { answer: docType } = await runOcrTool("tool_classifier", invoiceBase64);
if (docType === "invoice") {
const doc = await processImage(invoiceTemplateId, invoiceBase64);
}Error responses:
| HTTP | Condition |
| ---- | ------------------------------------------------------------------------------------------------------ |
| 400 | Missing required parameter, value exceeds maxChars, or prompt exceeds 3 000 chars after substitution |
| 402 | Document quota exhausted — upgrade plan or purchase add-on docs |
| 402 | AI credits exhausted — purchase add-on credits |
Credits & Analytics
getCreditsBalance()
Returns the current AI-credit balance for the authenticated user.
const balance = await getCreditsBalance();
console.log(`Monthly credits: ${balance.monthlyBalance}`);
console.log(`Add-on credits: ${balance.addonBalance}`);
console.log(`Total available: ${balance.totalBalance}`);getCreditsHistory(options?)
Returns a paginated history of AI credit consumption events (newest first).
| Option | Type | Default | Description |
| ------ | -------- | ------- | --------------------- |
| page | number | 0 | Zero-based page index |
| size | number | 20 | Page size (max 100) |
const history = await getCreditsHistory({ page: 0, size: 10 });
history.content.forEach((entry) => {
console.log(entry.timestamp, entry.operation, entry.creditsConsumed);
});Sub-Users
Requires a Pro or higher plan. The plan determines the maximum number of sub-users allowed.
Sub-users can log in to the web app and operate within the permissions you grant them.
Available permissions: "upload" · "view" · "template" · "settings" · "export" · "api"
Document History
import { getDocumentHistory } from "extractia-sdk";
const log = await getDocumentHistory({ page: 0, size: 20 });
log.content.forEach((entry) => {
console.log(entry.templateName, entry.status, entry.uploadDate);
if (entry.status === "FAILURE") console.error(entry.errorMessage);
});List Sub-Users
import { getSubUsers } from "extractia-sdk";
const users = await getSubUsers();
// [{ username: 'agent_carlos', permissions: ['upload','view'], suspended: false }]Create a Sub-User
import { createSubUser } from "extractia-sdk";
const sub = await createSubUser({
username: "agent_carlos",
password: "SecurePass1",
permissions: ["upload", "view"],
});Error codes:
| Code | Reason |
|------|--------|
| 403 | Plan does not allow sub-users or limit reached |
| 409 | Username already in use |
| 400 | Missing fields or password matches the main account |
Update Permissions or Password
Only the fields you include are changed. Omit password to keep it unchanged.
import { updateSubUser } from "extractia-sdk";
await updateSubUser("agent_carlos", {
permissions: ["upload", "view", "export"],
// password: 'NewPass99', ← optional
});Delete a Sub-User
import { deleteSubUser } from "extractia-sdk";
await deleteSubUser("agent_carlos");Suspend / Reactivate
A suspended sub-user cannot log in. Calling the same method again reactivates them.
import { toggleSuspendSubUser } from "extractia-sdk";
const state = await toggleSuspendSubUser("agent_carlos");
console.log(state.suspended); // true | falseTypeScript
The SDK ships with a full index.d.ts declaration file — no @types package needed.
import {
setToken,
processImage,
runOcrTool,
suggestFields,
exportDocumentsJson,
UserDocument,
OcrRunResult,
FormField,
TierError,
RateLimitError,
} from "extractia-sdk";
setToken(process.env.EXTRACTIA_TOKEN!);
async function classifyAndExtract(
templateId: string,
ocrToolId: string,
imagePath: string,
): Promise<UserDocument | null> {
const { readFileSync } = await import("fs");
const base64 = readFileSync(imagePath).toString("base64");
// Classify first
const check: OcrRunResult = await runOcrTool(ocrToolId, base64);
if (check.answer !== "YES") {
console.log("Document rejected:", check.explanation);
return null;
}
// Extract with template
return processImage(templateId, base64);
}Rate Limits & Quotas
| Limit | Value | | ------------------ | ---------------------------------------------------- | | Max image size | 5 MB decoded | | Processing timeout | 60 seconds | | Monthly documents | Depends on plan (Free / Pro / Business / Enterprise) | | Active API keys | 10 per account |
When the monthly quota is exhausted, processing calls throw a TierError.
Purchase extra document packs or upgrade your plan from the dashboard to continue.
Changelog
v1.2.0
- New:
getDocumentHistory(opts?)— paginated log of all document processing events (successes and failures) - New:
getSubUsers()— list all sub-users under your account - New:
createSubUser({ username, password, permissions })— create a sub-user (Pro+ plans) - New:
updateSubUser(username, updates)— change permissions or password of a sub-user - New:
deleteSubUser(username)— permanently remove a sub-user - New:
toggleSuspendSubUser(username)— suspend or reactivate a sub-user - Updated:
getMyProfile()response now includesdocumentsAvailableThisMonthandextraDocsAvailablequota fields - Updated TypeScript declarations:
AppUserProfile,DocumentAuditEntry,SubUserinterfaces; all new function signatures
v1.1.0
- New:
suggestFields(templateName, context?)— AI-powered field suggestions - New:
getDocumentById(templateId, docId)— fetch a single document - New:
getRecentDocuments(size?)— latest documents across all templates - New:
generateDocumentSummary(docId)— AI bullet-point summary of a document - New:
updateDocumentStatus(docId, status)— set workflow status - New:
updateDocumentNotes(docId, notes)— save reviewer annotations - New:
updateDocumentData(docId, data, opts?)— correct extracted data - New:
bulkPreconform(ids)— confirm multiple documents in one call - New:
exportDocumentsCsv(templateId, opts?)— export to CSV - New:
exportDocumentsJson(templateId)— export to JSON array - New:
getOcrTools()/createOcrTool()/updateOcrTool()/deleteOcrTool()/runOcrTool(id, image)— full OCR Tools API - New:
getCreditsBalance()/getCreditsHistory(opts?)— AI credits tracking - Extended
FieldTypewithBOOLEAN,EMAIL,PHONE,ADDRESS,CURRENCY - Full TypeScript declarations updated for all new methods
v1.0.6
- Added
processImagesMultipagefor multi-page document support - Added typed error classes (
AuthError,TierError,RateLimitError,NotFoundError,ForbiddenError) - Added TypeScript declaration file (
index.d.ts) - Added
deleteAllTemplateDocumentshelper - Added browser IIFE build
v1.0.0
- Initial release:
setToken,getMyProfile,updateWebhook,getTemplates,createTemplate,updateTemplate,deleteTemplate,getDocumentsByTemplateId,deleteDocument,processImage
License
MIT
