parsefy
v1.1.0
Published
Official TypeScript SDK for Parsefy - Financial Document Infrastructure for Developers
Downloads
32
Readme
Turn financial PDFs (invoices, receipts, bills) into structured JSON with validation and risk signals.
Installation
npm install parsefy zodQuick Start
import { Parsefy } from 'parsefy';
import * as z from 'zod';
const client = new Parsefy('pk_your_api_key');
const schema = z.object({
// REQUIRED - triggers fallback if below confidence threshold
invoice_number: z.string().describe('The invoice number'),
total: z.number().describe('Total amount including tax'),
// OPTIONAL - won't trigger fallback if missing or low confidence
vendor: z.string().optional().describe('Vendor name'),
due_date: z.string().optional().describe('Payment due date'),
});
const { object, metadata, verification, error } = await client.extract({
file: './invoice.pdf',
schema,
enableVerification: true, // Enable math verification
});
if (!error && object) {
console.log(object.invoice_number); // Fully typed!
// Access field-level confidence and evidence
console.log(`Overall confidence: ${metadata.confidence_score}`);
metadata.field_confidence.forEach((fc) => {
console.log(`${fc.field}: ${fc.score} (${fc.reason}) - "${fc.text}"`);
});
// Access verification results if enabled
if (verification) {
console.log(`Verification status: ${verification.status}`);
verification.checks_run.forEach((check) => {
console.log(`${check.type}: ${check.passed ? 'PASSED' : 'FAILED'}`);
});
}
}⚠️ Required vs Optional Fields (Important for Billing)
All fields are required by default. This is critical to understand:
| User writes (SDK) | SDK converts to (JSON Schema) | API interprets as |
|-------------------|-------------------------------|-------------------|
| name: z.string() | required: ["name"] | Required |
| name: z.string().optional() | required: [] | Optional |
Why This Matters
If a required field returns null or falls below the confidenceThreshold, the API triggers the fallback model (Tier 2), which is more expensive.
To avoid unexpected high billing:
const schema = z.object({
// REQUIRED - Always present on invoices, keep required
invoice_number: z.string(),
total: z.number(),
// OPTIONAL - May not appear on all documents, mark optional!
vendor: z.string().optional(), // Not all invoices have vendor name
tax_id: z.string().optional(), // Rarely present
notes: z.string().optional(), // Usually empty
due_date: z.string().optional(), // Sometimes missing
});Rule of thumb: If a field might be missing in >20% of your documents, mark it .optional().
Confidence Threshold
Control when the fallback model is triggered:
const { object, metadata } = await client.extract({
file: './invoice.pdf',
schema,
confidenceThreshold: 0.85, // default
});| Threshold | Behavior | |-----------|----------| | Lower (e.g., 0.70) | Faster – Accepts Tier 1 results more often | | Higher (e.g., 0.95) | More accurate – Triggers Tier 2 fallback more often |
Default: 0.85
Math Verification
Enable automatic math verification to ensure extracted numeric data is mathematically consistent:
const { object, verification } = await client.extract({
file: './invoice.pdf',
schema,
enableVerification: true, // Enable math verification
});
if (verification) {
console.log(`Verification status: ${verification.status}`);
console.log(`Checks passed: ${verification.checks_passed}`);
console.log(`Checks failed: ${verification.checks_failed}`);
verification.checks_run.forEach((check) => {
console.log(`${check.type}: ${check.passed ? 'PASSED' : 'FAILED'}`);
console.log(` Fields: ${check.fields.join(', ')}`);
console.log(` Expected: ${check.expected}, Actual: ${check.actual}`);
console.log(` Delta: ${check.delta}`);
});
}Verification Status Values
| Status | Description |
|--------|-------------|
| PASSED | All math checks passed |
| FAILED | One or more math checks failed |
| PARTIAL | Some checks passed, some failed or couldn't be verified |
| CANNOT_VERIFY | Required fields are missing (not a math error) |
| NO_RULES | No verifiable fields detected in schema |
Supported Verification Rules
- HORIZONTAL_SUM: Verifies
total = subtotal + tax - VERTICAL_SUM: Verifies
subtotal = sum(line_items)
Shadow Extraction
When enableVerification: true and only a single verifiable field is requested (e.g., just total), Parsefy automatically extracts supporting fields in the background for verification, then removes them from the response.
Response Format
interface ExtractResult<T> {
// Extracted data matching your schema, or null if extraction failed
object: T | null;
// Metadata about the extraction
metadata: {
processing_time_ms: number; // Processing time in milliseconds
credits: number; // Credits consumed (1 credit = 1 page)
fallback_triggered: boolean; // Whether fallback model was used
// 🆕 Field-level confidence and evidence
confidence_score: number; // Overall confidence (0.0 to 1.0)
field_confidence: Array<{
field: string; // JSON path (e.g., "$.invoice_number")
score: number; // Confidence score (0.0 to 1.0)
reason: string; // "Exact match", "Inferred from header", etc.
page: number; // Page number where found
text: string; // Source text evidence
}>;
issues: string[]; // Warnings or anomalies detected
};
// Math verification results (only present if enableVerification was true)
verification?: {
status: "PASSED" | "FAILED" | "PARTIAL" | "CANNOT_VERIFY" | "NO_RULES";
checks_passed: number;
checks_failed: number;
cannot_verify_count: number;
checks_run: Array<{
type: string; // e.g., "HORIZONTAL_SUM", "VERTICAL_SUM"
status: string;
fields: string[];
passed: boolean;
delta: number;
expected: number;
actual: number;
}>;
};
// Error details if extraction failed
error: {
code: string;
message: string;
} | null;
}Example Response
const { object, metadata, verification } = await client.extract({
file,
schema,
enableVerification: true
});
// object:
{
invoice_number: "INV-2024-0042",
date: "2024-01-15",
total: 1250.00,
vendor: "Acme Corp"
}
// metadata.confidence_score: 0.94
// metadata.field_confidence:
[
{ field: "$.invoice_number", score: 0.98, reason: "Exact match", page: 1, text: "Invoice # INV-2024-0042" },
{ field: "$.date", score: 0.95, reason: "Exact match", page: 1, text: "Date: 01/15/2024" },
{ field: "$.total", score: 0.92, reason: "Formatting ambiguous", page: 1, text: "Total: $1,250.00" },
{ field: "$.vendor", score: 0.90, reason: "Inferred from header", page: 1, text: "Acme Corp" }
]
// metadata.issues: []
// verification (only present if enableVerification was true):
{
status: "PASSED",
checks_passed: 1,
checks_failed: 0,
cannot_verify_count: 0,
checks_run: [
{
type: "HORIZONTAL_SUM",
status: "PASSED",
fields: ["total", "subtotal", "tax"],
passed: true,
delta: 0.0,
expected: 1250.00,
actual: 1250.00
}
]
}Configuration
API Key
// Option 1: Pass API key directly
const client = new Parsefy('pk_your_api_key');
// Option 2: Use environment variable
// Set PARSEFY_API_KEY in your environment
const client = new Parsefy();
// Option 3: Configuration object
const client = new Parsefy({
apiKey: 'pk_your_api_key',
timeout: 120000, // 2 minutes
});Configuration Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| apiKey | string | process.env.PARSEFY_API_KEY | Your Parsefy API key |
| timeout | number | 60000 | Request timeout in ms |
Extract Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| file | File \| Blob \| Buffer \| string | required | Document to extract from |
| schema | z.ZodType | required | Zod schema defining extraction structure |
| confidenceThreshold | number | 0.85 | Minimum confidence before triggering fallback |
| enableVerification | boolean | false | Enable math verification (includes shadow extraction) |
Usage
File Input Options
The SDK supports multiple file input types. Files don't need to be on disk – you can work entirely in memory.
| Input Type | Usage | Environment |
|------------|-------|-------------|
| string | File path | Node.js only |
| Buffer | In-memory bytes | Node.js |
| File | From file input or FormData | Browser, Node.js 20+, Edge |
| Blob | Raw binary with MIME type | Universal |
// Node.js: File path
const result = await client.extract({
file: './invoice.pdf',
schema,
});
// Node.js: Buffer (in-memory)
import { readFileSync } from 'fs';
const result = await client.extract({
file: readFileSync('./invoice.pdf'),
schema,
});
// Browser: File input
const fileInput = document.querySelector('input[type="file"]');
const result = await client.extract({
file: fileInput.files[0],
schema,
});Complex Schemas for Financial Documents
Use .describe() to guide the AI extraction:
const invoiceSchema = z.object({
// REQUIRED - Core financial data
invoice_number: z.string().describe('The invoice or receipt number'),
date: z.string().describe('Invoice date in YYYY-MM-DD format'),
total: z.number().describe('Total amount due including tax'),
currency: z.string().describe('3-letter currency code (USD, EUR, etc.)'),
// REQUIRED - Line items (usually present)
line_items: z.array(z.object({
description: z.string().describe('Item description'),
quantity: z.number().describe('Number of units'),
unit_price: z.number().describe('Price per unit'),
amount: z.number().describe('Total amount for this line'),
})).describe('List of items on the invoice'),
// OPTIONAL - May not appear on all invoices
vendor: z.object({
name: z.string().describe('Company name of the vendor'),
address: z.string().optional().describe('Full address'),
tax_id: z.string().optional().describe('Tax ID or VAT number'),
}).optional(),
subtotal: z.number().optional().describe('Subtotal before tax'),
tax: z.number().optional().describe('Tax amount'),
due_date: z.string().optional().describe('Payment due date'),
payment_terms: z.string().optional().describe('e.g., Net 30'),
});Server-Side / API Usage
Express with Multer:
import express from 'express';
import multer from 'multer';
import { Parsefy } from 'parsefy';
const upload = multer(); // Store in memory
const client = new Parsefy();
app.post('/extract', upload.single('document'), async (req, res) => {
const { object, metadata, error } = await client.extract({
file: req.file.buffer,
schema,
confidenceThreshold: 0.80, // Adjust based on your needs
});
res.json({
data: object,
confidence: metadata.confidence_score,
fieldDetails: metadata.field_confidence,
error,
});
});Hono / Cloudflare Workers:
import { Hono } from 'hono';
import { Parsefy } from 'parsefy';
const app = new Hono();
const client = new Parsefy();
app.post('/extract', async (c) => {
const formData = await c.req.formData();
const file = formData.get('document');
const { object, metadata, error } = await client.extract({
file,
schema,
});
return c.json({
data: object,
confidence: metadata.confidence_score,
issues: metadata.issues,
error,
});
});Error Handling
import { Parsefy, APIError, ValidationError, ParsefyError } from 'parsefy';
try {
const { object, error, metadata } = await client.extract({
file: './invoice.pdf',
schema,
});
// Extraction-level errors (request succeeded, but extraction failed)
if (error) {
console.error(`Extraction failed: [${error.code}] ${error.message}`);
console.log(`Fallback triggered: ${metadata.fallback_triggered}`);
console.log(`Issues: ${metadata.issues.join(', ')}`);
return;
}
// Check for low confidence fields
const lowConfidence = metadata.field_confidence.filter((fc) => fc.score < 0.80);
if (lowConfidence.length > 0) {
console.warn('Low confidence fields:', lowConfidence);
}
console.log('Success:', object);
} catch (err) {
// HTTP/Network errors
if (err instanceof APIError) {
console.error(`API Error ${err.statusCode}: ${err.message}`);
} else if (err instanceof ValidationError) {
console.error(`Validation Error: ${err.message}`);
} else if (err instanceof ParsefyError) {
console.error(`Parsefy Error: ${err.message}`);
}
}Error Types
| Error Class | Description |
|-------------|-------------|
| ParsefyError | Base error class for all Parsefy errors |
| APIError | HTTP errors (4xx/5xx responses) |
| ExtractionError | Extraction failed (returned in response) |
| ValidationError | Client-side validation errors |
Supported File Types
- PDF (
.pdf) – up to 10MB - DOCX (
.docx) – up to 10MB
Rate Limits
The API allows 1 request per second. The SDK automatically retries with exponential backoff on rate limit errors (HTTP 429).
Requirements
- Node.js 18+ (for native
fetchandFormData) - Zod 3.x (peer dependency)
TypeScript Types
All types are exported for your convenience:
import type {
ParsefyConfig,
ExtractOptions,
ExtractResult,
ExtractionMetadata,
FieldConfidence,
Verification,
VerificationStatus,
VerificationCheck,
APIErrorResponse,
} from 'parsefy';
import { DEFAULT_CONFIDENCE_THRESHOLD } from 'parsefy'; // 0.85License
MIT © Parsefy
