pdfvector

v2.3.4

Published

4 months ago

Official TypeScript/JavaScript SDK for PDF Vector API - Parse PDF/Word/Image/Excel documents to clean, structured markdown format and search academic publications across multiple databases

0High
0Medium
0Low

phuctm97

pdfvector pdf word markdown api sdk academic search pubmed semantic-scholar google-scholar arxiv eric research papers publications

PDF Vector TypeScript/JavaScript SDK

The official TypeScript/JavaScript SDK for the PDF Vector API: Convert PDF and Word documents to clean, structured markdown format with optional AI enhancement, ask questions about documents using AI, extract structured data from documents with JSON Schema, search across multiple academic databases with a unified API, fetch specific publications by DOI, PubMed ID, ArXiv ID, and more, and find relevant academic citations for paragraphs of text.

Installation

npm install pdfvector
# or
yarn add pdfvector
# or
pnpm add pdfvector
# or
bun add pdfvector

Quick Start

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

// Parse from document URL or data
const parseResult = await client.parse({
  url: "https://example.com/document.pdf",
  useLLM: "auto",
});

console.log(parseResult.markdown); // Return clean markdown
console.log(
  `Pages: ${parseResult.pageCount}, Credits: ${parseResult.creditCount}`,
);

// Ask questions about documents
const askResult = await client.ask({
  url: "https://example.com/research-paper.pdf",
  prompt: "What are the key findings and conclusions?",
});

console.log(askResult.markdown); // AI-generated answer in markdown format
console.log(`Pages: ${askResult.pageCount}, Credits: ${askResult.creditCount}`);

// Extract structured data using JSON Schema
const extractResult = await client.extract({
  url: "https://example.com/research-paper.pdf",
  prompt: "Extract the research information",
  schema: {
    type: "object",
    properties: {
      title: { type: "string" },
      authors: { type: "array", items: { type: "string" } },
      abstract: { type: "string" },
      findings: { type: "array", items: { type: "string" } },
    },
    required: ["title", "abstract"],
    additionalProperties: false,
  },
});

console.log(extractResult.data); // Structured JSON output matching the schema

Authentication

Get your API key from the PDF Vector dashboard. The SDK requires a valid API key for all operations.

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

Usage Examples

Parse from URL

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.parse({
  url: "https://arxiv.org/pdf/2301.00001.pdf",
  useLLM: "auto",
});

console.log(result.markdown);

Parse from data

import { readFile } from "fs/promises";
import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.parse({
  data: await readFile("document.pdf"),
  contentType: "application/pdf",
  useLLM: "auto",
});

console.log(result.markdown);

Ask questions about documents

Ask questions about PDF and Word documents using AI and get natural language answers.

Ask from URL

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.ask({
  url: "https://arxiv.org/pdf/2301.00001.pdf",
  prompt: "What methodology was used in this research?",
});

console.log(result.markdown); // AI-generated answer in markdown format
console.log(`Document has ${result.pageCount} pages`);
console.log(`Cost: ${result.creditCount} credits`);

Ask from file data

import { readFile } from "fs/promises";
import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.ask({
  data: await readFile("research-paper.pdf"),
  contentType: "application/pdf",
  prompt: "Summarize the main findings and their implications",
});

console.log(result.markdown);

Extract structured data from documents

Extract structured data from PDF and Word documents using AI and JSON Schema.

Extract from URL

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.extract({
  url: "https://example.com/invoice.pdf",
  prompt: "Extract invoice details",
  schema: {
    type: "object",
    properties: {
      invoiceNumber: { type: "string" },
      date: { type: "string" },
      totalAmount: { type: "number" },
      items: {
        type: "array",
        items: {
          type: "object",
          properties: {
            description: { type: "string" },
            quantity: { type: "number" },
            price: { type: "number" },
          },
        },
      },
    },
    required: ["invoiceNumber", "date", "totalAmount", "items"],
    additionalProperties: false,
  },
});

console.log(result.data); // Structured data matching the schema
console.log(`Cost: ${result.creditCount} credits`);

Extract from file data

import { readFile } from "fs/promises";
import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.extract({
  data: await readFile("research-paper.pdf"),
  contentType: "application/pdf",
  prompt: "Extract research paper metadata",
  schema: {
    type: "object",
    properties: {
      title: { type: "string" },
      authors: { type: "array", items: { type: "string" } },
      abstract: { type: "string" },
      keywords: { type: "array", items: { type: "string" } },
      publicationDate: { type: "string" },
    },
    required: ["title", "authors", "abstract"],
    additionalProperties: false,
  },
});

console.log(result.data);

Invoice-Specific Operations

The SDK provides specialized methods for processing invoices with optimized parsing and extraction capabilities.

Parse Invoice

Parse invoices and convert them to clean markdown format optimized for invoice structure.

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.invoiceParse({
  url: "https://example.com/invoice.pdf",
});

console.log(result.markdown); // Structured invoice content in markdown
console.log(`Cost: ${result.creditCount} credits (${result.pageCount} pages)`);

Returns:

{
  markdown: string; // Parsed invoice in markdown format
  pageCount: number; // Number of pages processed
  creditCount: number; // Credits consumed (4 per page)
}

Ask Questions About Invoices

Ask natural language questions about invoice content and get AI-powered answers.

const result = await client.invoiceAsk({
  url: "https://example.com/invoice.pdf",
  prompt: "What is the total amount and due date for this invoice?",
});

console.log(result.markdown); // AI-generated answer about the invoice
console.log(`Cost: ${result.creditCount} credits (${result.pageCount} pages)`);

Returns:

{
  markdown: string; // AI-generated answer in markdown format
  pageCount: number; // Number of pages in the invoice
  creditCount: number; // Credits consumed (6 per page)
}

Extract Structured Invoice Data

Extract structured data from invoices using JSON Schema for consistent, reliable results.

const result = await client.invoiceExtract({
  url: "https://example.com/invoice.pdf",
  prompt: "Extract all invoice details including vendor, items, and totals",
  schema: {
    type: "object",
    properties: {
      invoiceNumber: { type: "string" },
      invoiceDate: { type: "string" },
      dueDate: { type: "string" },
      vendor: {
        type: "object",
        properties: {
          name: { type: "string" },
          address: { type: "string" },
          taxId: { type: "string" },
        },
        additionalProperties: false,
      },
      items: {
        type: "array",
        items: {
          type: "object",
          properties: {
            description: { type: "string" },
            quantity: { type: "number" },
            unitPrice: { type: "number" },
            totalPrice: { type: "number" },
          },
          additionalProperties: false,
        },
      },
      subtotal: { type: "number" },
      tax: { type: "number" },
      total: { type: "number" },
    },
    required: ["invoiceNumber", "total", "items"],
    additionalProperties: false,
  },
});

console.log(result.data); // Structured invoice data matching the schema
console.log(`Cost: ${result.creditCount} credits (${result.pageCount} pages)`);

Returns:

{
  data: object; // Structured invoice data matching your schema
  pageCount: number; // Number of pages in the invoice
  creditCount: number; // Credits consumed (6 per page)
}

Invoice Methods - Cost Summary

invoiceParse(): 4 credits per page
invoiceAsk(): 6 credits per page
invoiceExtract(): 6 credits per page

ID Document Operations

The SDK provides specialized methods for processing ID documents (passports, driver's licenses, ID cards) with optimized parsing and extraction capabilities.

Parse ID Document

Parse ID documents and convert them to clean markdown format optimized for identity document structure.

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.idParse({
  url: "https://example.com/passport.pdf",
});

console.log(result.markdown); // Structured ID document content in markdown
console.log(`Cost: ${result.creditCount} credits (${result.pageCount} pages)`);

Returns:

{
  markdown: string; // Parsed ID document in markdown format
  pageCount: number; // Number of pages processed
  creditCount: number; // Credits consumed (4 per page)
}

Ask Questions About ID Documents

Ask natural language questions about ID document content and get AI-powered answers.

const result = await client.idAsk({
  url: "https://example.com/passport.pdf",
  prompt: "What is the full name and date of birth on this document?",
});

console.log(result.markdown); // AI-generated answer about the ID document
console.log(`Cost: ${result.creditCount} credits (${result.pageCount} pages)`);

Returns:

{
  markdown: string; // AI-generated answer in markdown format
  pageCount: number; // Number of pages in the ID document
  creditCount: number; // Credits consumed (6 per page)
}

Extract Structured ID Document Data

Extract structured data from ID documents using JSON Schema for consistent, reliable results.

const result = await client.idExtract({
  url: "https://example.com/passport.pdf",
  prompt: "Extract passport details from this document",
  schema: {
    type: "object",
    properties: {
      fullName: { type: "string" },
      dateOfBirth: { type: "string" },
      documentNumber: { type: "string" },
      nationality: { type: "string" },
      expirationDate: { type: "string" },
      issuingCountry: { type: "string" },
      gender: { type: "string" },
    },
    required: ["fullName", "documentNumber"],
    additionalProperties: false,
  },
});

console.log(result.data); // Structured ID document data matching the schema
console.log(`Cost: ${result.creditCount} credits (${result.pageCount} pages)`);

Returns:

{
  data: object; // Structured ID document data matching your schema
  pageCount: number; // Number of pages in the ID document
  creditCount: number; // Credits consumed (6 per page)
}

ID Document Methods - Cost Summary

idParse(): 4 credits per page
idAsk(): 6 credits per page
idExtract(): 6 credits per page

Search academic publications

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const searchResponse = await client.academicSearch({
  query: "quantum computing",
  providers: ["semantic-scholar", "arxiv", "pubmed"], // Search across multiple academic databases
  limit: 20,
  yearFrom: 2021,
  yearTo: 2024,
});

searchResponse.results.forEach((publication) => {
  console.log(`Title: ${publication.title}`);
  console.log(`Authors: ${publication.authors?.map((a) => a.name).join(", ")}`);
  console.log(`Year: ${publication.year}`);
  console.log(`Abstract: ${publication.abstract}`);
  console.log("---");
});

Search with Provider-Specific Data

const searchResponse = await client.academicSearch({
  query: "CRISPR gene editing",
  providers: ["semantic-scholar"],
  fields: ["title", "authors", "year", "providerData"], //providerData is Provider-Specific data field
});

searchResponse.results.forEach((pub) => {
  if (pub.provider === "semantic-scholar" && pub.providerData) {
    const data = pub.providerData;
    console.log(`Influential Citations: ${data.influentialCitationCount}`);
    console.log(`Fields of Study: ${data.fieldsOfStudy?.join(", ")}`);
  }
});

Fetch Academic Publications by ID

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const response = await client.academicFetch({
  ids: [
    "10.1038/nature12373", // DOI
    "12345678", // PubMed ID
    "2301.00001", // ArXiv ID
    "arXiv:2507.16298v1", // ArXiv with prefix
    "ED123456", // ERIC ID
    "0f40b1f08821e22e859c6050916cec3667778613", // Semantic Scholar ID
  ],
  fields: ["title", "authors", "year", "abstract", "doi"], // Optional: specify fields
});

// Handle successful results
response.results.forEach((pub) => {
  console.log(`Title: ${pub.title}`);
  console.log(`Provider: ${pub.detectedProvider}`);
  console.log(`Requested as: ${pub.id}`);
});

// Handle errors for IDs that couldn't be fetched
response.errors?.forEach((error) => {
  console.log(`Failed to fetch ${error.id}: ${error.error}`);
});

Find Citations for a Paragraph

Find relevant academic citations for each sentence in a paragraph using embedding-based cosine similarity ranking.

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.findCitations({
  paragraph:
    "Transformers have revolutionized natural language processing. Attention mechanisms allow models to focus on relevant parts of the input.",
  providers: ["semantic-scholar", "arxiv", "pubmed"],
});

console.log(
  `Found ${result.totalCitations} citations across ${result.sentenceCount} sentences`,
);

for (const item of result.results) {
  console.log(`\nSentence: ${item.sentence}`);
  for (const citation of item.citations) {
    console.log(`  [Score: ${citation.score}/10] ${citation.title}`);
    console.log(
      `    Authors: ${citation.authors?.map((a) => a.name).join(", ")}`,
    );
    console.log(`    Year: ${citation.year}`);
  }
}

Error Handling

import { PDFVector, PDFVectorError } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

try {
  const result = await client.parse({
    url: "https://example.com/document.pdf",
  });
  console.log(result.markdown);
} catch (error) {
  if (error instanceof PDFVectorError) {
    console.error(`API Error: ${error.message}`);
    console.error(`Status: ${error.status}`);
    console.error(`Code: ${error.code}`);
  } else {
    console.error("Unexpected Error:", error);
  }
}

API Reference

The client class for interacting with the PDF Vector API.

Constructor

new PDFVector(config: PDFVectorConfig)

Parameters:

config.apiKey (string): Your PDF Vector API key
config.baseUrl (string, optional): Custom base URL (defaults to https://www.pdfvector.com)

Methods

`parse(request)`

Parse a PDF or Word document and convert it to markdown.

Parameters:

For URL parsing:

{
  url: string;           // Direct URL to PDF/Word document
  useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}

For data parsing:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Direct data of PDF/Word document
  contentType: string;   // MIME type (e.g., 'application/pdf')
  useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}

Returns:

{
  markdown: string; // Extracted content as markdown
  pageCount: number; // Number of pages processed
  creditCount: number; // Credits consumed (1-2 per page)
  usedLLM: boolean; // Whether AI enhancement was used
}

LLM Usage Options

auto (default): Automatically decide if AI enhancement is needed (1-2 credits per page)
never: Standard parsing without AI (1 credit per page)
always: Force AI enhancement (2 credits per page)

Note: Free plans are limited to useLLM: 'never'. Upgrade to a paid plan for AI enhancement.

Supported File Types

PDF Documents

application/pdf
application/x-pdf
application/acrobat
application/vnd.pdf
text/pdf
text/x-pdf

Word Documents

application/msword (.doc)
application/vnd.openxmlformats-officedocument.wordprocessingml.document (.docx)

Usage Limits

Processing timeout: 3 minutes per document
File size: No explicit limit, but larger files usually have more pages and consume more credits

Cost

Credits: Consumed per page (1-2 credits depending on LLM usage)

Common error codes:

url-not-found: Document URL not accessible
unsupported-content-type: File type not supported
timeout-error: Processing timeout (3 minutes max)
payment-required: Usage limit reached

`ask(request)`

Ask questions about PDF or Word documents and get natural language answers.

Parameters:

For URL input:

{
  url: string; // Direct URL to PDF/Word document
  prompt: string; // The question you want to ask about the document
}

For data input:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Document data
  contentType: string; // MIME type (e.g., 'application/pdf')
  prompt: string; // The question you want to ask about the document
}

Returns:

{
  markdown: string; // AI-generated answer in markdown format
  pageCount: number; // Number of pages in the document
  creditCount: number; // Credits consumed (3 per page)
}

Document Q&A Features

Natural language responses: AI provides answers in clear, readable markdown format
Contextual understanding: AI analyzes the entire document to provide relevant answers
Multiple formats: Supports both PDF and Word documents
Page-based pricing: 3 credits per page in the document

Cost

Credits: 3 credits per page in the document

Common error codes:

url-not-found: Document URL not accessible
unsupported-content-type: File type not supported
page-count-not-found: Unable to detect page count
timeout-error: Processing timeout
payment-required: Usage limit reached

`extract(request)`

Extract structured data from PDF or Word documents using AI and JSON Schema.

Parameters:

For URL input:

{
  url: string; // Direct URL to PDF/Word document
  prompt: string; // Instructions for extracting structured data
  schema: object; // JSON Schema defining the structure of expected output
}

For data input:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Document data
  contentType: string; // MIME type (e.g., 'application/pdf')
  prompt: string; // Instructions for extracting structured data
  schema: object; // JSON Schema defining the structure of expected output
}

Returns:

{
  data: object; // Structured data matching the provided schema
  pageCount: number; // Number of pages in the document
  creditCount: number; // Credits consumed (3 per page)
}

JSON Schema Requirements

Must be a valid JSON Schema following the specification
Must include additionalProperties: false at the object level
Can define complex nested structures
Supports all standard JSON Schema features

Extract Features

Schema validation: Ensures extracted data matches your exact requirements
Complex structures: Supports nested objects, arrays, and various data types
Reliable extraction: AI follows your schema strictly for consistent results
Multiple formats: Supports both PDF and Word documents

Cost

Credits: 3 credits per page in the document

Common error codes:

url-not-found: Document URL not accessible
unsupported-content-type: File type not supported
invalid-schema: JSON Schema is invalid or missing additionalProperties
timeout-error: Processing timeout
payment-required: Usage limit reached

`invoiceParse(request)`

Parse an invoice and convert it to markdown format optimized for invoice structure.

Parameters:

For URL input:

{
  url: string; // Direct URL to invoice PDF/Word document
}

For data input:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Invoice data
  contentType: string; // MIME type (e.g., 'application/pdf')
}

Returns:

{
  markdown: string; // Parsed invoice in markdown format
  pageCount: number; // Number of pages in the invoice
  creditCount: number; // Credits consumed (4 per page)
}

Cost

Credits: 4 credits per page

`invoiceAsk(request)`

Ask questions about invoices and get AI-powered answers.

Parameters:

For URL input:

{
  url: string; // Direct URL to invoice PDF/Word document
  prompt: string; // The question you want to ask about the invoice
}

For data input:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Invoice data
  contentType: string; // MIME type (e.g., 'application/pdf')
  prompt: string; // The question you want to ask about the invoice
}

Returns:

{
  markdown: string; // AI-generated answer in markdown format
  pageCount: number; // Number of pages in the invoice
  creditCount: number; // Credits consumed (6 per page)
}

Cost

Credits: 6 credits per page

`invoiceExtract(request)`

Extract structured data from invoices using AI and JSON Schema.

Parameters:

For URL input:

{
  url: string; // Direct URL to invoice PDF/Word document
  prompt: string; // Instructions for extracting structured invoice data
  schema: object; // JSON Schema defining the structure of expected output
}

For data input:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Invoice data
  contentType: string; // MIME type (e.g., 'application/pdf')
  prompt: string; // Instructions for extracting structured invoice data
  schema: object; // JSON Schema defining the structure of expected output
}

Returns:

{
  data: object; // Structured invoice data matching the provided schema
  pageCount: number; // Number of pages in the invoice
  creditCount: number; // Credits consumed (6 per page)
}

Cost

Credits: 6 credits per page

`idParse(request)`

Parse an ID document (passport, driver's license, ID card) and convert it to markdown format.

Parameters:

For URL input:

{
  url: string; // Direct URL to ID document (PDF or image)
}

For data input:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // ID document data
  contentType: string; // MIME type (e.g., 'application/pdf', 'image/jpeg')
}

Returns:

{
  markdown: string; // Parsed ID document in markdown format
  pageCount: number; // Number of pages in the ID document
  creditCount: number; // Credits consumed (4 per page)
}

Cost

Credits: 4 credits per page

`idAsk(request)`

Ask questions about ID documents and get AI-powered answers.

Parameters:

For URL input:

{
  url: string; // Direct URL to ID document (PDF or image)
  prompt: string; // The question you want to ask about the ID document
}

For data input:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // ID document data
  contentType: string; // MIME type (e.g., 'application/pdf', 'image/jpeg')
  prompt: string; // The question you want to ask about the ID document
}

Returns:

{
  markdown: string; // AI-generated answer in markdown format
  pageCount: number; // Number of pages in the ID document
  creditCount: number; // Credits consumed (6 per page)
}

Cost

Credits: 6 credits per page

`idExtract(request)`

Extract structured data from ID documents using AI and JSON Schema.

Parameters:

For URL input:

{
  url: string; // Direct URL to ID document (PDF or image)
  prompt: string; // Instructions for extracting structured ID data
  schema: object; // JSON Schema defining the structure of expected output
}

For data input:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // ID document data
  contentType: string; // MIME type (e.g., 'application/pdf', 'image/jpeg')
  prompt: string; // Instructions for extracting structured ID data
  schema: object; // JSON Schema defining the structure of expected output
}

Returns:

{
  data: object; // Structured ID document data matching the provided schema
  pageCount: number; // Number of pages in the ID document
  creditCount: number; // Credits consumed (6 per page)
}

Cost

Credits: 6 credits per page

`academicSearch(request)`

Search academic publications across multiple databases.

Parameters:

{
  query: string;                              // Search query
  providers?: AcademicSearchProvider[];       // Databases to search (default: ["semantic-scholar"])
  offset?: number;                            // Pagination offset (default: 0)
  limit?: number;                             // Results per page, 1-100 (default: 20)
  yearFrom?: number;                          // Filter by publication year (from) (min: 1900)
  yearTo?: number;                            // Filter by publication year (to) (max: 2050)
  fields?: AcademicSearchPublicationField[];  // Fields to include in response
}

Supported Providers:

"semantic-scholar" - Semantic Scholar
"arxiv" - ArXiv
"pubmed" - PubMed
"google-scholar" - Google Scholar
"eric" - ERIC
"europe-pmc" - Europe PMC
"openalex" - OpenAlex

Available Fields:

Basic fields: "id", "doi", "title", "url", "providerURL", "authors", "date", "year", "totalCitations", "totalReferences", "abstract", "pdfURL", "provider"
Extended field: "providerData" - Provider-specific metadata

Returns:

{
  estimatedTotalResults: number;              // Total results available
  results: AcademicSearchPublication[];       // Array of publications
  errors?: AcademicSearchProviderError[];     // Any provider errors
}

Cost

Credits: 2 credits per search.

`academicFetch(request)` / `fetch(request)`

Fetch specific academic publications by their IDs with automatic provider detection.

Parameters:

{
  ids: string[];                               // Array of publication IDs to fetch
  fields?: AcademicSearchPublicationField[];   // Fields to include in response
}

Supported ID Types:

DOI: e.g., "10.1038/nature12373"
PubMed ID: e.g., "12345678" (numeric ID)
ArXiv ID: e.g., "2301.00001" or "arXiv:2301.00001" or "math.GT/0309136"
Semantic Scholar ID: e.g., "0f40b1f08821e22e859c6050916cec3667778613"
ERIC ID: e.g., "ED123456"
Europe PMC ID: e.g., "PMC3004345" or "PPR123456"
OpenAlex ID: e.g., "W2741809807" or "https://openalex.org/W2741809807"

Returns:

{
  results: AcademicFetchResult[];    // Successfully fetched publications
  errors?: AcademicFetchError[];     // Errors for IDs that couldn't be fetched
}

Each result includes:

{
  id: string; // The ID that was used to fetch
  detectedProvider: string; // Provider that was used
  // ... all publication fields (title, authors, abstract, etc.)
}

Cost

Credits: 2 credit per fetch.

`findCitations(request)`

Find relevant academic citations for each sentence in a paragraph. Splits the paragraph into sentences, searches for papers across providers, and ranks results using embedding cosine similarity. Only sentences with citations scoring >= 5 are returned.

Parameters:

{
  paragraph: string;                        // Text to find citations for (max 5000 characters)
  providers?: AcademicSearchProvider[];     // Databases to search (default: ["semantic-scholar"])
}

Returns:

{
  results: FindCitationsSentenceCitation[]; // Sentences with their matching citations (only those with score >= 5)
  sentenceCount: number;                    // Total sentences extracted from the paragraph
  totalCitations: number;                   // Total citations found across all sentences
  errors: string[];                         // Any errors encountered during processing
}

Each sentence citation includes:

{
  sentence: string;                          // The original sentence
  citations: FindCitationsScoredPublication[]; // Papers sorted by score (highest first)
}

Each scored publication includes:

{
  score: number;     // Relevance score 0-10 (only >= 5 returned)
  title?: string;
  authors?: { name: string; url?: string }[];
  year?: number;
  abstract?: string;
  doi?: string;
  pdfURL?: string;
  provider?: string;
  // ... all base publication fields
}

Cost

Credits: 2 credits per find-citations call.

TypeScript Support

The SDK is written in TypeScript and includes full type definitions:

import type {
  // Core classes
  PDFVector,
  PDFVectorConfig,
  PDFVectorError,
  // Parse API types
  ParseURLRequest,
  ParseDataRequest,
  ParseResponse,
  // Ask API types
  AskURLRequest,
  AskDataRequest,
  AskResponse,
  // Extract API types
  ExtractURLRequest,
  ExtractDataRequest,
  ExtractResponse,
  // Invoice API types
  ParseInvoiceResponse,
  AskInvoiceResponse,
  ExtractInvoiceResponse,
  // ID Document API types
  ParseIdResponse,
  AskIdResponse,
  ExtractIdResponse,
  // Academic Search API types
  SearchRequest,
  AcademicSearchResponse,
  AcademicSearchPublication,
  AcademicSearchProvider,
  AcademicSearchAuthor,
  AcademicSearchPublicationField,
  // Academic Fetch API types
  FetchRequest,
  AcademicFetchResponse,
  AcademicFetchResult,
  AcademicFetchError,
  // Find Citations API types
  FindCitationsRequest,
  FindCitationsResponse,
  FindCitationsSentenceCitation,
  FindCitationsScoredPublication,
  // Provider-specific data types
  AcademicSearchSemanticScholarData,
  AcademicSearchGoogleScholarData,
  AcademicSearchPubMedData,
  AcademicSearchArxivData,
  AcademicSearchEricData,
} from "pdfvector";

// Constants
import {
  AcademicSearchProviderValues, // Array of valid providers
  AcademicSearchPublicationFieldValues, // Array of valid fields
} from "pdfvector";

Node.js Support

Node.js version: Node.js 20+
ESM: Supports ES modules (CommonJS is not supported)
Dependencies: Uses standard fetch API

Examples

Batch Processing

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const documents = [
  "https://example.com/doc1.pdf",
  "https://example.com/doc2.pdf",
];

const results = await Promise.all(
  documents.map((url) => client.parse({ url, useLLM: "auto" })),
);

results.forEach((result, index) => {
  console.log(`Document ${index + 1}:`);
  console.log(`Pages: ${result.pageCount}`);
  console.log(`Credits: ${result.creditCount}`);
});

Document Q&A and Data Extraction

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

// Ask multiple questions about the same document
const questions = [
  "What is the main hypothesis?",
  "What methodology was used?",
  "What are the key findings?",
  "What are the limitations mentioned?",
];

const documentUrl = "https://example.com/research-paper.pdf";

const answers = await Promise.all(
  questions.map((prompt) => client.ask({ url: documentUrl, prompt })),
);

answers.forEach((result, index) => {
  console.log(`\nQuestion: ${questions[index]}`);
  console.log(`Answer: ${result.markdown}`);
});

// Extract structured data using the extract endpoint
const structuredData = await client.extract({
  url: documentUrl,
  prompt: "Extract comprehensive research information from this paper",
  schema: {
    type: "object",
    properties: {
      title: { type: "string" },
      authors: {
        type: "array",
        items: {
          type: "object",
          properties: {
            name: { type: "string" },
            affiliation: { type: "string" },
          },
          additionalProperties: false,
        },
      },
      abstract: { type: "string" },
      methodology: {
        type: "object",
        properties: {
          approach: { type: "string" },
          dataCollection: { type: "string" },
          sampleSize: { type: "number" },
        },
        additionalProperties: false,
      },
      findings: {
        type: "array",
        items: { type: "string" },
      },
      limitations: {
        type: "array",
        items: { type: "string" },
      },
      conclusions: { type: "string" },
    },
    required: ["title", "abstract", "findings"],
    additionalProperties: false,
  },
});

console.log(
  "Structured Research Data:",
  JSON.stringify(structuredData.data, null, 2),
);

// Note: Each operation consumes credits based on page count
const totalCredits =
  answers.reduce((sum, result) => sum + result.creditCount, 0) +
  structuredData.creditCount;
console.log(`\nTotal credits used: ${totalCredits}`);

Academic Search with Pagination

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

let offset = 0;
const limit = 50;
const allResults = [];

// Fetch first page
let response = await client.academicSearch({
  query: "climate change",
  providers: ["semantic-scholar", "arxiv"],
  offset,
  limit,
});

allResults.push(...response.results);

// Fetch more pages as needed
while (
  allResults.length < response.estimatedTotalResults &&
  allResults.length < 200
) {
  offset += limit;
  response = await client.academicSearch({
    query: "climate change",
    providers: ["semantic-scholar", "arxiv"],
    offset,
    limit,
  });
  allResults.push(...response.results);
}

console.log(`Fetched ${allResults.length} publications`);

Custom Base URL

// For development or custom deployments
const client = new PDFVector({
  apiKey: "pdfvector_api_key_here",
  baseUrl: "https://pdfvector.acme.com",
});

Support

API Reference: pdfvector.com/v1/api/scalar
Dashboard: pdfvector.com/dashboard

License

This SDK is licensed under the MIT License.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

PDF Vector TypeScript/JavaScript SDK

Installation

Quick Start

Authentication

Usage Examples

Parse from URL

Parse from data

Ask questions about documents

Ask from URL

Ask from file data

Extract structured data from documents

Extract from URL

Extract from file data

Invoice-Specific Operations

Parse Invoice

Ask Questions About Invoices

Extract Structured Invoice Data

Invoice Methods - Cost Summary

ID Document Operations

Parse ID Document

Ask Questions About ID Documents

Extract Structured ID Document Data

ID Document Methods - Cost Summary

Search academic publications

Search with Provider-Specific Data

Fetch Academic Publications by ID

Find Citations for a Paragraph

Error Handling

API Reference

Constructor

Methods

parse(request)

LLM Usage Options

Supported File Types

PDF Documents

Word Documents

Usage Limits

Cost

Common error codes:

ask(request)

Document Q&A Features

Cost

Common error codes:

extract(request)

JSON Schema Requirements

Extract Features

Cost

Common error codes:

invoiceParse(request)

Cost

invoiceAsk(request)

Cost

invoiceExtract(request)

Cost

idParse(request)

Cost

idAsk(request)

Cost

idExtract(request)

Cost

academicSearch(request)

Cost

academicFetch(request) / fetch(request)

Cost

findCitations(request)

Cost

TypeScript Support

Node.js Support

Examples

Batch Processing

Document Q&A and Data Extraction

Academic Search with Pagination

Custom Base URL

Support

`parse(request)`

`ask(request)`

`extract(request)`

`invoiceParse(request)`

`invoiceAsk(request)`

`invoiceExtract(request)`

`idParse(request)`

`idAsk(request)`

`idExtract(request)`

`academicSearch(request)`

`academicFetch(request)` / `fetch(request)`

`findCitations(request)`