bank-statement-parser
v2.0.0
Published
AI-powered bank statement PDF parser. Extracts bank name, account number, IFSC, transactions & balances from any bank's PDF statement.
Maintainers
Readme
🏦 bank-statement-parser
AI-powered npm package that extracts bank name, account number, IFSC code, transactions & balances from any bank's PDF statement — returns clean JSON.
Works with any bank — no templates or rules to maintain. The LLM adapts to each bank's unique format automatically.
Features
- 🏛️ Bank details — name, account number, IFSC, branch, account holder
- 💳 All transactions — date, description, reference, debit/credit, running balance
- 🏷️ Auto-categorization — UPI, NEFT, RTGS, ATM, Salary, EMI, etc.
- 📊 Summary — opening/closing balance, total debits/credits
- 🔌 Multiple integrations — Node.js, Express middleware, React hook, Next.js
- 🤖 Provider agnostic — works with Claude (Anthropic) or GPT (OpenAI)
Install
npm install bank-statement-parser pdf-parse
# Pick your LLM provider (one or both):
npm install anthropic # for Claude
npm install openai # for GPTQuick Start
Node.js / TypeScript
import { BankStatementParser } from "bank-statement-parser";
const parser = new BankStatementParser({
provider: "anthropic", // or "openai"
apiKey: process.env.ANTHROPIC_API_KEY!,
});
// Parse from file path
const result = await parser.parseFile("./hdfc-statement.pdf");
// Parse from Buffer (multer, fs.readFileSync, etc.)
// const result = await parser.parseBuffer(buffer);
// Parse from base64 (browser upload → API)
// const result = await parser.parseBase64(base64String);
console.log(result);Output (JSON)
{
"bankDetails": {
"bankName": "HDFC Bank",
"accountNumber": "50100XXXXXXX789",
"accountHolderName": "Rahul Sharma",
"ifscCode": "HDFC0001234",
"branch": "Koramangala, Bangalore",
"accountType": "Savings",
"statementPeriod": {
"from": "2024-01-01",
"to": "2024-03-31"
}
},
"summary": {
"openingBalance": 125000.50,
"closingBalance": 98450.75,
"totalDebits": 245000.00,
"totalCredits": 218450.25,
"transactionCount": 47,
"currency": "INR"
},
"transactions": [
{
"date": "2024-01-02",
"description": "UPI/SWIGGY/402312345678/Payment",
"reference": "402312345678",
"type": "debit",
"amount": 450.00,
"balance": 124550.50,
"category": "UPI"
},
{
"date": "2024-01-05",
"description": "NEFT/SALARY/JAN2024/ACME CORP",
"reference": "N0012345678",
"type": "credit",
"amount": 85000.00,
"balance": 209550.50,
"category": "Salary"
}
]
}Express Backend
import express from "express";
import cors from "cors";
import { createParserMiddleware } from "bank-statement-parser/middleware";
const app = express();
app.use(cors());
// One line to create the endpoint
app.use("/api/parse-statement", createParserMiddleware({
provider: "anthropic",
apiKey: process.env.ANTHROPIC_API_KEY!,
}));
// Also install: npm install multer express
app.listen(3001);# Test it
curl -X POST http://localhost:3001/api/parse-statement \
-F "[email protected]"React Frontend
import { useBankStatementParser } from "bank-statement-parser/react";
function StatementUploader() {
const { parse, data, loading, error } = useBankStatementParser({
endpoint: "/api/parse-statement",
});
return (
<div>
<input
type="file"
accept=".pdf"
onChange={(e) => {
const file = e.target.files?.[0];
if (file) parse(file);
}}
/>
{loading && <p>Parsing statement...</p>}
{error && <p style={{ color: "red" }}>{error}</p>}
{data && (
<div>
<h3>{data.bankDetails.bankName}</h3>
<p>Account: {data.bankDetails.accountNumber}</p>
<p>IFSC: {data.bankDetails.ifscCode}</p>
<p>Balance: {data.summary.closingBalance}</p>
<table>
<thead>
<tr>
<th>Date</th>
<th>Description</th>
<th>Amount</th>
<th>Balance</th>
</tr>
</thead>
<tbody>
{data.transactions.map((txn, i) => (
<tr key={i}>
<td>{txn.date}</td>
<td>{txn.description}</td>
<td style={{ color: txn.type === "credit" ? "green" : "red" }}>
{txn.type === "credit" ? "+" : "-"}{txn.amount}
</td>
<td>{txn.balance}</td>
</tr>
))}
</tbody>
</table>
<button onClick={() => {
const blob = new Blob([JSON.stringify(data, null, 2)], { type: "application/json" });
const a = document.createElement("a");
a.href = URL.createObjectURL(blob);
a.download = "statement.json";
a.click();
}}>
Download JSON
</button>
</div>
)}
</div>
);
}Next.js API Route
// app/api/parse-statement/route.ts
import { NextRequest, NextResponse } from "next/server";
import { BankStatementParser } from "bank-statement-parser";
const parser = new BankStatementParser({
provider: "anthropic",
apiKey: process.env.ANTHROPIC_API_KEY!,
});
export async function POST(request: NextRequest) {
const formData = await request.formData();
const file = formData.get("statement") as File;
const buffer = Buffer.from(await file.arrayBuffer());
const result = await parser.parseBuffer(buffer);
return NextResponse.json({ success: true, data: result });
}API Reference
BankStatementParser
const parser = new BankStatementParser(config: ParserConfig);| Config Option | Type | Default | Description |
|------------------|----------|-----------------------------|--------------------------------|
| provider | string | — | "anthropic" or "openai" |
| apiKey | string | — | API key for the provider |
| model | string | auto | Model name override |
| pagesPerChunk | number | 3 | Pages sent per LLM API call |
| maxPages | number | all | Max pages to process |
Methods
| Method | Input | Description |
|------------------------------------------------|-----------------|-------------------------------|
| parseFile(path, options?) | File path | Parse from local file |
| parseBuffer(buffer, options?) | Buffer | Parse from Node.js Buffer |
| parseBase64(base64, options?) | Base64 string | Parse from base64-encoded PDF |
| parseArrayBuffer(arrayBuffer, options?) | ArrayBuffer | Parse from ArrayBuffer |
Parse Options
{
pagesPerChunk?: number; // Override chunk size
maxPages?: number; // Override page limit
onProgress?: (current, total) => void; // Progress callback
}Output: ParsedStatement
{
bankDetails: {
bankName: string | null;
accountNumber: string | null;
accountHolderName: string | null;
ifscCode: string | null;
branch: string | null;
accountType: string | null;
statementPeriod: { from: string | null; to: string | null };
};
summary: {
openingBalance: number | null;
closingBalance: number | null;
totalDebits: number;
totalCredits: number;
transactionCount: number;
currency: string;
};
transactions: Array<{
date: string;
description: string;
reference: string | null;
type: "debit" | "credit";
amount: number;
balance: number | null;
category: string;
}>;
}Transaction Categories
The parser auto-categorizes transactions into:
| Category | Examples |
|-------------------|-----------------------------------------|
| Salary | Monthly salary credits |
| UPI | UPI/PhonePe/GPay payments |
| NEFT | NEFT transfers |
| RTGS | RTGS transfers |
| IMPS | IMPS transfers |
| Transfer | General fund transfers |
| ATM Withdrawal | Cash withdrawals from ATMs |
| POS/Card Payment| Debit/credit card swipes |
| EMI/Loan | EMI debits, loan repayments |
| Bill Payment | Utility bills, subscriptions |
| Cash Deposit | Cash deposits at branch/ATM |
| Interest | Interest earned |
| Charges/Fees | Bank charges, SMS fees, penalties |
| Refund | Refunds and reversals |
| Cheque | Cheque deposits/clearances |
| Other | Uncategorized |
Project Structure
bank-statement-parser/
├── src/
│ ├── index.ts # Main entry — exports everything
│ ├── core/
│ │ ├── types.ts # TypeScript interfaces
│ │ ├── parser.ts # BankStatementParser class
│ │ ├── llm.ts # Anthropic/OpenAI client
│ │ ├── prompt.ts # Extraction prompt
│ │ └── normalize.ts # JSON parsing & normalization
│ ├── middleware/
│ │ └── index.ts # Express middleware
│ └── react/
│ └── index.ts # React hook
├── examples/
│ ├── server.ts # Express server example
│ ├── BankStatementUpload.tsx # React component example
│ ├── node-usage.ts # Standalone Node.js example
│ └── nextjs-api-route.ts # Next.js API route example
├── package.json
├── tsconfig.json
└── README.mdTips
- Accuracy: Use
pagesPerChunk: 2for dense statements — smaller chunks = more reliable extraction at the cost of more API calls. - Cost: A typical 10-page statement costs ~$0.01-0.05 with Claude Sonnet.
- Security: Never expose API keys in the browser. Always use a backend endpoint.
- Scanned PDFs: This package works with digital/text-selectable PDFs. For scanned images, run OCR first (e.g., with Tesseract) and pipe the text through the parser.
License
MIT
