bank-statement-parser

v2.0.0

Published

3 months ago

AI-powered bank statement PDF parser. Extracts bank name, account number, IFSC, transactions & balances from any bank's PDF statement.

0High
0Medium
0Low

vanavarayan

bank statement parser pdf transactions extractor ai llm claude openai ifsc finance react express

🏦 bank-statement-parser

AI-powered npm package that extracts bank name, account number, IFSC code, transactions & balances from any bank's PDF statement — returns clean JSON.

Works with any bank — no templates or rules to maintain. The LLM adapts to each bank's unique format automatically.

Features

🏛️ Bank details — name, account number, IFSC, branch, account holder
💳 All transactions — date, description, reference, debit/credit, running balance
🏷️ Auto-categorization — UPI, NEFT, RTGS, ATM, Salary, EMI, etc.
📊 Summary — opening/closing balance, total debits/credits
🔌 Multiple integrations — Node.js, Express middleware, React hook, Next.js
🤖 Provider agnostic — works with Claude (Anthropic) or GPT (OpenAI)

Install

npm install bank-statement-parser pdf-parse

# Pick your LLM provider (one or both):
npm install anthropic    # for Claude
npm install openai       # for GPT

Quick Start

Node.js / TypeScript

import { BankStatementParser } from "bank-statement-parser";

const parser = new BankStatementParser({
  provider: "anthropic",           // or "openai"
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

// Parse from file path
const result = await parser.parseFile("./hdfc-statement.pdf");

// Parse from Buffer (multer, fs.readFileSync, etc.)
// const result = await parser.parseBuffer(buffer);

// Parse from base64 (browser upload → API)
// const result = await parser.parseBase64(base64String);

console.log(result);

Output (JSON)

{
  "bankDetails": {
    "bankName": "HDFC Bank",
    "accountNumber": "50100XXXXXXX789",
    "accountHolderName": "Rahul Sharma",
    "ifscCode": "HDFC0001234",
    "branch": "Koramangala, Bangalore",
    "accountType": "Savings",
    "statementPeriod": {
      "from": "2024-01-01",
      "to": "2024-03-31"
    }
  },
  "summary": {
    "openingBalance": 125000.50,
    "closingBalance": 98450.75,
    "totalDebits": 245000.00,
    "totalCredits": 218450.25,
    "transactionCount": 47,
    "currency": "INR"
  },
  "transactions": [
    {
      "date": "2024-01-02",
      "description": "UPI/SWIGGY/402312345678/Payment",
      "reference": "402312345678",
      "type": "debit",
      "amount": 450.00,
      "balance": 124550.50,
      "category": "UPI"
    },
    {
      "date": "2024-01-05",
      "description": "NEFT/SALARY/JAN2024/ACME CORP",
      "reference": "N0012345678",
      "type": "credit",
      "amount": 85000.00,
      "balance": 209550.50,
      "category": "Salary"
    }
  ]
}

Express Backend

import express from "express";
import cors from "cors";
import { createParserMiddleware } from "bank-statement-parser/middleware";

const app = express();
app.use(cors());

// One line to create the endpoint
app.use("/api/parse-statement", createParserMiddleware({
  provider: "anthropic",
  apiKey: process.env.ANTHROPIC_API_KEY!,
}));

// Also install: npm install multer express
app.listen(3001);

# Test it
curl -X POST http://localhost:3001/api/parse-statement \
  -F "[email protected]"

React Frontend

import { useBankStatementParser } from "bank-statement-parser/react";

function StatementUploader() {
  const { parse, data, loading, error } = useBankStatementParser({
    endpoint: "/api/parse-statement",
  });

  return (
    <div>
      <input
        type="file"
        accept=".pdf"
        onChange={(e) => {
          const file = e.target.files?.[0];
          if (file) parse(file);
        }}
      />

      {loading && <p>Parsing statement...</p>}
      {error && <p style={{ color: "red" }}>{error}</p>}

      {data && (
        <div>
          <h3>{data.bankDetails.bankName}</h3>
          <p>Account: {data.bankDetails.accountNumber}</p>
          <p>IFSC: {data.bankDetails.ifscCode}</p>
          <p>Balance: {data.summary.closingBalance}</p>

          <table>
            <thead>
              <tr>
                <th>Date</th>
                <th>Description</th>
                <th>Amount</th>
                <th>Balance</th>
              </tr>
            </thead>
            <tbody>
              {data.transactions.map((txn, i) => (
                <tr key={i}>
                  <td>{txn.date}</td>
                  <td>{txn.description}</td>
                  <td style={{ color: txn.type === "credit" ? "green" : "red" }}>
                    {txn.type === "credit" ? "+" : "-"}{txn.amount}
                  </td>
                  <td>{txn.balance}</td>
                </tr>
              ))}
            </tbody>
          </table>

          <button onClick={() => {
            const blob = new Blob([JSON.stringify(data, null, 2)], { type: "application/json" });
            const a = document.createElement("a");
            a.href = URL.createObjectURL(blob);
            a.download = "statement.json";
            a.click();
          }}>
            Download JSON
          </button>
        </div>
      )}
    </div>
  );
}

Next.js API Route

// app/api/parse-statement/route.ts
import { NextRequest, NextResponse } from "next/server";
import { BankStatementParser } from "bank-statement-parser";

const parser = new BankStatementParser({
  provider: "anthropic",
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

export async function POST(request: NextRequest) {
  const formData = await request.formData();
  const file = formData.get("statement") as File;
  const buffer = Buffer.from(await file.arrayBuffer());

  const result = await parser.parseBuffer(buffer);
  return NextResponse.json({ success: true, data: result });
}

API Reference

`BankStatementParser`

const parser = new BankStatementParser(config: ParserConfig);

| Config Option | Type | Default | Description | |------------------|----------|-----------------------------|--------------------------------| | provider | string | — | "anthropic" or "openai" | | apiKey | string | — | API key for the provider | | model | string | auto | Model name override | | pagesPerChunk | number | 3 | Pages sent per LLM API call | | maxPages | number | all | Max pages to process |

Methods

| Method | Input | Description | |------------------------------------------------|-----------------|-------------------------------| | parseFile(path, options?) | File path | Parse from local file | | parseBuffer(buffer, options?) | Buffer | Parse from Node.js Buffer | | parseBase64(base64, options?) | Base64 string | Parse from base64-encoded PDF | | parseArrayBuffer(arrayBuffer, options?) | ArrayBuffer | Parse from ArrayBuffer |

Parse Options

{
  pagesPerChunk?: number;           // Override chunk size
  maxPages?: number;                // Override page limit
  onProgress?: (current, total) => void;  // Progress callback
}

Output: `ParsedStatement`

{
  bankDetails: {
    bankName: string | null;
    accountNumber: string | null;
    accountHolderName: string | null;
    ifscCode: string | null;
    branch: string | null;
    accountType: string | null;
    statementPeriod: { from: string | null; to: string | null };
  };
  summary: {
    openingBalance: number | null;
    closingBalance: number | null;
    totalDebits: number;
    totalCredits: number;
    transactionCount: number;
    currency: string;
  };
  transactions: Array<{
    date: string;
    description: string;
    reference: string | null;
    type: "debit" | "credit";
    amount: number;
    balance: number | null;
    category: string;
  }>;
}

Transaction Categories

The parser auto-categorizes transactions into:

| Category | Examples | |-------------------|-----------------------------------------| | Salary | Monthly salary credits | | UPI | UPI/PhonePe/GPay payments | | NEFT | NEFT transfers | | RTGS | RTGS transfers | | IMPS | IMPS transfers | | Transfer | General fund transfers | | ATM Withdrawal | Cash withdrawals from ATMs | | POS/Card Payment| Debit/credit card swipes | | EMI/Loan | EMI debits, loan repayments | | Bill Payment | Utility bills, subscriptions | | Cash Deposit | Cash deposits at branch/ATM | | Interest | Interest earned | | Charges/Fees | Bank charges, SMS fees, penalties | | Refund | Refunds and reversals | | Cheque | Cheque deposits/clearances | | Other | Uncategorized |

Project Structure

bank-statement-parser/
├── src/
│   ├── index.ts              # Main entry — exports everything
│   ├── core/
│   │   ├── types.ts          # TypeScript interfaces
│   │   ├── parser.ts         # BankStatementParser class
│   │   ├── llm.ts            # Anthropic/OpenAI client
│   │   ├── prompt.ts         # Extraction prompt
│   │   └── normalize.ts      # JSON parsing & normalization
│   ├── middleware/
│   │   └── index.ts          # Express middleware
│   └── react/
│       └── index.ts          # React hook
├── examples/
│   ├── server.ts             # Express server example
│   ├── BankStatementUpload.tsx  # React component example
│   ├── node-usage.ts         # Standalone Node.js example
│   └── nextjs-api-route.ts   # Next.js API route example
├── package.json
├── tsconfig.json
└── README.md

Tips

Accuracy: Use pagesPerChunk: 2 for dense statements — smaller chunks = more reliable extraction at the cost of more API calls.
Cost: A typical 10-page statement costs ~$0.01-0.05 with Claude Sonnet.
Security: Never expose API keys in the browser. Always use a backend endpoint.
Scanned PDFs: This package works with digital/text-selectable PDFs. For scanned images, run OCR first (e.g., with Tesseract) and pipe the text through the parser.

License

MIT