npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

agentic-bookkeeper

v0.1.0

Published

Agentic chart-of-accounts categorisation for bank transactions, powered by Claude. Production patterns: Zod-validated structured outputs, prompt caching, multi-provider abstraction, evaluation harness.

Readme

agentic-bookkeeper

Production-grade agentic Chart-of-Accounts categorisation for South African bank transactions, powered by Claude.

CI CodeQL License: MIT TypeScript Node Coverage

A small, focused TypeScript library that takes an ambiguous bank transaction — "SHELL ULTRA CITY BLOEMFONTEIN, R3,950" — and returns a Zod-validated chart-of-accounts categorisation: account code, confidence score, VAT applicability, and a one-sentence rationale. The reasoning engine is Claude with forced tool-use against a 141-account SA Pty Ltd Chart of Accounts.

Extracted as a focused module from Axiomatics — a production agentic accounting platform — to demonstrate the patterns that distinguish a hobby LLM project from one a senior team would ship: structured outputs, prompt caching, retry classification, observability hooks, multi-provider abstraction, and an evaluation harness with measurable accuracy.

Why this exists

Bank-transaction categorisation looks like a string-matching problem and isn't. The same merchant description ("SHELL ULTRA CITY") can map to four different accounts depending on intent (employee fuel reimbursement vs. delivery vehicle vs. director's personal car claimed back), VAT status (standard-rated vs. exempt vs. non-VATable), and entity type (Pty Ltd shareholder loan vs. sole prop drawings). Production accounting systems can't just keyword-match — they need a model that reasons about IFRS rules, SARS classifications, and SA-specific bookkeeping conventions, then explains itself. This library is the agentic core of that reasoning, isolated and tested.

What it looks like

$ ANTHROPIC_API_KEY=sk-ant-... npm run example

agentic-bookkeeper — example
Company:   Karoo Coffee Roasters (Pty) Ltd
Categorising 20 transactions...

ID     │ Description                          │      Amount │ Code   │ Account                          │   Conf
────────────────────────────────────────────────────────────────────────────────
tx-01  │ ABSA INTERNET — RENT MARCH           │      -R8,500│ 6200   │ Rent Expense                     │   0.97
tx-02  │ MTN BUSINESS — FIBRE 100MBPS         │      -R1,245│ 6330   │ Internet & Telephone             │   0.96
tx-06  │ ETHIOPIA COFFEE IMPORTERS — BEANS    │     -R15,400│ 5300   │ Direct Materials                 │   0.88
tx-10  │ TRANSFER FROM A VAN HEERDEN — CAPITAL│     +R50,000│ 2260   │ Shareholder's Loan               │   0.94
tx-11  │ SHELL ULTRA CITY BLOEMFONTEIN        │      -R3,950│ 6920   │ Fuel & Oil                       │   0.92
tx-14  │ SARS VAT 201 PERIOD 02/2026          │      -R8,200│ 2150   │ VAT Output                       │   0.95
tx-18  │ SARS PENALTY — VAT 201 LATE FILING   │      -R1,200│ 6955   │ Penalties & Fines                │   0.96
tx-19  │ DONATION — KAROO HOSPICE NPO 18A     │      -R5,000│ 6965   │ Donations                        │   0.97
...
────────────────────────────────────────────────────────────────────────────────

Summary
  Categorised: 20/20 successful
  Tokens:      102,450 input / 1,840 output
  Cache:       97,800 read, 4,650 write       ← cache_read after first call
  Cost:        $0.0123 USD
  Avg / tx:    $0.0006 USD
  Wall-clock:  18.3s

The first transaction pays the cache_write premium for both system blocks (rules + accounts list). Every subsequent call pays the cache_read rate (~10% of full input). On a 20-transaction run that's roughly a 90% reduction in input cost.

Quickstart

import { categorize, AnthropicProvider } from "agentic-bookkeeper";

const provider = new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY! });

const result = await categorize({
  transaction: { description: "SHELL ULTRA CITY BLOEMFONTEIN", amount: -3950 },
  provider,
});

if (result.ok) {
  console.log(result.value);
  // {
  //   accountCode: "6920",
  //   accountName: "Fuel & Oil",
  //   confidence: 0.92,
  //   vatApplicable: true,
  //   reasoning: "Shell forecourt — vehicle fuel; standard 15% input VAT applies."
  // }
} else {
  console.error(result.error.kind, result.error.message);
}

The function returns a discriminated Result<CategorizationResult, CategorizeError> — failures are typed values, not thrown exceptions, so callers can't forget to handle them.

Architecture

sequenceDiagram
    participant Caller
    participant categorize
    participant Zod
    participant Prompt
    participant Provider as LLMProvider
    participant Retry

    Caller->>categorize: categorize({ transaction, provider, ... })
    categorize->>Zod: safeParse(transaction)
    Zod-->>categorize: validated input
    categorize->>Prompt: buildSystemBlocks + buildUserMessage
    Prompt-->>categorize: ProviderRequest

    loop attempt 1..maxAttempts
        categorize->>Provider: call(request)

        alt success
            Provider-->>categorize: ProviderResponse + usage
            categorize->>Zod: safeParse(toolInput)
            Zod-->>categorize: validated output
            categorize->>categorize: verify code ∈ CoA
            categorize-->>Caller: ok(result)
        else transient (429, 5xx, ECONN*)
            Provider-->>categorize: throws
            categorize->>Retry: backoffDelayMs + sleep
        else terminal (4xx auth/validation)
            Provider-->>categorize: throws
            categorize-->>Caller: err(PROVIDER_ERROR)
        end
    end

    categorize-->>Caller: err(MAX_RETRIES_EXCEEDED)

Full walkthrough in docs/ARCHITECTURE.md. Decision rationale in docs/adr/.

Key design decisions

| Decision | Why | | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | | ADR 0001 — Zod for output validation | Compile-time types + runtime contract from one source. LLM output is untrusted by definition. | | ADR 0002 — LLMProvider interface | Test the agent loop without an API key. Swap Anthropic for OpenAI/Gemini without touching categorize(). | | ADR 0003 — Versioned prompt artifacts with cache_control | Prompts are engineering artifacts. Cache the static parts (~90% input-cost reduction); A/B test versions in evals. |

Production patterns covered

The patterns reviewers look for when judging "production AI" vs "tutorial AI":

  • Structured outputs. Zod schemas for input + output. Compile-time and runtime contracts derived from one source. LLM tool-call results are validated before they ever reach the caller; invalid output becomes a typed PARSE_ERROR, not a runtime crash.
  • Forced tool-use. tool_choice: { type: "tool", name: ... } — the model cannot reply with prose. Either it emits a structured tool call matching the schema, or the call fails fast.
  • Prompt caching. Cacheable system blocks marked with cache_control: { type: "ephemeral" }. Cuts input cost ~90% on warm calls. Token usage breakdown (cache_read / cache_write) propagated through the cost layer so observers can see real-world savings.
  • Multi-provider abstraction. LLMProvider interfacecategorize() is decoupled from any specific SDK. Ships with AnthropicProvider; adding OpenAIProvider or GeminiProvider is a 50-line file with no changes to the agent loop.
  • Retry classification. isRetryable() distinguishes transient failures (408/425/429/500/502/503/504, ECONN*) from terminal failures (auth, validation). Exponential backoff with full jitter to avoid thundering herd. No retry-forever loops on bad credentials.
  • Observability hooks. Observer interface with onTokens, onCost, onAttempt, onError. Hooks that throw don't propagate into the agent loop — telemetry never breaks production.
  • Evaluation harness. 25 labelled SA-specific transactions exercise the prompt's encoded reasoning rules (director loan vs share capital, Shell forecourts as fuel, Section 18A donations as NON-VAT, SARS penalties as non-deductible). Strict + tolerant accuracy reporting; CI-ready exit codes. See evals/README.md.
  • No API key needed for testing. All 70 unit + integration tests run against a MockProvider in process. npm test is fast (<200ms), deterministic, and offline-friendly.

Data & compliance posture

The library itself processes a single transaction in-memory and returns a result. It does not:

  • Persist user data anywhere
  • Send telemetry to any third party
  • Read environment variables at module load
  • Make network calls except via the explicit provider.call() you injected

All sample data in examples/ and evals/ is fully synthetic — Karoo Coffee Roasters is fictional, references and amounts are fabricated, no real persons are named. This satisfies POPIA and GDPR de-identification principles for portfolio code: there is no personal data anywhere in the repository.

When integrating into a system that does process real customer data, treat categorize() as a pure function with no PII boundary — the only data leaving your process is what you pass in (transaction description + amount). The Anthropic API call is your responsibility to govern under your DPA with Anthropic; this library doesn't add any new data flows on top of theirs.

Security posture

  • No secrets in code or git history. Verified with gitleaks on every commit (pre-commit hook + CI). GitHub Secret Scanning + Push Protection enabled at the repo level.
  • Single secret to manage. ANTHROPIC_API_KEY via .env (gitignored) or process env. No service accounts, no admin SDKs, no encryption keys.
  • Dependabot. Weekly minor/patch dependency updates auto-PR'd. Major bumps require manual review (peer-dep coordination).
  • CodeQL. GitHub's static-analysis scanner runs security-and-quality queries on every push.
  • Lock-file integrity. npm ci in CI ensures reproducible installs from package-lock.json.
  • Vulnerability disclosure. SECURITY.md — private GitHub Security Advisories preferred.

npm audit reports 0 vulnerabilities at time of writing.

Development

npm install
npm run typecheck     # tsc --noEmit (strict mode)
npm run lint          # ESLint flat config + typescript-eslint
npm run format        # Prettier 3
npm test              # vitest run (70 tests, ~200ms)
npm run test:coverage # +v8 coverage, enforces 90/85/85/90 thresholds
npm run build         # emits dist/ — npm publish-ready
npm run secrets:scan  # gitleaks against the working tree
npm run example       # CLI demo (needs ANTHROPIC_API_KEY)
npm run eval          # accuracy harness (needs ANTHROPIC_API_KEY)

Pre-commit hooks (Husky 9) run lint-staged (eslint --fix + prettier --write on staged files) plus gitleaks protect --staged. Commit messages are validated against Conventional Commits by commitlint.

Testing

70 tests, 6 files, 95% line coverage. The test suite exercises:

| Area | File | | ------------------------------------------------------------ | ------------------------------------------------------------------------ | | Zod schemas, Result helpers, error class | tests/schemas.test.ts | | Retry classification + backoff distribution | tests/retry.test.ts | | Token-cost math + cache discount/premium | tests/cost.test.ts | | Observer safe-invoke contract | tests/observability.test.ts | | Anthropic SDK request/response shape | tests/providers/anthropic.test.ts | | Full agent flow (happy path + 5 error kinds + retry + hooks) | tests/categorize.test.ts |

Coverage thresholds (lines 90, branches 85, functions 85, statements 90) are enforced in CI — adding code without tests fails the build.

License

MIT — see LICENSE. Use it, fork it, learn from it, ship it.

Author

Arno van Heerden — AI Engineer & Automation Specialist behind Axiomatics, a production agentic accounting platform built solo on TypeScript, Next.js 16, Firebase, and Anthropic Claude as the primary LLM. The platform's multi-provider gateway also routes through OpenAI and Google Gemini; this library ships the Anthropic provider with the abstraction in place to add the others without touching the agent loop.

Try the live demo: axiomatics.co.za/demo.