@forgrit/llm-cost

v0.1.1

Published

17 days ago

Credit ledger primitives + cost estimators for AI applications — the 7 LLM cost design rules every AI startup hits, encoded as TypeScript.

0High
0Medium
0Low

forgrit

llm credits cost ledger ai pricing forgrit anthropic openai gemini

@forgrit/llm-cost

Credit ledger primitives + cost estimators for AI applications — the 7 LLM cost design rules every AI startup hits, encoded as TypeScript.

Zero runtime dependencies. Edge-runtime safe. Tree-shake friendly. Pure TypeScript.

Why this package exists

Every team building on top of LLMs re-discovers the same 7 problems:

Cost truth source vs estimate
thinkingTokens is optional and provider-specific
Quality evaluation needs its own task type
Per-call model override, not env mutation
Cache field semantics need a fixed vocabulary
Config changes ship in per-task commits
Evaluation infra ships before migration

This package bundles the production-tested credit-ledger primitives + cost estimators from ForGrit, plus the README below — which encodes the 7 rules in detail so your team doesn't have to re-derive them.

If your application charges users for compute, you're going to need vocabulary, estimators, and chokepoints. Start with these.

Install

npm install @forgrit/llm-cost
# or
pnpm add @forgrit/llm-cost
# or
yarn add @forgrit/llm-cost

Requires Node 20+.

Quick start

import { CREDIT_COST, estimatePreview, canAfford, isLedgerCategory } from '@forgrit/llm-cost';

// 1. Estimate cost of a generation
const est = estimatePreview({ screens: 6, variants: 3, viewports: 2 });
console.log(est.estimatedCredits); // 128

// 2. Check affordability
const userBalance = 200;
const { canAfford: ok, shortfall } = canAfford(userBalance, est);
if (!ok) throw new Error(`Insufficient credits: short by ${shortfall}`);

// 3. Use typed ledger vocabulary
function recordCharge(category: string, amount: number) {
  if (!isLedgerCategory(category)) {
    throw new Error(`Unknown ledger category: ${category}`);
  }
  // ... category is now typed as LedgerCategory
}

API reference

Pricing

`CREDIT_COST` — constants

{
  PREVIEW_BASE: 20,
  PREVIEW_PER_SCREEN_VARIANT_VIEWPORT: 3,
  REGEN_BASE: 10,
  CODEGEN_BASE: 50,
  CODEGEN_STRICT_PER_PAGE: 25,
  CODEGEN_GUIDED_PER_PAGE: 10,
  MIN_THRESHOLD: 10,
}

These are ForGrit's internal pricing constants — exposed so you can either use them directly or use them as reference points when designing your own. Credits-to-dollars ratio: $1 = 100 credits in ForGrit. Your conversion may differ.

`estimatePreview(params)` / `estimateRegen(params)` / `estimateCodegen(params)`

Pure functions. Take an object with operation-specific parameters; return a CostEstimate:

interface CostEstimate {
  estimatedCredits: number;
  breakdown: {
    base: number;
    units: number;
    unitRate: number;
    notes: string[];
  };
}

`canAfford(balance, estimate)`

Pure check. Returns { canAfford: boolean; shortfall: number }.

Ledger vocabulary

`LEDGER_TYPES` / `LedgerType`

The accounting direction of a row. Distinct from category.

['DEDUCTION', 'ADDITION', 'REFUND'];

`LEDGER_CATEGORIES` / `LedgerCategory`

The spend bucket — what bucket of cost this row represents.

['LLM', 'SANDBOX', 'STORAGE', 'DEMO', 'DEPLOY', 'BILLING', 'ADJUSTMENT', 'RAG_MODIFICATION'];

A single ledger row has both: e.g. type='DEDUCTION' + category='LLM' = user charged for an LLM call. type='REFUND' + category='ADJUSTMENT' = refund from a manual adjustment.

`LEDGER_STATUSES` / `LedgerStatus`

['estimated', 'confirmed', 'confirmed_freetier', 'disputed'];

estimated rows are written at request-time from token estimates. confirmed rows are written after the provider invoice resolves with real billed amounts (rule #1).

`FAILURE_POLICY_TAGS` / `FailurePolicyTag`

['completed', 'partial', 'failed', 'refunded', 'disputed'];

`CONFIRMATION_SOURCES` / `ConfirmationSource`

['gcp', 'freetier', 'internal_recompute'];

`isLedgerCategory(v: unknown): v is LedgerCategory`

Runtime guard. Use to validate user/API input before treating it as a typed category.

The 7 LLM cost design rules

These are the rules ForGrit derived from production. Each maps to a real bug we hit. Read them as a checklist for your own AI application's cost system.

Rule 1 — Cost truth source

Persisted per-call creditsCharged is the primary cost source. Token-derived pricing math is an estimate.

Every LLM call writes one row to a log table (in ForGrit: LlmCallLog) with a creditsCharged column set at the moment of the call. That row is the source of truth. Token-derived pricing math (multiplying input/output tokens by a per-model rate) is useful for estimating what a call will cost, but it must be labeled as an estimate in dashboards. It is never the source of truth.

Why this matters:

Provider billing diverges from token estimates due to thinking tokens, cached prompts, regional pricing tiers, and free-tier crediting. Vertex console / OpenAI usage page is the real billing truth.
An estimate-as-truth dashboard will silently disagree with the invoice. By the time someone notices, weeks of decisions have been made on bad data.
Confirmation flow: write the row at status='estimated', then a reconciler updates status='confirmed' + the creditsCharged value when the provider invoice resolves.

The LEDGER_STATUSES vocabulary in this package encodes that distinction. Use it.

Rule 2 — `thinkingTokens` is optional and provider-specific

Default to 0 for absence. Do not assume zero thinking. Report thinkingTokensAvailability % separately.

Some providers (Anthropic with extended thinking, OpenAI o-series) emit a separate thinkingTokens field; others don't. Even within a provider, certain models or modes don't expose it.

If you persist thinkingTokens as a non-nullable column with default 0, your dashboards will silently misreport: rows where the provider didn't tell you thinking tokens look identical to rows where the model didn't think. Those are very different facts.

Fix: persist thinkingTokens as nullable. When rolling up to dashboards, surface a thinkingTokensAvailability percentage — "for what fraction of calls did we receive a thinkingTokens value at all?" — separate from the actual token total.

Rule 3 — Dedicated judge path

Quality evaluation uses a dedicated LLMTask.QUALITY_JUDGE. Never reuse a business task like EXPERT_REVIEW.

When you build an evaluation harness for your LLM outputs, the temptation is to call the same business task (whatever does "expert review" or "second-pass refinement" in production) and treat its output as a judge score. This is wrong.

Reasons:

Business tasks are tuned for the business outcome, not for evaluation rigor. Their prompts include domain context that biases judgment.
Pricing-wise, business tasks may use a cheaper tier (Flash, Haiku) and produce noisy judge scores. Judges should be pinned to the strongest tier (Pro, Opus).
Coupling the judge to a business task means every change to the business prompt invalidates the entire eval history.

Fix: create a dedicated LLMTask.QUALITY_JUDGE enum entry with its own pinned model, its own system prompt, its own router entry. Treat it as its own observability surface.

Rule 4 — Per-call `modelOverride`, not env mutation

process.env.LLM_MODEL_OVERRIDE mutation in a running process is brittle. Use per-call modelOverride option on LLMExecuteOptions.

A common shortcut: "I want to run task X with model Y for one call, so I'll mutate process.env.LLM_MODEL_OVERRIDE = 'Y', call, then restore." This breaks under concurrency (another call lands during the mutation window) and under retry (the restore fires before the retry executes).

Fix: add modelOverride to the per-call options object. The router applies it with highest priority before its own model-selection logic.

interface LLMExecuteOptions {
  modelOverride?: string; // e.g., 'gemini-2.0-pro' — overrides router's default for this call
  // ... other options
}

Rule 5 — Cache field semantics — fixed vocabulary

cached: boolean (full response from cache), cacheLayer: 'L1' | 'L2' | null, promptCached: boolean (prompt cache reused). cached + promptCached are mutually exclusive.

Cache observability gets confusing fast because there are at least three distinct concepts:

"Did we return a cached response without calling the model?" → cached: true
"Did the model reuse a cached system prompt but still generate fresh output?" → promptCached: true (Vertex CachedContent, Anthropic prompt caching)
"Which cache layer hit?" → cacheLayer: 'L1' | 'L2' | null (in-memory vs persistent)

Without fixed vocabulary, every dashboard derives its own definition and they all disagree. Lock the names early.

Invariant: cached and promptCached are mutually exclusive. If the full response came from cache, no prompt-cache decision was made because no call was made.

Rule 6 — Config changes are per-task commits

Each task config change is its own commit. Rollback granularity matches task granularity.

If you have 12 task types (preview, regen, codegen, judge, expert-review, summarizer, ...) and you change the model for 4 of them in one commit, when something regresses in production you can't isolate which task's change caused it — you have to revert all 4 together.

Fix: one task config change = one commit. The diff is taskRegistry[TASK_NAME].model = 'gemini-2.0-flash'. Rollback is git revert <sha> and you affect only that task.

This is the principle the ESLint chokepoint pattern (below) enforces: every credit-ledger write goes through one service; every config change is scoped to one task; every rollback is one revert.

Rule 7 — Evaluation infra before migration

Split work into infra-first phases (SLA, golden set, runner, evaluator, dashboards), then migration phases (changing task → model). Infra ships + gets trusted before model change rides on it.

The wrong order: "let me migrate task X from Pro to Flash and use the migration to drive the eval infra build." Result: the eval infra ships under deadline pressure, you don't trust its numbers, the migration is gated on infra you don't trust, and the whole thing slides.

The right order: ship the eval infra in its own milestone — SLA definitions, golden set, runner, evaluator (dedicated judge per rule #3), dashboards. Let it bake. Confirm numbers are stable. Then migrate a single task using the infra. Then migrate the next. Each migration is small and fully observable.

ESLint chokepoint pattern (recommended)

ForGrit enforces "all credit ledger writes go through one service" via an ESLint rule. The rule below is schema-coupled (it hardcodes the Prisma model name creditTransaction), so we recommend you copy it and adapt to your schema rather than depending on our copy.

// tools/eslint-rules/no-direct-credit-transaction-create.js
module.exports = {
  meta: {
    type: 'problem',
    docs: {
      description: 'Disallow direct prisma.creditTransaction.create outside CreditsService',
    },
    schema: [],
    messages: {
      noDirect:
        'Use CreditsService.deductCredits / addCredits instead of writing directly to credit_transactions.',
    },
  },
  create(context) {
    const filename = context.getFilename().replace(/\\/g, '/');
    const isAllowed =
      filename.includes('/credits/credits.service.ts') || /\/credits\/.*\.spec\.ts$/.test(filename);
    if (isAllowed) return {};

    return {
      MemberExpression(node) {
        if (
          node.property &&
          (node.property.name === 'create' || node.property.name === 'createMany')
        ) {
          const obj = node.object;
          if (
            obj &&
            obj.type === 'MemberExpression' &&
            obj.property &&
            obj.property.name === 'creditTransaction'
          ) {
            context.report({ node, messageId: 'noDirect' });
          }
        }
      },
    };
  },
};

Adapt:

Swap creditTransaction for your Prisma model name (creditLedger, usageRecord, etc.).
Swap the allowed-file list for your chokepoint service path.
Add the rule to your .eslintrc.cjs or eslint.config.mjs.

This implements rule #1 (write-discipline) and rule #6 (per-task commits' granularity) at the linter level — anyone who tries to bypass the chokepoint gets a CI error before merge.

What this package is not

Not a billing engine. It does not talk to Stripe, GCP billing, or any provider. It's primitives.
Not a Prisma schema generator. You design your own table; this gives you the vocabulary to type its columns.
Not a NestJS module. Pure TypeScript. Use however you like.
Not a model router. A future @forgrit/prompt-engine package may ship router primitives.

Versioning

0.1.x is early-access. The public API may evolve before 1.0.0 locks semver. After 1.0.0, breaking changes require an RFC + major-version bump.

License

MIT — see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@forgrit/llm-cost

Why this package exists

Install

Quick start

API reference

Pricing

CREDIT_COST — constants

estimatePreview(params) / estimateRegen(params) / estimateCodegen(params)

canAfford(balance, estimate)

Ledger vocabulary

LEDGER_TYPES / LedgerType

LEDGER_CATEGORIES / LedgerCategory

LEDGER_STATUSES / LedgerStatus

FAILURE_POLICY_TAGS / FailurePolicyTag

CONFIRMATION_SOURCES / ConfirmationSource

isLedgerCategory(v: unknown): v is LedgerCategory

The 7 LLM cost design rules

Rule 1 — Cost truth source

Rule 2 — thinkingTokens is optional and provider-specific

Rule 3 — Dedicated judge path

Rule 4 — Per-call modelOverride, not env mutation

Rule 5 — Cache field semantics — fixed vocabulary

Rule 6 — Config changes are per-task commits

Rule 7 — Evaluation infra before migration

ESLint chokepoint pattern (recommended)

What this package is not

Versioning

License

Links

`CREDIT_COST` — constants

`estimatePreview(params)` / `estimateRegen(params)` / `estimateCodegen(params)`

`canAfford(balance, estimate)`

`LEDGER_TYPES` / `LedgerType`

`LEDGER_CATEGORIES` / `LedgerCategory`

`LEDGER_STATUSES` / `LedgerStatus`

`FAILURE_POLICY_TAGS` / `FailurePolicyTag`

`CONFIRMATION_SOURCES` / `ConfirmationSource`

`isLedgerCategory(v: unknown): v is LedgerCategory`

Rule 2 — `thinkingTokens` is optional and provider-specific

Rule 4 — Per-call `modelOverride`, not env mutation