npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@forgrit/llm-cost

v0.1.1

Published

Credit ledger primitives + cost estimators for AI applications — the 7 LLM cost design rules every AI startup hits, encoded as TypeScript.

Readme

@forgrit/llm-cost

Credit ledger primitives + cost estimators for AI applications — the 7 LLM cost design rules every AI startup hits, encoded as TypeScript.

npm version license: MIT

Zero runtime dependencies. Edge-runtime safe. Tree-shake friendly. Pure TypeScript.


Why this package exists

Every team building on top of LLMs re-discovers the same 7 problems:

  1. Cost truth source vs estimate
  2. thinkingTokens is optional and provider-specific
  3. Quality evaluation needs its own task type
  4. Per-call model override, not env mutation
  5. Cache field semantics need a fixed vocabulary
  6. Config changes ship in per-task commits
  7. Evaluation infra ships before migration

This package bundles the production-tested credit-ledger primitives + cost estimators from ForGrit, plus the README below — which encodes the 7 rules in detail so your team doesn't have to re-derive them.

If your application charges users for compute, you're going to need vocabulary, estimators, and chokepoints. Start with these.


Install

npm install @forgrit/llm-cost
# or
pnpm add @forgrit/llm-cost
# or
yarn add @forgrit/llm-cost

Requires Node 20+.


Quick start

import { CREDIT_COST, estimatePreview, canAfford, isLedgerCategory } from '@forgrit/llm-cost';

// 1. Estimate cost of a generation
const est = estimatePreview({ screens: 6, variants: 3, viewports: 2 });
console.log(est.estimatedCredits); // 128

// 2. Check affordability
const userBalance = 200;
const { canAfford: ok, shortfall } = canAfford(userBalance, est);
if (!ok) throw new Error(`Insufficient credits: short by ${shortfall}`);

// 3. Use typed ledger vocabulary
function recordCharge(category: string, amount: number) {
  if (!isLedgerCategory(category)) {
    throw new Error(`Unknown ledger category: ${category}`);
  }
  // ... category is now typed as LedgerCategory
}

API reference

Pricing

CREDIT_COST — constants

{
  PREVIEW_BASE: 20,
  PREVIEW_PER_SCREEN_VARIANT_VIEWPORT: 3,
  REGEN_BASE: 10,
  CODEGEN_BASE: 50,
  CODEGEN_STRICT_PER_PAGE: 25,
  CODEGEN_GUIDED_PER_PAGE: 10,
  MIN_THRESHOLD: 10,
}

These are ForGrit's internal pricing constants — exposed so you can either use them directly or use them as reference points when designing your own. Credits-to-dollars ratio: $1 = 100 credits in ForGrit. Your conversion may differ.

estimatePreview(params) / estimateRegen(params) / estimateCodegen(params)

Pure functions. Take an object with operation-specific parameters; return a CostEstimate:

interface CostEstimate {
  estimatedCredits: number;
  breakdown: {
    base: number;
    units: number;
    unitRate: number;
    notes: string[];
  };
}

canAfford(balance, estimate)

Pure check. Returns { canAfford: boolean; shortfall: number }.

Ledger vocabulary

LEDGER_TYPES / LedgerType

The accounting direction of a row. Distinct from category.

['DEDUCTION', 'ADDITION', 'REFUND'];

LEDGER_CATEGORIES / LedgerCategory

The spend bucket — what bucket of cost this row represents.

['LLM', 'SANDBOX', 'STORAGE', 'DEMO', 'DEPLOY', 'BILLING', 'ADJUSTMENT', 'RAG_MODIFICATION'];

A single ledger row has both: e.g. type='DEDUCTION' + category='LLM' = user charged for an LLM call. type='REFUND' + category='ADJUSTMENT' = refund from a manual adjustment.

LEDGER_STATUSES / LedgerStatus

['estimated', 'confirmed', 'confirmed_freetier', 'disputed'];

estimated rows are written at request-time from token estimates. confirmed rows are written after the provider invoice resolves with real billed amounts (rule #1).

FAILURE_POLICY_TAGS / FailurePolicyTag

['completed', 'partial', 'failed', 'refunded', 'disputed'];

CONFIRMATION_SOURCES / ConfirmationSource

['gcp', 'freetier', 'internal_recompute'];

isLedgerCategory(v: unknown): v is LedgerCategory

Runtime guard. Use to validate user/API input before treating it as a typed category.


The 7 LLM cost design rules

These are the rules ForGrit derived from production. Each maps to a real bug we hit. Read them as a checklist for your own AI application's cost system.

Rule 1 — Cost truth source

Persisted per-call creditsCharged is the primary cost source. Token-derived pricing math is an estimate.

Every LLM call writes one row to a log table (in ForGrit: LlmCallLog) with a creditsCharged column set at the moment of the call. That row is the source of truth. Token-derived pricing math (multiplying input/output tokens by a per-model rate) is useful for estimating what a call will cost, but it must be labeled as an estimate in dashboards. It is never the source of truth.

Why this matters:

  • Provider billing diverges from token estimates due to thinking tokens, cached prompts, regional pricing tiers, and free-tier crediting. Vertex console / OpenAI usage page is the real billing truth.
  • An estimate-as-truth dashboard will silently disagree with the invoice. By the time someone notices, weeks of decisions have been made on bad data.
  • Confirmation flow: write the row at status='estimated', then a reconciler updates status='confirmed' + the creditsCharged value when the provider invoice resolves.

The LEDGER_STATUSES vocabulary in this package encodes that distinction. Use it.

Rule 2 — thinkingTokens is optional and provider-specific

Default to 0 for absence. Do not assume zero thinking. Report thinkingTokensAvailability % separately.

Some providers (Anthropic with extended thinking, OpenAI o-series) emit a separate thinkingTokens field; others don't. Even within a provider, certain models or modes don't expose it.

If you persist thinkingTokens as a non-nullable column with default 0, your dashboards will silently misreport: rows where the provider didn't tell you thinking tokens look identical to rows where the model didn't think. Those are very different facts.

Fix: persist thinkingTokens as nullable. When rolling up to dashboards, surface a thinkingTokensAvailability percentage — "for what fraction of calls did we receive a thinkingTokens value at all?" — separate from the actual token total.

Rule 3 — Dedicated judge path

Quality evaluation uses a dedicated LLMTask.QUALITY_JUDGE. Never reuse a business task like EXPERT_REVIEW.

When you build an evaluation harness for your LLM outputs, the temptation is to call the same business task (whatever does "expert review" or "second-pass refinement" in production) and treat its output as a judge score. This is wrong.

Reasons:

  • Business tasks are tuned for the business outcome, not for evaluation rigor. Their prompts include domain context that biases judgment.
  • Pricing-wise, business tasks may use a cheaper tier (Flash, Haiku) and produce noisy judge scores. Judges should be pinned to the strongest tier (Pro, Opus).
  • Coupling the judge to a business task means every change to the business prompt invalidates the entire eval history.

Fix: create a dedicated LLMTask.QUALITY_JUDGE enum entry with its own pinned model, its own system prompt, its own router entry. Treat it as its own observability surface.

Rule 4 — Per-call modelOverride, not env mutation

process.env.LLM_MODEL_OVERRIDE mutation in a running process is brittle. Use per-call modelOverride option on LLMExecuteOptions.

A common shortcut: "I want to run task X with model Y for one call, so I'll mutate process.env.LLM_MODEL_OVERRIDE = 'Y', call, then restore." This breaks under concurrency (another call lands during the mutation window) and under retry (the restore fires before the retry executes).

Fix: add modelOverride to the per-call options object. The router applies it with highest priority before its own model-selection logic.

interface LLMExecuteOptions {
  modelOverride?: string; // e.g., 'gemini-2.0-pro' — overrides router's default for this call
  // ... other options
}

Rule 5 — Cache field semantics — fixed vocabulary

cached: boolean (full response from cache), cacheLayer: 'L1' | 'L2' | null, promptCached: boolean (prompt cache reused). cached + promptCached are mutually exclusive.

Cache observability gets confusing fast because there are at least three distinct concepts:

  • "Did we return a cached response without calling the model?" → cached: true
  • "Did the model reuse a cached system prompt but still generate fresh output?" → promptCached: true (Vertex CachedContent, Anthropic prompt caching)
  • "Which cache layer hit?" → cacheLayer: 'L1' | 'L2' | null (in-memory vs persistent)

Without fixed vocabulary, every dashboard derives its own definition and they all disagree. Lock the names early.

Invariant: cached and promptCached are mutually exclusive. If the full response came from cache, no prompt-cache decision was made because no call was made.

Rule 6 — Config changes are per-task commits

Each task config change is its own commit. Rollback granularity matches task granularity.

If you have 12 task types (preview, regen, codegen, judge, expert-review, summarizer, ...) and you change the model for 4 of them in one commit, when something regresses in production you can't isolate which task's change caused it — you have to revert all 4 together.

Fix: one task config change = one commit. The diff is taskRegistry[TASK_NAME].model = 'gemini-2.0-flash'. Rollback is git revert <sha> and you affect only that task.

This is the principle the ESLint chokepoint pattern (below) enforces: every credit-ledger write goes through one service; every config change is scoped to one task; every rollback is one revert.

Rule 7 — Evaluation infra before migration

Split work into infra-first phases (SLA, golden set, runner, evaluator, dashboards), then migration phases (changing task → model). Infra ships + gets trusted before model change rides on it.

The wrong order: "let me migrate task X from Pro to Flash and use the migration to drive the eval infra build." Result: the eval infra ships under deadline pressure, you don't trust its numbers, the migration is gated on infra you don't trust, and the whole thing slides.

The right order: ship the eval infra in its own milestone — SLA definitions, golden set, runner, evaluator (dedicated judge per rule #3), dashboards. Let it bake. Confirm numbers are stable. Then migrate a single task using the infra. Then migrate the next. Each migration is small and fully observable.


ESLint chokepoint pattern (recommended)

ForGrit enforces "all credit ledger writes go through one service" via an ESLint rule. The rule below is schema-coupled (it hardcodes the Prisma model name creditTransaction), so we recommend you copy it and adapt to your schema rather than depending on our copy.

// tools/eslint-rules/no-direct-credit-transaction-create.js
module.exports = {
  meta: {
    type: 'problem',
    docs: {
      description: 'Disallow direct prisma.creditTransaction.create outside CreditsService',
    },
    schema: [],
    messages: {
      noDirect:
        'Use CreditsService.deductCredits / addCredits instead of writing directly to credit_transactions.',
    },
  },
  create(context) {
    const filename = context.getFilename().replace(/\\/g, '/');
    const isAllowed =
      filename.includes('/credits/credits.service.ts') || /\/credits\/.*\.spec\.ts$/.test(filename);
    if (isAllowed) return {};

    return {
      MemberExpression(node) {
        if (
          node.property &&
          (node.property.name === 'create' || node.property.name === 'createMany')
        ) {
          const obj = node.object;
          if (
            obj &&
            obj.type === 'MemberExpression' &&
            obj.property &&
            obj.property.name === 'creditTransaction'
          ) {
            context.report({ node, messageId: 'noDirect' });
          }
        }
      },
    };
  },
};

Adapt:

  • Swap creditTransaction for your Prisma model name (creditLedger, usageRecord, etc.).
  • Swap the allowed-file list for your chokepoint service path.
  • Add the rule to your .eslintrc.cjs or eslint.config.mjs.

This implements rule #1 (write-discipline) and rule #6 (per-task commits' granularity) at the linter level — anyone who tries to bypass the chokepoint gets a CI error before merge.


What this package is not

  • Not a billing engine. It does not talk to Stripe, GCP billing, or any provider. It's primitives.
  • Not a Prisma schema generator. You design your own table; this gives you the vocabulary to type its columns.
  • Not a NestJS module. Pure TypeScript. Use however you like.
  • Not a model router. A future @forgrit/prompt-engine package may ship router primitives.

Versioning

0.1.x is early-access. The public API may evolve before 1.0.0 locks semver. After 1.0.0, breaking changes require an RFC + major-version bump.

License

MIT — see LICENSE.

Links

  • npm: https://www.npmjs.com/package/@forgrit/llm-cost
  • Source: https://github.com/forgrit-ai/forgrit/tree/main/lifecycle/0-foundry/llm-cost
  • Issues: https://github.com/forgrit-ai/forgrit/issues
  • ForGrit: https://forgrit.ai