ai-contract

v0.1.1

Published

9 days ago

Unit tests for LLM outputs.

0High
0Medium
0Low

froxeis

ai llm evals evaluation testing prompt json structured-output cli

ai-contract

Unit tests for LLM outputs.

ai-contract is a small TypeScript library and CLI for checking that AI responses still match the shape, budget, and rules your app expects. It is provider-agnostic: OpenAI, Anthropic, Gemini, local models, custom gateways, or any async function that returns text.

Created by frox.

Why

LLM features often depend on promises like:

"This prompt returns valid JSON."
"This field is always a number."
"This answer never includes banned claims."
"This output stays small enough for the next step."

Those promises break during prompt edits, model swaps, SDK changes, and provider migrations. ai-contract lets you catch those regressions locally and in CI.

Features

Fluent TypeScript API for defining LLM output contracts.
CLI runner for local checks and CI.
Simple JSON shape validation.
String enum validation for structured outputs.
Estimated token budget checks.
Required and banned phrase checks.
Custom validators for project-specific rules.
Provider-agnostic execution.
JSON reporter for CI and automation.

Install

npm install -D ai-contract

Requires Node.js 20 or newer.

Quick Start

Create ai-contract.config.ts:

import { defineConfig } from "ai-contract";

export default defineConfig({
  contracts: "contracts/**/*.contract.ts",
  tokenRatio: 4
});

Create contracts/support.contract.ts:

import { contract } from "ai-contract";

const callModel = async () => {
  return JSON.stringify({
    answer: "Please contact support and we can help with a refund.",
    confidence: 0.92,
    next_action: "refund"
  });
};

export default [
  contract("support-reply")
    .expectsJson({
      answer: "string",
      confidence: "number",
      next_action: ["refund", "escalate", "none"]
    })
    .maxTokens(800)
    .mustInclude(["refund"])
    .mustNotInclude(["guaranteed"])
    .test(callModel)
];

Run:

npx ai-contract test

Example output:

PASS support-reply
  ok expectsJson: Passed.
  ok maxTokens: Passed.
  ok mustInclude: Passed.
  ok mustNotInclude: Passed.

Summary: 1 passed, 0 failed, 1 total

Library Usage

Use runContracts when you want to run checks inside a test file, script, or custom pipeline:

import { contract, runContracts } from "ai-contract";

const results = await runContracts([
  contract("support-reply")
    .expectsJson({
      answer: "string",
      confidence: "number",
      next_action: ["refund", "escalate", "none"]
    })
    .maxTokens(800)
    .mustInclude(["polite", "actionable"])
    .mustNotInclude(["guaranteed"])
    .test(async () => {
      return await callMyLLM(prompt);
    })
]);

if (results.some((result) => !result.passed)) {
  process.exitCode = 1;
}

CLI

npx ai-contract test

Use machine-readable output:

npx ai-contract test --json

Default behavior:

Looks for ai-contract.config.ts.
Discovers contracts/**/*.contract.ts.
Runs every exported contract.
Exits with code 0 when all contracts pass.
Exits with code 1 when any contract fails.

Checks

`expectsJson(shape)`

Validates that the output is a JSON object with expected field types.

contract("classification")
  .expectsJson({
    label: ["billing", "support", "sales"],
    confidence: "number",
    explanation: "string"
  })
  .test(callModel);

Supported primitive types:

string
number
boolean
object
array

String arrays are treated as enums.

`maxTokens(limit)`

Estimates output size with Math.ceil(output.length / tokenRatio). The default tokenRatio is 4, which is intentionally approximate. Use it for regression checks, not exact billing.

contract("short-summary")
  .maxTokens(120)
  .test(callModel);

`mustInclude(terms)`

Checks required words or phrases case-insensitively.

contract("refund-reply")
  .mustInclude(["refund policy"])
  .test(callModel);

`mustNotInclude(terms)`

Checks banned words or phrases case-insensitively.

contract("medical-disclaimer")
  .mustNotInclude(["guaranteed cure"])
  .test(callModel);

`custom(name, fn)`

Adds your own validator.

contract("signed-answer")
  .custom("ends-with-signature", ({ output }) => ({
    passed: output.endsWith("The Support Team"),
    message: "Expected output to end with the support signature."
  }))
  .test(callModel);

Configuration

import { defineConfig } from "ai-contract";

export default defineConfig({
  contracts: "contracts/**/*.contract.ts",
  tokenRatio: 4
});

Options:

| Option | Type | Default | Description | | --- | --- | --- | --- | | contracts | string | contracts/**/*.contract.ts | Glob pattern for contract files. | | tokenRatio | number | 4 | Approximate characters-per-token ratio. |

CI Options

If GitHub Actions is unavailable, use the included CircleCI config or the local release gate documented in docs/CI.md.

This repository includes .circleci/config.yml for hosted CI outside GitHub Actions.

For local release checks, run:

npm run ci
npm run package:check

Examples

See examples/basic for a minimal contract file and config.

Run the example from this repository:

npm install
npm run build
cd examples/basic
node ../../dist/cli/index.mjs test

Limitations

JSON validation is intentionally simple and shallow in the first release.
Token checks are estimates, not provider-exact tokenization.
There is no hosted dashboard, tracing backend, or prompt management UI.
Contract files execute your code, so treat them like tests and keep secrets out of logs.

Roadmap

Nested JSON schema support.
Better diff output for JSON failures.
JUnit reporter for CI systems.
Optional exact tokenizers for popular model families.
Contract examples for OpenAI, Anthropic, Gemini, and local models.

Development

npm install
npm run typecheck
npm test
npm run build

Before publishing:

npm run ci
npm pack --dry-run

Publishing runs the same safety checks through prepublishOnly.

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ai-contract

Why

Features

Install

Quick Start

Library Usage

CLI

Checks

expectsJson(shape)

maxTokens(limit)

mustInclude(terms)

mustNotInclude(terms)

custom(name, fn)

Configuration

CI Options

Examples

Limitations

Roadmap

Development

Contributing

Security

License

`expectsJson(shape)`

`maxTokens(limit)`

`mustInclude(terms)`

`mustNotInclude(terms)`

`custom(name, fn)`