katt
v0.0.9
Published
CLI tool that tests the output of agentic AI tools
Downloads
637
Maintainers
Readme
Katt
Katt is a lightweight testing framework for running AI Evals, inspired by Jest.
Table of Contents
- Overview
- API Documentation
- Articles
- Hello World - Example
- Main Features
- Installation
- Basic Usage
- Specifying AI Models
- Development
- How It Works
- Requirements
- License
- Contributing
Overview
✨ Run your own benchmarks and evaluations ✨
Katt is designed to evaluate and validate the behavior of AI agents like Claude Code, GitHub Copilot, OpenAI Codex and more. It provides a simple, intuitive API for writing tests that interact with AI models and assert their responses.
API Documentation
For a complete list of features and usage examples, see docs/api-documentation.md.
Articles
Hello World - Example
import { expect, prompt } from "katt";
const result = await prompt("If you read this just say 'hello world'");
expect(result).toContain("hello world");It also supports the familiar describe and it syntax for organizing tests:
import { describe, expect, it, prompt } from "katt";
describe("Greeting agent", () => {
it("should say hello world", async () => {
const result = await prompt("If you read this just say 'hello world'");
expect(result).toContain("hello world");
});
});Main Features
- Simple Testing API: Familiar
describeanditsyntax for organizing tests - AI Interaction and Verification: Built-in
prompt(),promptFile()andpromptCheck()functions for running and analyzing prompts to AI agents - Classification Matcher: Built-in
toBeClassifiedAs()matcher to grade a response against a target label on a 1-5 scale - Concurrent Execution: Runs eval files concurrently for faster test execution
- Model Selection: Support for specifying custom AI models
- Runtime Selection: Run prompts through GitHub Copilot (default) or Codex
- Configurable Timeouts: Override prompt wait time per test or via
katt.json
Usage
Installation
npm install -g kattBasic Usage
- Create a file with the
.eval.tsor.eval.jsextension and write your tests.
import { expect, prompt } from "katt";
const result = await prompt("If you read this just say 'hello world'");
expect(result).toContain("hello world");- Run Katt from your project directory:
npx kattUsing promptFile
Load prompts from external files:
// test.eval.js
import { describe, expect, it, promptFile } from "katt";
describe("Working with files", () => {
it("should load the file and respond", async () => {
const result = await promptFile("./myPrompt.md");
expect(result).toContain("expected response");
});
});Specifying AI Models
You can specify a custom model for your prompts:
import { describe, expect, it, prompt } from "katt";
describe("Model selection", () => {
it("should use a specific model", async () => {
const promptString = "You are a helpful agent. Say hi and ask what you could help the user with.";
const result = await prompt(promptString, { model: "gpt-5.2" });
expect(result).promptCheck("It should be friendly and helpful");
});
});You can also set runtime defaults in katt.json.
Copilot (default runtime):
{
"agent": "gh-copilot",
"agentOptions": {
"model": "gpt-5-mini"
},
"prompt": {
"timeoutMs": 240000
}
}Codex:
{
"agent": "codex",
"agentOptions": {
"model": "gpt-5-codex",
"profile": "default",
"sandbox": "workspace-write"
},
"prompt": {
"timeoutMs": 240000
}
}When this file exists:
- Supported agents are:
gh-copilot(default whenagentis missing or unsupported)codex
prompt("...")andpromptFile("...")mergeagentOptionswith call-time optionsprompt("...", { model: "..." })overrides the model from configprompt.timeoutMssets the default wait timeout for long-running prompts
Development
Setup
npm installAvailable Scripts
npm run dev- Run the CLI in development modenpm run build- Build the projectnpm run test- Run testsnpm run typecheck- Run TypeScript type checkingnpm run format- Format code using Biomenpm run lint- Lint code using Biomenpm run test:build- Test the built CLI
Verification Process
After making changes, run the following sequence:
npm run formatnpm run typechecknpm run testnpm run buildnpm run test:build
Project Structure
katt/
├── src/ # Source code
│ ├── cli/ # CLI implementation
│ ├── lib/ # Core libraries (describe, it, expect, prompt)
│ └── types/ # TypeScript type definitions
├── examples/ # Example eval files
├── specs/ # Markdown specifications
├── package.json # Package configuration
└── tsconfig.json # TypeScript configurationHow It Works
- Katt searches the current directory recursively for
*.eval.jsand*.eval.tsfiles - It skips
.gitandnode_modulesdirectories - Found eval files are imported and executed concurrently
- Tests registered with
describe()andit()are collected and run - Each test duration is printed after execution
- A summary is displayed showing passed/failed tests and total duration
- Katt exits with code
0on success or1on failure
Requirements
- Node.js
- For
gh-copilotruntime: access to GitHub Copilot with a logged-in user - For
codexruntime: Codex CLI installed and authenticated (codex login)
License
MIT
Contributing
We welcome contributions from the community! Please see our CONTRIBUTING.md guide for detailed information on how to contribute to Katt.
Quick start:
- Fork the repository
- Create a feature branch
- Make your changes
- Run the verification process
- Submit a pull request
For detailed guidelines, development setup, coding standards, and more, check out our contribution guide.
