@kindlm/core
v0.2.1
Published
Core engine for KindLM — behavioral regression testing for AI agents
Maintainers
Readme
KindLM
Behavioral regression testing for AI agents. Test what your agents do — not just what they say.
Why KindLM?
LLM evals measure text quality. KindLM tests behavior — the tool calls your agent makes, the decisions it takes, and whether it leaks PII or violates compliance rules. It runs in CI so regressions never ship.
Features
- Tool call assertions — verify agents call the right tools with the right arguments, in the right order
- Schema validation — structured output checked against JSON Schema (AJV)
- PII detection — catch leaked SSNs, credit cards, emails, phone numbers, IBANs
- LLM-as-judge — score responses against natural-language criteria (0.0–1.0)
- Drift detection — semantic + field-level comparison against saved baselines
- Keyword guards — require or forbid specific phrases in output
- Latency & cost budgets — fail tests that exceed time or token-cost thresholds
- EU AI Act compliance — generate Annex IV documentation from test results
- CI-native — exit code 0/1, JUnit XML reporter, GitHub Actions ready
Supported Providers
| Provider | Example config |
|----------|---------------|
| OpenAI | openai:gpt-4o |
| Anthropic | anthropic:claude-sonnet-4-5-20250929 |
| Ollama | ollama:llama3 |
| Google Gemini | google:gemini-2.0-flash |
| AWS Bedrock | bedrock:anthropic.claude-sonnet-4-5-20250929-v1:0 |
| Azure OpenAI | azure:my-gpt4o-deployment |
Quick Start
npm install -g @kindlm/cli
kindlm initEdit the generated kindlm.yaml:
version: "1"
defaults:
provider: openai:gpt-4o
temperature: 0
runs: 3
suites:
- name: refund-agent
system_prompt: "You are a refund support agent."
tests:
- name: looks-up-order
input: "I want to return order #12345"
assert:
- type: tool_called
value: lookup_order
- type: no_pii
- type: judge
criteria: "Response is empathetic and professional"
threshold: 0.8Run your tests:
kindlm testOutput:
refund-agent
✓ looks-up-order (3/3 runs passed)
✓ tool_called: lookup_order
✓ no_pii
✓ judge: 0.92 ≥ 0.8
1 suite, 1 test, 3 assertions — all passedCI Integration
# .github/workflows/test.yml
- run: npm install -g @kindlm/cli
- run: kindlm test --reporter junit --output results.xmlRepository Layout
packages/
core/ @kindlm/core — Business logic, zero I/O dependencies
cli/ @kindlm/cli — CLI entry point
cloud/ @kindlm/cloud — Cloudflare Workers API + D1 database
docs/ Technical specs and documentation
site/ Documentation website (Next.js)Documentation
Full docs: kindlm.dev | Source: docs/
License
MIT (core + CLI) | AGPL (cloud)
