@tracepact/core
v0.5.0
Published
Core library for TracePact — matchers, trace model, parser, sandbox, drivers
Downloads
1,004
Readme
@tracepact/core
Core library for TracePact — the behavioral testing framework for AI agents.
This package provides the foundational building blocks: LLM drivers, tool sandboxing, trace collection, multi-tier assertion matchers, caching, cassette recording/replay, and more. All other TracePact packages depend on this one.
Installation
npm install @tracepact/coreKey Concepts
Skills
A skill is a YAML-frontmatter markdown file (SKILL.md) that defines an agent's system prompt, available tools, and behavioral constraints.
import { parseSkill } from '@tracepact/core';
const skill = await parseSkill('SKILL.md');
// skill.frontmatter — {name, description, triggers, excludes, tools}
// skill.body — system prompt text
// skill.hash — SHA256 content hashDrivers
Drivers execute prompts against LLM providers. Built-in support for OpenAI and Anthropic, with a registry for multi-provider setups.
import { DriverRegistry, defineConfig } from '@tracepact/core';
const config = defineConfig({
model: 'openai/gpt-4o',
roles: {
agent: 'openai/gpt-4o',
judge: 'anthropic/claude-haiku-4-5-20251001',
embedding: 'openai/text-embedding-3-small',
},
});
const registry = new DriverRegistry(config);
const driver = registry.get('openai');Tool Sandbox
Mock tool environments let you test agent behavior without real side effects.
import { createMockTools, mockReadFile, mockBash, captureWrites, denyAll, passthrough } from '@tracepact/core';
const sandbox = createMockTools({
read_file: mockReadFile({ 'src/index.ts': 'export const x = 1;' }),
write_file: captureWrites(),
bash: mockBash({ 'npm test': { stdout: 'PASS' } }),
dangerous_tool: denyAll(),
other: passthrough(),
});
const result = await sandbox.executeTool('read_file', { path: 'src/index.ts' });
const trace = sandbox.getTrace(); // full call history
const writes = sandbox.getWrites(); // captured file writesTool Definitions
Type-safe tool schemas using Zod:
import { defineTools } from '@tracepact/core';
import { z } from 'zod';
const tools = defineTools({
read_file: z.object({ path: z.string() }),
write_file: z.object({ path: z.string(), content: z.string() }),
});Prompt Execution
The main orchestration function that ties everything together:
import { executePrompt } from '@tracepact/core';
const result = await executePrompt('SKILL.md', {
prompt: 'Review the code for security issues',
sandbox,
tools,
config: { temperature: 0.7, maxTokens: 2048 },
});
// result.output — agent's final text response
// result.trace — complete ToolTrace
// result.usage — token counts
// result.duration — execution time in msAssertion Matchers
TracePact provides a 5-tier matcher system, from fast deterministic checks to LLM-based evaluation:
Tier 0 — Tool Assertions
import { toHaveCalledTool, toHaveCalledToolsInOrder, toHaveToolCallCount } from '@tracepact/core';
toHaveCalledTool(trace, 'read_file', { path: /\.ts$/ });
toHaveCalledToolsInOrder(trace, ['read_file', 'write_file']);
toHaveToolCallCount(trace, 'read_file', 2);
toHaveFirstCalledTool(trace, 'read_file');
toHaveLastCalledTool(trace, 'write_file');
toNotHaveCalledTool(trace, 'bash');Tier 1 — Structural Assertions
toHaveMarkdownStructure(ctx, { headings: ['## Summary'] });
toMatchJsonSchema(ctx, schema);
toHaveLineCount(ctx, { min: 5, max: 50 });
toHaveFileWritten(trace, 'output.md');Tier 2 — Content Assertions
toContain(ctx, 'security vulnerability');
toNotContain(ctx, 'TODO');
toMention(ctx, 'eval');
toContainAll(ctx, ['injection', 'sanitize']);
toContainAny(ctx, ['warning', 'error']);Tier 3 — Semantic Assertions (async, requires embeddings)
await toBeSemanticallySimilar(ctx, 'security analysis report', { threshold: 0.8 });
await toHaveSemanticOverlap(ctx, ['security', 'code review']);Tier 4 — Judge Assertions (async, requires LLM)
await toPassJudge(ctx, 'Does the review identify the eval() vulnerability?');
await toMatchTrajectory(ctx, expectedTrajectory);Conditional Matchers
import { when, calledTool, calledToolWith } from '@tracepact/core';
when(trace, calledTool('bash'), toHaveCalledTool(trace, 'read_file'));
when(trace, calledToolWith('read_file', { path: /test/ }), toNotHaveCalledTool(trace, 'bash'));RAG Assertions
toHaveRetrievedDocument(ctx, 'doc-id');
toHaveCitedSources(ctx, ['source-1', 'source-2']);
toNotHaveHallucinated(ctx, groundTruthDocs);MCP Assertions
toHaveCalledMcpTool(trace, 'server-name', 'tool-name');
toHaveCalledMcpServer(trace, 'server-name');
toHaveCalledMcpToolsInOrder(trace, [['server', 'tool1'], ['server', 'tool2']]);Additional Features
Cassette Recording & Replay
Record live LLM interactions and replay them for deterministic CI:
// Record
const result = await executePrompt('SKILL.md', {
prompt: 'test',
record: 'cassettes/test.json',
});
// Replay
const replayed = await executePrompt('SKILL.md', {
prompt: 'test',
replay: 'cassettes/test.json',
});Caching
import { CacheStore } from '@tracepact/core';
const cache = new CacheStore({ dir: '.cache', ttlSeconds: 3600 });Redaction
Strip sensitive data before publishing results:
import { RedactionPipeline } from '@tracepact/core';
const pipeline = new RedactionPipeline({
rules: [{ pattern: /sk-[a-zA-Z0-9]+/g, replacement: '[REDACTED]' }],
redactEnvValues: ['OPENAI_API_KEY'],
});
const cleaned = pipeline.redact(text);Audit
Static analysis of skill files:
import { AuditEngine, BUILTIN_RULES } from '@tracepact/core';
const engine = new AuditEngine(BUILTIN_RULES);
const report = engine.audit(parsedSkill);
// report.findings, report.riskLevel, report.passModel Registry
import { listProviders, listModels, getRecommended } from '@tracepact/core';
const providers = listProviders();
const models = listModels('openai');
const recommended = getRecommended('agent');Dependencies
openai— OpenAI API clientyaml— YAML parsingstemmer— Text stemming for content matching
Optional peer dependencies:
@anthropic-ai/sdk— For Anthropic providerzod— For typed tool definitions
License
MIT
