@reaatech/agent-eval-harness-tool-use

v0.1.0

Published

13 days ago

Tool-use validation (selection, schema compliance, result verification) for agent-eval-harness

Downloads

158

0High
0Medium
0Low

reaatech

@reaatech/agent-eval-harness-tool-use

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Tool-call validation and result verification for agent trajectories. Validates tool selection against schemas, checks argument compliance, detects hallucinated results, and verifies proper result integration into agent responses.

Installation

npm install @reaatech/agent-eval-harness-tool-use

Feature Overview

Tool selection validation — checks that the agent picked the right tool for the task
Schema compliance — validates tool arguments against JSON Schema or custom ToolSchema definitions
Result verification — detects hallucinated results that don't match actual tool output
Integration checking — verifies tool results are properly used in agent responses
13 issue types — structured categorization of tool-use problems from critical (missing tool name) to low (result unused)
Trajectory-wide summarization — aggregate result verification across all tool calls

Quick Start

import { validateToolCall, createToolSchema, verifyResult } from '@reaatech/agent-eval-harness-tool-use';
import type { ToolCall, Turn } from '@reaatech/agent-eval-harness-types';

const schema = createToolSchema('send_email', {
  properties: { to: { type: 'string', format: 'email' }, subject: { type: 'string' } },
  required: ['to']
});

const call: ToolCall = { name: 'send_email', arguments: { to: '[email protected]', subject: 'Hi' }, result: { status: 'sent' } };
const turn: Turn = { turn_id: 2, role: 'agent', content: 'Email sent!', timestamp: '2026-04-15T00:00:00Z', tool_calls: [call] };

const validation = validateToolCall(call, schema);
console.log(`Valid: ${validation.valid}, Score: ${validation.score}`);

const verification = verifyResult(call, turn);
console.log(`Hallucinated: ${verification.hallucinated}, Integrated: ${verification.integrated}`);

API Reference

Validation Functions

| Export | Signature | Description | |--------|-----------|-------------| | validateTrajectory | (trajectory: Trajectory, toolSchemas?: Record<string, ToolSchema>, options?: ValidateOptions) => ValidationResult[] | Validates all tool calls across every agent turn in a trajectory. Returns one ValidationResult per agent turn with tool calls. | | validateTurn | (turn: Turn, toolSchemas?: Record<string, ToolSchema>, options?: ValidateOptions) => ValidationResult | Validates all tool calls in a single turn. Handles missing_tool_name, unknown_tool, deprecated_tool, missing_arguments, missing_result, schema violations, and hallucination detection. | | validateToolCall | (toolCall: ToolCall, schema?: ToolSchema, options?: ValidateOptions) => ValidationResult | Validates a single tool call against an optional schema. Convenience wrapper that creates a synthetic turn internally. |

Schema Functions

| Export | Signature | Description | |--------|-----------|-------------| | validateSchema | (toolCall: ToolCall, schema: ToolSchema) => SchemaValidationResult | Deep schema validation of tool arguments against a ToolSchema. Checks required fields, types, enums, formats (email, uri, date, date-time), and nested object/array properties. | | createToolSchema | (name: string, jsonSchema: Record<string, unknown>, description?: string) => ToolSchema | Creates a ToolSchema from a JSON Schema-like definition. Converts properties and required arrays into the internal ToolSchema parameter structure. |

Result Verification Functions

| Export | Signature | Description | |--------|-----------|-------------| | verifyResult | (toolCall: ToolCall, turn: Turn, trajectory?: Trajectory, options?: VerifyOptions) => ResultVerificationResult | Verifies a single tool call's result against the agent's response. Checks for hallucination, result integration, contradictions, and missing/empty/error results. Accepts optional full trajectory for cross-turn usage detection. | | verifyTurnResults | (turn: Turn, trajectory?: Trajectory, options?: VerifyOptions) => ResultVerificationResult[] | Runs verifyResult on every tool call in a turn. Returns an array of verification results. | | summarizeResultVerification | (trajectory: Trajectory, options?: VerifyOptions) => { totalTools, validResults, hallucinatedResults, integratedResults, averageScore, issues } | Aggregates result verification across an entire trajectory. Returns counts for total tools, valid results, hallucinated results, integrated results, average score, and all issues. |

Types

ToolSchema

interface ToolSchema {
  name: string;
  description?: string;
  parameters: {
    type: 'object';
    properties: Record<string, ParameterSchema>;
    required?: string[];
  };
  deprecated?: boolean;
  replacedBy?: string;
}

interface ParameterSchema {
  type: 'string' | 'number' | 'boolean' | 'object' | 'array';
  description?: string;
  enum?: unknown[];
  format?: string;
  items?: ParameterSchema;
  properties?: Record<string, ParameterSchema>;
}

ValidationResult

interface ValidationResult {
  valid: boolean;          // true if no critical issues
  issues: ToolUseIssue[];  // all detected issues
  suggestions: string[];   // remediation suggestions (e.g., deprecated tool replacement)
  score: number;           // 0.0–1.0 weighted by issue severity
}

interface ToolUseIssue {
  type: ToolUseIssueType;
  severity: 'low' | 'medium' | 'high' | 'critical';
  description: string;
  turnId?: number;
  toolName?: string;
  details?: Record<string, unknown>;
}

ValidateOptions

interface ValidateOptions {
  allowUnknownTools?: boolean;   // default: false — set true to skip unknown tool errors
  validateSchemas?: boolean;     // default: true — enable parameter-level schema checks
  checkResultUsage?: boolean;    // default: true — check for unused tool results
  detectHallucination?: boolean; // default: true — check for fabricated result usage
  strict?: boolean;              // default: false — when true, score drops to 0.0 if any high/critical issue
}

SchemaValidationResult

interface SchemaValidationResult {
  valid: boolean;
  issues: SchemaIssue[];
  score: number;
}

interface SchemaIssue {
  type: string;         // e.g., 'missing_arguments', 'type_error', 'invalid_format', 'required_field_missing'
  severity: 'low' | 'medium' | 'high' | 'critical';
  path: string;         // dot-notation path to the problematic parameter
  message: string;
  expected?: unknown;
  actual?: unknown;
}

ResultVerificationResult

interface ResultVerificationResult {
  valid: boolean;
  issues: ResultIssue[];
  score: number;
  hallucinated: boolean;  // true if hallucination score exceeds threshold
  integrated: boolean;    // true if result values appear in the agent response
}

interface ResultIssue {
  type: ResultIssueType;
  severity: 'low' | 'medium' | 'high' | 'critical';
  description: string;
  turnId?: number;
  toolName?: string;
  details?: Record<string, unknown>;
}

VerifyOptions

interface VerifyOptions {
  checkUsage?: boolean;             // default: true — verify result usage in response
  detectHallucination?: boolean;    // default: true — detect fabricated result content
  checkContradictions?: boolean;    // default: true — catch result/response contradictions
  hallucinationThreshold?: number;  // default: 0.3 — score above this triggers hallucinated flag
}

Enums

ToolUseIssueType (13 values)

| Value | Severity | Description | |-------|----------|-------------| | missing_tool_name | critical | Tool call has no name field | | missing_arguments | high | Tool call has no arguments field | | invalid_arguments | medium | Argument value not in allowed enum | | tool_not_found | high | Tool name not in provided schemas | | tool_misuse | medium | Tool used incorrectly for the context | | missing_result | medium | Tool was called but no result returned | | result_unused | low | Tool result fields not found in agent response | | hallucinated_result | high | Agent response references data not in the actual tool result | | schema_violation | high | Arguments fail schema-level validation | | type_mismatch | high | Argument type does not match schema (e.g., string for number) | | missing_required_param | high | Required parameter missing from arguments | | unknown_tool | high/medium | Tool name not recognized; severity depends on strict mode | | deprecated_tool | medium | Tool is marked as deprecated; suggestion includes replacement |

ResultIssueType (8 values)

| Value | Severity | Description | |-------|----------|-------------| | missing_result | medium | Tool call has no result object | | empty_result | low | Tool returned an empty result ({}) | | error_result | high | Result status is 'error' | | hallucinated_content | high | Response contains fabricated data not in the result | | unused_result | medium | Result values not referenced in agent response | | contradicts_response | high | Result indicates success but response says failure (or vice versa) | | incomplete_integration | medium | Only partial result data used in response | | malformed_result | high | Result structure is unexpected or invalid |

Related Packages

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@reaatech/agent-eval-harness-tool-use

Installation

Feature Overview

Quick Start

API Reference

Validation Functions

Schema Functions

Result Verification Functions

Types

ToolSchema

ValidationResult

ValidateOptions

SchemaValidationResult

ResultVerificationResult

VerifyOptions

Enums

ToolUseIssueType (13 values)

ResultIssueType (8 values)

Related Packages

License