npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@reaatech/agent-eval-harness-tool-use

v0.1.0

Published

Tool-use validation (selection, schema compliance, result verification) for agent-eval-harness

Downloads

158

Readme

@reaatech/agent-eval-harness-tool-use

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Tool-call validation and result verification for agent trajectories. Validates tool selection against schemas, checks argument compliance, detects hallucinated results, and verifies proper result integration into agent responses.

Installation

npm install @reaatech/agent-eval-harness-tool-use

Feature Overview

  • Tool selection validation — checks that the agent picked the right tool for the task
  • Schema compliance — validates tool arguments against JSON Schema or custom ToolSchema definitions
  • Result verification — detects hallucinated results that don't match actual tool output
  • Integration checking — verifies tool results are properly used in agent responses
  • 13 issue types — structured categorization of tool-use problems from critical (missing tool name) to low (result unused)
  • Trajectory-wide summarization — aggregate result verification across all tool calls

Quick Start

import { validateToolCall, createToolSchema, verifyResult } from '@reaatech/agent-eval-harness-tool-use';
import type { ToolCall, Turn } from '@reaatech/agent-eval-harness-types';

const schema = createToolSchema('send_email', {
  properties: { to: { type: 'string', format: 'email' }, subject: { type: 'string' } },
  required: ['to']
});

const call: ToolCall = { name: 'send_email', arguments: { to: '[email protected]', subject: 'Hi' }, result: { status: 'sent' } };
const turn: Turn = { turn_id: 2, role: 'agent', content: 'Email sent!', timestamp: '2026-04-15T00:00:00Z', tool_calls: [call] };

const validation = validateToolCall(call, schema);
console.log(`Valid: ${validation.valid}, Score: ${validation.score}`);

const verification = verifyResult(call, turn);
console.log(`Hallucinated: ${verification.hallucinated}, Integrated: ${verification.integrated}`);

API Reference

Validation Functions

| Export | Signature | Description | |--------|-----------|-------------| | validateTrajectory | (trajectory: Trajectory, toolSchemas?: Record<string, ToolSchema>, options?: ValidateOptions) => ValidationResult[] | Validates all tool calls across every agent turn in a trajectory. Returns one ValidationResult per agent turn with tool calls. | | validateTurn | (turn: Turn, toolSchemas?: Record<string, ToolSchema>, options?: ValidateOptions) => ValidationResult | Validates all tool calls in a single turn. Handles missing_tool_name, unknown_tool, deprecated_tool, missing_arguments, missing_result, schema violations, and hallucination detection. | | validateToolCall | (toolCall: ToolCall, schema?: ToolSchema, options?: ValidateOptions) => ValidationResult | Validates a single tool call against an optional schema. Convenience wrapper that creates a synthetic turn internally. |

Schema Functions

| Export | Signature | Description | |--------|-----------|-------------| | validateSchema | (toolCall: ToolCall, schema: ToolSchema) => SchemaValidationResult | Deep schema validation of tool arguments against a ToolSchema. Checks required fields, types, enums, formats (email, uri, date, date-time), and nested object/array properties. | | createToolSchema | (name: string, jsonSchema: Record<string, unknown>, description?: string) => ToolSchema | Creates a ToolSchema from a JSON Schema-like definition. Converts properties and required arrays into the internal ToolSchema parameter structure. |

Result Verification Functions

| Export | Signature | Description | |--------|-----------|-------------| | verifyResult | (toolCall: ToolCall, turn: Turn, trajectory?: Trajectory, options?: VerifyOptions) => ResultVerificationResult | Verifies a single tool call's result against the agent's response. Checks for hallucination, result integration, contradictions, and missing/empty/error results. Accepts optional full trajectory for cross-turn usage detection. | | verifyTurnResults | (turn: Turn, trajectory?: Trajectory, options?: VerifyOptions) => ResultVerificationResult[] | Runs verifyResult on every tool call in a turn. Returns an array of verification results. | | summarizeResultVerification | (trajectory: Trajectory, options?: VerifyOptions) => { totalTools, validResults, hallucinatedResults, integratedResults, averageScore, issues } | Aggregates result verification across an entire trajectory. Returns counts for total tools, valid results, hallucinated results, integrated results, average score, and all issues. |

Types

ToolSchema

interface ToolSchema {
  name: string;
  description?: string;
  parameters: {
    type: 'object';
    properties: Record<string, ParameterSchema>;
    required?: string[];
  };
  deprecated?: boolean;
  replacedBy?: string;
}

interface ParameterSchema {
  type: 'string' | 'number' | 'boolean' | 'object' | 'array';
  description?: string;
  enum?: unknown[];
  format?: string;
  items?: ParameterSchema;
  properties?: Record<string, ParameterSchema>;
}

ValidationResult

interface ValidationResult {
  valid: boolean;          // true if no critical issues
  issues: ToolUseIssue[];  // all detected issues
  suggestions: string[];   // remediation suggestions (e.g., deprecated tool replacement)
  score: number;           // 0.0–1.0 weighted by issue severity
}

interface ToolUseIssue {
  type: ToolUseIssueType;
  severity: 'low' | 'medium' | 'high' | 'critical';
  description: string;
  turnId?: number;
  toolName?: string;
  details?: Record<string, unknown>;
}

ValidateOptions

interface ValidateOptions {
  allowUnknownTools?: boolean;   // default: false — set true to skip unknown tool errors
  validateSchemas?: boolean;     // default: true — enable parameter-level schema checks
  checkResultUsage?: boolean;    // default: true — check for unused tool results
  detectHallucination?: boolean; // default: true — check for fabricated result usage
  strict?: boolean;              // default: false — when true, score drops to 0.0 if any high/critical issue
}

SchemaValidationResult

interface SchemaValidationResult {
  valid: boolean;
  issues: SchemaIssue[];
  score: number;
}

interface SchemaIssue {
  type: string;         // e.g., 'missing_arguments', 'type_error', 'invalid_format', 'required_field_missing'
  severity: 'low' | 'medium' | 'high' | 'critical';
  path: string;         // dot-notation path to the problematic parameter
  message: string;
  expected?: unknown;
  actual?: unknown;
}

ResultVerificationResult

interface ResultVerificationResult {
  valid: boolean;
  issues: ResultIssue[];
  score: number;
  hallucinated: boolean;  // true if hallucination score exceeds threshold
  integrated: boolean;    // true if result values appear in the agent response
}

interface ResultIssue {
  type: ResultIssueType;
  severity: 'low' | 'medium' | 'high' | 'critical';
  description: string;
  turnId?: number;
  toolName?: string;
  details?: Record<string, unknown>;
}

VerifyOptions

interface VerifyOptions {
  checkUsage?: boolean;             // default: true — verify result usage in response
  detectHallucination?: boolean;    // default: true — detect fabricated result content
  checkContradictions?: boolean;    // default: true — catch result/response contradictions
  hallucinationThreshold?: number;  // default: 0.3 — score above this triggers hallucinated flag
}

Enums

ToolUseIssueType (13 values)

| Value | Severity | Description | |-------|----------|-------------| | missing_tool_name | critical | Tool call has no name field | | missing_arguments | high | Tool call has no arguments field | | invalid_arguments | medium | Argument value not in allowed enum | | tool_not_found | high | Tool name not in provided schemas | | tool_misuse | medium | Tool used incorrectly for the context | | missing_result | medium | Tool was called but no result returned | | result_unused | low | Tool result fields not found in agent response | | hallucinated_result | high | Agent response references data not in the actual tool result | | schema_violation | high | Arguments fail schema-level validation | | type_mismatch | high | Argument type does not match schema (e.g., string for number) | | missing_required_param | high | Required parameter missing from arguments | | unknown_tool | high/medium | Tool name not recognized; severity depends on strict mode | | deprecated_tool | medium | Tool is marked as deprecated; suggestion includes replacement |

ResultIssueType (8 values)

| Value | Severity | Description | |-------|----------|-------------| | missing_result | medium | Tool call has no result object | | empty_result | low | Tool returned an empty result ({}) | | error_result | high | Result status is 'error' | | hallucinated_content | high | Response contains fabricated data not in the result | | unused_result | medium | Result values not referenced in agent response | | contradicts_response | high | Result indicates success but response says failure (or vice versa) | | incomplete_integration | medium | Only partial result data used in response | | malformed_result | high | Result structure is unexpected or invalid |

Related Packages

| Package | Description | |---------|-------------| | @reaatech/agent-eval-harness-types | Shared domain types and schemas | | @reaatech/agent-eval-harness-trajectory | Trajectory evaluation | | @reaatech/agent-eval-harness-tool-use | Tool-use validation | | @reaatech/agent-eval-harness-cost | Cost tracking | | @reaatech/agent-eval-harness-latency | Latency monitoring | | @reaatech/agent-eval-harness-judge | LLM-as-judge | | @reaatech/agent-eval-harness-golden | Golden trajectories | | @reaatech/agent-eval-harness-suite | Suite runner | | @reaatech/agent-eval-harness-gate | CI gates | | @reaatech/agent-eval-harness-mcp-server | MCP server | | @reaatech/agent-eval-harness-cli | CLI | | @reaatech/agent-eval-harness-observability | Observability |

License

MIT