@langwatch/scenario

v0.4.4

Published

4 days ago

A TypeScript library for testing AI agents using scenarios

Downloads

6,883

0High
0Medium
0Low

rchaves

drew.langwatch

0xdeafcafe-langwatch

Scenario

scenario

A powerful TypeScript library for testing AI agents in realistic, scripted scenarios.

Scenario provides a declarative DSL for defining test cases, allowing you to control conversation flow, simulate user behavior, and evaluate agent performance against predefined criteria.

Features

Declarative DSL: Write clear and concise tests with a simple, powerful scripting language.
Realistic Simulation: Use the userSimulatorAgent to generate natural user interactions.
Automated Evaluation: Employ the judgeAgent to automatically assess conversations against success criteria.
Flexible & Extensible: Easily integrate any AI agent that conforms to a simple AgentAdapter interface.
Detailed Reporting: Get rich results including conversation history, success/failure reasoning, and performance metrics.
TypeScript First: Full type safety and autocompletion in your editor.

Installation

pnpm add @langwatch/scenario
# or
npm install @langwatch/scenario
# or
yarn add @langwatch/scenario

Quick Start

Create your first scenario test in under a minute.

// echo.test.ts
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";

// 1. Create an adapter for your agent
const echoAgent: AgentAdapter = {
  role: AgentRole.AGENT,
  call: async (input) => {
    // This agent simply echoes back the last message content
    const lastMessage = input.messages[input.messages.length - 1];
    return `You said: ${lastMessage.content}`;
  },
};

// 2. Define and run your scenario
async function testEchoAgent() {
  const result = await scenario.run({
    name: "Echo Agent Test",
    description: "The agent should echo back the user's message.",
    agents: [echoAgent],
    script: [
      scenario.user("Hello world!"),
      scenario.agent("You said: Hello world!"), // You can assert the agent's response directly
      scenario.succeed("Agent correctly echoed the message."),
    ],
  });

  if (result.success) {
    console.log("✅ Scenario passed!");
  } else {
    console.error(`❌ Scenario failed: ${result.reasoning}`);
  }
}

testEchoAgent();

Usage with a Test Runner

Scenario integrates seamlessly with test runners like Vitest or Jest. Here's a more advanced example testing an AI-powered weather agent.

// weather.test.ts
import { describe, it, expect } from "vitest";
import { openai } from "@ai-sdk/openai";
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";
import { generateText, tool } from "ai";
import { z } from "zod/v4";

describe("Weather Agent", () => {
  it("should get the weather for a city", async () => {
    // 1. Define the tools your agent can use
    const getCurrentWeather = tool({
      description: "Get the current weather in a given city.",
      parameters: z.object({
        city: z.string().describe("The city to get the weather for."),
      }),
      execute: async ({ city }) =>
        `The weather in ${city} is cloudy with a temperature of 24°C.`,
    });

    // 2. Create an adapter for your agent
    const weatherAgent: AgentAdapter = {
      role: AgentRole.AGENT,
      call: async (input) => {
        const response = await generateText({
          model: openai("gpt-4.1-mini"),
          system: `You are a helpful assistant that may help the user with weather information.`,
          messages: input.messages,
          tools: { get_current_weather: getCurrentWeather },
        });

        if (response.toolCalls && response.toolCalls.length > 0) {
          const toolCall = response.toolCalls[0];
          // Agent executes the tool directly and returns both messages
          const toolResult = await getCurrentWeather.execute(
            toolCall.input as { city: string },
            {
              toolCallId: toolCall.toolCallId,
              messages: input.messages,
            }
          );
          return [
            {
              role: "assistant",
              content: [
                {
                  type: "tool-call",
                  toolName: toolCall.toolName,
                  toolCallId: toolCall.toolCallId,
                  input: toolCall.input,
                },
              ],
            },
            {
              role: "tool",
              content: [
                {
                  type: "tool-result",
                  toolName: toolCall.toolName,
                  toolCallId: toolCall.toolCallId,
                  output: { type: "text", value: toolResult as string },
                },
              ],
            },
          ];
        }

        return response.text;
      },
    };

    // 3. Define and run your scenario
    const result = await scenario.run({
      name: "Checking the weather",
      description:
        "The user asks for the weather in a specific city, and the agent should use the weather tool to find it.",
      agents: [
        weatherAgent,
        scenario.userSimulatorAgent({ model: openai("gpt-4.1-mini") }),
      ],
      script: [
        scenario.user("What's the weather like in Barcelona?"),
        scenario.agent(),
        // You can use inline assertions within your script
        (state) => {
          expect(state.hasToolCall("get_current_weather")).toBe(true);
        },
        scenario.succeed("Agent correctly used the weather tool."),
      ],
    });

    // 4. Assert the final result
    expect(result.success).toBe(true);
  }, 30_000); // 30s test timeout
});

Core Concepts

`run(config)`

The main function to execute a scenario. It takes a configuration object and returns a promise that resolves with the final ScenarioResult.

`ScenarioConfig`

The configuration object for a scenario.

name: string: A human-readable name for the scenario.
description: string: A detailed description of what the scenario tests.
agents: AgentAdapter[]: A list of agents participating in the scenario.
script?: ScriptStep[]: An optional array of steps to control the scenario flow. If not provided, the scenario will proceed automatically.
maxTurns?: number: The maximum number of conversation turns before a timeout. Defaults to 10.
verbose?: boolean: Enables detailed logging during execution.
setId?: string: (Optional) Groups related scenarios into a test suite ("Simulation Set"). Useful for organizing and tracking scenarios in the UI and reporting. If not provided, the scenario will not be grouped into a set.

Agents

Agents are the participants in a scenario. They are defined by the AgentAdapter interface.

export interface AgentAdapter {
  role: AgentRole; // USER, AGENT, or JUDGE
  call: (input: AgentInput) => Promise<AgentReturnTypes>;
}

Scenario provides built-in agents for common testing needs:

userSimulatorAgent(config): Simulates a human user, generating realistic messages based on the scenario description.
judgeAgent(config): Evaluates the conversation against a set of criteria and determines if the scenario succeeds or fails.

Scripting

Scripts provide fine-grained control over the scenario's execution. A script is an array of ScriptStep functions.

A ScriptStep is a function that receives the current ScenarioExecutionState and the ScenarioExecutionLike context.

Built-in Script Steps:

user(content?): A user turn. If content is provided, it's used as the message. Otherwise, the userSimulatorAgent generates one.
agent(content?): An agent turn. If content is provided, it's used as the message. Otherwise, the agent under test generates a response.
judge(content?): Forces the judgeAgent to make a decision.
message(message): Adds a specific CoreMessage to the conversation.
proceed(turns?, onTurn?, onStep?): Lets the scenario run automatically.
succeed(reasoning?): Ends the scenario with a success verdict.
fail(reasoning?): Ends the scenario with a failure verdict.

You can also provide your own functions as script steps for making assertions:

import { expect } from "vitest";

const script = [
  user("Hello"),
  agent(),
  (state) => {
    // Make assertions on the state
    expect(state.lastAssistantMessage?.content).toContain("Hi there");
  },
  succeed(),
];

Configuration

You can configure project-wide defaults by creating a scenario.config.js or scenario.config.mjs file in your project root.

// scenario.config.mjs
import { defineConfig } from "@langwatch/scenario/config";
import { openai } from "@ai-sdk/openai";

export default defineConfig({
  // Set a default model provider for all agents (e.g., userSimulatorAgent, judgeAgent)
  defaultModel: {
    model: openai("gpt-4o-mini"),
    temperature: 0.1,
  },
});

The library will automatically load this configuration.

All Configuration Options

The following configuration options are all optional. You can specify any combination of them in your scenario.config.js file.

defaultModel (Optional): An object to configure the default AI model for all agents.
- model: (Required if defaultModel is set) An instance of a language model from a provider like @ai-sdk/openai.
- temperature (Optional): The default temperature for the model (e.g., 0.1).
- maxTokens (Optional): The default maximum number of tokens for the model to generate.

Environment Variables

You can control the library's behavior with the following environment variables:

LOG_LEVEL: Sets the verbosity of the internal logger. Can be error, warn, info, or debug. By default, logging is silent.
SCENARIO_DISABLE_SIMULATION_REPORT_INFO: Set to true to disable the "Scenario Simulation Reporting" banner that is printed to the console when a test run starts.
LANGWATCH_API_KEY: Your LangWatch API key. This is used as a fallback if langwatchApiKey is not set in your config file.
LANGWATCH_ENDPOINT: The LangWatch reporting endpoint. This is used as a fallback if langwatchEndpoint is not set in your config file.

Grouping Scenarios with setId

You can group related scenarios into a set ("Simulation Set") by providing the setId option. This is useful for organizing your scenarios in the UI and for reporting in LangWatch.

const result = await scenario.run({
  name: "my first scenario",
  description: "A simple test to see if the agent responds.",
  setId: "my-test-suite", // Group this scenario into a set
  agents: [myAgent, scenario.userSimulatorAgent()],
});

This will group all scenarios with the same setId together in the LangWatch UI and reporting tools.

The setupFiles entry enables Scenario's event logging for each test.
The custom VitestReporter provides detailed scenario test reports in your test output.

Vitest Integration

Scenario provides a convenient helper function to enhance your Vitest configuration with all the necessary setup files.

Using the withScenario Helper (Recommended)

// vitest.config.ts
import { defineConfig } from "vitest/config";
import { withScenario } from "@langwatch/scenario/integrations/vitest/config";
import VitestReporter from "@langwatch/scenario/integrations/vitest/reporter";

export default withScenario(
  defineConfig({
    test: {
      testTimeout: 180000, // 3 minutes, or however long you want to wait for the scenario to run
      // Your existing setup files will be preserved and run after Scenario's setup
      setupFiles: ["./my-custom-setup.ts"],
      // Your existing global setup files will be preserved and run after Scenario's global setup
      globalSetup: ["./my-global-setup.ts"],
      // Optional: Add the Scenario reporter for detailed test reports
      reporters: ["default", new VitestReporter()],
    },
  })
);

The withScenario helper automatically:

Adds Scenario's setup files for event logging
Adds Scenario's global setup files
Preserves any existing setup configuration you have
Handles both string and array configurations for setup files

Manual Configuration

If you prefer to configure Vitest manually, you can add the Scenario setup files directly:

// vitest.config.ts
import { defineConfig } from "vitest/config";
import VitestReporter from "@langwatch/scenario/integrations/vitest/reporter";

export default defineConfig({
  test: {
    testTimeout: 180000, // 3 minutes, or however long you want to wait for the scenario to run
    setupFiles: ["@langwatch/scenario/integrations/vitest/setup"],
    // Optional: Add the Scenario reporter for detailed test reports
    reporters: ["default", new VitestReporter()],
  },
});

This configuration:

The setupFiles entry enables Scenario's event logging for each test
The custom VitestReporter provides detailed scenario test reports in your test output (optional)

Development

This project uses pnpm for package management.

Getting Started

# Install dependencies
pnpm install

# Build the project
pnpm run build

# Run tests
pnpm test

License

MIT

SCENARIO_BATCH_RUN_ID

When running scenario tests, you can set the SCENARIO_BATCH_RUN_ID environment variable to uniquely identify a batch of test runs. This is especially useful for grouping results in reporting tools and CI pipelines.

Example:

SCENARIO_BATCH_RUN_ID=my-ci-run-123 pnpm test

If you use the provided test script, a unique batch run ID is generated automatically for each run.