npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@goyamegh/agent-health

v0.5.23

Published

Agent Evaluation and Observability Framework

Readme

License npm version Build SDK Status


What is Agent Health?

Agent Health is an evaluation and observability framework for AI agents, built on OpenSearch. It helps you measure agent performance through "Golden Path" trajectory comparison — where an LLM judge evaluates agent actions against expected outcomes — and provides deep observability into agent execution via OpenTelemetry traces.

Who uses Agent Health:

  • AI teams building autonomous agents (RCA, customer support, data analysis, retrieval/discovery)
  • Teams comparing coding agents and multi-agent workflows across models, prompts, and context strategies
  • QA engineers testing agent behavior across scenarios
  • Platform teams monitoring agent performance in production
  • Developers using AI coding agents who want visibility into usage, costs, and productivity

See it in action: Watch the demo video on YouTube


AI Agent Skills

Agent Health ships with built-in skill files for Claude Code and Kiro that teach your AI coding agent how to work with this project effectively. Copy the relevant directory into your workspace to unlock project-aware assistance:

| Skill | Claude Code | Kiro | What it does | |-------|-------------|------|--------------| | Add Connector | .claude/skills/add-connector/SKILL.md | .kiro/steering/add-connector.md | Guides creation of custom agent connectors | | Write Test | .claude/skills/write-test/SKILL.md | .kiro/steering/write-test.md | Project test conventions, mocking patterns, coverage thresholds | | Create PR | .claude/skills/create-pr/SKILL.md | .kiro/steering/create-pr.md | PR workflow with DCO signoff and CI compliance | | Config & Auth | .claude/skills/config-auth/SKILL.md | — | Config loading, AWS auth, multi-profile setup | | Instrument with OTel | .claude/skills/instrument-otel/SKILL.md | — | OpenTelemetry GenAI span structure + config for Agent Health | | Agent Health | .claude/skills/agent-health/SKILL.md | .kiro/steering/agent-health.md | Evaluate, benchmark & improve agents with the agent-health CLI/APIs |

To use these skills:

  • Claude Code — Skills in .claude/skills/ are auto-discovered when the directory exists in your workspace root. No extra setup needed.
  • Kiro — Copy .kiro/steering/ to your workspace root. Kiro loads steering files automatically.

Installation

Get Agent Health running in minutes. Choose the option that best suits your needs:

Option 1: NPX (Fastest — No Setup)

# Start Agent Health with demo data (no configuration needed)
npx @opensearch-project/agent-health

Opens http://localhost:4001 with pre-loaded sample data for exploration. If port 4001 is already in use, the server automatically tries the next available port (4002, 4003, etc., up to 10 attempts).

Option 2: Docker Compose

For the full observability stack with OpenSearch, OpenTelemetry Collector, and Data Prepper for trace ingestion:

Quick start (one command):

curl -fsSL https://raw.githubusercontent.com/opensearch-project/agent-health/main/scripts/install.sh | bash

This clones the repo, starts the Docker stack, waits for OpenSearch, auto-configures agent-health.config.json, and launches Agent Health.

# Clone the repository
git clone https://github.com/opensearch-project/agent-health.git
cd agent-health

# Start the OpenSearch observability stack
docker compose up -d

# Copy Docker environment configuration
cp .env.docker .env

# Start Agent Health (connects to local OpenSearch automatically)
npx @opensearch-project/agent-health

This brings up:

  • OpenSearch — Stores traces, test cases, benchmarks, and evaluation results
  • OpenTelemetry Collector — Receives telemetry data via OTLP (ports 4317/4318)
  • Data Prepper — Transforms and enriches traces before OpenSearch ingestion

Prerequisites: Docker Desktop with 4GB+ memory allocated. See docker-compose.yml for configuration options.

Option 3: AWS CloudFormation (Managed OpenSearch)

Deploy a fully managed observability backend using the included CloudFormation template:

aws cloudformation create-stack \
  --stack-name AgentHealthObservability \
  --template-body file://deployment/cloudformation/agent-health-observability.yaml \
  --capabilities CAPABILITY_NAMED_IAM

This deploys:

  • Amazon OpenSearch Service domain or OpenSearch Serverless collection for trace storage
  • OpenSearch Ingestion (OSIS) pipeline for OTLP data collection
  • IAM roles for pipeline execution and agent telemetry ingestion

Both Amazon OpenSearch Service domains and OpenSearch Serverless collections are supported. Set OPENSEARCH_STORAGE_AWS_SERVICE=es for managed domains or OPENSEARCH_STORAGE_AWS_SERVICE=aoss for Serverless collections. Both use SigV4 authentication (OPENSEARCH_STORAGE_AUTH_TYPE=sigv4). See docs/CONFIGURATION.md for details.

After deployment, connect it to Agent Health:

npx @opensearch-project/agent-health configure --from-stack AgentHealthObservability

Or manually copy the AgentHealthConfigJSON stack output into your agent-health.config.json. See deployment/cloudformation/ for details and regional Launch Stack URLs.

Next Steps


Features

Agent Evaluation & Observability

| Feature | Description | |---------|-------------| | Evals | Real-time agent evaluation with trajectory streaming | | Experiments | Batch evaluation runs with configurable parameters | | Compare | Side-by-side trace comparison with aligned and merged views | | Agent Traces | Table-based trace view with latency histogram, filtering, and detailed flyout | | Live Traces | Real-time trace monitoring with auto-refresh and filtering | | Trace Views | Timeline and Flow visualizations for debugging | | Reports | Evaluation reports with LLM judge reasoning | | Connectors | Pluggable protocol adapters (AG-UI SSE, REST, CLI, Claude Code) |

Coding Agent Analytics

A unified dashboard for monitoring AI coding agent usage across Claude Code, Kiro, and Codex CLI. Zero configuration — just run agent-health and it auto-detects installed agents.

  • Multi-agent dashboard: Session history, cost estimation, tool usage, activity patterns, and efficiency metrics
  • 9 analytics tabs: Overview, Sessions, Projects, Costs, Activity, Efficiency, Tools, Advanced, and Workspace management
  • Interactive drill-downs: Click any chart, card, or metric to drill into filtered session views
  • Workspace management: View and edit Claude Code memory files, plans, tasks; browse Kiro MCP servers, agents, and extensions
  • Privacy-first: All data stays local — reads directly from ~/.claude/, ~/.kiro/, ~/.codex/

Full Coding Agent Analytics documentation

Supported Connectors

| Connector | Protocol | Description | |-----------|----------|-------------| | agui-streaming | AG-UI SSE | ML-Commons agents (default) | | rest | HTTP POST | Non-streaming REST APIs | | openai-compatible | OpenAI Chat | LiteLLM, Ollama, vLLM | | strands | Bedrock Agent Runtime | Amazon Strands agents (server-only) | | langgraph | LangGraph REST | Non-AG-UI LangGraph instances | | subprocess | CLI | Command-line tools | | claude-code | Claude CLI | Claude Code agent comparison | | kiro | Kiro CLI | Kiro coding agent | | pi | Pi CLI | Pi coding agent | | mock | In-memory | Demo and testing |

For creating custom connectors, see docs/CONNECTORS.md.

Observio Sample Agent

Agent Health includes Observio, a reference ReAct agent you can use as a practice target for evaluating and improving agent performance:

cd observio-sample-agent && npm install && npm run start:ag-ui
npx @opensearch-project/agent-health run -t demo-otel-001 -a observio

See the Observio README for details.


Architecture

Agent Health uses a client-server architecture where all clients (UI, CLI) access OpenSearch through a unified HTTP API. The server handles agent communication via pluggable connectors and proxies LLM judge calls to AWS Bedrock.

For detailed architecture documentation, see docs/ARCHITECTURE.md.


Quick Configuration

Agent Health works out-of-the-box with demo data. Configure when you're ready to connect your own agent:

# Generate a config file with examples
npx @opensearch-project/agent-health init
// agent-health.config.ts
export default {
  agents: [
    {
      key: "my-agent",
      name: "My Agent",
      endpoint: "http://localhost:8000/agent",
      connectorType: "rest",  // or "agui-streaming", "langgraph", "strands", "subprocess"
      models: ["claude-sonnet-4"],
      useTraces: true,        // Enable OpenTelemetry trace collection (default: false)
    }
  ],
};

Tip: Run npx @opensearch-project/agent-health doctor to verify your configuration is loaded correctly.

For full configuration options including authentication hooks and environment variables, see CONFIGURATION.md.


Star History

If you find Agent Health useful, please consider giving us a star! Your support helps us grow our community and continue improving the project.

Star History Chart


Contributing

We welcome contributions! There are many ways to get involved:

Development Quick Start

git clone https://github.com/opensearch-project/agent-health.git
cd agent-health
npm install
npm run dev          # Frontend on port 4000
npm run dev:server   # Backend on port 4001

Port conflicts: If port 4001 is already in use, the backend server automatically tries 4002, 4003, etc. (up to 10 attempts). The actual port is displayed in the console output.

All commits require DCO signoff (git commit -s) and all PRs must pass CI checks.

For detailed development setup, testing, CI pipeline, debugging, and troubleshooting, see the Developer Guide. For full contribution guidelines, see CONTRIBUTING.md.


Documentation

| Guide | Description | |-------|-------------| | Getting Started | Step-by-step walkthrough from install to first evaluation | | Configuration | Connect your agent and configure the environment | | CLI Reference | Command-line interface documentation | | Code-Based SDK | Write evaluations as .eval.js / .eval.ts test files (experimental) | | Skill Evaluator | A/B-benchmark and improve a SKILL.md | | Instrument with OTel | OpenTelemetry instrumentation for Agent Health | | Coding Agent Analytics | Multi-agent dashboard and remote server monitoring | | Observio Sample Agent | Reference agent for practicing evaluations | | Developer Guide | Development setup, testing, CI, debugging | | Connectors Guide | Create custom connectors for your agent type | | Architecture | System design and patterns | | ML-Commons Setup | OpenSearch ML-Commons integration |