npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@agent-assistant/telemetry

v0.4.35

Published

Usage, cost, and response telemetry primitives for Agent Assistant

Downloads

8,350

Readme

Agent Assistant Telemetry

Human Eval Helpers

@agent-assistant/telemetry/evals includes reusable helpers for product eval systems that keep cases human-authored while sharing deterministic checks and run artifacts.

Products can usually keep only a small wrapper script plus their own evals/suites/*/cases.md files. The shared package provides:

  • Markdown cases.md parsing and compilation to generated cases.jsonl.
  • JSONL suite loading and filtering by suite, case, or tag.
  • Deterministic checks for content, regexes, tool calls, routing metadata, stop reasons, and question counts.
  • Human-review tracking via Must, Must Not, and Human Review: true.
  • Run artifact writing: result.json, summary.md, and human-review.md.
  • A generic CLI run loop with pluggable product executors.
  • Provider executor helpers for local OpenCode one-shot runs and deeper Agent Relay handoffs.
  • CI summary rendering that fails on failed/skipped cases while listing needs-human cases for review.

Minimal product runner:

import {
  compileHumanEvalSuitesFromMarkdown,
  runHumanEvalCli,
  summarizeLatestHumanEvalRunForCi,
} from '@agent-assistant/telemetry/evals';
import path from 'node:path';

const rootDir = path.resolve(import.meta.dirname, '../..');

compileHumanEvalSuitesFromMarkdown({
  suitesDir: path.join(rootDir, 'evals', 'suites'),
});

const exitCode = await runHumanEvalCli({
  argv: process.argv.slice(2),
  rootDir,
  runsDir: path.join(rootDir, '.nightcto', 'evals', 'runs'),
  productName: 'NightCTO Evals',
  executors: {
    async nightcto(testCase, context) {
      // Invoke the product here and normalize to:
      // { content: string, toolCalls: Array<{ name: string }>, status?: string }
      return { content: String(testCase.input.message ?? ''), toolCalls: [] };
    },
  },
});

if (process.env.GITHUB_STEP_SUMMARY) {
  summarizeLatestHumanEvalRunForCi({
    rootDir,
    runsDir: path.join(rootDir, '.nightcto', 'evals', 'runs'),
    githubStepSummaryPath: process.env.GITHUB_STEP_SUMMARY,
    title: 'NightCTO Eval CI Summary',
  });
}

process.exit(exitCode);

Provider-Backed Runs

Provider-backed runs are opt-in so offline deterministic checks stay cheap. Use --provider to allow an executor to call a model or broker, and --executor to run existing manual cases through a provider without rewriting every case.

node scripts/evals/run-product-evals.mjs --provider --executor opencode --suite workflow-authoring

The OpenCode helper wraps the Agent Assistant harness CLI runner, so products can use free or local OpenCode models without OpenRouter credentials:

import {
  createOpenCodeHumanEvalExecutor,
  runHumanEvalCli,
} from '@agent-assistant/telemetry/evals';

const exitCode = await runHumanEvalCli({
  argv: process.argv.slice(2),
  rootDir,
  productName: 'Ricky Evals',
  executors: {
    opencode: createOpenCodeHumanEvalExecutor({
      productName: 'Ricky',
      model: 'opencode/minimax-m2.5-free',
      instructions: [
        'Follow Ricky workflow standards.',
        'Prefer deterministic verification, review artifacts, and honest blocker reporting.',
      ],
    }),
  },
});

Use the direct OpenCode path for quick local quality sweeps where the candidate answer is a single assistant response. The result is still usually needs-human; the model output is captured into human-review.md for a person to grade against the case's Must and Must Not bullets.

Agent Relay For Complex Evals

Use Agent Relay when the eval needs real execution topology rather than a single model answer: worker spawning, tool-mediated work, channel/broker behavior, multi-agent coordination, or a product path that depends on Relay metadata. createAgentRelayHumanEvalExecutor() dynamically imports the Node-only Relay adapter only when this executor runs.

import {
  createAgentRelayHumanEvalExecutor,
  runHumanEvalCli,
} from '@agent-assistant/telemetry/evals';

const exitCode = await runHumanEvalCli({
  argv: process.argv.slice(2),
  rootDir,
  productName: 'Ricky Evals',
  executors: {
    relay: createAgentRelayHumanEvalExecutor({
      productName: 'Ricky',
      channelId: 'agent-assistant-evals',
      workerName: 'ricky-eval-worker',
      timeoutMs: 300_000,
      spawnWorker: {
        enabled: true,
        cli: 'opencode',
        model: 'opencode/minimax-m2.5-free',
        includeWorkflowConventions: true,
      },
      instructions: 'Exercise the real worker path and return an Agent Assistant ExecutionResult.',
    }),
  },
});

Run those cases with:

node scripts/evals/run-product-evals.mjs --provider --executor relay --case workflow-authoring.multi-agent-repair

For long-running evals, prefer one case or one suite at a time and give the executor a larger timeout. Relay eval outputs include normalized content, structured tool calls when present, Relay metadata, and trace data in the normal human-eval run artifacts.