npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

claude-code-testbed

v0.5.2

Published

Programmatic test harness for Claude Code plugins — spawns real Claude Code sessions under tmux

Readme

claude-code-testbed

CI npm version npm downloads License: MIT Node ≥20

When you build a Claude Code plugin, the agent writing your code is Claude Code. claude-code-testbed closes that loop: it lets Claude Code spin up a fresh, fully authenticated Claude Code session, drive it with messages and slash commands, and read back the complete JSONL transcript — so the agent can test its own plugin end-to-end, with no human in the loop.

Write a plugin, run the testbed, assert on the transcript, iterate — all in the same session, without ever asking a human to open a terminal and check the output.

Under the hood it manages tmux sessions and the Claude Code JSONL transcript format, and exposes a small JS API for driving sessions from Vitest (or any test runner).

Prerequisites

  • Node ≥ 20
  • tmuxbrew install tmux (macOS) or sudo apt-get install tmux (Debian/Ubuntu)
  • Claude Code CLI, installed and authenticated — claude must be on your PATH and already logged in

That's it. No API key required. Spawned sessions reuse whatever login your claude CLI already has — your Pro/Max subscription or an API key, whichever you normally use. See Authentication for details.

Install

npm install -D claude-code-testbed

Install as a Claude Code plugin

For live, in-conversation debugging of a plugin you're developing — instead of (or alongside) the npm library in tests:

/plugin marketplace add roiperlman/claude-code-testbed
/plugin install claude-code-testbed@claude-code-testbed

Restart Claude Code, then run /claude-code-testbed:setup to verify prerequisites. The plugin registers an MCP server with tools (start, slash, wait_idle, events, kill, …) for driving live testbed sessions from inside Claude Code. The MCP server itself is fetched from npm via npx on first use — no manual npm install in the plugin directory is required.

Quick start

import { start, send, kill } from 'claude-code-testbed';
import { expect, test, afterEach } from 'vitest';

let session;
afterEach(() => session && kill(session.id));

test('agent edits foo.mjs', async () => {
  session = await start();
  await send(session.id, 'edit foo.mjs to add a header comment');

  await expect(session).toHaveCalledTool('Edit', { input: { file_path: /foo\.mjs/ } });
  await expect(session).toHaveTouchedFile('foo.mjs', { content: /header/ });
  await expect(session).toHaveReachedIdle({ within: 30_000 });
});

Authentication

A testbed session is a real claude process. By default it inherits your normal Claude Code login — the same OAuth/subscription credentials (or API key) the claude CLI uses interactively. If you can run claude in a terminal, the testbed works. No extra configuration.

The one exception is bare: true (see start()), which runs claude --bare: a minimal mode that skips hooks, plugin sync, and keychain reads, and accepts only ANTHROPIC_API_KEY. Use it for hermetic, CI-style runs where you want to pin auth to an explicit key — otherwise leave it off.

API

import { start, send, slash, waitIdle, events, pane, tail, kill, list } from 'claude-code-testbed';

start(opts?){ id, tmuxName, jsonlPath }

Spawn a fresh Claude Code session under tmux and wait for the input prompt.

| Option | Default | Description | |--------|---------|-------------| | projectDir | process.cwd() | Directory Claude Code opens | | pluginDir | projectDir | Directory containing .claude-plugin/ | | model | "haiku" | Host model (haiku is cheapest, fastest) | | bare | false | false inherits your Claude Code login; true runs claude --bare (API-key-only auth) | | name | null | Human label shown in list() and tmux ls |

send(id, text)Promise<void>

Send a user message and press Enter.

slash(id, cmd)Promise<void>

Send a slash command (must start with /).

waitIdle(id, opts?)Promise<void>

Block until the session is idle — last event is a turn-completion type and no new events have arrived for idleMs.

| Option | Default | Description | |--------|---------|-------------| | timeoutMs | 60000 | Hard timeout (throws on expiry) | | idleMs | 2000 | Quiet period before declaring idle |

events(id, opts?)Promise<object[]>

Return all JSONL events recorded so far. Pass { since: byteOffset } to read only new events since a previous call.

pane(id, opts?)Promise<string>

Capture the tmux pane scrollback. { lines: N } (default 100).

tail(id, opts?)AsyncIterable<object>

Stream events as they arrive. Pass { signal: AbortSignal } to stop.

kill(id)Promise<void>

Kill the tmux session and remove it from the registry. Idempotent.

list()Promise<SessionInfo[]>

List sessions from the registry, pruning any whose tmux session no longer exists.

Assertions

Available since [email protected].

A slim set of 10 Vitest custom matchers for asserting on a recorded session. Register them once in your Vitest config:

// vitest.config.mjs
export default {
  test: {
    setupFiles: ['claude-code-testbed/matchers'],
  },
};

Then in any test:

import { start, send, kill } from 'claude-code-testbed';
import { expect, test, afterEach } from 'vitest';

let session;
afterEach(() => session && kill(session.id));

test('agent edits foo.mjs', async () => {
  session = await start();
  await send(session.id, 'edit foo.mjs to add a header');

  await expect(session).toHaveCalledTool('Edit', { input: { file_path: /foo\.mjs/ } });
  await expect(session).toHaveTouchedFile('foo.mjs', { content: /header/ });
  await expect(session).toHaveReachedIdle({ within: 30_000 });
});

All matchers auto-wait for the session to reach idle before reading; pass { wait: false } to opt out. All support .not.

| Matcher | Purpose | |---------|---------| | toHaveCalledTool(name, inputMatcher?, { times?, wait?, timeoutMs? }) | Tool was invoked (optionally N times, with matching input) | | toHaveToolResult(name, resultMatcher, opts?) | Tool returned matching content | | toHaveAssistantText(matcher, opts?) | Agent text matches | | toHaveUserMessage(matcher, opts?) | User input (or slash command) matches | | toHaveTouchedFile(path, { created?, content?, ... }) | Disk file matches (reads disk, not events) | | toHaveReachedIdle({ within?, ... }) | Session became idle (optionally within N ms) | | toHaveErrored(matcher?, opts?) | Any error event present | | toHavePaneText(matcher, { lines?, ... }) | Tmux pane scrollback matches | | toHaveEvent(predicate, opts?) | Escape hatch — any JSONL event matches | | toMatchTranscriptSnapshot({ normalize? }) | Semantic snapshot (timestamps/ids/text stripped) |

Matchers accept either the { id, ... } object from start() or a bare id string. Input matchers accept a string (strict equality), RegExp (test), object (deep partial), or function (value) => boolean.

CLI

# Start a session
npx claude-code-testbed start --project /path/to/project --model haiku

# Send a message
npx claude-code-testbed send <id> "hello"

# Send a slash command
npx claude-code-testbed slash <id> "/help"

# Wait until idle
npx claude-code-testbed wait-idle <id> --timeout 120000

# Read events
npx claude-code-testbed events <id> --format pretty

# Stream events (Ctrl-C to stop)
npx claude-code-testbed tail <id>

# Print tmux pane
npx claude-code-testbed pane <id> --lines 200

# Kill a session
npx claude-code-testbed kill <id>

# List active sessions
npx claude-code-testbed list

Pass --bare to start for API-key-only auth (see Authentication).

Usage in tests

import { start, slash, waitIdle, events, kill } from 'claude-code-testbed';
import { afterEach, it, expect } from 'vitest';

let session;

afterEach(async () => {
  if (session) await kill(session.id);
});

it('my plugin responds to /my-command', async () => {
  session = await start({
    pluginDir: new URL('..', import.meta.url).pathname,
    model: 'haiku',
    name: 'my-plugin-e2e',
  });

  await slash(session.id, '/my-command');
  await waitIdle(session.id, { timeoutMs: 120_000, idleMs: 3000 });

  const evs = await events(session.id);
  const toolResult = evs.find((e) => e.type === 'tool_result');
  expect(toolResult?.is_error).not.toBe(true);
});

Two rules that save you grief:

  • Always waitIdle before reading events — the JSONL is written asynchronously, so reading mid-turn gives a partial transcript.
  • Always kill in afterEach — dangling sessions pile up in tmux ls as testbed-<id>.

When to use vs. unit tests

Use the testbed when the question is "what does the host (Claude Code) do with this?" — plugin discovery, slash-command routing, tool-call rendering, JSONL transcript shape, MCP notification rendering. The testbed is the only programmable way to answer those questions.

Don't use it when the question is "what does my adapter / parser / server logic do on a given input?" — those have cheaper answers (pure unit tests, direct MCP client calls). Each testbed scenario costs a real model call and 5–60s; reach for it only after the cheaper layers pass.

Gate testbed tests behind an environment variable so they don't run in CI by default:

it.skipIf(!process.env.TESTBED_E2E)('e2e: plugin responds correctly', async () => {
  // ...
});

Canonical consumer

cursed — the Claude Code plugin this was extracted from — uses claude-code-testbed for its integration test suite. See test/integration/ there for worked examples.

License

MIT