npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@lythos/skill-arena

v0.17.0

Published

Skill Arena — benchmark skill effectiveness with controlled-variable comparison

Readme

@lythos/skill-arena

Controlled-variable benchmark for AI agent skills. Test single decks or compare A/B — agent-orchestrated by default, cross-player when you need it.

Modes at a Glance

| Mode | How | When | |------|-----|------| | Agent-Orchestrated (DEFAULT) | Agent tool spawns subagents, parallel dispatch, native judge | Single deck test, cross-deck A/B comparison | | Cross-Player (OPT-IN) | CLI runner spawns different agent binaries via Bun.spawn | Comparing kimi vs codex vs claude |

95% of arena use is agent-orchestrated. The Agent tool can spawn parallel subagents with isolated workdirs and different decks — zero CLI. Cross-player mode is ONLY needed when comparing different agent CLIs (the Agent tool can only spawn same-type agents).

Install

bun add -d @lythos/skill-arena
# or use directly
bunx @lythos/[email protected] <command>

Quick Start

# single — test one deck (most common)
bunx @lythos/[email protected] single \
  --deck ./examples/decks/scout.toml \
  --brief "Generate auth flow diagram" \
  --out ./output

# single with explicit player
bunx @lythos/[email protected] single \
  --deck ./examples/decks/scout.toml \
  --brief "Generate auth flow diagram" \
  --player kimi \
  --out ./output

# cross-deck vs — compare two decks (agent-orchestrated)
# Create arena.toml declaring sides with different decks, then:
bunx @lythos/[email protected] vs --config ./arena.toml

# cross-player vs — compare kimi vs codex (CLI only)
bunx @lythos/[email protected] vs --config ./arena.toml --player kimi

What happens: Agent creates isolated /tmp workdir per side, deck link skills, spawns parallel subagents, collects artifacts, judge scores outputs. Parent deck restored after.

Commands

single — one deck, one task

bunx @lythos/[email protected] single \
  --deck ./deck.toml \
  --brief "Produce a .docx report with radar chart" \
  --timeout 600000 \
  --out ./output

# with explicit player
bunx @lythos/[email protected] single \
  --deck ./deck.toml \
  --brief "Produce a .docx report with radar chart" \
  --player kimi \
  --out ./output

vs — multi-deck comparison

bunx @lythos/[email protected] vs --config ./arena.toml
bunx @lythos/[email protected] vs --config ./arena.toml --dry-run

prepare-workdir — isolate + link skills (agent-orchestrated)

bunx @lythos/[email protected] prepare-workdir \
  --deck ./skill-deck.toml \
  --out /tmp/arena-side-a \
  --brief "task description"

Creates /tmp-isolated workdir with deck copied, AGENTS.md written, and deck link run. --dry-run prints the plan (skills, workdir path, link needed) without creating anything.

archive — collect agent outputs (agent-orchestrated)

bunx @lythos/[email protected] archive \
  --from /tmp/arena-side-a \
  --to ./playground/output \
  --sides side-a

Copies agent artifacts from workdir(s) to output, skipping internal files (.claude, skill-deck.toml, skill-deck.lock, AGENTS.md). Single-side archives fall back to workdir root when the named side subdirectory doesn't exist. --dry-run shows the per-side plan before copying.

viz — render results (WIP — HTML report generation pending)

bunx @lythos/[email protected] viz runs/arena-<id>/

Parameters

| Flag | Command | Description | |------|---------|-------------| | --brief "<text>" | single | Inline task brief | | --deck <path\|url> | single | Deck file (URL auto-fetched) | | --player <name> | single, vs | Agent player: kimi|codex|deepseek|claude | | --timeout <ms> | single | Subagent timeout (300000–600000 for complex tasks) | | --from <dir> | archive | Source workdir | | --to <dir> | archive | Output directory | | --sides <names> | archive | Comma-separated side names (default: .) | | --out <dir> | single, vs, prepare-workdir | Output / workdir directory | | --config <path> | vs | arena.toml | | --dry-run | vs, prepare-workdir, archive | Print plan without execution |

Prerequisites (cross-player only)

For cross-player mode, install at least one agent CLI:

uv tool install kimi-cli           # kimi (recommended default)
npm i -g @openai/codex             # codex
# deepseek: bundled with desktop app or pip install deepseek-cli
# claude: set ANTHROPIC_API_KEY (SDK, no CLI binary needed)

Skill Documentation

The agent-visible skill layer: skill/SKILL.md

Architecture

arena.toml  →  ArenaToml (Zod)  →  ExecutionPlan (pure)  →  per-cell agent spawn (IO)
                                   ↓
               aggregateAllStats (pure)  ←  verdicts[]
                                   ↓
               runComparativeJudge (IO)  →  report.md + Pareto frontier
  • Intent: arena.toml declarative config
  • Plan: buildExecutionPlan(), aggregateSideStats(), computePareto() — pure functions
  • Execute: Agent tool spawn (agent-orchestrated) or AgentAdapter (cross-player)

License

MIT