npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

em-metrics

v0.1.1

Published

Observability and eval framework for Claude Code and Coco sessions

Readme

em -- AI Coding Agent DevTools CLI

em is a command-line tool that provides observability and evaluation capabilities for Claude Code and Coco sessions. It extracts structured metrics from raw session logs, detects behavioral patterns, and runs reproducible eval suites with CI integration.

Version: 0.1.0

Installation

npm install -g em-metrics

From Source

git clone <repo>
cd eval-metrics
npm install
npm run build && npm link

Quick Start

# Watch a live Claude Code session
em watch

# Review the last completed session
em show --last

# Aggregate metrics from the past week
em analyze --since 2026-04-04

# Run an eval task
em eval run --task tasks/fix-bug.yaml

Commands

Observability

| Command | Description | |---------|-------------| | watch | Real-time monitor of an active Claude Code session | | show | Show metrics for a completed session | | analyze | Aggregate metrics across multiple sessions (supports --json, --csv, --output) | | trace | Behavioral trace: tool call sequence and signals | | profile | Agent capability profile by codebase module | | patterns | Detect behavioral patterns across sessions |

Evaluation

All eval commands live under the em eval subcommand group:

| Command | Description | |---------|-------------| | eval run | Run a single eval task from YAML | | eval batch | Run a suite of eval tasks | | eval compare | Compare two eval result files | | eval check | CI regression check against a baseline | | eval trend | Historical pass rate and cost trends |

Command Examples

show

em show --last                        # Last session, summary view
em show --session abc123 --detail     # Full detail for a specific session
em show --last --json --source coco   # JSON output from Coco session

analyze

em analyze --since 2026-04-04               # Past 7 days
em analyze --since 2026-03-12 --until 2026-04-04  # Range
em analyze --since 2026-04-04 --json        # Machine-readable output
em analyze --source coco                    # Analyze Coco sessions only
em analyze --since 2026-04-04 --csv --output metrics.csv  # CSV export
em analyze --since 2026-04-04 --json --output data.json   # JSON export to file

trace

em trace --last                       # Trace the last session
em trace --session abc123 --verbose   # Detailed trace with all signals
em trace --last --json --source coco

profile

em profile --since 2026-03-28 --depth 2      # Module-level capability profile
em profile --since 2026-03-12 --json

patterns

em patterns --since 2026-03-28 --min-occurrences 3
em patterns --since 2026-03-12 --json --source claude

eval run / eval batch / eval compare / eval check / eval trend

# Run a single eval task with 5 trials
em eval run --task tasks/fix-bug.yaml --model claude-4-opus --trials 5

# Run a full eval suite
em eval batch --suite suites/core.yaml --model claude-4-opus --trials 3 --output results/

# Compare two result files
em eval compare results/v1.json results/v2.json --label-a baseline --label-b candidate

# CI regression check (exits 1 on regression)
em eval check --baseline results/v1.json --current results/v2.json --threshold 0.05

# View historical trends
em eval trend --history results/ --task fix-bug

Key Metrics

Token Efficiency -- tokens_per_loc, exploration_ratio, edit_precision

Interaction Quality -- correction_count, files_re_edited, abandonment

Cache Health -- overall_cache_hit_rate, cache_hit_rate_trend, compact_count

Cost -- cost_usd (calculated per model pricing)

Eval -- pass_rate, pass@k, flakiness

Health Ranges

| Metric | Healthy | Warning | Danger | |--------|---------|---------|--------| | cost_usd (single) | < $0.50 | $0.50 -- $2.00 | > $2.00 | | tool_success_rate | > 95% | 90 -- 95% | < 90% | | exploration_ratio | 30 -- 60% | 20 -- 30% or 60 -- 80% | < 20% or > 80% | | edit_precision | > 70% | 50 -- 70% | < 50% | | tokens_per_loc | < 50 | 50 -- 100 | > 100 | | overall_cache_hit_rate | > 70% | 50 -- 70% | < 50% |

Eval Framework

Define eval tasks in YAML:

id: fix-null-check
name: Fix null pointer bug
prompt: "Fix the null pointer exception in src/parser.ts"
cwd: ./fixtures/null-check
setup: "git checkout buggy-branch"
teardown: "git checkout main"
graders:
  - type: code
    check: file_contains
    path: src/parser.ts
    pattern: "!= null"
  - type: llm
    prompt: "Does the fix handle all edge cases?"

Grader types:

  • code -- file_exists, file_contains, command (exit code check)
  • llm -- LLM-based scoring with a custom prompt

Multi-trial metrics: pass@k (any k of n pass), pass^k (all k pass), flakiness (variance across trials).

The check command integrates with CI pipelines. It compares current results against a baseline and exits with code 1 when regression exceeds the threshold.

Data Sources

em reads session data from two sources, selected with --source:

| Source | Flag | Location | |--------|------|----------| | Claude Code | --source claude | ~/.claude/projects/<hash>/<id>.jsonl | | Coco | --source coco | ~/Library/Caches/coco/sessions/<id>/ (session.json + events.jsonl) |

When --source is omitted, Claude Code is used by default.

Development

npm run dev       # Run CLI via tsx (no build step)
npm run build     # Compile TypeScript
npm test          # Run tests with vitest

Runtime deps: chalk, commander, js-yaml | Dev deps: vitest, tsx, typescript