npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

redharness

v1.0.1

Published

Agent evaluation, regression, & safe red-team security harness with app-specific QA packs

Readme

🔴 RedHarness

Test, probe, and prove your AI agents — in one harness.

CI npm version License: MIT TypeScript Node Tests PRs Welcome

npm i -g redharness · npx redharness · Agent eval · Red-team · Pentest · MCP


Why · Quickstart · Features · Packs · Security · MCP · Docs · Contributing


Why RedHarness?

Most testing tools do one thing. RedHarness does everything your agents need to go to production.

| You want to… | Other tools make you… | RedHarness | |---|---|---| | Run regression on an AI app | Wire up Playwright + a test runner + custom reporters | redharness pro-regression-smoke my-app --turns 5 | | Pentest a web app safely | Use Burp Suite or custom scripts | redharness blackbox-pentest my-app --url https://example.com | | Red-team your LLM | Build a prompt-injection framework from scratch | redharness redteam my-app --dataset owasp-top10 | | Evaluate an agent | Stitch together LangSmith + custom graders + traces | redharness run my-suite --scenario agent-eval | | Compare model versions | Manual spreadsheets | redharness experiment compare --baseline v1 --candidate v2 | | Expose results to AI tools | Write yet another API | Built-in MCP server |

One CLI. 19 registered suites. 719 tests. Zero destructive payloads.


Quickstart

# Install
npm install -g redharness

# List available QA packs
redharness list

# Run a public smoke check
redharness smoke pocket-socrates

# Security smoke (headers, cookies, exposed files, auth gates)
redharness security-smoke pocket-socrates --write-findings

# Run everything
redharness all-smoke pocket-socrates --ci

No auth? No problem — smoke, pentest, and blackbox commands work without credentials.


Features

🧪 Agent Evaluation

Run agents against versioned datasets. Grade on trajectory, state, rubric, rules, pairwise — or drop in a human reviewer.

redharness run agent-eval --scenario read-file --dataset fixture-v1
  • 8 grader types: deterministic, state, trajectory, rule, rubric, pairwise, composite, human
  • Bounded agent runtime: policies, budgets, approvals, checkpoints, cancellations
  • Trace spans with OTel export
  • Durable checkpoints — pick up where you left off

🛡️ Red-Team Security

OWASP-aligned adversarial scenarios. Safe, non-destructive, evidence-first.

redharness redteam fixture-agent --dataset owasp-injection-2026
  • OWASP Top 10 for Agentic Apps 2026: goal hijack, tool misuse, memory poisoning, rogue agents
  • Seeded trials: deterministic reproduction across runs
  • Benign controls: distinguish real failures from false positives
  • Finding packets: Notion-ready reports with replay scripts

🔍 Safe Pentest

Blackbox and whitebox route discovery with confirmed replay — no repro, no finding.

redharness blackbox-pentest pocket-socrates --url https://pocketsoc.me --confirm-runs 2
  • Security headers, cookie flags, exposed files, auth-gate bypass
  • Sourcemap scanning, public bundle secrets
  • Wire-level replay: exact HTTP request, curl, Playwright script
  • Finding packets with finding.md, replay.pw.ts, replay.curl.sh

📊 Regression & Smoke

Browser-level QA suites for authenticated and public surfaces.

redharness pro-regression-smoke my-app --turns 5
redharness long-thread-smoke my-app --turns 12 --refresh-every 4
redharness chaos-smoke my-app
  • Dashboard, mobile viewport, billing, language, workshop
  • Chaos probes: double-send, mid-generation refresh, rapid tab switching
  • Console/network/5xx capture on every check

🧠 MCP Server

Expose everything to AI agents via Model Context Protocol.

redharness mcp

Your AI assistant can list packs, start runs, poll status, cancel, compare, and inspect findings — governed by the same policy engine.


Packs

RedHarness is pack-driven. Packs define routes, checks, graders, and issue types for any application.

# packs/my-app/pack.yaml
id: my-app
name: My App
type: web
baseUrl: https://my-app.com

| Pack | Type | Status | |------|------|--------| | fixture-agent | Agent fixture | ✅ Deterministic CI | | fixture-web | Web fixture | ✅ Release gating | | pocket-socrates | AI reflection app | ✅ Live smoke | | scholars-xp | Web app | ✅ Smoke ready | | gorilla-moverz | Web app | ✅ Smoke ready |

Create your own: packs/<app>/pack.yaml — then run any command against it.


Safe Security / Pentest

Safety first. RedHarness is intentionally non-destructive:

  • No brute force, no credential stuffing, no spam
  • No payment abuse, no destructive mutations
  • Suspicious findings must be replayed --confirm-runs times before becoming confirmed
  • All finding packets are draft-only — nothing auto-submits

What RedHarness found in the real world:

| Finding | Tool | |---------|------| | 🔴 Unauthenticated /en/account renders settings UI | security-smoke, blackbox-pentest | | 🟡 Blank invite-code submit has no validation | browser-smoke |


CLI

redharness <command> [pack] [options]

| Command | What it does | |---------|-------------| | smoke | Public HTTP smoke (status, title, text) | | public-nav-smoke | Public browser navigation checks | | browser-smoke | TOS/early-access gate checks | | auth-smoke | Authenticated dashboard smoke | | crucible-smoke | AI/Crucible interaction smoke | | pro-regression-smoke | Pro/Solo regression (turns, persistence, export) | | long-thread-smoke | Long-thread timeout/stage checks | | completion-smoke | Session completion/Landing checks | | mobile-auth-smoke | Mobile viewport + drawer smoke | | billing-smoke | Safe billing/account surface check | | language-smoke | Locale/language switching smoke | | workshop-smoke | Roots/Echoes/Workshop surface check | | record-export-smoke | Document/export empty-state check | | targeted-changelog-smoke | Selected changelog verification | | chaos-smoke | Aggressive exploratory UI probes | | security-smoke | Headers, cookies, exposed files, auth gates | | blackbox-pentest | URL-only safe pentest with confirmed replay | | whitebox-pentest | Repo-aware route discovery + live probes | | redteam | OWASP-aligned adversarial agent scenarios | | run | Execute a registered suite against a pack | | experiment | Compare baselines vs candidates | | mcp | Start MCP server for AI agent access | | list | List packs, suites, scenarios, datasets | | scan | Scan text against pack style rules | | report | Validate and render report YAML | | checklist | Print a pack track checklist |


Architecture

src/
├── cli.ts                    # CLI entrypoint
├── agent/                    # Bounded agent runtime (26 files)
│   ├── runtime.ts            # Policy-controlled agent executor
│   ├── policyEngine.ts       # Budgets, approvals, stop conditions
│   ├── checkpoints.ts        # Durable checkpoint/resume
│   └── browser/              # Governed browser tools
├── redteam/                  # OWASP-aligned security scenarios (13 files)
│   ├── attackRegistry.ts     # Attack mutation library
│   ├── datasetLoader.ts      # Versioned dataset loading
│   └── findingWriter.ts      # Notion-ready draft packets
├── scenarios/                # Scenario engine + dataset schemas
├── graders/                  # 8 grader types (deterministic → human)
├── experiments/              # Comparison, regression gates
├── core/                     # Run coordination, suite registry, status
├── mcp/                      # MCP server (AI agent access)
├── exporters/                # OTel, JUnit, SARIF, GitHub reporters
├── service/                  # Governed service API
└── reporters/                # Report renderers

Documentation

Full PRD and spec docs: docs/prd/

| Doc | What it covers | |-----|---------------| | Run Contract & Suite Registry | Truthful execution, suite registry | | Trace, Evidence & Replay | Unified trace spans, artifact store | | Scenarios, Datasets & Graders | Dataset schemas, 8 grader types | | Agent Runtime & Safety | Bounded agent, policy engine, budgets | | Agentic Security & Red-Team | OWASP 2026, attack mutations, findings | | Experiments, CI & MCP | Experiments, gates, OTel, MCP | | Security Platform | Pentest, finding packets, replay |


Release Status

TypeScript: 138 source files, 0 errors
Tests:      84 files, 719 tests
Suites:     19 registered
License:    MIT

✅ Delivered

  • Truthful execution, suite registry, run coordination, retries, cancel, resume
  • 8 grader types (deterministic, state, trajectory, rubric, pairwise, composite, rules, human)
  • Bounded agent runtime with policies, budgets, approvals, checkpoints
  • OWASP 2026 red-team engine with seeded trials, benign controls, findings
  • Safe blackbox/whitebox pentest with confirmed replay
  • JUnit, SARIF, GitHub, OTel reporters
  • SQLite catalog, baselines, findings, retention, scheduled workflows
  • Policy-governed MCP server

🔄 In Progress

  • Screen recording per finding (video.webm)
  • AI red-team mode (prompt-injection, system-prompt leak, data-leak probes)
  • Fix-as-PR mode for owned repos
  • Compliance mapping (SOC 2 / HIPAA / PCI / ISO)
  • Scheduled recurring runs / continuous QA

Contributing

PRs welcome! The project needs help with:

  • New QA packs — add your app to packs/
  • Graders — new evaluation strategies
  • Attack scenarios — OWASP-aligned or novel
  • Docs & examples — make it easier for others to get started
git clone https://github.com/AIWhispererDev/redharness.git
cd redharness
npm install
npm test

🔴 RedHarness — Test, probe, and prove your AI agents.

GitHub · npm · Issues · PRs