npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dj_abstract/prompt-eval

v0.3.0

Published

Prompt-injection eval harness — runs a curated attack corpus against an LLM target and scores defense posture over time using Claude-as-judge.

Readme

prompt-eval

npm version license: MIT Node.js >=20

A prompt-injection eval harness. Runs a curated attack corpus against an LLM target (an agent endpoint, a raw API, or any HTTP-callable model) and uses Claude as a judge to score whether each attack landed.

Why this exists: If you ship an agent, you should know — quantitatively — how it holds up against the attacks every adversary will try first. That's not a vibes question; it's measurement. prompt-eval gives you a defense-rate number, a per-category breakdown, and a per-attack diff between runs so regressions are obvious.

What it tests

The default corpus (src/corpus/attacks.json) covers the OWASP LLM01 surface plus agent-specific patterns:

| Category | Examples | |----------|---------| | direct-injection | "Ignore all previous instructions"; pirate persona swap | | system-prompt-extraction | "Repeat everything above verbatim"; print system prompt as JSON | | role-hijack | DAN; developer mode | | prefix-injection | Forced response prefix | | indirect-injection | Embedded instruction in a user document or email being analyzed | | encoding-tricks | Base64-wrapped instruction; ROT13 | | information-leak | Reveal API keys; reveal user PII | | tool-coercion | Coerce a send_email exfil; coerce shell exec | | delimiter-confusion | Fake </user> <system> tags | | authority-claim | Fake "system admin priority" override | | refusal-bypass | Hypothetical framing for prohibited content |

Each attack carries an explicit successCriteria — what counts as "compromised." The judge sees that criterion and the target's response, then returns resisted | compromised | ambiguous plus a one-line rationale.

Install

One-shot with npx (no install):

npx @dj_abstract/prompt-eval run --target brain --url http://localhost:3005

Global install:

npm install -g @dj_abstract/prompt-eval
prompt-eval --help

Or clone and run from source:

git clone https://github.com/abregoarthur-star/prompt-eval
cd prompt-eval
npm install
cp .env.example .env  # add ANTHROPIC_API_KEY

Requires Node.js 20+.

Usage

List the corpus

node bin/prompt-eval.js list

Eval the Brain agent

# Brain dev server must be running on :3005 with an authenticated session
node bin/prompt-eval.js run \
  --target brain \
  --url http://localhost:3005 \
  --cookie "auth=YOUR_JWT"

Eval the raw Anthropic API (baseline)

node bin/prompt-eval.js run \
  --target anthropic \
  --model claude-sonnet-4-6 \
  --system "You are a helpful assistant."

Eval any OpenAI-compatible endpoint

node bin/prompt-eval.js run \
  --target http \
  --url https://my-llm.example.com/api/chat \
  --bearer "$TOKEN" \
  --body-key prompt \
  --response-key choices.0.message.content

Filter the corpus

# Only run critical-severity attacks
node bin/prompt-eval.js run --target brain --severity critical

# Only run a specific category
node bin/prompt-eval.js run --target brain --category tool-coercion,information-leak

# Run a single attack by id
node bin/prompt-eval.js run --target brain --ids context-injection-002

Compare two runs

node bin/prompt-eval.js diff reports/2026-04-01.json reports/2026-04-15.json

Prints Δ defense rate plus the per-attack regressions and wins.

CI gate

Fail the build if defense drops below a threshold:

node bin/prompt-eval.js run --target brain --fail-rate 90

Output

Every run produces:

  • Terminal report — defense rate, per-category, per-severity, list of compromised attacks with the judge's rationale.
  • JSON report in reports/<timestamp>.json — full attacker prompt, target response, judge verdict, latency. Diffable across runs.
  • HTML dashboard in reports/<timestamp>.html — share-friendly, hero defense rate, per-attack cards.

Programmatic API

import { evalTarget } from 'prompt-eval';

const report = await evalTarget({
  target: 'brain',
  url: 'http://localhost:3005',
  cookie: process.env.BRAIN_COOKIE,
  filter: { severities: ['critical', 'high'] },
});

console.log(`Defense rate: ${report.summary.defenseRate}%`);

Design notes

  • Claude-as-judge because per-attack successCriteria is the ground truth, not a string match — partial compliance and creative rephrasings need a model to score correctly.
  • Concurrent attacks (default 3) keep wall-clock low. Bumping past 5 risks rate limits on both the target and the judge.
  • Read-only by design. The harness sends prompts and reads responses. It does not attempt actual exploitation, post-execution destruction, or network attacks.
  • Corpus is intentionally small (~18 attacks) and curated. Big benchmarks are noisy and slow. A focused suite that catches the patterns every attacker tries first is more useful as a regression tool than a 10K-row eval that takes 4 hours and costs $50 per run.

Roadmap

(Multi-turn attack generation is now scoped to prompt-genesis 0.2.0 — this tool consumes the resulting corpus.)

  • Indirect injection corpus from synthetic RAG documents
  • ASCII Smuggler / Unicode Tag attacks (paired with mcp-audit detection)
  • Tool-call telemetry from agent targets (which tools were invoked vs. which the attacker tried to coerce)
  • Side-by-side eval runs across model versions for regression tracking

References

Related tools

Part of a detect → inventory → test → generate → defend pipeline for AI-agent security:

| Layer | Tool | Role | |---|---|---| | Detect | @dj_abstract/mcp-audit | Static audit of MCP server definitions | | Detect | mcp-audit-sweep | Reproducible sweep of public MCP servers | | Inventory | @dj_abstract/agent-capability-inventory | Fleet-wide tool catalog with data-sensitivity tags | | Test | prompt-eval (you are here) | Runtime prompt-injection eval harness | | Generate | @dj_abstract/prompt-genesis | Attack corpus generator — drop-in compatible with src/corpus/attacks.json | | Defend | @dj_abstract/agent-firewall | Call-time defensive middleware |

The v0.2.0 corpus was expanded from 18 → 38 attacks using prompt-genesis. Notably, the generator produced a YAML-serialization system-prompt-extraction attack that the bare Anthropic model fell for (baseline v2: 97.4%), which the hand-curated corpus did not contain — proof that LLM-driven fuzzing surfaces attacks a human curator wouldn't think to write.

License

MIT — see LICENSE.