npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@agentic-evals/mcp

v0.1.0

Published

Agentic Evaluation Library — MCP server for deterministic and non-deterministic UI/UX evaluations with swarm orchestration

Readme

@agentic-evals/mcp

MCP server for running deterministic and non-deterministic UI/UX evaluations using multi-agent swarms. Combines automated tooling (Lighthouse, axe-core, ESLint) with LLM-judged rubric evaluations, synthesizes findings through a deliberation protocol, and outputs prioritized GitHub issues.

Quick Start

npm install
npm run build
npm start          # starts MCP server on stdio

Use with VS Code / Copilot

Add to .vscode/mcp.json:

{
  "servers": {
    "agentic-evals": {
      "type": "stdio",
      "command": "node",
      "args": ["dist/mcp/server.js"]
    }
  }
}

Then ask Copilot: "Run a quick review of http://localhost:3000"

Architecture

User / Copilot
  │  MCP tool call (run_swarm, run_eval, …)
  ▼
MCP Server (17 tools, 2 resources, 3 prompts)
  │
  ▼
SwarmOrchestrator.execute()
  │
  ├─ Phase 1: FAN OUT (parallel batches)
  │   ├─ Deterministic agents (Lighthouse, axe, ESLint, Stylelint, Prettier)
  │   └─ Non-deterministic agents (LLM + rubric + evidence)
  │
  ├─ Phase 1.5: RE-EVALUATION (confidence-gated)
  │   └─ High-severity + low-confidence findings → targeted evidence
  │       (zoom, hover, focus states, element-level capture)
  │
  ├─ Phase 1.75: CROSS-RUN MEMORY
  │   └─ Annotate findings: new | persistent | regression | resolved
  │
  └─ Phase 2: DELIBERATION (LLM synthesis)
      ├─ Merge overlapping findings
      ├─ Challenge across agents
      ├─ Prioritize (severity × effort → P0–P3)
      └─ Actionize into concrete fixes
  │
  ▼
GitHub Issue Creator → [P0] [Domain] Title with evidence & labels

Wizard Flow

The wizard provides a session-based pipeline that guides the evaluation from start to finish. Each step auto-populates the next — the agent never needs to figure out what to call next.

📋 Plan  →  ✅ Approve  →  🔍 Run  →  🤔 Deliberate  →  🎯 Act  →  📝 Confirm  →  🎉 Done

| Step | What Happens | User Action | |------|-------------|-------------| | Plan | wizard_start assembles an eval team based on project signals | Review team, add/remove evals | | Approve | wizard_advance(approve) locks the team and launches the swarm | Confirm the roster | | Run | Swarm executes in parallel, returns phased narrative | — (automatic) | | Deliberate | Agent follows deliberation protocol to merge/prioritize findings | Review prioritized findings | | Act | wizard_advance(submit_findings) formats findings as issues | Choose: issues, actionize, or fix | | Confirm | wizard_advance(choose_action) shows dry-run preview | Confirm to create | | Done | wizard_advance(confirm) creates GitHub issues/PRs | — |

Session state persists across tool calls — findings, issues, and evidence are never lost between steps.

MCP Tools

Wizard Flow (Recommended)

| Tool | Description | |------|-------------| | wizard_start | Start a guided evaluation wizard — creates a session, auto-assembles team, walks through each phase | | wizard_advance | Advance the wizard to the next phase (approve → run → deliberate → act → confirm) | | wizard_status | Check wizard session status or list all active sessions |

Individual Tools

| Tool | Description | |------|-------------| | list_evals | List available evaluation plugins | | scan_project | Discover project tooling (linters, tests, formatters) and register as plugins | | run_eval | Run a single evaluation plugin against a URL or project | | run_suite | Run multiple evaluations as a suite | | run_swarm | Execute a full multi-agent swarm evaluation | | list_presets | List swarm presets (quick-scan, full-review, deep-dive, a11y-focus) | | capture_evidence | Capture screenshots, DOM, and styles at multiple viewports | | capture_user_flow | Execute a user flow (click, fill, navigate, hover) and capture evidence at each step | | reevaluate_findings | Re-evaluate high-severity/low-confidence findings with targeted evidence (hover, focus, zoom) | | get_run_history | View evaluation run history, trend data, regressions, and resolutions | | get_rubric | Load a rubric with knowledge context | | list_rubrics | List all available rubrics | | create_issues | Convert findings into GitHub issues | | review_page | One-shot page review (evidence + eval + report) | | register_plugin | Register a custom eval plugin at runtime | | scaffold_deterministic_eval | Create a custom deterministic eval in .evals/ (plugin + command + config) | | scaffold_non_deterministic_eval | Create a custom non-deterministic eval in .evals/ (rubric + plugin + config) |

Swarm Presets

| Preset | Agents | Rounds | Use Case | |--------|--------|--------|----------| | quick-scan | 3 | 1 | Fast smoke test | | full-review | 6 | 2 | Standard review | | deep-dive | 8 | 3 | Comprehensive audit | | a11y-focus | 4 | 2 | Accessibility-focused |

Knowledge & Rubrics

Non-deterministic evals are powered by rubrics in knowledge/rubrics/. Override per-project by placing rubrics in .evals/rubrics/ in the target project.

Available rubrics: accessibility, content/copy, information architecture, performance perception, responsive design, UX heuristics, visual design.

Supporting knowledge: knowledge/principles/ (Gestalt, Fitts' law, cognitive load, color theory, typography) and knowledge/standards/ (WCAG 2.2, design systems).

See knowledge/README.md for details on the rubric resolution system.

Per-project .evals/ structure

Run scan_project or call initProject() to scaffold a fully customizable .evals/ directory:

.evals/
├── config.json                      # enable/disable evals, adjust settings
├── commands.json                    # custom CLI-based evals
├── deliberation-protocol.md         # override deliberation prompts
├── issue-template.md                # override GitHub issue format
├── rubrics/                         # override or extend rubric scoring
│   ├── visual-design.md
│   └── ...
├── knowledge/
│   ├── principles/                  # override or add design principles
│   │   ├── gestalt.md
│   │   ├── brand-guidelines.md      # (your own)
│   │   └── ...
│   └── standards/                   # override or add design standards
│       ├── wcag-2.2.md
│       ├── your-design-system.md    # (your own)
│       └── ...
└── plugins/                         # override or extend eval plugins
    ├── deterministic/
    ├── non-deterministic/
    └── README.md

Resolution order: project .evals/ overrides → library defaults. Project files always win on name collision, and you can add new files that don't exist in the library.

Custom Evals

Command-based (deterministic)

Add .evals/commands.json to your project:

{
  "commands": [
    {
      "name": "storybook-build",
      "command": "npm run build-storybook",
      "kind": "build",
      "failOn": "exit-code"
    }
  ]
}

Plugin-based (non-deterministic)

See examples/example-plugin.ts for a template.

Guided creation (via Copilot)

Use the create-eval prompt — Copilot will interview you, then scaffold everything:

"Create a non-deterministic eval for brand compliance"

Or call the tools directly: scaffold_deterministic_eval / scaffold_non_deterministic_eval.

Development

npm run dev        # watch mode
npm run lint
npm test           # vitest

License

MIT