npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

aeokit

v0.3.0

Published

Agentic Engine Optimization toolkit — audit and test how well AI agents can find, parse, and use your website

Readme

AEOkit is an Agentic Engine Optimization toolkit — it measures how well an AI agent can actually use your website, by running one. It combines a deterministic static audit (Lab) with a real agent session driving a real browser (Field) and emits a Lighthouse-style report with a grade, six scoring dimensions, and an honest trace of what the agent tried, saw, and missed.

AEOkit score report

Status: alpha (v0.2.1 on npm). The engine is solid; the product packaging around it isn't finished yet. Read Known limits before you pitch this internally.


Why it exists

The web is being re-consumed by AI agents — search summarisers, shopping agents, coding copilots, computer-use models. "Does my site render?" is no longer the question. The questions are:

  • Can an agent discover what my site offers without a human guiding it?
  • Can it read the page cheaply, or does it burn 80 K tokens on a header carousel?
  • Can it complete a task end-to-end, or does a cookie wall / bot challenge / SPA hydration race stop it on step 1?

AEOkit answers those by running the agent.


What you get

Lab — static audit (no API key)

Runs in seconds against any URL. Checks what an agent would see if it used fetch or curl:

  • llms.txt, robots.txt, sitemap.xml discovery
  • Per-crawler access matrix (GPTBot, Google-Extended, ClaudeBot, PerplexityBot, CCBot…)
  • Token-budget measurement on the landing page + key docs (real gpt-tokenizer counts, not len/4 estimates)
  • Two-tier fetcher: plain HTTP first, stealth-Chromium fallback if the plain tier is blocked — so we can tell you which tier your site accepts

Field — agent session (needs an API key)

Launches a real headless Chromium, hands it to an LLM (Claude / OpenAI / Gemini) via a 10-tool browser harness (screenshot, click, type, scroll, select_option, press_key, navigate, get_page_info, wait, go_back), and gives it a plain-English task. Emits a full trace — every step, every tool call, every token.

  • Smart observation: viewport-scoped, semantically pruned a11y trees with list summarisation (stable ~18–20 K tokens/step instead of ballooning)
  • Pre-flight DOM weight analysis — auto-switches the observation mode on heavy / SPA-hydrating pages
  • Domain guardrails: agent can't wander off to accounts.google.com
  • Auto-dismisses cookie / privacy banners via DuckDuckGo's autoconsent rules (~200 CMPs) before the agent sees the page
  • Optional video recording (--record) — watch what the agent actually saw
  • Statistical runs (--runs N) with pass-rate and per-dimension aggregates

Combined score

aeokit score <url> --scenario task.yaml runs both and produces a single composite grade (Lab × 40% + Field × 60%) with an HTML report that shows them side by side.


Quickstart

Requirements: Node 20+, macOS/Linux/Windows. For Field runs, an API key from at least one of: Anthropic, OpenAI, or Google AI Studio.

# 1. Install
npm install -g aeokit
npx playwright install chromium     # one-time — downloads headless browser

# 2. Configure your API key (only needed for Field runs)
#    Either set an env var — ANTHROPIC_API_KEY / OPENAI_API_KEY / GOOGLE_API_KEY —
#    or drop an aeokit.config.yaml in the directory you run from (see below).

Prefer not to install globally? Every command below works with npx aeokit ….

Verify it's working:

# Lab only — no API key needed
aeokit audit https://modelcontextprotocol.io

# Field run against your own YAML scenario (see "Writing a scenario" below)
aeokit run my-task.yaml --record

# Combined Lab + Field with a composite grade + video recording
aeokit score https://modelcontextprotocol.io --scenario my-task.yaml --record

# Override the model per run (without editing aeokit.config.yaml)
aeokit run my-task.yaml --model claude-opus-4-7

Reports land in ./audit-reports/, ./aeokit-results/, or ./score-reports/ (--output overrides). Open the .html file in any browser.

Prebuilt scenarios live in examples/scenarios/ in the GitHub repo — copy any of them into your project as a starting point.


Configuration

AEOkit reads aeokit.config.yaml from the current working directory. Env vars work as a fallback when a provider block is missing — set ANTHROPIC_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY / GEMINI_API_KEY and you can skip the YAML entirely.

For committed config, drop this in aeokit.config.yaml (git-ignore it — it holds secrets):

providers:
  claude:
    apiKey: sk-ant-...
    # model: claude-sonnet-4-6   # optional override
  openai:
    apiKey: sk-proj-...
    # model: gpt-4o
  gemini:
    apiKey: ...
    # model: gemini-2.5-flash

The full example lives at aeokit.config.example.yaml in the GitHub repo.


How it works

┌────────────────── aeokit score <url> ──────────────────────┐
│                                                            │
│   LAB (deterministic, ~10 s, no LLM)                       │
│   ├─ Discovery: llms.txt, sitemap, robots                  │
│   ├─ Access:    UA matrix across GPTBot/Claude/Perplexity… │
│   └─ Tokens:    budget + heatmap via gpt-tokenizer         │
│                                                            │
│                          ┌──── composite ────┐             │
│                          │ Lab 40 + Field 60 │             │
│                          └───────────────────┘             │
│                                                            │
│   FIELD (empirical, ~30 s–2 min, needs LLM)                │
│   ├─ launch Chromium → attach autoconsent                  │
│   ├─ goto + preflight DOM weight → pick observation mode   │
│   ├─ agent loop:  observe → plan → act → trace → repeat    │
│   └─ assertions:  element_visible · text_contains ·        │
│                   tool_called · url_matches · llm_judge ·  │
│                   custom_eval                              │
│                                                            │
└────────────────────────────────────────────────────────────┘

The same 6-dimension model scores both sides:

| Dimension | Weight | What it measures | |---|---|---| | Task Completion | 30% | Assertion pass rate + natural completion | | Step Efficiency | 15% | Steps per successful action (absolute: <1.5 is excellent) | | Token Economy | 15% | Tokens per action (absolute: <3 K is excellent) | | Error Resilience | 15% | Tool-call success rate + recovery detection | | Navigation Clarity | 15% | Observation-only step ratio — low = page is readable | | Interaction Directness | 10% | Action vs. observation tool ratio |

Crashed / zero-activity runs don't get vacuous credit: the five non-completion dimensions return 0 when the agent never ran, and the task-completion multiplier drops sharply on fatal errors. A "pre-assertion passed" site that blocked the agent will score F, not C.


Commands

| Command | Purpose | |---|---| | aeokit audit <url> | Static audit only. No API key needed. | | aeokit run <scenarios…> | Empirical agent runs against a task YAML. | | aeokit score <url> | Audit + (optional) empirical, with a composite grade. | | aeokit inspect <url> | Probe the page for WebMCP tools (Phase 5 preview) + DOM stats. |

Useful flags:

  • --provider claude|openai|gemini — pick the LLM (defaults: claude-sonnet-4-6, gpt-4o, gemini-2.5-flash)
  • -m, --model <id> — override the provider's model for a single run (e.g. --model claude-opus-4-7, --model gpt-4o-mini). Persistent defaults live in providers.<name>.model in aeokit.config.yaml.
  • --runs N — repeat the scenario N times, report pass-rate + aggregates
  • --record — save a .webm of the agent session
  • --headed — see the browser as the agent drives it
  • --format json,html,sarif,md — pick report formats
  • --min-score N — CI exit code if the composite score drops below N
  • --diff baseline.json --fail-on-regression — compare audits and fail on drops (CI)
  • --no-render — skip the stealth-Chromium fallback in the audit fetcher
  • --user-agent "…" — override the UA for audit fetches

Writing a scenario

# examples/scenarios/real-world/hackernews-browse.yaml
name: "Hacker News - Read top stories"
url: "https://news.ycombinator.com"
mode: general
intent: |
  You are on Hacker News. Read the homepage and report the titles of
  the top 3 stories along with their points and comment counts.
assertions:
  - type: url_matches
    pattern: "news.ycombinator.com"
  - type: tool_called
    tool: get_page_info
  - type: llm_judge
    question: "Did the agent report the titles of at least 3 actual HN stories?"
    expectedAnswer: "yes"
config:
  maxSteps: 10
  maxTokens: 40000
  observationMode: a11y
  handleConsent: true    # default — set false if you're testing consent UI

Assertion types:

  • element_visible — CSS selector is present and visible
  • text_contains — element text contains a string
  • tool_called — agent invoked a specific tool (optionally with args)
  • url_matches — final URL matches a pattern
  • llm_judge — semantic yes/no judged by the LLM from the trace
  • custom_eval — arbitrary JS returning a value compared to expected

Prebuilt scenarios live in examples/scenarios/ — Hacker News, Wikipedia, GitHub, Stripe docs, MCP docs, Claude docs, NYT, TodoMVC, Reddit.


Reports

Every run produces:

  • JSON — full trace, assertions, metrics, scored dimensions, insights. Schema-versioned so you can diff in CI.
  • HTML — self-contained, no network. Dimension strip, "What to fix this week" insights, pre-flight analysis, consent outcome, assertion rows, dimension deep-dives, collapsible step-by-step trace.
  • SARIF + Markdown — on aeokit audit when --format sarif,md is set. Drop-in for GitHub code-scanning and sticky PR comments (see examples/ci/github-action.yml).
  • WebM — if --record was set, a video of the agent session.

Programmatic API

import { runScenario, createProvider, loadScenario, loadConfig } from "aeokit";

const config   = await loadConfig();
const provider = createProvider("claude", config);
const scenario = await loadScenario("./task.yaml");

const result = await runScenario({
  scenario,
  provider,
  browserOptions: { headless: true },
});

console.log(result.totalSteps, result.assertions, result.consent);

Everything the CLI does is exported from the root. The types in dist/index.d.ts are stable within a minor version.


Known limits

In the spirit of not shipping bullshit:

  • Three providers wired, none calibrated. Claude / OpenAI / Gemini all run end-to-end, but scoring thresholds were calibrated on Claude traces — don't read a gpt-4o run's 82 as meaning the same thing as a claude-sonnet-4-6 run's 82 until we publish cross-model normalisation data.
  • No Playwright auto-install. First run needs npx playwright install chromium — we detect the missing binary and point you at the command, but we don't fetch it for you.
  • Sites with aggressive bot defences will fail. Reddit, LinkedIn, and CF-protected banking sites typically return JS-challenge pages. The Field browser runs a stealth preset (STEALTH_INIT_SCRIPT + realistic UA) but some sites still detect headless Chromium and the report scores them an honest F.
  • Error reporting is raw. A bot-challenge shows up as page.title: Execution context was destroyed instead of a clean BLOCKED_BY_BOT_CHALLENGE signal. Taxonomy is on the roadmap.
  • Scoring thresholds are principled, not calibrated. They come from first-principles reasoning about tokens/step/actions; they haven't been fitted against a labelled benchmark yet. The composite weights (40/60) are reasonable, not sacred.
  • WebMCP mode is a stub. mode: webmcp in a scenario throws. Discovery via navigator.modelContext is planned for Phase 5.

Roadmap

| Done | Next | Later | |---|---|---| | Lab audit (discovery / access / tokens / parsability / capability) | blockedBy error taxonomy | WebMCP mode (Phase 5) | | Field agent loop + smart observation | Multi-LLM comparison report | Calibrated cross-model scoring | | 6-dimension scoring with honest zeros | Landing page + docs site | 100-site public benchmark | | Combined score command | — | — | | Claude / OpenAI / Gemini providers | — | — | | Consent banner auto-dismiss (200 CMPs) | — | — | | CI primitives: GitHub Action, SARIF, PR comment, --diff | — | — | | Published on npm (aeokit) via Changesets | — | — |


Contributing

Bug reports, PRs, and new scenario examples are welcome. Dev setup, the test/lint/build loop, release flow, and commit style all live in CONTRIBUTING.md. The internal architecture notes sit in CLAUDE.md.


License

MIT.