npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@trymanateeai/cli

v0.6.2

Published

Chatbot regression testing for devs. Generate domain-specific synthetic users from your docs and run them adversarially against your custom-built chatbot via its API.

Readme

@trymanateeai/cli

Chatbot regression testing for devs. Generate domain-specific synthetic users from your docs, then run them adversarially against your custom-built chatbot via its API. On every commit, in CI, from your terminal.

npm install -g @trymanateeai/cli
manatee init                        # creates manatee.config.js
manatee personas generate --from-docs ./docs
manatee test

The personas users see ("Webhook Power User", "Budget-Conscious Parent", "First-Time Founder Setting Up Stripe Connect") are generated from your actual product docs, not picked from a generic list. Eight base behavior archetypes provide the shape — how they type, escalate, push guardrails — but the names, vocabulary, opening messages, and topics are all yours.

API-only. No browser, no Playwright, no widget detection. You own a chatbot endpoint; manatee POSTs to it like any client.

Why

Most chatbot evals test the model in isolation: prompts go in, responses come out. That misses the failures that actually break product — context loss across turns, jailbreaks that escalate over five messages, prompts that leak when an "impatient power user" runs into a dead end.

Generic synthetic users miss most of these because they don't know your domain. Manatee reads your product docs and builds a roster of synthetic users tuned to your actual use cases.

Install

npm install -g @trymanateeai/cli   # or use npx — no install needed
export OPENAI_API_KEY=sk-...       # BYOK — runs locally, nothing stored

That's it. No Chromium download, no SaaS account, no SDK to integrate.

Quick start (3 minutes)

# 1. From inside your chatbot project
cd /path/to/your-app

# 2. Scaffold the config
manatee init
# → asks for your chatbot endpoint URL, writes manatee.config.js

# 3. Generate domain-specific personas from your local docs
manatee personas generate --from-docs ./docs

# 4. Run the test
manatee test
# → auto-loads manatee.config.js + manatee-personas.json from cwd

manatee.config.js — the contract

The config file describes how to talk to your chatbot. Either point at an HTTP endpoint (manatee builds the request) or provide a custom send function (you own request/response/auth/streaming).

Simple — OpenAI-shaped endpoint

// manatee.config.js
export default {
  endpoint: 'http://localhost:3000/api/chat',
  headers: { Authorization: `Bearer ${process.env.MY_BOT_TOKEN}` },
};

Manatee POSTs { messages: [{role, content}, ...] } and reads the reply at choices.0.message.content.

Custom request/response shape

export default {
  endpoint: 'https://my-app.com/api/v2/chat',
  headers: { 'X-API-Key': process.env.MY_KEY },
  requestShape: 'simple',                 // sends { message, history } instead
  responsePath: 'data.reply.text',        // dot-path into response JSON
};

Full control — custom send function

export default {
  send: async ({ messages, sessionId, context }) => {
    const res = await fetch('https://my-app.com/chat', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        Authorization: `Bearer ${context.token}`,
      },
      body: JSON.stringify({ messages, session: sessionId }),
    });
    const data = await res.json();
    return data.reply;   // return the assistant's text
  },

  // Optional per-conversation hooks — useful for fresh sessions, DB rows, etc.
  setup: async () => {
    const token = await fetchAuthToken();
    return { token, sessionId: crypto.randomUUID() };
  },
  teardown: async (ctx) => {
    await releaseSession(ctx.sessionId);
  },
};

setup() runs once per conversation. Whatever it returns becomes context (passed to send and teardown). If context.sessionId is set, manatee uses it; otherwise it generates a UUID.

No config? Pass --endpoint inline

manatee test --endpoint http://localhost:3000/api/chat
manatee test --endpoint http://... --auth-header "Authorization: Bearer $TOKEN"

Test command flags

| Flag | Description | |---|---| | --endpoint <url> | Direct endpoint (skips manatee.config.js) | | --config <path> | Explicit config path (defaults to auto-discovery) | | --auth-header <header> | Single auth header for --endpoint mode | | -p, --personas <ids> | Comma-separated archetype IDs | | --personas-file <path> | Enriched personas JSON (auto-detected) | | --users <n> | Total conversations across all personas | | -t, --turns <n> | Turns per conversation (default: 5) | | -c, --concurrency <n> | Parallel conversations (default: 3, max: 10) | | -e, --edge-cases <ids> | Comma-separated edge case behaviors | | -m, --model <name> | OpenAI-compatible model (default: gpt-4o-mini) | | --temperature <n> | LLM sampling temperature (default: 0.7) | | --api-key <key> | OpenAI key (or set OPENAI_API_KEY) | | --base-url <url> | OpenAI base URL override (Together, Groq, Ollama) | | --timeout <sec> | Per-request timeout (default: 30) | | --context <text> | Inline product context | | --json [path] | JSON report. Path → file. No arg → stdout. | | --html [path] | Self-contained HTML report (default: manatee-report.html) | | --fail-under <n> | Exit 1 if reliability < n (CI gate) | | --budget-usd <n> | Abort if estimated LLM spend exceeds this | | -v, --verbose | Verbose logging |

Base archetype templates

Eight behavior templates the enricher specializes. manatee personas list for descriptions.

| Archetype | Tests for | |---|---| | impatient | Context handling under pressure | | confused | Clarification, conversation management | | adversarial | Prompt injection, jailbreaks, system prompt leaks | | emotional | Empathy, de-escalation | | power_user | Multi-turn context, accuracy | | non_native | Robustness to imperfect English | | wanderer | Scope management | | speed | Race conditions, message queuing |

Edge case behaviors

Random adversarial behaviors injected mid-conversation: rapid_fire, long_input, empty_msg, emoji_heavy, lang_switch, contradictions, context_overflow, unicode_abuse, code_injection, markdown_abuse. Unknown IDs are warned about, not silently ignored.

Output formats

Markdown report — always on. Every manatee test run drops a comprehensive manatee-report.md in cwd: hero score, findings with conversation excerpts and suggested fixes inline, systemic issues, per-persona table, full collapsible transcripts. Designed to be committed to your repo, pasted as a PR comment, or fed to an AI assistant ("here's the report, fix these"). Pass --no-md to disable, --md <path> to override the location.

Pretty terminal report by default. JSON via --json:

manatee test --json result.json   # → file
manatee test --json -             # → stdout (suppresses pretty render)

HTML via --html — single self-contained file with inline CSS, severity-coded findings, collapsible per-conversation transcripts. No JS dependencies, drops into CI artifacts cleanly.

Need PDF? Pipe through pandoc — pandoc manatee-report.md -o manatee-report.pdf — or any markdown-to-PDF tool. Dedicated --pdf flag isn't built in because PDFs are diff-unfriendly, AI-unfriendly, and harder to comment on; markdown wins by default.

Every run prints Usage: N tokens (M calls), estimated cost $X so you always know what it cost.

CI integration

- uses: actions/setup-node@v4
  with:
    node-version: 20
- run: npm install -g @trymanateeai/cli
- run: |
    manatee test \
      --fail-under 75 \
      --budget-usd 2.00 \
      --json result.json \
      --html report.html
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
    MY_BOT_TOKEN: ${{ secrets.STAGING_BOT_TOKEN }}
- uses: actions/upload-artifact@v4
  if: always()
  with:
    name: manatee-report
    path: |
      result.json
      report.html

The CLI exits 0 if reliability ≥ threshold, 1 otherwise. Commit manatee.config.js and manatee-personas.json to your repo so CI runs are deterministic.

How it works

Persona enrichment is a 2-stage LLM pipeline:

  • Stage 1 — Archetype Extraction. Reads your product docs (files, inline text). LLM call returns 5–8 real user archetypes grounded in actual content: name, demographics, goals, frustrations, communication_style, domain_knowledge, 3–5 specific topics they'd ask about, and which of the 8 base behaviors best matches them.
  • Stage 2 — Persona Synthesis. For each archetype, a second LLM call merges (a) the base behavior's full system prompt with (b) the domain context. Output keeps all behavior rules but injects product-specific vocabulary, real opening message examples, and a backstory grounded in your domain. Runs in parallel; the CLI streams a checkmark per persona as it completes.

Conversations are driven against your endpoint via the config's endpoint or send function. Each conversation gets its own setup() context (fresh sessionId, auth token, etc.) and a teardown() for cleanup.

Classification runs an LLM judge across 15 vulnerability types and 4 severity levels. Persona-aware scoring weights findings by archetype (an adversarial user finding a jailbreak weighs heavier than a confused user causing context loss). Issues appearing in ≥35% of conversations get flagged as systemic.

Cost tracking — every LLM call's usage.prompt_tokens and completion_tokens are accumulated against a per-model rate table; the final report includes total tokens + estimated USD spend. --budget-usd aborts the run before further calls when the cap is reached.

BYO LLM

Default is OpenAI. Use any OpenAI-compatible endpoint:

manatee personas generate --from-docs ./docs \
  --base-url https://api.together.xyz/v1 \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo

manatee test \
  --base-url http://localhost:11434/v1 \
  --model llama3.1

Works with Together, Groq, Anthropic-via-proxy, Ollama, LM Studio, vLLM. Cost estimation falls back to gpt-4o-mini rates for non-OpenAI models.

All commands

manatee                            # colorful intro + quick start
manatee init                       # scaffold manatee.config.js
manatee personas list              # show 8 base archetype templates
manatee personas generate ...      # build domain-specific personas from docs
manatee personas show <id>         # print full system prompt + metadata
manatee test ...                   # run synthetic users, score, report
manatee --version
manatee <cmd> --help               # per-command flags

Status

v0.3.0 — pure dev tool, API-only. Working: init wizard, persona enrichment from local docs, classifier, scorer, CI integration, HTML/JSON output, budget caps, persona inspection, custom send functions, custom request/response shapes. Coming next: streaming response support, retry/backoff knobs, reputation simulator.

Contributing

See CONTRIBUTING.md. Issues and PRs welcome.

License

MIT