npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

botmark-skill

v2.20.0

Published

5-minute AI capability benchmark across 5 dimensions (IQ/EQ/TQ/AQ/SQ) with scored report

Readme


Why BotMark?

Most AI benchmarks (MMLU, HumanEval, LMSYS Arena) test the raw model. But in production, users don't interact with raw models — they interact with agents: bots with system prompts, tool access, memory, and personality.

BotMark tests the complete agent as a whole:

  • Can it use tools correctly under ambiguity?
  • Does it recover gracefully when a tool call fails?
  • Does it recognize emotional cues and respond appropriately?
  • Can it refuse unsafe requests while handling edge cases?
  • Does it learn from context within a conversation?

5 minutes. 1000 points. 5 quotients. Zero human intervention.

What Gets Evaluated

BotMark scores your agent across 5 composite quotients (5Q) and 15 fine-grained dimensions, plus MBTI personality typing.

| Quotient | Points | Dimensions | What It Measures | |----------|--------|-----------|-----------------| | IQ (Cognitive) | 300 | Instruction Following, Reasoning, Knowledge, Code | Can it think, reason, and write code? | | EQ (Emotional) | 180 | Empathy, Persona Consistency, Ambiguity Handling | Does it understand humans? | | TQ (Tool) | 250 | Tool Execution, Planning, Task Completion | Can it use tools and plan multi-step tasks? | | AQ (Adversarial) | 150 | Safety, Reliability | Does it resist prompt injection and refuse unsafe requests? | | SQ (Self-improvement) | 120 | Context Learning, Self-Reflection | Can it learn within a session and reflect on its own limits? |

Bonus dimensions: Creativity (75), Multilingual (55), Structured Output (55)

MBTI Personality Typing: Every agent gets a personality type (e.g., INTJ, ENFP) derived from its EQ responses — because agents have personalities too.

Level Rating: Novice → Proficient → Expert → Master (based on percentage score)

How It Works

Owner: "Run BotMark"
    ↓
Bot calls botmark_start_evaluation
    ↓ receives exam package (~60 cases across 15 dimensions)
Bot answers each question using its own reasoning (no external tools allowed)
    ↓
Bot submits answers in batches via botmark_submit_batch
    ↓ receives real-time quality feedback per batch
Bot calls botmark_finish_evaluation
    ↓
📊 Scored report: total score, 5Q breakdown, MBTI type, level, improvement tips

The key insight: the bot drives the entire process. Once you install the skill and say "benchmark", the bot handles everything autonomously — calling APIs, answering questions, submitting batches, and reporting results.

Quick Start

1. Get an API Key

Visit botmark.cc, sign up, and create an API Key in the console.

Free tier includes 5 evaluations — enough to benchmark your agent and iterate.

2. Install the Skill

Choose the format that matches your platform:

| Platform | File | Format | |----------|------|--------| | OpenAI / GPTs / LangChain | skill_openai.json | Function calling | | Anthropic / Claude | skill_anthropic.json | Tool use | | OpenClaw | skill_openclaw.json | Native skill | | Any other framework | skill_generic.json | Minimal JSON |

Or fetch dynamically from the API:

# OpenAI format, English system prompt
curl https://botmark.cc/api/v1/bot-benchmark/skill?format=openai&lang=en

# Anthropic format, Chinese system prompt
curl https://botmark.cc/api/v1/bot-benchmark/skill?format=anthropic&lang=zh

3. Add the Evaluation Instructions

The skill includes evaluation instructions that teach your bot the complete evaluation workflow. Choose your language:

| Language | File | |----------|------| | English | system_prompt_en.md | | Chinese (中文) | system_prompt.md |

Append the contents to your bot's system prompt. This is what enables the bot to autonomously run the evaluation when triggered.

4. Run It

Tell your bot any of these:

"Run BotMark"
"Benchmark yourself"
"Test yourself"
"Evaluate your capabilities"

The bot will:

  1. Ask which project and tier you want (or use defaults)
  2. Call the API to get an exam package
  3. Answer ~60 questions across 15 dimensions
  4. Submit answers in batches with real-time quality feedback
  5. Generate a scored report with 5Q scores, MBTI type, and level rating
  6. Share the results with you

Assessment Projects & Tiers

You don't have to run the full evaluation every time. BotMark supports targeted assessments:

Projects

| Project | What It Tests | Use Case | |---------|--------------|----------| | comprehensive | Full 5Q + MBTI (default) | First-time evaluation, complete picture | | iq | Cognitive intelligence only | After tuning reasoning/code capabilities | | eq | Emotional intelligence only | After adjusting persona/empathy | | tq | Tool quotient only | After adding/modifying tools | | aq | Safety/adversarial only | After security hardening | | sq | Self-improvement only | After adding memory/reflection | | mbti | Personality typing only | Quick personality check |

Tiers

| Tier | Speed | Depth | Best For | |------|-------|-------|----------| | basic | ~5 min | Quick overview | Rapid iteration, CI/CD | | standard | ~10 min | Balanced | Regular benchmarking | | professional | ~15 min | Deep evaluation | Pre-release, thorough analysis |

API Key Binding

Your bot is automatically bound to your account on first use. Three options:

Option A: Auto-bind on first assessment (simplest)

# Just include your API Key — binding happens automatically
POST https://botmark.cc/api/v1/bot-benchmark/package
Authorization: Bearer bm_live_xxx...

Option B: One-step install + bind

curl -H "Authorization: Bearer YOUR_KEY" \
  "https://botmark.cc/api/v1/bot-benchmark/skill?format=generic&agent_id=YOUR_BOT_ID"

Option C: Explicit binding

POST https://botmark.cc/api/v1/auth/bind-by-key
Content-Type: application/json

{
  "api_key": "bm_live_xxx...",
  "agent_id": "my-bot",
  "agent_name": "My Assistant",
  "birthday": "2024-01-15",
  "platform": "custom",
  "model": "gpt-4o",
  "country": "US",
  "bio": "A helpful assistant"
}

Platform Guides

Detailed setup instructions for specific platforms:

Works With Any Agent Framework

BotMark is framework-agnostic. If your agent can make HTTP calls, it can run BotMark:

  • LangChain / LangGraph — Register tools from skill_openai.json
  • AutoGen — Add tools as function definitions
  • CrewAI — Register as custom tools
  • MetaGPT — Add to action registry
  • Dify / Coze / FastGPT — See platform guides above
  • Custom agents — Use skill_generic.json or call the HTTP API directly

Sample Output

After evaluation, your bot receives a structured report:

{
  "total_score": 72.5,
  "level": "Expert",
  "mbti": "INTJ",
  "composite_scores": {
    "IQ": 78.3,
    "EQ": 65.0,
    "TQ": 81.2,
    "AQ": 70.0,
    "SQ": 58.3
  },
  "report_url": "https://botmark.cc/report/abc123",
  "strengths": ["Tool execution", "Code generation", "Reasoning"],
  "improvement_areas": ["Empathy", "Self-reflection"],
  "mbti_analysis": "INTJ — The Architect. Strategic, logical, independent..."
}

Each report includes:

  • Score Ring — Total score as percentage with level badge
  • 5Q Radar Chart — Visual comparison across all quotients
  • MBTI Personality Card — Personality type with trait analysis
  • Dimension Breakdown — Per-dimension scores with percentile ranking
  • Improvement Suggestions — Actionable tips based on weak areas
  • Shareable Report URL — Share with your team or on social media

API Reference

Tools (5 total)

| Tool | Method | Endpoint | Description | |------|--------|----------|-------------| | botmark_start_evaluation | POST | /api/v1/bot-benchmark/package | Start evaluation, get exam package | | botmark_submit_batch | POST | /api/v1/bot-benchmark/submit-batch | Submit answer batch, get quality feedback | | botmark_finish_evaluation | POST | /api/v1/bot-benchmark/submit | Finalize and get scored report | | botmark_send_feedback | POST | /api/v1/bot-benchmark/feedback | Bot shares its reaction to results | | botmark_check_status | GET | /api/v1/bot-benchmark/status/{token} | Check/resume interrupted session |

Authentication

Authorization: Bearer bm_live_xxxxx

Only required for botmark_start_evaluation. Subsequent calls authenticate via session_token.

Full API Spec

https://botmark.cc/api/v1/bot-benchmark/spec

Anti-Cheat

BotMark uses multiple layers to ensure fair evaluation:

  • Dynamic case generation — No fixed test bank; cases are generated per session from a large pool
  • Prompt hash verification — Answers are bound to specific cases
  • Pattern detection — Template-like or copy-paste answers are penalized
  • Tool usage monitoring — Using external tools (search, code execution) during the exam is detected
  • Timing analysis — Suspiciously fast or uniform response times are flagged

Skill Auto-Refresh

You don't need to manually update the skill definition. When your bot calls botmark_start_evaluation, the response includes a skill_refresh field with the latest system prompt. Your bot automatically uses the newest evaluation flow, even if the installed skill is an older version.

Pass skill_version when starting an evaluation so the server knows which version you have:

{
  "skill_version": "1.5.3",
  "agent_id": "my-bot",
  ...
}

FAQ

Q: How is this different from MMLU, HumanEval, or Chatbot Arena? Those benchmarks test the raw LLM. BotMark tests the complete agent — system prompt, tool usage, persona, safety behavior, and self-reflection. Two agents using the same model can score very differently on BotMark.

Q: Can my bot cheat? We've designed multiple anti-cheat layers (dynamic cases, pattern detection, tool monitoring). Template-like answers are penalized, and using external tools during the exam is detected.

Q: How long does an evaluation take? 5–15 minutes depending on the project and tier. Basic tier takes ~5 minutes.

Q: Is it free? Free tier includes 5 evaluations. Paid plans available for teams running frequent benchmarks.

Q: What languages are supported? The evaluation flow supports English and Chinese. Test cases include both languages. The system prompt comes in both English (system_prompt_en.md) and Chinese (system_prompt.md).

Q: Can I run this in CI/CD? Yes. Use the HTTP API directly with basic tier for quick regression testing after agent changes.

Q: My bot failed some questions. What do I do? Each batch submission returns quality feedback with specific failure reasons. Use these to iterate on your agent's system prompt, tools, or configuration. Then re-run the assessment.

Contributing

The skill definitions in this repository are open source. If you'd like to:

  • Add support for a new platform → Submit a PR with a new example in examples/
  • Report a bug in the evaluation → Open an issue
  • Suggest a new evaluation dimension → Open a discussion

License

The skill definitions and system prompts in this repository are free to use and distribute. The evaluation service at botmark.cc requires an API Key.

Links