@swarmtools/evals
v0.2.40
Published
Evaluation suite for swarm-tools multi-agent coordination
Readme
@swarmtools/evals
🐝 EVAL SUITE 🐝
━━━━━━━━━━━━━━━━━━━━━━━━━━
Swarm Intelligence QAEvaluation suite for swarm-tools multi-agent coordination. Uses Evalite to measure coordinator behavior, decomposition quality, and compaction correctness.
Purpose
This package contains the evaluation framework for the swarm-tools ecosystem. Extracting evals into a separate package ensures:
- Clean Dependencies - Main plugin doesn't need evalite/vitest in production
- Faster Installs - Eval deps only needed for development/CI
- Isolated Testing - Eval suite can evolve independently from plugin
What Gets Evaluated
- Coordinator Protocol - Does the coordinator spawn workers vs doing work itself?
- Coordinator Behavior - LLM behavior after compaction (stays in coordinator role)
- Compaction Resumption - Context injection correctness after compaction
- Compaction Prompt Quality - Quality of continuation prompts generated
- Task Decomposition - Quality of task splitting, file conflict detection
- Strategy Selection - Correct strategy choice for task characteristics
- Decision Quality - Strategy selection quality and precedent relevance
Usage
# Run all evals
bun run test
# Build for publishing
bun run build
# Type check
bun run typecheckPackage Structure
This package is part of the swarm-tools monorepo:
opencode-swarm-plugin- Main plugin (peer dependency)swarm-mail- Event sourcing primitives (peer dependency)@swarmtools/evals- This package
Development
Evals use real coordinator sessions captured to ~/.config/swarm-tools/sessions/*.jsonl. See docs/README.md in this package for details on session capture and eval architecture.
License
MIT
