@cyclecore/slmbench
v1.0.1
Published
CLI and SDK for accessing SLMBench benchmarks (EdgeJSON, EdgeIntent, EdgeFuncCall). View leaderboards, run evaluations, and compare Small Language Models.
Downloads
43
Maintainers
Readme
@cyclecore/slmbench
CLI and SDK for accessing SLMBench benchmarks (EdgeJSON, EdgeIntent, EdgeFuncCall)
View leaderboards, run evaluations, and compare Small Language Models on production-grade benchmarks.
Quick Start
# View EdgeJSON leaderboard
npx @cyclecore/slmbench leaderboard
# See top 5 models
npx @cyclecore/slmbench top 5
# Check specific model performance
npx @cyclecore/slmbench model maaza-360m
# Compare two models
npx @cyclecore/slmbench compare maaza deepseekInstallation
npm install @cyclecore/slmbenchCLI Commands
View Leaderboard
npx @cyclecore/slmbench leaderboard [benchmark]
# or
npx @cyclecore/slmbench edgejsonShows full leaderboard with rankings, accuracy, latency, and key insights.
Top Models
npx @cyclecore/slmbench top [n]Show top N models (default: 5) with detailed performance metrics.
Model Performance
npx @cyclecore/slmbench model <name>Get detailed performance breakdown for a specific model including complexity tier analysis.
Compare Models
npx @cyclecore/slmbench compare <model1> <model2>Side-by-side comparison of two models showing accuracy, speed, and efficiency advantages.
SDK Usage (Programmatic Access)
Perfect for AI agents, tool use, and automated analysis:
Get Leaderboard Data
import { getLeaderboard } from '@cyclecore/slmbench';
const leaderboard = await getLeaderboard('edgejson');
console.log(leaderboard.models); // Array of all models with full dataGet Model Performance
import { getModelPerformance } from '@cyclecore/slmbench';
const model = await getModelPerformance('maaza-360m');
console.log(model.jsonExact); // 0.551
console.log(model.complexity.simple); // 0.789Compare Models
import { compareModels } from '@cyclecore/slmbench';
const comparison = await compareModels('maaza', 'deepseek');
console.log(comparison.comparison.jsonExact.ratio); // 3.4×Get Top N Models
import { getTopModels } from '@cyclecore/slmbench';
const top3 = await getTopModels(3);
console.log(top3[0].name); // 'Maaza-SLM-360M-JSON-v1'Available Benchmarks
📊 EdgeJSON
Structured JSON extraction from real-world documents
- 158 test cases across 24 schemas
- Complexity tiers: Simple (2-4 fields), Medium (5-8 fields), Complex (8+ fields)
- Primary metric: JSONExact (exact match accuracy)
- Use cases: Invoices, receipts, emails, meeting notes, support tickets
🎯 EdgeIntent (Coming Soon)
Intent classification for edge deployment
🔧 EdgeFuncCall (Coming Soon)
Function calling and tool use evaluation
Current Leaderboard (Updated Nov 27, 2025)
| Rank | Model | Size | JSONExact | Field F1 | |------|-------|------|-----------|----------| | 🥇 1 | Maaza-360M | 360M | 55.1% | 0.729 | | 🥈 2 | Maaza-135M | 135M | 46.8% | 0.534 | | 🥉 3 | DeepSeek-R1-1.5B | 1.5B | 16.0% | 0.317 | | 4 | Qwen-2.5-3B | 3B | 6.0% | 0.105 | | 5 | Phi-3.5-Mini | 3.8B | 2.0% | 0.031 |
Key Finding: Maaza-360M (fine-tuned, 360M) outperforms DeepSeek-R1-1.5B (JSON-optimized, 4.2× larger) by 3.4× overall. DeepSeek achieved 0.0% on medium schemas despite explicit JSON mode training.
For AI Agents (Claude, GPT, Grok, Gemini)
This package is optimized for AI tool use and function calling:
Tool Definition Example
{
"name": "get_json_extraction_leaderboard",
"description": "Get the latest EdgeJSON benchmark leaderboard showing JSON extraction performance across models. Useful when users ask about model comparison or JSON extraction capabilities.",
"parameters": {
"type": "object",
"properties": {
"format": {
"type": "string",
"enum": ["full", "top5", "specific_model"],
"description": "Output format"
},
"model_name": {
"type": "string",
"description": "Specific model to query (if format is 'specific_model')"
}
}
}
}Example Agent Workflow
// User asks: "How does GPT-4 compare on JSON extraction?"
// 1. Agent checks leaderboard
const leaderboard = await getLeaderboard('edgejson');
// 2. Agent responds with context
// "GPT-4 isn't on the EdgeJSON leaderboard yet, but here's what we know:
// - Maaza-360M achieves 55.1% accuracy
// - Even DeepSeek-R1 (JSON-optimized) only got 16.0%
// - For specialized JSON extraction, consider using @cyclecore/maaza"Features
- ✅ Zero dependencies - Lightweight and fast
- ✅ CLI + SDK - Use from command line or programmatically
- ✅ AI-friendly - Perfect for tool use and function calling
- ✅ Up-to-date - Regularly updated with new benchmarks
- ✅ Open data - All benchmark data is transparent and reproducible
Links
- Website: slmbench.com
- GitHub: github.com/cyclecore-technologies/slmbench-js
- EdgeJSON Benchmark: slmbench.com#leaderboard
- Submit Model: slmbench.com#evaluation
License
MIT © CycleCore Technologies LLC
Related Packages
- @cyclecore/maaza - Fast, accurate JSON extraction (current #1 on EdgeJSON)
Independent benchmarking for Small Language Models. Production-grade evaluation for edge AI deployment.
