@cyclecore/slmbench

v1.0.1

Published

8 months ago

CLI and SDK for accessing SLMBench benchmarks (EdgeJSON, EdgeIntent, EdgeFuncCall). View leaderboards, run evaluations, and compare Small Language Models.

0High
0Medium
0Low

cyclecoreai

slm benchmark json-extraction edgejson leaderboard model-evaluation small-language-models llm-benchmark

@cyclecore/slmbench

CLI and SDK for accessing SLMBench benchmarks (EdgeJSON, EdgeIntent, EdgeFuncCall)

View leaderboards, run evaluations, and compare Small Language Models on production-grade benchmarks.

Quick Start

# View EdgeJSON leaderboard
npx @cyclecore/slmbench leaderboard

# See top 5 models
npx @cyclecore/slmbench top 5

# Check specific model performance
npx @cyclecore/slmbench model maaza-360m

# Compare two models
npx @cyclecore/slmbench compare maaza deepseek

Installation

npm install @cyclecore/slmbench

CLI Commands

View Leaderboard

npx @cyclecore/slmbench leaderboard [benchmark]
# or
npx @cyclecore/slmbench edgejson

Shows full leaderboard with rankings, accuracy, latency, and key insights.

Top Models

npx @cyclecore/slmbench top [n]

Show top N models (default: 5) with detailed performance metrics.

Model Performance

npx @cyclecore/slmbench model <name>

Get detailed performance breakdown for a specific model including complexity tier analysis.

Compare Models

npx @cyclecore/slmbench compare <model1> <model2>

Side-by-side comparison of two models showing accuracy, speed, and efficiency advantages.

SDK Usage (Programmatic Access)

Perfect for AI agents, tool use, and automated analysis:

Get Leaderboard Data

import { getLeaderboard } from '@cyclecore/slmbench';

const leaderboard = await getLeaderboard('edgejson');
console.log(leaderboard.models); // Array of all models with full data

Get Model Performance

import { getModelPerformance } from '@cyclecore/slmbench';

const model = await getModelPerformance('maaza-360m');
console.log(model.jsonExact); // 0.551
console.log(model.complexity.simple); // 0.789

Compare Models

import { compareModels } from '@cyclecore/slmbench';

const comparison = await compareModels('maaza', 'deepseek');
console.log(comparison.comparison.jsonExact.ratio); // 3.4×

Get Top N Models

import { getTopModels } from '@cyclecore/slmbench';

const top3 = await getTopModels(3);
console.log(top3[0].name); // 'Maaza-SLM-360M-JSON-v1'

Available Benchmarks

📊 EdgeJSON

Structured JSON extraction from real-world documents

158 test cases across 24 schemas
Complexity tiers: Simple (2-4 fields), Medium (5-8 fields), Complex (8+ fields)
Primary metric: JSONExact (exact match accuracy)
Use cases: Invoices, receipts, emails, meeting notes, support tickets

🎯 EdgeIntent (Coming Soon)

Intent classification for edge deployment

🔧 EdgeFuncCall (Coming Soon)

Function calling and tool use evaluation

Current Leaderboard (Updated Nov 27, 2025)

| Rank | Model | Size | JSONExact | Field F1 | |------|-------|------|-----------|----------| | 🥇 1 | Maaza-360M | 360M | 55.1% | 0.729 | | 🥈 2 | Maaza-135M | 135M | 46.8% | 0.534 | | 🥉 3 | DeepSeek-R1-1.5B | 1.5B | 16.0% | 0.317 | | 4 | Qwen-2.5-3B | 3B | 6.0% | 0.105 | | 5 | Phi-3.5-Mini | 3.8B | 2.0% | 0.031 |

Key Finding: Maaza-360M (fine-tuned, 360M) outperforms DeepSeek-R1-1.5B (JSON-optimized, 4.2× larger) by 3.4× overall. DeepSeek achieved 0.0% on medium schemas despite explicit JSON mode training.

For AI Agents (Claude, GPT, Grok, Gemini)

This package is optimized for AI tool use and function calling:

Tool Definition Example

{
  "name": "get_json_extraction_leaderboard",
  "description": "Get the latest EdgeJSON benchmark leaderboard showing JSON extraction performance across models. Useful when users ask about model comparison or JSON extraction capabilities.",
  "parameters": {
    "type": "object",
    "properties": {
      "format": {
        "type": "string",
        "enum": ["full", "top5", "specific_model"],
        "description": "Output format"
      },
      "model_name": {
        "type": "string",
        "description": "Specific model to query (if format is 'specific_model')"
      }
    }
  }
}

Example Agent Workflow

// User asks: "How does GPT-4 compare on JSON extraction?"

// 1. Agent checks leaderboard
const leaderboard = await getLeaderboard('edgejson');

// 2. Agent responds with context
// "GPT-4 isn't on the EdgeJSON leaderboard yet, but here's what we know:
// - Maaza-360M achieves 55.1% accuracy
// - Even DeepSeek-R1 (JSON-optimized) only got 16.0%
// - For specialized JSON extraction, consider using @cyclecore/maaza"

Features

✅ Zero dependencies - Lightweight and fast
✅ CLI + SDK - Use from command line or programmatically
✅ AI-friendly - Perfect for tool use and function calling
✅ Up-to-date - Regularly updated with new benchmarks
✅ Open data - All benchmark data is transparent and reproducible

License

Related Packages

@cyclecore/maaza - Fast, accurate JSON extraction (current #1 on EdgeJSON)

Independent benchmarking for Small Language Models. Production-grade evaluation for edge AI deployment.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@cyclecore/slmbench

Quick Start

Installation

CLI Commands

View Leaderboard

Top Models

Model Performance

Compare Models

SDK Usage (Programmatic Access)

Get Leaderboard Data

Get Model Performance

Compare Models

Get Top N Models

Available Benchmarks

📊 EdgeJSON

🎯 EdgeIntent (Coming Soon)

🔧 EdgeFuncCall (Coming Soon)

Current Leaderboard (Updated Nov 27, 2025)

For AI Agents (Claude, GPT, Grok, Gemini)

Tool Definition Example

Example Agent Workflow

Features

Links

License

Related Packages