slimclaw
v0.5.0
Published
Inference optimization middleware — windowing, routing, caching
Maintainers
Readme
SlimClaw 🦞
Advanced inference optimization plugin for OpenClaw. Provides intelligent model routing, cross-provider pricing, dynamic cost tracking, latency monitoring, prompt caching optimization, and multi-provider embedding service across 12+ models on Anthropic and OpenRouter.
0 vulnerabilities · 855 tests passing · Node 22+
Features
🎯 Intelligent Routing
- ClawRouter Integration — 15-dimension complexity scoring as primary classifier, with heuristic fallback
- Cross-Provider Support — Route across Anthropic and OpenRouter providers seamlessly
- Tier-Based Classification — Automatic model selection based on request complexity (simple/mid/complex/reasoning)
- Provider Resolution — Glob-pattern matching for intelligent provider selection (
openai/*→ openrouter) - Shadow Mode — Observe routing decisions without mutation (active mode blocked on OpenClaw hooks)
📊 Metrics & Analytics
- Request Tracking — Input/output tokens, cache reads/writes, estimated savings per request
- Dynamic Pricing — Live pricing data from OpenRouter API with 6-hour TTL caching
- Latency Monitoring — Per-model P50/P95/avg latency tracking with circular buffer (100 samples default)
- Dashboard — Real-time web UI at
http://localhost:3333with dark theme
💾 Caching Optimization
- Cache Breakpoint Injection — Optimizes Anthropic's prompt caching for maximum efficiency
- Smart Windowing — Maintains conversation context while minimizing redundant tokens
- 90% Cache Savings — Cache reads are significantly cheaper than regular input tokens
🔮 Shadow Mode
- Risk-Free Operation — All routing runs in observe-only mode, no request mutation
- Comprehensive Logging — Detailed routing decisions with cost projections and provider recommendations
- Full Pipeline Simulation — Complete shadow execution of classify → resolve → recommend → log
Installation
Method 1: OpenClaw CLI (Recommended)
openclaw plugins install slimclawThis automatically downloads, installs, and registers the plugin. Skip to Configuration.
Method 2: npm install (Manual Setup)
# Install to OpenClaw plugins directory
mkdir -p ~/.openclaw/plugins/slimclaw
cd ~/.openclaw/plugins/slimclaw
npm init -y
npm install slimclawThen register the plugin in ~/.openclaw/openclaw.json:
{
"plugins": {
"load": {
"paths": ["~/.openclaw/plugins/slimclaw"]
},
"entries": {
"slimclaw": { "enabled": true }
}
}
}Method 3: From Source
git clone https://github.com/evansantos/slimclaw ~/.openclaw/plugins/slimclaw
cd ~/.openclaw/plugins/slimclaw
npm install && npm run buildThen register the plugin in ~/.openclaw/openclaw.json using the same configuration as Method 2.
Configuration
Create a configuration file at ~/.openclaw/plugins/slimclaw/slimclaw.config.json.
Minimal Configuration
Start with this basic setup to enable shadow mode:
{
"enabled": true,
"mode": "shadow"
}Shadow Mode (Safe Default)
Shadow mode observes and logs routing decisions without modifying requests:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true
},
"dashboard": {
"enabled": true,
"port": 3333
}
}With Routing Tiers
Configure which models to use for different complexity levels:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "anthropic/claude-sonnet-4-20250514",
"complex": "anthropic/claude-opus-4-20250514",
"reasoning": "anthropic/claude-opus-4-20250514"
}
},
"dashboard": {
"enabled": true,
"port": 3333
}
}With OpenRouter Cross-Provider Routing
Route different model families to their optimal providers:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "openai/gpt-4-turbo",
"complex": "anthropic/claude-opus-4-20250514",
"reasoning": "openai/o1-preview"
},
"tierProviders": {
"openai/*": "openrouter",
"anthropic/*": "anthropic",
"google/*": "openrouter"
},
"openRouterHeaders": {
"HTTP-Referer": "https://slimclaw.dev",
"X-Title": "SlimClaw"
}
},
"dashboard": {
"enabled": true,
"port": 3333
}
}With Budget Enforcement
Set daily/weekly spending limits with automatic tier downgrading:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true,
"reasoningBudget": 10000,
"allowDowngrade": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "anthropic/claude-sonnet-4-20250514",
"complex": "anthropic/claude-opus-4-20250514",
"reasoning": "anthropic/claude-opus-4-20250514"
}
},
"dashboard": {
"enabled": true,
"port": 3333
}
}With Dynamic Pricing
Automatically fetch live pricing from OpenRouter API:
{
"enabled": true,
"mode": "shadow",
"routing": {
"enabled": true,
"shadowLogging": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "openai/gpt-4-turbo",
"complex": "anthropic/claude-opus-4-20250514",
"reasoning": "openai/o1-preview"
},
"tierProviders": {
"openai/*": "openrouter",
"anthropic/*": "anthropic"
},
"dynamicPricing": {
"enabled": true,
"ttlMs": 21600000,
"refreshIntervalMs": 21600000,
"timeoutMs": 10000,
"apiUrl": "https://openrouter.ai/api/v1/models"
}
},
"dashboard": {
"enabled": true,
"port": 3333
}
}Verification
After configuration, restart the OpenClaw gateway to load the plugin:
# Restart gateway to load plugin
openclaw gateway restart
# Check it loaded successfully
tail -5 ~/.openclaw/logs/gateway.log | grep SlimClaw
# Check the dashboard is running
curl http://localhost:3333/api/statsYou should see log entries indicating SlimClaw loaded successfully and the dashboard should respond with current metrics. Access the web UI at http://localhost:3333 to monitor routing decisions in real-time.
Usage
Command
/slimclawShows current metrics: requests, tokens, cache hits, savings, and routing statistics.
Dashboard
When dashboard.enabled: true, access the web UI at:
http://localhost:3333Or from your network: http://<your-ip>:3333
API Endpoints
GET /metrics/optimizer— Current metrics summaryGET /metrics/history?period=hour&limit=24— Historical dataGET /metrics/raw— Raw metrics for debuggingGET /api/routing-stats— Routing decision statisticsGET /health— Health check
Architecture
SlimClaw implements a comprehensive routing pipeline that operates in shadow mode:
Routing Pipeline
- Classification → ClawRouter analyzes request complexity across 15 dimensions, assigns tier (simple/mid/complex/reasoning). Falls back to built-in heuristic classifier on failure (circuit breaker pattern).
- Provider Resolution → Map model patterns to providers using glob matching (
openai/*→ openrouter) - Shadow Recommendation → Generate complete routing decision without mutation
- Structured Logging → Output detailed recommendations with cost analysis
Shadow Mode
SlimClaw v0.2.0 operates exclusively in shadow mode due to OpenClaw's current hook limitations:
- Observes all requests and generates routing recommendations
- Logs what routing decisions would be made with cost projections
- Tracks dynamic pricing and latency for routing intelligence
- Cannot mutate requests until OpenClaw supports
historyMessagesmutation
Example Shadow Log:
[SlimClaw] 🔮 Shadow route: opus-4-6 → o4-mini (via openrouter)
Tier: reasoning (0.92) | Savings: 78% | $0.045/1k → $0.003/1kModel Tier Mapping
| Complexity | Default Model | Use Cases |
| ----------- | --------------- | ----------------------------------------- |
| simple | Claude 3 Haiku | Basic Q&A, simple tasks, casual chat |
| mid | Claude 4 Sonnet | General development, analysis, writing |
| complex | Claude 4 Opus | Architecture, complex debugging, research |
| reasoning | Claude 4 Opus | Multi-step logic, planning, deep analysis |
API Reference
Core Routing
import {
resolveModel,
buildShadowRecommendation,
makeRoutingDecision,
type ShadowRecommendation,
} from './routing/index.js';
// Generate routing decision
const decision = resolveModel(classification, routingConfig, context);
// Build complete shadow recommendation
const recommendation = buildShadowRecommendation(
runId,
actualModel,
decision,
tierProviders,
pricing,
);Dynamic Pricing
import { DynamicPricingCache, DEFAULT_DYNAMIC_PRICING_CONFIG } from './routing/dynamic-pricing.js';
const pricing = new DynamicPricingCache({
enabled: true,
openRouterApiUrl: 'https://openrouter.ai/api/v1/models',
cacheTtlMs: 21600000, // 6 hours
fetchTimeoutMs: 5000,
fallbackToHardcoded: true,
});
// Get live pricing (sync — falls back to hardcoded if cache miss)
const cost = pricing.getPricing('openai/gpt-4-turbo');
// { inputPer1k: 0.01, outputPer1k: 0.03 }
// Refresh cache from OpenRouter API (async)
await pricing.refresh();Latency Tracking
import {
LatencyTracker,
DEFAULT_LATENCY_TRACKER_CONFIG,
type LatencyStats,
} from './routing/latency-tracker.js';
const tracker = new LatencyTracker({
enabled: true,
windowSize: 100, // circular buffer per model
outlierThresholdMs: 60000,
});
// Record request latency (tokenCount optional — enables throughput calc)
tracker.recordLatency('anthropic/claude-sonnet-4-20250514', 2500, 150);
// Get statistics
const stats: LatencyStats | null = tracker.getLatencyStats('anthropic/claude-sonnet-4-20250514');
// { p50: 2100, p95: 4500, avg: 2300, min: 1800, max: 5200, count: 87, tokensPerSecond: 45.2 }Provider Resolution
import { resolveProvider, type ProviderResolution } from './routing/provider-resolver.js';
const resolution = resolveProvider('openai/gpt-4-turbo', {
'openai/*': 'openrouter',
'anthropic/*': 'anthropic',
});
// { provider: 'openrouter', matchedPattern: 'openai/*', confidence: 1.0 }Status
Version 0.2.0
- 618 tests passing across 43 test files
- 3 major phases complete:
- Phase 1: Cross-provider pricing (PR #24)
- Phase 2a: Shadow routing (PR #25)
- Phase 3a: Dynamic pricing + latency tracking (PR #29)
- 12+ models supported across Anthropic and OpenRouter providers
- Shadow mode only — active routing blocked on OpenClaw hook mutation support
- 0 vulnerabilities — clean security audit
Cross-Provider Coverage
| Provider | Models | Pricing | Status | | ---------- | --------------------------------- | ---------------------- | ---------- | | Anthropic | Claude 3/4 series | ✅ Dynamic + Hardcoded | Production | | OpenRouter | 12+ models (OpenAI, Google, etc.) | ✅ Live API pricing | Production |
Embeddings Router 🧬
SlimClaw includes a production-ready embedding router for multi-provider vector generation with intelligent caching, cost tracking, and configurable complexity-based model selection.
Features
- Intelligent Routing — Classify text complexity (simple/mid/complex) and select optimal embedding model
- Smart Caching — SQLite-backed persistent cache with TTL and duplicate detection
- Cost Tracking — Per-model pricing with real-time metrics and savings calculation
- Retry Logic — Exponential backoff with configurable retries and timeout protection
- Configurable Thresholds — Tune complexity detection via config (default: 200/1000 chars)
- Dashboard Integration — Real-time metrics at
/api/embeddings/metrics
Quick Start
import { EmbeddingRouter } from 'slimclaw';
const router = new EmbeddingRouter({
providers: [
new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY }),
new OpenRouterProvider({ apiKey: process.env.OPENROUTER_API_KEY }),
],
config: {
tiers: {
simple: 'openai/text-embedding-3-small',
mid: 'openai/text-embedding-3-large',
complex: 'cohere/cohere-embed-english-v3.0',
},
retry: {
maxRetries: 3,
baseDelayMs: 1000,
timeoutMs: 30000,
},
},
});
const result = await router.embed('Hello, world!');
console.log(result.embedding); // [0.123, 0.456, ...]Configuration
1. Set up API keys — Create .env from .env.example:
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENROUTER_API_KEY=your-openrouter-key-here2. Enable in config — Add to slimclaw.config.json or OpenClaw's openclaw.json under plugins.slimclaw:
{
"embeddings": {
"enabled": true,
"routing": {
"tiers": {
"simple": "voyage-3-lite",
"mid": "voyage-3",
"complex": "voyage-3-large"
},
"tierProviders": {
"voyage-3-lite": "openrouter",
"voyage-3": "openrouter",
"voyage-3-large": "openrouter"
}
},
"caching": {
"enabled": true,
"ttlMs": 3600000,
"maxSize": 1000
},
"metrics": {
"enabled": true,
"trackCosts": true
}
}
}Config Schema:
enabled(boolean, default:false) — Enable embedding routerrouting.tiers(object) — Map complexity tiers to modelsrouting.tierProviders(object) — Map models/patterns to providers (anthropic|openrouter)caching.enabled(boolean, default:true) — Enable SQLite cachecaching.ttlMs(number, default:3600000) — Cache TTL in milliseconds (1 hour)caching.maxSize(number, default:1000) — Max cache entriesmetrics.enabled(boolean, default:true) — Track metricsmetrics.trackCosts(boolean, default:true) — Calculate per-model costs
Dashboard Metrics
Access embedding metrics at: GET /api/embeddings/metrics
{
"totalRequests": 42,
"cacheHits": 38,
"cacheMisses": 4,
"totalCost": 0.15,
"averageDurationMs": 245,
"requestsByTier": {
"simple": 20,
"mid": 15,
"complex": 7
}
}Retry Strategy
- Exponential Backoff: 1s → 2s → 4s delays between attempts
- Smart Error Handling: Retries 5xx/timeouts, skips 4xx (client errors)
- Timeout Protection: Per-request 30s timeout prevents hanging
- Error Context: Logs include provider name, attempt count, original error
Testing
Embeddings tests included in main test suite:
npm test -- src/embeddings/Coverage: 57+ tests covering routing, caching, retry logic, and metrics.
Roadmap
- Active Mode — Waiting on OpenClaw hook mutation support (#20416)
- Budget Enforcement — Per-session/daily spend limits with automatic tier capping
- A/B Testing — Route percentages across providers to compare quality/cost tradeoffs
- Multi-Factor Routing — Combine cost + latency + quality signals for optimal model selection
- Embedding Pooling — Batch multiple embedding requests for efficiency
Dependencies
Note:
@blockrun/clawrouterpullsviemas a transitive dependency (~50 packages). This doesn't affect functionality but adds tonode_modulessize. See BlockRunAI/clawrouter#1 for tracking.
Provider Proxy Mode (Phase 1)
SlimClaw can operate as an active provider proxy, intercepting model requests and applying intelligent routing.
Quick Start
- Enable proxy in
slimclaw.config.json:
{
"enabled": true,
"proxy": {
"enabled": true,
"port": 3334
},
"routing": {
"enabled": true,
"tiers": {
"simple": "anthropic/claude-3-haiku-20240307",
"mid": "anthropic/claude-sonnet-4-20250514",
"complex": "anthropic/claude-opus-4-20250514"
}
}
}- Set your OpenClaw model to
slimclaw/auto:
{
"defaultModel": "slimclaw/auto"
}How It Works
- OpenClaw sends request to
slimclaw/auto - SlimClaw classifies prompt complexity
- Routing pipeline selects optimal model for the tier
- Request forwards to real provider (OpenRouter)
- Streaming response pipes back to OpenClaw
Supported (Phase 1)
| Feature | Status |
| ----------------------- | ---------- |
| slimclaw/auto model | ✅ |
| OpenRouter forwarding | ✅ |
| Streaming responses | ✅ |
| Budget enforcement | ✅ |
| A/B testing | ✅ |
| Direct Anthropic API | ⏳ Phase 2 |
| Multiple virtual models | ⏳ Phase 2 |
Contributing
PRs welcome! The project uses branch protection with CI + GitSniff review.
git clone https://github.com/evansantos/slimclaw
cd slimclaw
npm install
npm testDevelopment
- TypeScript — Full type safety with strict checking
- Testing — 618 tests with full coverage across routing pipeline
- ESLint — Enforced code style and quality
- Branch Protection — All changes require PR review
License
MIT
