@supposedev/suppose-cli
v0.3.1
Published
CLI for Suppose - model your system architecture and measure latency, cost, and accuracy before writing any code
Maintainers
Readme
@supposedev/suppose-cli
CLI for Suppose -- model AI system architectures and measure latency, cost, and accuracy before writing any code.
Install
npm install -g @supposedev/suppose-cliAuthenticate
suppose login # Opens browser for Cognito sign-in
suppose usage # Verify login, view quotaQuick start
# Generate a starter topology
suppose scaffold rag -o pipeline.yaml
# Validate syntax
suppose validate pipeline.yaml
# Fast algebraic evaluation (deterministic, single-pass)
suppose eval pipeline.yaml
# Check feasibility against SLA constraints
suppose feasibility pipeline.yaml \
--constraint "response.latency_ms <= 500"
# Discrete-event simulation (stochastic, tail latencies)
suppose sim pipeline.yaml
# Multi-objective optimization
suppose optimize pipeline.yaml \
--vary llm.latency_ms.mean=50..300 \
--objective "minimize response.latency_ms.mean" \
--constraint "response.cost_usd <= 0.005"
# Open in web playground
suppose open pipeline.yamlTopology format
Topologies are YAML files with four required sections:
inputs: [request] # Named injection points
outputs: [response] # Named collection points
nodes:
embedder:
description: "Text embedding"
kind:
type: transform
profile:
distributions:
latency_ms: { type: normal, mean: 20.0, std_dev: 3.0 }
cost_usd: { type: point, value: 0.0001 }
cache:
description: "Semantic cache"
kind:
type: branch
condition:
type: probability
distribution: { type: point, value: 0.6 }
true_profile:
distributions:
latency_ms: { type: point, value: 2.0 }
false_profile:
distributions:
latency_ms: { type: point, value: 1.0 }
llm:
description: "Claude Sonnet"
kind:
type: transform
profile:
distributions:
latency_ms: { type: log_normal, mu: 6.2, sigma: 0.5 }
cost_usd:
per_token: { type: point, value: 0.000003 }
accuracy: { type: normal, mean: 0.94, std_dev: 0.02 }
edges:
- { from: _in.request, to: embedder.input }
- { from: embedder.output, to: cache.input }
- { from: cache.on_true, to: _out.response } # cache hit
- { from: cache.on_false, to: llm.input } # cache miss
- { from: llm.output, to: _out.response }Optional scenario and workload blocks control simulation parameters:
scenario:
rps: 500 # Requests per second
duration_secs: 60 # Simulation duration
warm_up_secs: 5 # Exclude initial signals from stats
replications: 3 # Independent runs for confidence
workload:
token_count: { type: normal, mean: 500, std_dev: 200 }
payload_size_bytes: { type: uniform, low: 1024, high: 65536 }Node types
| Type | Ports | Description |
|------|-------|-------------|
| transform | input -> output | Applies latency + cost + accuracy |
| branch | input -> on_true, on_false | Routes by probability or metric threshold |
| split | input -> out.0, out.1, ... | Clone (broadcast) or weighted routing |
| join | input (multi) -> output | Collects branches: latency=max, cost=sum, accuracy=min |
| loop | input -> stopped, exhausted | Iterative processing with early stopping |
| gate | input, release -> accept, reject | Rate limiting, queues, concurrency control |
Distribution types
{ type: point, value: 42.0 } # Deterministic
{ type: normal, mean: 100.0, std_dev: 10.0 } # Gaussian
{ type: log_normal, mu: 3.0, sigma: 0.5 } # Heavy tail
{ type: uniform, low: 10.0, high: 50.0 } # Uniform rangeScaled costs
cost_usd:
per_token: { type: point, value: 0.000003 } # x signal.token_count
egress:
per_byte: { type: point, value: 0.00000001 } # x signal.payload_size_bytesRequires a workload block with token_count or payload_size_bytes.
Commands
suppose eval <file>
Fast algebraic evaluation. Deterministic, single-pass. Best for linear pipelines. Returns null for probabilistic branch paths not taken.
suppose eval pipeline.yaml # Table output
suppose eval pipeline.yaml --json # JSON output
suppose eval pipeline.yaml --seed 42 # Reproducible
suppose eval - # Read from stdinsuppose sim <file>
Monte Carlo discrete-event simulation. Required for branches, loops, gates, distribution tails (p95/p99), and workload-shaped traffic.
suppose sim pipeline.yaml # Uses scenario block defaults
suppose sim pipeline.yaml -n 1000 # Override signal count
suppose sim pipeline.yaml --json # JSON with signals + summary
suppose sim pipeline.yaml --seed 42 # Reproduciblesuppose feasibility <file>
Check if a topology can meet SLA constraints. Very fast. Supports forward checking ("does this topology meet my SLA?") and backward solving ("what parameter values make the SLA achievable?").
# Forward check: fixed topology against SLAs
suppose feasibility pipeline.yaml \
--constraint "response.latency_ms <= 500" \
--constraint "response.accuracy >= 0.9"
# Backward solve: find parameters that satisfy SLAs
suppose feasibility pipeline.yaml \
--vary llm.latency_ms.mean=50..500 \
--constraint "response.latency_ms <= 200"Constraint format: port.metric[.stat] operator value
Operators: <=, >=, <, >
suppose optimize <file>
Multi-objective optimization using NSGA-II. Finds Pareto-optimal configurations by varying distribution parameters.
suppose optimize pipeline.yaml \
--vary llm.latency_ms.mean=50..500 \
--vary llm.cost_usd.value=0.001..0.05 \
--objective "minimize response.latency_ms.mean" \
--objective "minimize response.cost_usd.mean" \
--constraint "response.accuracy >= 0.85" \
--max-evaluations 200 \
-n 30 \
--jsonParameter path format: node.metric.field (e.g. llm.latency_ms.mean, cache.true_profile.latency_ms.value)
suppose check [file]
CI/CD gate. Run assertions against simulation results or saved baselines.
# Inline assertion against a topology
suppose check pipeline.yaml \
--assert "response.p95_latency_ms < 200" \
--assert "response.mean_metrics.accuracy > 0.9"
# Run a check file
suppose check pipeline.suppose-check.yaml
# Discover and run all check files in the project
suppose check --all
# Only check topologies modified in git
suppose check --changed
# Generate markdown table for PR comments
suppose check --all --comment
# Update baselines when checks pass
suppose check --all --update-baseline --commitsuppose init <file>
Initialize a check file and baseline for a topology:
suppose init pipeline.suppose.yamlCreates:
pipeline.suppose-check.yamlwith default assertions.suppose/baselines/pipeline.suppose.jsonwith current metrics
Check file format
topology: pipeline.suppose.yaml
seed: 42
signals: 200
checks:
- name: Latency SLO
assert: response.p95_latency_ms < 200
- name: Cost budget
assert: response.mean_metrics.cost_usd < baseline * 1.2
- name: Accuracy floor
assert: response.mean_metrics.accuracy > 0.95Use baseline or baseline * factor for regression detection.
suppose scaffold <pattern>
Generate a starter topology:
suppose scaffold rag -o my-pipeline.yaml
suppose patterns # List available patterns
suppose patterns --json # JSON with descriptionsPatterns: llm, rag, rag_with_cache, agentic_loop, fan_out_fan_in, load_balanced, queue_with_processing, quality_gate_retry, circuit_breaker
suppose render <file>
Render topology as a diagram:
suppose render pipeline.yaml # ASCII art
suppose render pipeline.yaml -f mermaid # Mermaid flowchart
suppose render pipeline.yaml -f mermaid -d left_rightsuppose open <file>
Open topology in the web playground:
suppose open pipeline.yaml # Opens browser
suppose open pipeline.yaml --print # Print URL only
echo "..." | suppose open - --print # From stdinsuppose skill
Generate an AI agent skill file:
suppose skill # Raw markdown to stdout
suppose skill -f claude # Write to .claude/commands/suppose.md
suppose skill -f cursor # Write to .cursor/rules/suppose.md
suppose skill -f codex # Write to .codex/instructions/suppose.md
suppose skill -f gemini # Write to .gemini/instructions/suppose.md
suppose skill -o custom.md # Custom output pathsuppose usage
Show current month API usage and quotas:
suppose usage # Table output
suppose usage --json # JSON with quotasOther commands
suppose health # Check gateway + compute backend
suppose health --json # JSON health status
suppose schema # Print topology JSON Schema
suppose validate <file> # Validate YAML syntax
suppose validate <file> --json # Structured diagnostics
suppose login # Cognito PKCE login (opens browser)
suppose logout # Clear stored credentialsGlobal options
--api-url <url> Override API base URL (default: https://api.suppose.dev)
--json Output as JSON (available on most commands)
-h, --help Show help for any command
-V, --version Show versionMCP server
Suppose exposes an MCP server at mcp.suppose.dev for AI coding assistants:
# Generate Claude Code config
suppose skill -f claude
# Or add to .claude/settings.json manually:
# { "mcpServers": { "suppose": { "type": "url", "url": "https://mcp.suppose.dev/mcp" } } }Tools: evaluate, simulate, simulate_batch, optimize, feasibility, validate, scaffold, health
Links
- suppose.dev -- landing page
- app.suppose.dev -- web playground
- mcp.suppose.dev -- MCP server
License
Copyright (c) 2026 Barak Bercovitz. All Rights Reserved.
