@peakinfer/cli

v1.0.133

Published

20 days ago

LLM inference performance analysis CLI

0High
0Medium
0Low

mtrajan

llm inference performance analysis openai anthropic

PeakInfer

Run AI inference at peak performance.

PeakInfer scans your code. Finds every LLM call. Shows you exactly what's holding back your latency, throughput, and reliability.

30 seconds. Zero config. Real numbers.

npm install -g @kalmantic/peakinfer
peakinfer analyze .

The Problem

Your code says streaming: true. Your runtime shows 0% streams.

That's drift—and it's killing your latency.

| What You Think | What's Actually Happening | |----------------|---------------------------| | Streaming enabled | Blocking calls | | Fast responses | p95 latency 5x slower than benchmarks | | Retry logic works | Never triggered | | Fallbacks ready | Never tested |

Static analysis sees code. Monitoring sees requests. Neither sees the gap.

PeakInfer sees both.

What Is Peak Inference Performance?

Peak Inference Performance means: Improving latency, throughput, reliability, and cost without changing evaluated behavior.

No one else correlates what PeakInfer sees together:

CODE                 RUNTIME              BENCHMARKS           EVALS
────                 ───────              ──────────           ─────

What you             What actually        The upper bound      Your quality
declared             happened             of possible          gate

streaming: true      0% streaming         InferenceMAX:        "extraction" 94%
model: gpt-4o        p95: 2400ms          gpt-4o p95: 1200ms   accuracy

        └───────────────────┴────────────────────┴───────────────────┘
                                     │
                                     ▼
                                PEAKINFER
                               (correlation)

The Four Dimensions

PeakInfer analyzes every inference point across four dimensions:

| Dimension | What We Find | Typical Improvement | |-----------|--------------|---------------------| | Latency | Missing streaming, blocking calls, p95 gaps | 50-80% faster | | Throughput | Sequential loops, no batching | 10-50x improvement | | Reliability | No retry, no fallback, no timeout | 99%+ uptime | | Cost | Wrong model for the job | 60-90% reduction |

How It Works

1. Scan Your Code

peakinfer analyze ./src

Finds every inference point. OpenAI, Anthropic, Azure, Bedrock, self-hosted. All of them.

2. See What's Holding You Back

7 inference points found
39 issues detected

LATENCY:
- Streaming configured but not consumed (p95: 2400ms, should be 400ms)
- Blocking calls in hot path (6x latency penalty)

THROUGHPUT:
- Sequential batch processing (50x throughput opportunity)

RELIABILITY:
- Zero error handling across all LLM calls
- No fallback on critical inference path

QUICK WINS:
- Enable streaming consumption: -80% latency
- Add retry logic: +99% reliability
- Parallelize batch: 50x throughput

3. Catch Drift Before Production

Add to every PR:

- uses: kalmantic/peakinfer-action@v1
  with:
    path: ./src
    token: ${{ secrets.PEAKINFER_TOKEN }}

Installation

npm install -g @kalmantic/peakinfer

Requires Node.js 18+.

First-Time Setup

PeakInfer uses Claude for semantic analysis. You provide your own Anthropic API key (BYOK mode).

Step 1: Get an Anthropic API Key

Go to console.anthropic.com
Create an account or sign in
Navigate to API Keys and create a new key
Copy the key (starts with sk-ant-)

Step 2: Configure Your API Key

Option A: Environment File (Recommended)

# .env
ANTHROPIC_API_KEY=sk-ant-your-key-here

Option B: Shell Export

export ANTHROPIC_API_KEY=sk-ant-your-key-here

Step 3: Verify Setup

peakinfer analyze . --verbose

BYOK Mode: Your API key, your costs, full transparency. Analysis runs locally. No data sent to PeakInfer servers.

Commands

# Basic scan
peakinfer analyze .

# With code fix suggestions
peakinfer analyze . --fixes

# With HTML report
peakinfer analyze . --html --open

# Compare to InferenceMAX benchmarks
peakinfer analyze . --benchmark

# With runtime correlation (drift detection)
peakinfer analyze . --events production.jsonl

# Fetch runtime from observability platforms
peakinfer analyze . --runtime helicone --runtime-key $HELICONE_KEY

# Full analysis
peakinfer analyze . --fixes --benchmark --html --open

CLI Options

| Flag | Description | |------|-------------| | Output | | | --fixes | Show code fix suggestions for each issue | | --html | Generate HTML report | | --pdf | Generate PDF report | | --open | Auto-open report in browser/viewer | | --output <format> | Output format: text, json, or inference-map | | --verbose | Show detailed analysis logs | | Runtime Data | | | --events <file> | Path to runtime events file (JSONL) | | --events-url <url> | URL to fetch runtime events | | --runtime <source> | Fetch from: helicone, langsmith | | --runtime-key <key> | API key for runtime source | | --runtime-days <n> | Days of runtime data (default: 7) | | Comparison | | | --compare [runId] | Compare with previous analysis run | | --benchmark | Compare to InferenceMAX benchmarks | | --predict | Generate deploy-time latency predictions | | --target-p95 <ms> | Target p95 latency for budget calculation | | Cost Control | | | --estimate | Show cost estimate before analysis | | --yes | Auto-proceed without confirmation | | --max-cost <dollars> | Skip if estimated cost exceeds threshold | | --cached | View previous analysis (offline) |

GitHub Action

Every PR. Every merge. Automatic.

name: PeakInfer
on: [pull_request]

permissions:
  contents: read
  pull-requests: write

jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: kalmantic/peakinfer-action@v1
        with:
          token: ${{ secrets.PEAKINFER_TOKEN }}
          github-token: ${{ github.token }}

See peakinfer-action for full documentation.

Runtime Drift Detection

PeakInfer's real power: correlating code with runtime behavior.

# From file
peakinfer analyze ./src --events events.jsonl

# From Helicone
peakinfer analyze ./src --runtime helicone --runtime-key $HELICONE_KEY

# From LangSmith
peakinfer analyze ./src --runtime langsmith --runtime-key $LANGSMITH_KEY

Supported formats: JSONL, JSON, CSV, OpenTelemetry, Jaeger, Zipkin, LangSmith, LiteLLM, Helicone.

Supported Providers

| Provider | Status | |----------|--------| | OpenAI | Full support | | Anthropic | Full support | | Azure OpenAI | Full support | | AWS Bedrock | Full support | | Google Vertex | Full support | | vLLM / TensorRT-LLM | HTTP detection | | LangChain / LlamaIndex | Framework support |

Community Templates

43 templates across two categories:

Insight Templates (12)

Detect issues: streaming drift, overpowered model, context accumulation, token underutilization, retry explosion, untested fallback, dead code, and more.

Optimization Templates (31)

Actionable fixes: model routing, batch utilization, prompt caching, vLLM high-throughput, GPTQ quantization, TensorRT-LLM, multi-provider fallback, auto-scaling, and more.

Pricing

CLI: Free forever. BYOK — you provide your Anthropic API key.

GitHub Action:

Free: 50 credits one-time (6-month expiry)
Starter: $19 for 200 credits
Growth: $49 for 600 credits
Scale: $149 for 2,000 credits
Mega: $499 for 10,000 credits

No subscriptions. No per-seat pricing. Team pooling.

View pricing →

What's Included

| Feature | Status | |---------|--------| | Unified Prompt-Based Analysis | ✅ | | GitHub Action with PR Comments | ✅ | | Code Fix Suggestions | ✅ | | Runtime Drift Detection | ✅ | | InferenceMAX Benchmark Comparison | ✅ | | 43 Optimization Templates | ✅ | | Run History & Comparison | ✅ | | BYOK Mode (CLI) | ✅ |

Links

Built by Kalmantic. Apache-2.0 license.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

PeakInfer

The Problem

What Is Peak Inference Performance?

The Four Dimensions

How It Works

1. Scan Your Code

2. See What's Holding You Back

3. Catch Drift Before Production

Installation

First-Time Setup

Step 1: Get an Anthropic API Key

Step 2: Configure Your API Key

Step 3: Verify Setup

Commands

CLI Options

GitHub Action

Runtime Drift Detection

Supported Providers

Community Templates

Insight Templates (12)

Optimization Templates (31)

Pricing

What's Included

Links