playingpack

v0.7.2

Published

16 days ago

Chrome DevTools for AI Agents - Local reverse proxy and debugger for LLM API calls

0High
0Medium
0Low

playingpacknpm

ai llm openai proxy debugger mock testing

PlayingPack

Chrome DevTools for AI Agents — A local reverse proxy and debugger for intercepting, recording, and replaying LLM API calls.

Point your AI agent at PlayingPack instead of your LLM provider, and get a real-time dashboard to watch requests, pause on tool calls, inject mock responses, and replay cached responses with zero latency and zero cost.

Works with any OpenAI API-compatible provider: OpenAI, Ollama, Azure OpenAI, LiteLLM, vLLM, and more.

Why PlayingPack?

Building AI agents is painful:

Expensive iteration — Every test run burns API credits. Debugging a single edge case can cost dollars.
Non-deterministic behavior — LLMs return different responses each time, making tests flaky and debugging a guessing game.
Blind debugging — You can't see what tool calls the agent made or why it chose a particular action.
Slow feedback loops — Waiting seconds for API responses on every iteration kills productivity.
CI/CD nightmares — You can't run reliable automated tests against a non-deterministic, rate-limited API.

PlayingPack solves these problems:

| Problem | Solution | |---------|----------| | Expensive iteration | Cache Mode — Record once, replay forever. Zero API costs after first run. | | Non-deterministic tests | Cache playback — Same request always returns same response. Deterministic by design. | | Blind debugging | Intervene Mode — Pause before and after LLM calls. Inspect, edit, or mock at any point. | | Slow feedback | Instant replay — Cached responses return in milliseconds, not seconds. | | CI/CD reliability | Read-only cache — Run tests against cached responses. Fast, free, deterministic. |

Features

Cache Mode

Record API responses and replay them deterministically. First request hits the real API and saves the response to cache. Subsequent identical requests replay from cache with original timing preserved.

Real-time Dashboard

Browser-based UI showing live request streaming, status updates, request/response inspection with syntax highlighting, and full history.

Intervene Mode

Pause requests at two points in the lifecycle:

Before LLM call — Inspect the request, edit it, use a cached response, or mock without calling the LLM
After LLM response — Inspect the response before it reaches your agent, modify or mock as needed

Full control over request/response flow with the ability to inject mock responses at any point.

SSE Streaming

Full OpenAI-compatible streaming with proper chunk handling. Parses tool calls in real-time. Works exactly like the real API.

Multi-Provider Support

Drop-in replacement for any OpenAI API-compatible endpoint:

OpenAI
Ollama (local LLMs)
Azure OpenAI
LiteLLM
vLLM
Any compatible endpoint

Requirements

Node.js 20+

Installation

# npm
npm install -g playingpack

# pnpm
pnpm add -g playingpack

# yarn
yarn global add playingpack

# Or run directly with npx (no install)
npx playingpack start

Quick Start

1. Start the proxy

npx playingpack start

2. Point your agent at PlayingPack

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4747/v1",
    api_key="your-api-key"  # Still needed for upstream
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

TypeScript/JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: 'http://localhost:4747/v1',
    apiKey: process.env.OPENAI_API_KEY,
});

const response = await client.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Hello!' }],
});

cURL

curl http://localhost:4747/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

3. Open the dashboard

Navigate to http://localhost:4747 in your browser.

Use Cases

Deterministic Testing

Record your agent's API interactions once, then replay them in tests forever:

# First run: records responses to .playingpack/cache/
npx playingpack start &
pytest tests/

# Subsequent runs: replays from cache (instant, free)
npx playingpack start --cache read &
pytest tests/

Your tests become:

Fast — Milliseconds instead of seconds per request
Free — Zero API costs after initial recording
Deterministic — Same input always produces same output
Offline-capable — No network required

Debugging Agent Behavior

Enable intervene mode to pause requests and inspect what your agent is doing:

Start PlayingPack (intervention mode is enabled by default)
Run your agent
At Point 1 (before LLM call), choose:
- Allow — Send the original request to the LLM
- Use Cache — Replay from cache if available (saves API costs)
- Mock — Return a mock response without calling the LLM
At Point 2 (after LLM response), choose:
- Return — Send the response to your agent as-is
- Modify — Edit the response before sending to your agent

CI/CD Integration

Run your test suite against cached responses in CI:

# In your CI pipeline
npx playingpack start --no-ui --no-intervene --cache read &
sleep 2  # Wait for server
npm test

If a cached response is missing, the request fails immediately — no surprise API calls in CI.

# Example GitHub Actions step
- name: Run tests with PlayingPack
  run: |
    npx playingpack start --no-ui --no-intervene --cache read &
    sleep 2
    npm test

Local Development with Ollama

Proxy to a local LLM for free, fast development:

# Start Ollama
ollama serve

# Point PlayingPack at Ollama
npx playingpack start --upstream http://localhost:11434/v1

Now your agent talks to your local LLM through PlayingPack, and you still get recording, replay, and debugging.

Cost Reduction

During development, avoid burning through API credits:

Record a representative set of interactions
Iterate on your agent logic using cached responses
Only hit the real API when you need fresh recordings

Typical savings: 90%+ reduction in API costs during development.

Configuration

Create playingpack.config.ts (or .js, .mjs) in your project root:

import { defineConfig } from 'playingpack';

export default defineConfig({
  // Upstream API endpoint (default: https://api.openai.com)
  upstream: process.env.LLM_API_URL ?? 'https://api.openai.com',

  // Cache mode: 'off' | 'read' | 'read-write' (default: read-write)
  // - off: Always hit upstream, never cache
  // - read: Only read from cache, fail if missing
  // - read-write: Read from cache if available, write new responses
  cache: process.env.CI ? 'read' : 'read-write',

  // Intervene mode: pause for human inspection (default: true)
  intervene: true,

  // Directory for cache storage (default: .playingpack/cache)
  cachePath: '.playingpack/cache',

  // Directory for logs (default: .playingpack/logs)
  logPath: '.playingpack/logs',

  // Server port (default: 4747)
  port: 4747,

  // Server host (default: 0.0.0.0)
  host: '0.0.0.0',

  // Run without UI in CI environments (default: false)
  headless: !!process.env.CI,
});

Environment-Aware Configuration

Using a JS/TS config file allows dynamic configuration based on environment:

import { defineConfig } from 'playingpack';

export default defineConfig({
  // Use different upstream for local vs CI
  upstream: process.env.CI
    ? 'https://api.openai.com'
    : 'http://localhost:11434/v1',

  // CI: read-only (fast, deterministic), Local: read-write (record on miss)
  cache: process.env.CI ? 'read' : 'read-write',

  // No UI needed in CI
  headless: !!process.env.CI,
});

Supported Config Files

Config files are loaded in this order (first found wins):

playingpack.config.ts (recommended)
playingpack.config.mts
playingpack.config.js
playingpack.config.mjs
playingpack.config.jsonc (legacy)
playingpack.config.json (legacy)

CLI flags override config file values.

CLI Reference

npx playingpack start [options]

| Option | Description | Default | |--------|-------------|---------| | -p, --port <port> | Port to listen on | 4747 | | -h, --host <host> | Host to bind to | 0.0.0.0 | | --no-ui | Run without UI (headless mode) | false | | --upstream <url> | Upstream API URL | https://api.openai.com | | --cache-path <path> | Directory for cache storage | .playingpack/cache | | --cache <mode> | Cache mode (off, read, read-write) | read-write | | --no-intervene | Disable human intervention mode | false |

Examples

# Proxy to a local LLM (Ollama)
npx playingpack start --upstream http://localhost:11434/v1

# CI mode: read-only cache, no UI, no intervention
npx playingpack start --no-ui --no-intervene --cache read

# Custom port and cache directory
npx playingpack start --port 8080 --cache-path ./test/fixtures/cache

# Disable intervention mode for CI/CD
npx playingpack start --no-intervene

How It Works

Architecture

Your Agent  →  PlayingPack (localhost:4747)  →  Upstream API
                      ↓
                Dashboard UI
                - View requests in real-time
                - Pause & inspect tool calls
                - Mock responses
                - Replay from cache

Request Flow

Request arrives at POST /v1/chat/completions
Cache lookup — Request body is normalized and hashed (SHA-256)
Intervention Point 1? → If intervene enabled, wait for user action (allow/cache/mock)
Get response → From cache (if available) or upstream LLM
Intervention Point 2? → If intervene enabled, wait for user action (return/modify)
Response complete → Save to cache (if enabled), notify dashboard

Simple Mental Model

┌─────────────────────────────────────────────────────────────────────┐
│                         PlayingPack                                  │
│                                                                      │
│   Cache: System remembers responses (read/write/off)                │
│   Intervene: Human can inspect/modify at two points                 │
│                                                                      │
│   Request → [Point 1: Before LLM] → Response → [Point 2: After] →   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Cache Format

Cached responses are stored as JSON files named by request hash:

{
  "meta": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "hash": "a1b2c3d4e5f6...",
    "timestamp": "2025-01-13T10:30:00.000Z",
    "model": "gpt-4",
    "endpoint": "/v1/chat/completions"
  },
  "request": {
    "body": { "model": "gpt-4", "messages": [...] }
  },
  "response": {
    "status": 200,
    "chunks": [
      { "c": "data: {\"id\":\"chatcmpl-...\"}\n\n", "d": 50 },
      { "c": "data: {\"id\":\"chatcmpl-...\"}\n\n", "d": 30 },
      { "c": "data: [DONE]\n\n", "d": 10 }
    ]
  }
}

c = chunk content (SSE data)
d = delay in milliseconds since previous chunk

Request Hashing

Requests are normalized before hashing to ensure deterministic matching:

Keys are sorted alphabetically
stream parameter is ignored (streaming and non-streaming match)
Timestamps and request IDs are removed
Result: SHA-256 hash used as cache filename

API Endpoints

| Endpoint | Description | |----------|-------------| | POST /v1/chat/completions | OpenAI-compatible chat endpoint (proxied) | | GET /v1/* | Other OpenAI endpoints (passthrough) | | GET /ws | WebSocket for real-time dashboard updates | | ALL /api/trpc/* | TRPC API for dashboard | | GET /health | Health check | | GET / | Dashboard UI |

Development

See CONTRIBUTING.md for full details.

# Clone and install
git clone https://github.com/geoptly/playingpack.git
cd playingpack
pnpm install

# Run in development mode (hot reload)
pnpm dev

# Run tests
pnpm test

# Type check
pnpm typecheck

# Lint and format
pnpm lint
pnpm format

# Build for production
pnpm run build:all

Project Structure

playingpack/
├── packages/
│   ├── shared/        # TypeScript types & Zod schemas
│   ├── cli/           # Fastify proxy server + CLI
│   │   ├── proxy/     # HTTP routing, upstream client, SSE parsing
│   │   ├── cache/     # Response caching & playback
│   │   ├── session/   # Session state management
│   │   ├── mock/      # Synthetic response generation
│   │   ├── trpc/      # API procedures
│   │   └── websocket/ # Real-time events
│   └── web/           # React dashboard
│       ├── components/
│       ├── stores/    # Zustand state
│       └── lib/       # TRPC & WebSocket clients

FAQ

Q: Does PlayingPack modify my requests? A: No. Requests are forwarded to upstream unchanged. The only modification is adding proxy headers for debugging.

Q: Can I use this in production? A: PlayingPack is designed for development and testing. For production, point your agents directly at your LLM provider.

Q: How do I update cached responses when my prompts change? A: Delete the relevant files from .playingpack/cache/ and run your tests again. New responses will be cached automatically.

Q: Does it work with function calling / tool use? A: Yes. PlayingPack fully supports OpenAI's function calling and tool use.

Q: Can I share cached responses with my team? A: Yes. Commit your .playingpack/cache/ directory to version control. Everyone on the team gets the same deterministic behavior.

License

BUSL-1.1 (Business Source License)

The Licensed Work is provided for non-production use. For production use, please contact us for a commercial license.