kontex-proxy

v0.1.1

Published

2 months ago

Local HTTP proxy + dashboard for AI agent developers. Intercept, inspect, replay, and fork every LLM call — no cloud required.

Downloads

0High
0Medium
0Low

pankaj.agrawalla

llm ai-agents openai anthropic proxy developer-tools debugging context-window replay multi-agent typescript sqlite

Kontex CLI

Local HTTP proxy + dashboard for AI agent developers.
One command intercepts every LLM API call, saves a full snapshot locally, and opens a "Control Room" dashboard — no cloud, no config, no data leaves your machine.

Dashboard

Why Kontex?

When you're building AI agents, you need to answer questions like:

Which LLM call caused the bad output?
What was the exact context when the agent went off-track?
Can I replay this run with a different response at step 3?

Kontex intercepts every call at the proxy layer, so you get full observability with zero changes to your agent code — just point your base URL at localhost:8080.

What it does

Your agent  →  localhost:8080  →  OpenAI / Anthropic / Ollama / any LLM API
                    │
                    ├── Saves raw prompt + response to .kontex.db (SQLite)
                    ├── Optionally trims context (lossless, toggleable)
                    └── Serves dashboard at GET /

Key features

| Feature | Description | |---|---| | Proxy | Intercepts every POST /* call and forwards to your upstream LLM | | Snapshots | Saves the full untrimmed prompt and response to SQLite — nothing is lost | | Context trimmer | Structurally lossless trimming applied before the upstream call — toggleable from the dashboard | | Session grouping | Groups related agent runs into sessions via a request header | | Multi-agent graph | Swim-lane view showing every agent's trajectory and cross-agent links | | Live pause | Pause a request mid-flight, inspect it, then resume with edited messages | | Fork & replay | Branch from any snapshot with a human-edited response; downstream calls replay deterministically | | Branch chain | Create a new agent task from any snapshot, staying in the same session |

Requirements

Node.js 18+
npm 9+

Installation

Option A — global install (recommended)

npm install -g kontex-proxy
kontex start

Option B — clone and build

git clone https://github.com/pankaj-agrawalla/kontex-cli.git
cd kontex-cli
npm install
cd web && npm install && cd ..
npm run build

Configuration

Copy .env.example and edit as needed:

cp .env.example .env

# .env
KONTEX_PORT=8080           # Port for the proxy + dashboard (default: 8080)
UPSTREAM_URL=https://api.openai.com   # LLM API to forward requests to

To use with Ollama locally:

UPSTREAM_URL=http://localhost:11434

To use with Anthropic:

UPSTREAM_URL=https://api.anthropic.com

Usage

Start the server

kontex start

The browser opens automatically at http://localhost:8080.

Or with a custom port:

kontex start --port 9000

Point your agents at Kontex

Change your agent's base URL from the LLM provider to the Kontex proxy:

http://localhost:8080

No other code changes are required. All requests are transparently proxied.

Example — OpenAI SDK:

import OpenAI from "openai"

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "http://localhost:8080/v1",   // ← point at Kontex
})

Example — LangChain:

import { ChatOpenAI } from "@langchain/openai"

const llm = new ChatOpenAI({
  openAIApiKey: process.env.OPENAI_API_KEY,
  configuration: {
    baseURL: "http://localhost:8080/v1",  // ← point at Kontex
  },
})

Example — raw fetch:

await fetch("http://localhost:8080/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json", "Authorization": `Bearer ${apiKey}` },
  body: JSON.stringify({ model: "gpt-4o", messages }),
})

Optional request headers

These headers unlock richer dashboard views. They are stripped before forwarding upstream — your LLM never sees them.

| Header | Purpose | |---|---| | X-Kontex-Task-Id | Groups snapshots into a named agent task (swim lane in the graph). Defaults to "default" if omitted. | | X-Kontex-Session-Id | Groups all tasks from one run into a single session entry in the sidebar. | | X-Kontex-Parent-Task-Id | Records a cross-agent link (draws an amber dashed edge). Send on the first turn only of a child agent. | | X-Kontex-Fork-Id | Enables deterministic replay. Set to the task ID you forked from. |

Without any headers, everything still works — all snapshots land under the "default" task and appear in the dashboard.

With headers (recommended for multi-agent workflows):

const headers = {
  "X-Kontex-Task-Id": "planner-agent",
  "X-Kontex-Session-Id": "run-2024-001",
  // first turn of a child agent only:
  "X-Kontex-Parent-Task-Id": "planner-agent",
}

The Dashboard

Open http://localhost:8080 in your browser.

Sidebar (left)

Lists all sessions ordered newest-first
Each entry shows the session ID, timestamp, agent count, and snapshot count
Click a session to load its graph
Context trimmer toggle at the bottom — turn trimming on or off in real time

Graph (center)

One swim-lane column per agent task
Nodes = individual LLM calls (snapshots)
Gray edges = within the same agent
Amber dashed animated edges = cross-agent links (parent → child)
Amber-bordered nodes = human-edited snapshots
Click any node to open the snapshot drawer

Snapshot drawer (right)

Opens when you click a node. Shows:

The full conversation messages sent to the LLM
Live Pause — pauses the next request from this task mid-flight so you can inspect and edit messages before they reach the LLM
Fork & Edit — save a human-edited version of the messages; the next replay of this prompt hash will return your edited version instead of calling the LLM
Branch chain here — create a new agent task (in the same session) branching from this point, with an editable LLM response

Context trimmer

The trimmer applies three structurally lossless passes before forwarding to the upstream LLM:

Tool result truncation — long tool/function responses are sliced to prevent runaway context growth
Middle-turn compression — older assistant turns in the middle of a long conversation are shortened
System prompt deduplication — repeated system content across turns is reduced

The raw untrimmed payload is always saved to the database — trimming only affects what is forwarded upstream.

Toggle it on/off live from the sidebar without restarting the server.

Multi-agent workflow example

const SESSION_ID = `run-${Date.now()}`

// Agent 1 — Planner
const plannerResponse = await fetch("http://localhost:8080/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${apiKey}`,
    "X-Kontex-Task-Id": "planner",
    "X-Kontex-Session-Id": SESSION_ID,
  },
  body: JSON.stringify({ model: "gpt-4o", messages: plannerMessages }),
})

// Agent 2 — Coder (links back to planner)
const coderResponse = await fetch("http://localhost:8080/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${apiKey}`,
    "X-Kontex-Task-Id": "coder",
    "X-Kontex-Session-Id": SESSION_ID,
    "X-Kontex-Parent-Task-Id": "planner",   // ← first turn only
  },
  body: JSON.stringify({ model: "gpt-4o", messages: coderMessages }),
})

This produces a dashboard with two swim lanes and an amber edge from Planner → Coder, grouped under one session.

Database

All data is stored in .kontex.db (SQLite) in the project root. The file is created automatically on first run.

To start completely fresh:

rm .kontex.db
kontex start

Schema

CREATE TABLE Snapshots (
  id                 TEXT PRIMARY KEY,   -- cuid
  task_id            TEXT NOT NULL,      -- from X-Kontex-Task-Id header
  parent_id          TEXT,               -- previous snapshot in the same task
  parent_task_id     TEXT,               -- from X-Kontex-Parent-Task-Id header
  session_id         TEXT,               -- from X-Kontex-Session-Id header
  prompt_hash        TEXT NOT NULL,      -- MD5 of messages array (for replay lookup)
  raw_prompt_payload TEXT NOT NULL,      -- original untrimmed JSON body
  llm_response       TEXT,              -- raw response from upstream
  is_human_edited    INTEGER DEFAULT 0, -- 1 if created via fork
  created_at         INTEGER NOT NULL   -- Unix ms
);

Internal API

These endpoints power the dashboard. You can also call them directly.

| Method | Path | Description | |---|---|---| | GET | /health | Health check | | GET | /api/sessions | List all sessions | | GET | /api/tasks | List all task IDs | | GET | /api/graph?session=<id> | Combined graph (nodes + edges) for a session | | GET | /api/tasks/:id/graph | Graph for a single task | | GET | /api/snapshots/:id | Full snapshot detail | | POST | /api/snapshots/:id/pause | Pause the next request on this snapshot | | POST | /api/snapshots/:id/resolve | Resume a paused request with edited messages | | POST | /api/snapshots/:id/fork | Create a human-edited snapshot (same task) | | POST | /api/snapshots/:id/fork-chain | Create a new task branching from this snapshot | | GET | /api/trimmer | Get trimmer state { enabled: boolean } | | POST | /api/trimmer/toggle | Toggle trimmer on/off |

Development

Run the backend and frontend separately with hot reload:

# Terminal 1 — backend
npm run dev

# Terminal 2 — frontend
cd web && npm run dev

The Vite dev server runs on port 5173 and proxies /api to localhost:8080.

E2E test

Requires Ollama running locally with llama3.2:1b:

ollama pull llama3.2:1b
npm run build
npm run e2e

Simulates a 3-agent pipeline (Planner → Coder → Reviewer), verifies snapshots, cross-agent edges, session grouping, fork/replay, and edge cases. Exits 0 on full pass.

Contributing

Issues and PRs are welcome. Please open an issue first for significant changes.

License

MIT — see LICENSE.