stitcher-proxy
v1.0.0
Published
Universal infinite-memory proxy for LLMs. LLMs have amnesia. Stitcher is the cure.
Maintainers
Readme
🧵 Stitcher Proxy
LLMs have amnesia. Stitcher is the cure.
A transparent proxy that gives any LLM infinite memory. Zero dependencies. One command.
Install
npm install -g stitcher-proxyOr run without installing:
npx stitcher-proxyOr one-liner:
curl -fsSL https://raw.githubusercontent.com/Djsand/stitcher-proxy/main/install.sh | bashHow It Works
You point your LLM client at Stitcher instead of OpenAI/Anthropic directly. Stitcher intercepts every request, stitches in the full conversation history from local storage, and forwards it upstream. Your LLM gets maximum context every time. Transparently.
┌─────────┐ POST /v1/chat/completions ┌──────────────────┐
│ Your │ ─────────────────────────────────▶ │ Stitcher Proxy │
│ App │ (only new messages) │ │
└─────────┘ │ 1. Save to JSONL│
▲ │ 2. Stitch history│
│ │ 3. Dedup │
│ Response (unchanged) │ 4. Token budget │
└─────────────────────────────────────── ◀ │ 5. Forward │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ OpenAI/Anthropic│
│ (full context) │
└──────────────────┘Quick Start
# Setup (pick provider, set API key, configure token budget)
stitcher-proxy init
# Start the proxy
stitcher-proxy
# Auto-configure Claude Code, Codex, and all OpenAI clients
stitcher-proxy integrate allUsage
Change your base_url. That's it.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8081/v1",
api_key="your-real-key",
default_headers={"X-Stitcher-Session": "user-123"}
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What did we talk about yesterday?"}]
)
# Stitcher injected the full history. The model remembers.curl http://localhost:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "X-Stitcher-Session: my-session" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Continue where we left off."}]}'Works With
Claude Code · Codex · Cursor · OpenClaw · LangChain · Vercel AI SDK · Ollama · vLLM · any OpenAI-compatible client
stitcher-proxy integrate all # Configure everything
stitcher-proxy integrate claude-code # Just Claude Code
stitcher-proxy integrate codex # Just CodexCLI Reference
stitcher-proxy Start the proxy
stitcher-proxy init Interactive setup wizard
stitcher-proxy start [--port N] Start with options
stitcher-proxy status Config + session count
stitcher-proxy sessions List sessions
stitcher-proxy sessions purge <name> Delete a session
stitcher-proxy config Show all settings
stitcher-proxy config edit Interactive config editor
stitcher-proxy config set <key> <val> Quick set a value
stitcher-proxy integrate [target] Auto-configure integrationsConfiguration
All settings are tunable via stitcher-proxy config edit:
| Setting | Default | Description |
|---------|---------|-------------|
| port | 8081 | Proxy port |
| upstream_url | https://api.openai.com | Upstream LLM API |
| max_tokens | 128000 | Token budget for stitched context |
| dedup_threshold | 0.6 | Similarity cutoff for dedup (0-1) |
| condense_threshold | 0.35 | Similarity cutoff for condensing |
| chars_per_token | 4 | Token estimation ratio |
| roll_size_bytes | 5242880 | File roll threshold (5MB) |
Config priority: CLI flags → env vars → ~/.stitcher/config.json → defaults
Under The Hood
Stitcher stores every message as a line in JSONL files. When context is needed:
- Read backward through archived files (newest → oldest)
- Deduplicate near-identical assistant messages via trigram similarity
- Budget — stop when token limit is reached
- Condense — replace older similar messages with placeholders
- Reverse — restore chronological order
- Forward — send the full context to upstream
Sessions auto-roll to numbered archives when files exceed the configured size.
License
MIT
