@jc4649/pify
v0.1.0
Published
Lightweight unix-style local proxy that hooks the calls between closed-source AI harnesses (Claude Code, Codex, Cursor) and their LLM providers.
Downloads
0
Maintainers
Readme
pify
A unix-style wire tap for closed-source AI harnesses.
pify is a tiny local proxy that sits on the one seam every harness exposes — the HTTPS call to its LLM provider — and hands that call to you. Inject a system prompt, reroute to a cheaper model, redact a tool result, retry an overloaded provider, mutate a tool call before it runs, or pipe the whole exchange to your own script in any language. Then forward it on.
Why it exists. Harnesses like Claude Code, Codex, and Cursor are closed — you can't load an extension into them. But you can point them at a different base URL. pify is that base URL. It gives you the customization an in-process plugin would, for the slice that crosses the wire: context, routing, resilience, tool I/O, and observability — without a CA, a certificate, or a single change to the harness beyond one env var.
What it is not. Not a framework, not a plugin host, not a transcoder. It's one small tool that does one thing — let your code see and reshape the request and the response — and composes with everything else (point it at LiteLLM if you need cross-provider translation). Two files, zero dependencies, ~750 lines.
The contract throughout: a little declarative sugar for the common cases, and an envelope hook (your script, or an HTTP service) as the escape hatch for everything else. If a hook breaks, pify forwards the call unchanged — fail-open, never a man-in-the-middle outage.
Install
npm i -g @jc4649/pify # or just: npx @jc4649/pifyZero runtime dependencies. Needs Node ≥ 18.
Use
One pify instance is one pipe: one port → one default target → one hook set.
pify --target https://api.anthropic.com --port 8787Point a harness at it (no harness change beyond one env var):
ANTHROPIC_BASE_URL=http://localhost:8787 # Claude Code
OPENAI_BASE_URL=http://localhost:8787 # Codex / Cursor / mostThe harness sends its own API key through pify; pify forwards it untouched.
Every request/response is logged to ~/.pify/logs/pify-YYYY-MM-DD.log.
Note: those logs are local and contain full prompt and response bodies — which may include source code, secrets, or API keys. They're never sent anywhere, but treat the log dir like any other sensitive local data.
Multiple harnesses? Run multiple instances — the port is the caller's identity. Want different hooks for Cursor vs Claude Code? Two instances, two configs, two ports.
pify --target https://api.anthropic.com --port 8787 --config work.json
pify --target http://localhost:11434 --port 8788 --config cheap.jsonConfigure
pify reads ./pify.json, then ~/.pify/pify.json, else built-in defaults.
{
"port": 8787,
"target": "https://api.anthropic.com", // default upstream; a hook can override per request
"protocol": "anthropic", // optional pin; else auto-detected
"logDir": "~/.pify/logs",
"rules": [
{ "match": { "client": "claude-code" },
"onRequest": [{ "injectSystem": "You write clean, typed code." }] },
{ "match": { "model": "gpt-4*" },
"onRequest": [{ "setTarget": "http://localhost:11434" }, { "setModel": "qwen2.5-coder:7b" }] }
]
}Match conditions
match is selectors only — which rule, not a content query. Content decisions go in an exec/http hook.
client—claude-code|cursor|codex| user-agent tokenprotocol—anthropic|openaimodel— glob, e.g.gpt-4*
The hooks
Each rule may carry any of these stages alongside its match:
Most seams take only an envelope hook (your code) — that's the deliberate unix-tool line: declarative sugar survives only where it earns its place (routing, denial, resilience, or where no subprocess can reach, like per-token stream edits). Content transforms are your script's job.
| Stage | Declarative sugar | Envelope hook fires on |
|---|---|---|
| onRequest | injectSystem, setModel, setTarget, block | the request, before forward |
| onToolResult | — | tool-result blocks in the request, before the model sees them |
| onResponse | — | full (buffered) response |
| onToolCall | — | tool-call blocks in the response, before the harness executes them (buffers the stream) |
| onStreamChunk | replace: {from,to} | per SSE event, streaming through (the one declarative content op — no hook can run per token) |
| onError | retry, fallback (URL), response | upstream connection fails |
| onUpstreamError | retryOn: [429,529], max, backoffMs, fallback (URL), response | upstream returns 4xx/5xx (honors Retry-After) |
| onComplete | — | async, after the exchange — non-blocking tap of {request, response, status, timingMs} |
setTarget is code-free routing; onError.fallback is a URL. Example:
{ "match": { "model": "gpt-4*" },
"onRequest": [{ "setTarget": "http://localhost:11434" }, { "setModel": "qwen2.5-coder:7b" }],
"onError": { "retry": 1, "fallback": "https://api.openai.com" } }Your own code — the envelope hook
When config can't express it, hand the whole request/response to your code and take it back.
pify pipes the envelope { target, method, url, headers, body } (request) or
{ status, headers, body } (response); you return the modified one. Two transports, same contract:
| Op | Transport | Use |
|---|---|---|
| { "exec": "./hook.py" } | subprocess (spawn per call) | local, zero-setup, any language |
| { "http": "http://localhost:9000/hook" } | POST to a long-lived service | warm state, shared, remote, real streaming |
Every seam takes an envelope hook:
onRequest— envelope{ target, method, url, headers, body }; return the modified one (or block).onResponse— envelope{ status, headers, body }; return the modified one (bufferStream: truefor SSE).onError— envelope{ error, attempt, target, request }when the upstream connection fails; return{ retry: "url" }to re-forward or{ status, headers, body }to synthesize a reply. Theretry/fallback/responseops are the no-code sugar on this seam.onUpstreamError— same envelope plusstatuswhen the upstream returns a 4xx/5xx (the connection was fine).retryOn/backoffMs/fallbackare the sugar; the hook can{ retry }or synthesize{ status, headers, body }. HonorsRetry-Afterautomatically.onToolCall— a normalized view{ protocol, calls: [{ name, input }] }of the tool calls in the response; return modifiedcalls(pify writes them back into the anthropic/openai shape). Rewrite abashcommand, normalize a path, neuter a destructive call before the harness runs it.onToolResult— a normalized view{ protocol, results: [{ id, content }] }of the tool results in the request; return modifiedresults. Trim bulky output, redact secrets, enrich errors before the model sees them.onComplete— fired async after the exchange with{ client, protocol, status, timingMs, request, response }; return value ignored. A non-blocking tap for telemetry/cost/capture — never adds latency.
{ "match": {},
"onRequest": [{ "exec": "./examples/cheap-route.py" }], // route/rewrite/block in your code
"onResponse": [{ "http": "http://localhost:9000/redact", "bufferStream": true }] }Contract: exit ≠0 / non-2xx = block (stderr or body is the message); empty / timeout / bad JSON =
fail open (forward unchanged). Routing is just envelope.target in the returned object.
bufferStream: true runs a response hook over a streamed reply by buffering it (trades streaming
for full rewrite).
Examples
Runnable configs and hook scripts live in examples/ (with a
--selftest on each hook and a full e2e). Two categories:
Common use cases
| Want to… | Hook | How |
|---|---|---|
| Inject a house style / system prompt | onRequest | { "injectSystem": "You write clean, typed code." } |
| Route a model to a cheaper / local provider | onRequest | { "setTarget": "http://localhost:11434" }, { "setModel": "qwen2.5-coder:7b" } |
| Route by content (your code decides) | onRequest | { "exec": "./examples/cheap-route.py" } |
| Redact secrets from responses | onResponse | { "exec": "node examples/redact-secrets.js" } |
| Retry an overloaded provider (429/529) | onUpstreamError | { "retryOn": [429,529], "max": 4, "backoffMs": 500, "fallback": "https://api.openai.com" } |
| Tap every exchange for cost/telemetry | onComplete | { "http": "http://localhost:9000/usage" } |
Extension use cases — port a pi extension onto a closed harness
A pi/pich extension whose job is a pure transform of the request or response
ports onto a closed-source harness through pify, with zero harness changes. The
worked example ports @jc4649/pi-context-collapse
from pich — it trims bulky tool output before
each LLM call — from an in-process pi.on("context") hook to a pify onToolResult
hook:
node bin/pify.js --config examples/context-collapse.json # run it
node examples/test-context-collapse.js # e2e proof through pifyContext engineering (collapse, memory injection, usage telemetry) ports this way;
extensions that register local tools, drive the agent loop, or paint UI do not —
those live inside the harness. See examples/README.md for
the full mapping.
Scope
Same-provider proxy + logging + the hooks above + envelope hooks (exec/http) ship today.
Cross-provider transcoding (Anthropic ⇄ OpenAI wire formats) and a TLS-MITM tier for
harnesses without a base-URL knob are designed but not built — see llm-hook-spec.md.
Develop
node test.js # pure-logic self-check (no network)