@jc4649/pify

v0.1.0

Published

2 days ago

Lightweight unix-style local proxy that hooks the calls between closed-source AI harnesses (Claude Code, Codex, Cursor) and their LLM providers.

Downloads

0High
0Medium
0Low

jimmyc4649

llm proxy claude-code codex cursor anthropic openai hooks agent harness

pify

A unix-style wire tap for closed-source AI harnesses.

pify is a tiny local proxy that sits on the one seam every harness exposes — the HTTPS call to its LLM provider — and hands that call to you. Inject a system prompt, reroute to a cheaper model, redact a tool result, retry an overloaded provider, mutate a tool call before it runs, or pipe the whole exchange to your own script in any language. Then forward it on.

Why it exists. Harnesses like Claude Code, Codex, and Cursor are closed — you can't load an extension into them. But you can point them at a different base URL. pify is that base URL. It gives you the customization an in-process plugin would, for the slice that crosses the wire: context, routing, resilience, tool I/O, and observability — without a CA, a certificate, or a single change to the harness beyond one env var.

What it is not. Not a framework, not a plugin host, not a transcoder. It's one small tool that does one thing — let your code see and reshape the request and the response — and composes with everything else (point it at LiteLLM if you need cross-provider translation). Two files, zero dependencies, ~750 lines.

The contract throughout: a little declarative sugar for the common cases, and an envelope hook (your script, or an HTTP service) as the escape hatch for everything else. If a hook breaks, pify forwards the call unchanged — fail-open, never a man-in-the-middle outage.

Install

npm i -g @jc4649/pify     # or just: npx @jc4649/pify

Zero runtime dependencies. Needs Node ≥ 18.

Use

One pify instance is one pipe: one port → one default target → one hook set.

pify --target https://api.anthropic.com --port 8787

Point a harness at it (no harness change beyond one env var):

ANTHROPIC_BASE_URL=http://localhost:8787   # Claude Code
OPENAI_BASE_URL=http://localhost:8787      # Codex / Cursor / most

The harness sends its own API key through pify; pify forwards it untouched. Every request/response is logged to ~/.pify/logs/pify-YYYY-MM-DD.log.

Note: those logs are local and contain full prompt and response bodies — which may include source code, secrets, or API keys. They're never sent anywhere, but treat the log dir like any other sensitive local data.

Multiple harnesses? Run multiple instances — the port is the caller's identity. Want different hooks for Cursor vs Claude Code? Two instances, two configs, two ports.

pify --target https://api.anthropic.com --port 8787 --config work.json
pify --target http://localhost:11434     --port 8788 --config cheap.json

Configure

pify reads ./pify.json, then ~/.pify/pify.json, else built-in defaults.

{
  "port": 8787,
  "target": "https://api.anthropic.com",  // default upstream; a hook can override per request
  "protocol": "anthropic",                 // optional pin; else auto-detected
  "logDir": "~/.pify/logs",
  "rules": [
    { "match": { "client": "claude-code" },
      "onRequest": [{ "injectSystem": "You write clean, typed code." }] },

    { "match": { "model": "gpt-4*" },
      "onRequest": [{ "setTarget": "http://localhost:11434" }, { "setModel": "qwen2.5-coder:7b" }] }
  ]
}

Match conditions

match is selectors only — which rule, not a content query. Content decisions go in an exec/http hook.

client — claude-code | cursor | codex | user-agent token
protocol — anthropic | openai
model — glob, e.g. gpt-4*

The hooks

Each rule may carry any of these stages alongside its match:

Most seams take only an envelope hook (your code) — that's the deliberate unix-tool line: declarative sugar survives only where it earns its place (routing, denial, resilience, or where no subprocess can reach, like per-token stream edits). Content transforms are your script's job.

| Stage | Declarative sugar | Envelope hook fires on | |---|---|---| | onRequest | injectSystem, setModel, setTarget, block | the request, before forward | | onToolResult | — | tool-result blocks in the request, before the model sees them | | onResponse | — | full (buffered) response | | onToolCall | — | tool-call blocks in the response, before the harness executes them (buffers the stream) | | onStreamChunk | replace: {from,to} | per SSE event, streaming through (the one declarative content op — no hook can run per token) | | onError | retry, fallback (URL), response | upstream connection fails | | onUpstreamError | retryOn: [429,529], max, backoffMs, fallback (URL), response | upstream returns 4xx/5xx (honors Retry-After) | | onComplete | — | async, after the exchange — non-blocking tap of {request, response, status, timingMs} |

setTarget is code-free routing; onError.fallback is a URL. Example:

{ "match": { "model": "gpt-4*" },
  "onRequest": [{ "setTarget": "http://localhost:11434" }, { "setModel": "qwen2.5-coder:7b" }],
  "onError": { "retry": 1, "fallback": "https://api.openai.com" } }

Your own code — the envelope hook

When config can't express it, hand the whole request/response to your code and take it back. pify pipes the envelope { target, method, url, headers, body } (request) or { status, headers, body } (response); you return the modified one. Two transports, same contract:

| Op | Transport | Use | |---|---|---| | { "exec": "./hook.py" } | subprocess (spawn per call) | local, zero-setup, any language | | { "http": "http://localhost:9000/hook" } | POST to a long-lived service | warm state, shared, remote, real streaming |

Every seam takes an envelope hook:

onRequest — envelope { target, method, url, headers, body }; return the modified one (or block).
onResponse — envelope { status, headers, body }; return the modified one (bufferStream: true for SSE).
onError — envelope { error, attempt, target, request } when the upstream connection fails; return { retry: "url" } to re-forward or { status, headers, body } to synthesize a reply. The retry/fallback/response ops are the no-code sugar on this seam.
onUpstreamError — same envelope plus status when the upstream returns a 4xx/5xx (the connection was fine). retryOn/backoffMs/fallback are the sugar; the hook can { retry } or synthesize { status, headers, body }. Honors Retry-After automatically.
onToolCall — a normalized view { protocol, calls: [{ name, input }] } of the tool calls in the response; return modified calls (pify writes them back into the anthropic/openai shape). Rewrite a bash command, normalize a path, neuter a destructive call before the harness runs it.
onToolResult — a normalized view { protocol, results: [{ id, content }] } of the tool results in the request; return modified results. Trim bulky output, redact secrets, enrich errors before the model sees them.
onComplete — fired async after the exchange with { client, protocol, status, timingMs, request, response }; return value ignored. A non-blocking tap for telemetry/cost/capture — never adds latency.

{ "match": {},
  "onRequest":  [{ "exec": "./examples/cheap-route.py" }],   // route/rewrite/block in your code
  "onResponse": [{ "http": "http://localhost:9000/redact", "bufferStream": true }] }

Contract: exit ≠0 / non-2xx = block (stderr or body is the message); empty / timeout / bad JSON = fail open (forward unchanged). Routing is just envelope.target in the returned object. bufferStream: true runs a response hook over a streamed reply by buffering it (trades streaming for full rewrite).

Examples

Runnable configs and hook scripts live in examples/ (with a --selftest on each hook and a full e2e). Two categories:

Common use cases

| Want to… | Hook | How | |---|---|---| | Inject a house style / system prompt | onRequest | { "injectSystem": "You write clean, typed code." } | | Route a model to a cheaper / local provider | onRequest | { "setTarget": "http://localhost:11434" }, { "setModel": "qwen2.5-coder:7b" } | | Route by content (your code decides) | onRequest | { "exec": "./examples/cheap-route.py" } | | Redact secrets from responses | onResponse | { "exec": "node examples/redact-secrets.js" } | | Retry an overloaded provider (429/529) | onUpstreamError | { "retryOn": [429,529], "max": 4, "backoffMs": 500, "fallback": "https://api.openai.com" } | | Tap every exchange for cost/telemetry | onComplete | { "http": "http://localhost:9000/usage" } |

Extension use cases — port a pi extension onto a closed harness

A pi/pich extension whose job is a pure transform of the request or response ports onto a closed-source harness through pify, with zero harness changes. The worked example ports @jc4649/pi-context-collapse from pich — it trims bulky tool output before each LLM call — from an in-process pi.on("context") hook to a pify onToolResult hook:

node bin/pify.js --config examples/context-collapse.json     # run it
node examples/test-context-collapse.js                       # e2e proof through pify

Context engineering (collapse, memory injection, usage telemetry) ports this way; extensions that register local tools, drive the agent loop, or paint UI do not — those live inside the harness. See examples/README.md for the full mapping.

Scope

Same-provider proxy + logging + the hooks above + envelope hooks (exec/http) ship today. Cross-provider transcoding (Anthropic ⇄ OpenAI wire formats) and a TLS-MITM tier for harnesses without a base-URL knob are designed but not built — see llm-hook-spec.md.

Develop

node test.js     # pure-logic self-check (no network)