provenby-proxy

v0.1.0

Published

14 days ago

Transparent LLM API proxy that captures skill metadata for ProvenBy

0High
0Medium
0Low

inferlane

provenby llm proxy openai anthropic skills

ProvenBy-proxy

Transparent LLM API proxy that captures skill metadata for ProvenBy. Change one env var, keep all your existing code.

Quick start

# Start the proxy
npx ProvenBy-proxy --port 4000 --candidate-id YOUR_ID --api-key sig_YOUR_KEY

# Point your SDK at it (one env var change)
export OPENAI_BASE_URL=http://localhost:4000/v1

# All existing code works unchanged
python -c "import openai; openai.chat.completions.create(model='gpt-4o', messages=[...])"

How it works

Your code sends API requests to localhost:4000 instead of api.openai.com
The proxy forwards the request to the real provider -- unchanged, with all your headers and auth
The provider's response streams back to your code -- identical, transparent
In the background, the proxy extracts skill patterns (languages, frameworks, domains) from the request
Only anonymized metadata is sent to ProvenBy -- never raw text

Supported providers

Routes are auto-detected from request paths:

| Path | Provider | |---|---| | /v1/chat/completions | OpenAI (also xAI, DeepSeek, any OpenAI-compatible) | | /v1/messages | Anthropic | | /v1/chat/complete | Mistral |

Override with path prefixes: /openai/v1/..., /anthropic/v1/..., /deepseek/v1/...

Configuration

CLI flags

--port, -p          Port to listen on (default: 4000)
--candidate-id, -c  ProvenBy candidate ID
--api-key, -k       ProvenBy API key
--server-url, -s    ProvenBy server URL
--config            Path to config file (default: ~/.provenby/proxy.json)
--debug             Log extraction activity to stderr

Config file (`~/.provenby/proxy.json`)

{
  "port": 4000,
  "candidateId": "xxx",
  "apiKey": "sig_xxx",
  "serverUrl": "https://provenby.dev",
  "providers": {
    "openai": "https://api.openai.com",
    "anthropic": "https://api.anthropic.com",
    "xai": "https://api.x.ai",
    "deepseek": "https://api.deepseek.com",
    "mistral": "https://api.mistral.ai"
  }
}

Environment variables

PROVENBY_PROXY_PORT=4000
PROVENBY_CANDIDATE_ID=xxx
PROVENBY_API_KEY=sig_xxx
PROVENBY_SERVER_URL=https://provenby.dev

Priority: CLI flags > config file > env vars.

Privacy model

The proxy sees raw text (it has to, to forward it). But it:

Extracts skills from the request messages locally
Runs PII stripping on the extraction (not the forwarded request)
Sends ONLY extraction metadata to ProvenBy (languages, frameworks, domains, complexity)
Does NOT log, store, or cache the raw request/response body
The forwarded request goes directly to the provider -- unchanged

What IS sent to ProvenBy: { model: "gpt-4o", languages: ["Python"], frameworks: ["FastAPI"], domain: "backend", complexity: "moderate" }

What is NOT sent: Your prompts, code, API keys, or any raw text.

For maximum privacy, use ProvenBy-sdk instead (extraction happens inside your process with no proxy).

Streaming

The proxy fully supports SSE streaming (stream: true). Streaming responses are piped directly from the provider to your code without buffering. Skill extraction runs on the request body only, so streaming adds zero latency.

Zero dependencies

Built entirely on Node.js built-in modules (http, https, url, path, fs, crypto). No node_modules needed.

Health check

curl http://localhost:4000/health
# {"status":"ok","version":"0.1.0"}

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme