provenby-proxy
v0.1.0
Published
Transparent LLM API proxy that captures skill metadata for ProvenBy
Maintainers
Readme
ProvenBy-proxy
Transparent LLM API proxy that captures skill metadata for ProvenBy. Change one env var, keep all your existing code.
Quick start
# Start the proxy
npx ProvenBy-proxy --port 4000 --candidate-id YOUR_ID --api-key sig_YOUR_KEY
# Point your SDK at it (one env var change)
export OPENAI_BASE_URL=http://localhost:4000/v1
# All existing code works unchanged
python -c "import openai; openai.chat.completions.create(model='gpt-4o', messages=[...])"How it works
- Your code sends API requests to
localhost:4000instead ofapi.openai.com - The proxy forwards the request to the real provider -- unchanged, with all your headers and auth
- The provider's response streams back to your code -- identical, transparent
- In the background, the proxy extracts skill patterns (languages, frameworks, domains) from the request
- Only anonymized metadata is sent to ProvenBy -- never raw text
Supported providers
Routes are auto-detected from request paths:
| Path | Provider |
|---|---|
| /v1/chat/completions | OpenAI (also xAI, DeepSeek, any OpenAI-compatible) |
| /v1/messages | Anthropic |
| /v1/chat/complete | Mistral |
Override with path prefixes: /openai/v1/..., /anthropic/v1/..., /deepseek/v1/...
Configuration
CLI flags
--port, -p Port to listen on (default: 4000)
--candidate-id, -c ProvenBy candidate ID
--api-key, -k ProvenBy API key
--server-url, -s ProvenBy server URL
--config Path to config file (default: ~/.provenby/proxy.json)
--debug Log extraction activity to stderrConfig file (~/.provenby/proxy.json)
{
"port": 4000,
"candidateId": "xxx",
"apiKey": "sig_xxx",
"serverUrl": "https://provenby.dev",
"providers": {
"openai": "https://api.openai.com",
"anthropic": "https://api.anthropic.com",
"xai": "https://api.x.ai",
"deepseek": "https://api.deepseek.com",
"mistral": "https://api.mistral.ai"
}
}Environment variables
PROVENBY_PROXY_PORT=4000
PROVENBY_CANDIDATE_ID=xxx
PROVENBY_API_KEY=sig_xxx
PROVENBY_SERVER_URL=https://provenby.devPriority: CLI flags > config file > env vars.
Privacy model
The proxy sees raw text (it has to, to forward it). But it:
- Extracts skills from the request messages locally
- Runs PII stripping on the extraction (not the forwarded request)
- Sends ONLY extraction metadata to ProvenBy (languages, frameworks, domains, complexity)
- Does NOT log, store, or cache the raw request/response body
- The forwarded request goes directly to the provider -- unchanged
What IS sent to ProvenBy: { model: "gpt-4o", languages: ["Python"], frameworks: ["FastAPI"], domain: "backend", complexity: "moderate" }
What is NOT sent: Your prompts, code, API keys, or any raw text.
For maximum privacy, use ProvenBy-sdk instead (extraction happens inside your process with no proxy).
Streaming
The proxy fully supports SSE streaming (stream: true). Streaming responses are piped directly from the provider to your code without buffering. Skill extraction runs on the request body only, so streaming adds zero latency.
Zero dependencies
Built entirely on Node.js built-in modules (http, https, url, path, fs, crypto). No node_modules needed.
Health check
curl http://localhost:4000/health
# {"status":"ok","version":"0.1.0"}