@evolvconsulting/cc-sf

v0.3.4

Published

2 months ago

Local proxy that lets Claude Code use Claude models hosted in Snowflake Cortex.

0High
0Medium
0Low

jeremy-newhouse

claude-code snowflake cortex anthropic proxy

cc-sf

Local proxy that lets Claude Code use Claude models hosted in Snowflake Cortex, authed with your existing key‑pair from your Snowflake connections.toml.

Cortex exposes an Anthropic‑Messages‑API‑compatible endpoint. Claude Code already honors ANTHROPIC_BASE_URL. cc-sf fills three gaps Claude Code can't handle on its own: minting & rotating the Snowflake RS256 keypair JWT, sending the X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT header, and remapping model IDs (claude-opus-4-7, dated suffixes) to what Cortex currently serves (claude-opus-4-6, claude-haiku-4-5, …).

Install

npm i -g @evolvconsulting/cc-sf

Requires Node ≥ 20 and the claude CLI on PATH.

Use

cc-sf                         # start bridge, drop into an interactive `claude` session
cc-sf --coco                  # same, but route traffic through Cortex Code's agent:run
                              # endpoint so usage bills as `cortex_code_cli`
cc-sf --override              # same, but force opus/sonnet/haiku picks to latest
                              # live Cortex version for this session; revert on exit
cc-sf --bridge-only           # run proxy alone, no claude (for debugging or manual curl)
cc-sf --list-models           # print models available to your account, then exit
cc-sf --refresh-models        # re-probe Cortex and refresh the model cache
cc-sf --jwt                   # mint a JWT and print to stdout
cc-sf --decode-jwt            # inspect header + payload of a minted JWT
cc-sf --help

Picking a model

cc-sf does not have its own --model flag. Anything after -- is forwarded straight to the claude binary, so use claude's mechanisms — the --model flag at launch, the ANTHROPIC_MODEL env var, or /model mid-session.

cc-sf                                          # interactive, claude's default
cc-sf -- --model claude-opus-4-7               # interactive, opus 4.7
cc-sf -- --model claude-4-sonnet               # Cortex-only ID (older family,
                                               # not in claude's built-in picker)
cc-sf -- -p "summarize README.md"              # one-shot print mode (claude's -p)
cc-sf -- -p "ping" --model claude-haiku-4-5    # one-shot with explicit model
ANTHROPIC_MODEL=claude-opus-4-7 cc-sf          # set the default for the session
cc-sf --override                               # any opus/sonnet/haiku pick
                                               # auto-resolves to latest live

Inside an interactive session, /model opens claude's picker. The picker only lists Claude Code's built-in aliases (Opus / Sonnet / Haiku); for Cortex-only IDs like claude-4-sonnet or claude-3-7-sonnet, use --model or ANTHROPIC_MODEL. Run cc-sf --list-models to see what your account can serve.

Passing other claude flags

Any claude flag works after --. cc-sf only consumes its own options; everything past -- is argv for the spawned claude process. Common examples:

cc-sf -- --dangerously-skip-permissions          # skip permission prompts
cc-sf -- --continue                              # continue the most recent session
cc-sf -- --resume                                # show the resume picker
cc-sf -- --debug                                 # claude debug logging
cc-sf -- --add-dir /path/to/extra/dir            # extend the working dir set
cc-sf -- --chrome                                # any flag your claude build supports
cc-sf -- --dangerously-skip-permissions --model claude-opus-4-7 -p "do the thing"

The -- is only required when you're passing flags claude understands but cc-sf does not — cc-sf parses its own flags first and stops at the first unknown token, so cc-sf -p "x" works too. When in doubt, use --; it's never wrong.

To combine cc-sf flags with claude flags:

cc-sf --override -- --dangerously-skip-permissions --model claude-opus-4-7

Prereqs

A Snowflake connections.toml. cc-sf discovers the file using the same precedence Snowflake CLI / the Python connector use — --config > $SF_CONNECTIONS_FILE (cc-sf override) > $SNOWFLAKE_HOME/connections.toml > ~/.snowflake/connections.toml > OS default (Windows %USERPROFILE%\AppData\Local\snowflake\, macOS ~/Library/Application Support/snowflake/, Linux $XDG_CONFIG_HOME/snowflake/). default_connection_name is also read from a sibling config.toml when it's not set inline. cc-sf supports two authenticator values:

SNOWFLAKE_JWT (default) — cc-sf mints and auto-rotates a keypair JWT:

default_connection_name = "ennovate"

[ennovate]
account = "XLB91549"
user = "[email protected]"
authenticator = "SNOWFLAKE_JWT"
private_key_file = "/Users/you/.snowflake/rsa_key.p8"
role = "SYSADMIN"

(private_key_path is also accepted as an alias — the two field names are interchangeable.)

Plus the matching public key uploaded to your Snowflake user (ALTER USER … SET RSA_PUBLIC_KEY='…'). SNOWFLAKE.CORTEX_USER is granted to PUBLIC by default, so most roles inherit it.

OAUTH — bring an external-OAuth bearer token yourself:

[ennovate-oauth]
account = "XLB91549"
authenticator = "OAUTH"
token_file_path = "/Users/you/.snowflake/oauth-access-token"
role = "SYSADMIN"   # optional

Your IdP tooling is responsible for keeping the token file current. cc-sf reads the file lazily, caches it in memory, and re-reads it on HTTP 401 — so after your refresh script writes a new token to the same path, the bridge picks it up automatically on the next request. cc-sf does not itself speak the OAuth refresh protocol.

PAT (PROGRAMMATIC_ACCESS_TOKEN) is not supported yet; it requires a per-user network policy (an account-admin operation) and has not been tested end-to-end against Cortex.

Flags & environment

| Flag | Env | Default | Purpose | | --- | --- | --- | --- | | --config <path> | SF_CONNECTIONS_FILE | Snowflake discovery ladder (see Prereqs) | TOML file to read | | — | SNOWFLAKE_HOME | — | Snowflake-standard override for the config directory; read as $SNOWFLAKE_HOME/connections.toml | | --connection <name> | SF_CONNECTION | default_connection_name in connections.toml, then sibling config.toml | Which block to use | | — | SF_BRIDGE_PORT | 8787 | Local bind port | | — | SNOWFLAKE_PRIVATE_KEY_PASSPHRASE | — | Passphrase for encrypted .p8 keys | | — | ANTHROPIC_MODEL | claude-sonnet-4-6 | Model Claude Code targets | | — | ANTHROPIC_SMALL_FAST_MODEL | claude-haiku-4-5 | Small/fast model |

Nothing is hardcoded: account, user, key path, role, and JWT fingerprint all derive from the selected connection at runtime. If the file or a named connection is missing, cc-sf prints a clear error (including available connection names when the named one isn't found).

Platform support

macOS, Linux, and Windows. TOML discovery follows the Snowflake CLI / Python connector ladder on every OS, including %USERPROFILE%\AppData\Local\snowflake\connections.toml on Windows when ~/.snowflake/ isn't present. cc-sf spawns the claude shim with shell: true on Windows so that npm's claude.cmd resolves correctly.

How it works

cc-sf loads your connection via smol-toml and builds an authenticator for it. For SNOWFLAKE_JWT it mints an RS256 JWT (cached with 55‑min TTL, refreshed 5 min before expiry). For OAUTH it reads token_file_path on demand and caches the bytes. Then it stands up a Hono HTTP server on 127.0.0.1:${SF_BRIDGE_PORT:-8787}.
POST /v1/messages and /v1/messages/count_tokens are forwarded to https://<account>.snowflakecomputing.com/api/v2/cortex/v1/messages… with Authorization: Bearer <token>. JWT connections also attach X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT; OAuth connections omit that header so Cortex auto-detects the token type.
On startup the bridge probes a candidate list (claude-opus-4-7, -4-6, -4-5, -sonnet-*, -haiku-4-5, -3-7-sonnet, …) with max_tokens:1 requests in parallel. Results are cached at ~/.cache/cc-sf/models.json with a 24 h TTL; force with --refresh-models. (Cortex has no listing endpoint, so probing is the only reliable source of truth.)
model in the request body is remapped against the live set: passthrough if available, else strip [1m] and dated -YYYYMMDD suffix, else downshift minor/major versions until a match (e.g. claude-haiku-4-5-20251001 → claude-haiku-4-5; claude-sonnet-4-7 → claude-sonnet-4-6).
Response body (SSE or JSON) streams back untouched.
On upstream 401 the authenticator invalidates its credential cache and the request retries once. For JWT this triggers a remint; for OAuth it re-reads token_file_path so a freshly-refreshed token written by your IdP tooling is picked up automatically.

Then cc-sf sets ANTHROPIC_BASE_URL=http://127.0.0.1:<port>, unsets ANTHROPIC_API_KEY, and execs claude "$@". SIGINT/SIGTERM forward to the child.

Model availability

Availability is discovered at runtime per account; run cc-sf --list-models to see yours. (As of 2026‑04 a typical account sees claude-opus-4-7, claude-opus-4-6, claude-sonnet-4-6, claude-opus-4-5, claude-sonnet-4-5, claude-haiku-4-5, claude-4-sonnet, claude-3-7-sonnet — but this drifts, which is why the bridge probes rather than hardcoding.)

--coco (Cortex Code mode)

By default cc-sf proxies to Snowflake's public Anthropic-compatible endpoint (/api/v2/cortex/v1/messages). With --coco, it instead routes traffic through the undocumented /api/v2/cortex/agent:run endpoint that Snowflake's own cortex CLI uses, so usage is attributed to Cortex Code (cortex_code_cli) for billing.

cc-sf --coco                                   # interactive, coco mode
cc-sf --coco -- --model claude-opus-4-7        # pick a model, coco mode

What this changes under the hood:

Login via /session/v1/login-request with CLIENT_ENVIRONMENT.APPLICATION = cortex_code_cli and a matching QUERY_TAG, yielding a short-lived session token (re-login when it expires).
Upstream calls use Authorization: Snowflake Token="…", User-Agent: cortex_code_cli/1.0.0, body fields origin_application: coding_agent and experimental.CodingAgent.OriginApplication: snova.
Anthropic Messages request/SSE is translated in both directions (Anthropic ↔ Cortex agent:run). Claude Code sees a normal SSE stream; Cortex sees a native-looking agent:run call.

Model discovery (separate from the default /v1/messages catalog):

On startup, cc-sf --coco probes the full candidate list against agent:run and caches the result at ~/.cache/cc-sf/coco-models.json (24 h TTL, per-account). --refresh-models refreshes both caches.
agent:run's catalog is a subset of what /v1/messages serves — as of Apr 2026 it lacks -4-7 and Haiku. Any Claude Code request for a model outside the coco catalog is downshifted within the same family first (e.g. claude-opus-4-7 → claude-opus-4-6). If no in-family match exists (e.g. claude-haiku-4-5, claude-3-7-sonnet), the model is rewritten to "auto" so Cortex picks a served option instead of erroring.
If a request still gets rejected at runtime (e.g. catalog drift between probe and request), the bridge parses the Available models: ... list from the error, writes it to the cache, and auto-retries once with a freshly-remapped model. The user sees one coherent response.

Requirements & caveats:

SNOWFLAKE_JWT connections only (OAUTH is rejected — session login needs the keypair).
agent:run is undocumented; Snowflake can change the shape without notice.
Client-side tools are passed through as client_mcp tool specs (generic pass-through), not mapped to Cortex's built-in bash/read/grep/… catalog. Tool use works; per-tool billing attribution may differ from the real Cortex Code client.
cache_control, thinking, and other Anthropic-specific fields are not guaranteed to round-trip.
/v1/messages/count_tokens still uses the public endpoint (no agent:run equivalent).
Set CC_SF_DEBUG_SSE=1 to tee raw upstream SSE to stderr — useful when agent:run's shape drifts.

--override

Claude Code's /model picker shows its canonical entries (Opus/Sonnet/Haiku). If you'd rather have every pick resolve to the latest live version for that family on Cortex (e.g. so selecting any "Sonnet" uses claude-sonnet-4-6 even if Claude Code lists it as 4.5), pass --override:

cc-sf --override

On startup cc-sf reads ~/.claude/settings.json (or $CLAUDE_CONFIG_DIR/settings.json), merges its computed modelOverrides (old‑version IDs → latest of that family), and writes a sidecar backup at settings.json.cc-sf.bak. On exit (normal, SIGINT, SIGTERM), the original file is restored byte‑for‑byte. If cc-sf crashes before cleanup, the next invocation detects the sidecar and restores automatically. Existing keys outside modelOverrides are preserved; any pre‑existing entries in modelOverrides that don't collide are left in place.

Known gaps

Not yet exercised (may work, not verified): cache_control, thinking blocks, /v1/messages/count_tokens.

--coco mode specifically has a narrower feature surface than the public /v1/messages shim — see the caveats under --coco.

License

MIT