@auth-craft/aws-cf-stack
v1.6.4
Published
Self-contained, versioned distribution of the Auth Craft AWS (DynamoDB + Lambda) + Cloudflare gateway stack. Bundles prebuilt Lambda/worker artifacts + CDK app so consumers deploy without cloning auth-craft.
Readme
@auth-craft/aws-cf-stack
Self-contained, one-command deploy for the Auth Craft Lambda + Cloudflare gateway stack. The package bundles prebuilt artifacts (the Lambda bundle, the 3-worker gateway bundle, the CDK app) and the super-admin CLI, so a consumer can deploy without cloning or building the auth-craft repo — just pass env and run.
"Self-contained" = no auth-craft repo, no private packages — not zero-dependency. All auth-craft code is prebuilt into
assets/(lambda / gateway worker / admin CLI), so none of the internal/privateworkspace packages are needed at install time. The only runtimedependenciesare the publicaws-cdk-lib+constructs(the bundledcdk/app is TypeScript that imports them at synth/deploy time — they're regular npm packagesnpm ipulls in automatically). Theaws-cdk+wranglerCLIs are optional peers (the toolchain you already have on the deploying machine). devDependencies don't ship.
What it deploys
- DynamoDB + Lambda (CDK) — the auth backend. Emits a Function URL + a base path.
- Super-admin (idempotent) — bootstraps the
__system__tenant owner. - Gateway worker(s) on Cloudflare, via wrangler. Default
multi= 3 workers (system / tenant / customer);CF_GATEWAY_MODE=shared= 1 worker. See Deploy model.
The gateway's BACKEND_URL is computed as FunctionURL + ApiBasePath read back from the
live CDK outputs of the same deploy — so the worker config can never drift from the
Lambda. The same LAMBDA_GATEWAY_SECRET / LAMBDA_JWT_PUBLIC_KEY feed both sides from one
process env.
Naming convention
Everything is named from --stage (dev|staging|prod) and --project (default
default). With the default project, app = auth-craft; otherwise app = auth-craft-<project>.
| Resource | Name |
|----------|------|
| CDK stacks | <app>-<stage>-data, <app>-<stage>-api |
| Lambda fn + DynamoDB table | <app>-<stage> |
| Gateway workers (multi) | <app>-<stage>-system / -tenant / -customer |
| Gateway worker (shared) | <app>-<stage> |
So outputs/gateway look up the api stack <app>-<stage>-api. The CDK output keys are
LambdaFunctionUrl, ApiBasePath, AuthSystemName, DynamoDBTableName.
Deploy model (multi vs shared)
CF_GATEWAY_MODE picks how the gateway is deployed:
multi(default) — 3 workers, one scope each. Each worker has a fixedAUTH_SCOPE, so the scope is enforced server-side: a customer worker can never act as system, whatever path the client sends. Strongest isolation; 3 deploys + 3 Service Bindings.shared— 1 worker. Scope is derived from the path's first segment (/system/...→system), so the worker by itself does not enforce scope (a client could send/system/...). Simpler: 1 deploy, 1 binding, unioned CORS. Pair it with a scope-locked Pages proxy (createPagesAuthProxy({ scope: 'customer' })from @auth-craft/pages-auth-proxy) so the proxy prepends the scope and the client never picks it — that restores the "customer can't reach system" guarantee at the edge.
In shared mode CORS unions all per-scope origins/suffixes, and the dynamic-CORS KV uses
CF_CORS_KV_ID_SHARED (falling back to CF_CORS_KV_ID_CUSTOMER). outputs.workerNames.*
all point at the single worker.
Commands
| Command | Does |
|---------|------|
| lambda | Deploy the DynamoDB + Lambda CDK stacks. --outputs-file <p> writes the raw CDK outputs JSON. Runs a best-effort /health check. |
| admin | Create the super-admin (idempotent; skips if LAMBDA_SUPER_ADMIN_EMAIL/LAMBDA_SUPER_ADMIN_PASSWORD unset; "already exists" = success). |
| gateway | Deploy the gateway worker(s) — 3 (multi, default) or 1 (CF_GATEWAY_MODE=shared). Re-reads the Lambda outputs from CloudFormation if run as a separate invocation. |
| all | lambda → admin → gateway, one run. |
| outputs | Print the live api-stack outputs as JSON (no deploy): backendUrl, apiBasePath, dynamodbTable, and the auth keys read from the live Lambda config (jwtPublicKey, jwtIssuer, jwtAlgorithm, serviceJwtPublicKey) + gatewayUrls/routePrefixes. Self-contained — works standalone (no env), so an orchestrator can auto-fill its *_AUTH_* vars. |
Requirements (on the deploying machine)
awsCLI authenticated (orAWS_*creds in env),cdk(ornpx cdk),wrangler,node≥ 20,jq,curl.- AWS + Cloudflare credentials.
Usage
# everything, in order
LAMBDA_JWT_PRIVATE_KEY=... LAMBDA_JWT_PUBLIC_KEY=... LAMBDA_GATEWAY_SECRET=... \
CLOUDFLARE_API_TOKEN=... CLOUDFLARE_ACCOUNT_ID=... CF_EDGE_GATE_SECRET=... \
npx auth-craft-stack all --stage dev --region us-east-1
# or step by step
npx auth-craft-stack lambda --stage dev
npx auth-craft-stack admin --stage dev
npx auth-craft-stack gateway --stage devInputs come from flags or environment variables (flag wins, then env, then
--env-file, then default). See .env.example for the full contract.
Minimum required inputs
| Stage | Required |
|-------|----------|
| lambda | LAMBDA_JWT_PRIVATE_KEY, LAMBDA_JWT_PUBLIC_KEY, and LAMBDA_GATEWAY_SECRET (required for staging/prod; optional in dev). AWS creds in env. |
| admin | LAMBDA_SUPER_ADMIN_EMAIL, LAMBDA_SUPER_ADMIN_PASSWORD (else skipped). AWS creds. |
| gateway | Hard-required: CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID. Needed in practice (pushed as worker secrets; empty ones are skipped but the gateway then rejects clients / can't reach the backend): CF_EDGE_GATE_SECRET, LAMBDA_GATEWAY_SECRET, LAMBDA_JWT_PUBLIC_KEY. |
Set LAMBDA_APP_BASE_PATH to a stable value — if unset, a random base path is generated
every deploy, which breaks any client that has the path baked in. Set
CF_WORKERS_SUBDOMAIN so the gateway URLs can be printed (otherwise workers.dev URLs aren't
known to the script). Everything else has a sensible default — see .env.example.
Driving the deploy yourself (orchestrator controls the steps)
You don't have to use all. The sub-commands are independent, so an orchestrator can run
each stage, insert its own work in between, and read back the live outputs — without
re-implementing any of the wiring (BACKEND_URL = FunctionUrl+ApiBasePath, the 3-worker
fan-out, secrets) yourself. Each stage is idempotent and gateway re-reads the Lambda
outputs from CloudFormation when run as a separate invocation.
# 1) Deploy just the backend; capture its outputs as JSON for your own logic.
npx auth-craft-stack lambda --stage "$STAGE" --outputs-file /tmp/cdk.json
# 2) Read what you need (no re-deploy). `outputs` prints machine-readable JSON,
# including the derived backendUrl you'd otherwise have to compute by hand.
eval "$(npx auth-craft-stack outputs --stage "$STAGE" \
| jq -r '"AUTH_BACKEND_URL=\(.backendUrl) AUTH_TABLE=\(.dynamodbTable)"')"
# → now AUTH_BACKEND_URL / AUTH_TABLE are yours to use in your own steps.
# 3) Do your own thing here (deploy a sibling stack, run migrations, gate on a check…).
# 4) Then finish the auth stages whenever you want.
npx auth-craft-stack admin --stage "$STAGE"
npx auth-craft-stack gateway --stage "$STAGE"If you need even more control (e.g. deploy the CDK app inside your own pipeline, or use
Terraform instead of wrangler for the workers), the prebuilt artifacts are addressable
directly inside the installed package — assets/lambda/index.mjs,
assets/gateway/worker.js, and the self-contained cdk/ app — so you can point your own
tooling at them. The sub-commands above are the supported, drift-safe path; reach for the
raw artifacts only when you deliberately want to own the wiring.
Calling from an orchestrator (e.g. snapshot-commerce)
The orchestrator reads its own env file under any names it wants and maps them into the contract when invoking — auth-craft never sees the orchestrator's env names:
LAMBDA_JWT_PUBLIC_KEY="$SNAP_JWT_PUB" \
LAMBDA_GATEWAY_SECRET="$SNAP_GW_SECRET" \
CF_EDGE_GATE_SECRET="$SNAP_EDGE" CLOUDFLARE_ACCOUNT_ID="$SNAP_CF_ACCT" \
npx auth-craft-stack all --stage "$STAGE"CORS (per scope + dynamic allowlist)
CORS is resolved per scope (system / tenant / customer) — the 3 workers no longer
share one origin list. CF_ALLOWED_ORIGINS is the shared fallback; override a single
scope with CF_ALLOWED_ORIGINS_<SCOPE> (e.g. CF_ALLOWED_ORIGINS_SYSTEM=https://admin.example.com).
For customer (snapshot-commerce style — stores added continuously, static lists aren't viable), two dynamic mechanisms can be enabled independently. Neither requires a redeploy to add a store:
Suffix match (static, deploy-time) — allow any subdomain of a domain you own. Set
CF_ALLOWED_ORIGIN_SUFFIXES_CUSTOMER=.snapshot.app→ every*.snapshot.apporigin is allowed. Zero lookup, zero latency. Best when stores are subdomains.KV allowlist (runtime, per-store custom domains) — set the KV namespace id with
CF_CORS_KV_ID_CUSTOMER. The worker then owns CRUD on the allowlist via a service route (Service JWT, typ=3) — your backend calls it when a store is created / removed, so no Cloudflare credentials ever leave the worker:PUT /internal/cors-origins { "origin": "https://shop.brand.com" } # allow DELETE /internal/cors-origins { "origin": "https://shop.brand.com" } # revoke GET /internal/cors-origins # listCreate the namespace once:
wrangler kv namespace create CORS_KV→ put its id inCF_CORS_KV_ID_CUSTOMER. A KV binding can't be set via--var, so the deploy appends a[[kv_namespaces]]block to a temp wrangler config for that scope only.
If neither is set, the customer worker falls back to its static ALLOWED_ORIGINS
(dynamic CORS off). System/tenant normally use only static origins.
Cookies: an allowed origin (static, suffix, or KV) is reflected with
Access-Control-Allow-Credentials: true, so credentialed requests work.*never is (browsers forbid credentials with*) — use explicit origins / suffix / KV when you need cookies. Cross-site cookies (store on a different registrable domain than the gateway) additionally need the backend to setSameSite=None; Secureon the cookie.
Access models & token transport
There are two ways a frontend reaches auth, and they can be mixed per scope against the same gateway + backend:
Model 1 — Direct to gateway (cross-origin) + token in body
The app calls the gateway worker on a different origin. Refresh token is returned in
the response body (client stores it; @auth-craft/client createBodyProvider). No
auth cookie → no cross-site cookie problem, immune to Safari ITP, and the customer
worker can run with ALLOWED_ORIGINS=* (Bearer needs no credentials → no dynamic CORS,
no KV). Trade-off: the refresh token lives in client storage (XSS-reachable) — mitigated
by rotation + short TTL. Good default for customer (many stores, many domains).
Model 2 — Cloudflare Pages reverse-proxy (same-origin) + cookie
If the app runs on Cloudflare Pages, mount @auth-craft/pages-auth-proxy so /{scope}/auth/*
is served on the app's own origin and forwarded to the gateway. The refresh cookie is
then first-party (HttpOnly; Secure; SameSite=Strict) — XSS can't read it, there's no
cross-site CORS, and it's immune to ITP. Best security; requires the app to be on Pages.
Good for system/tenant (your own admin apps).
Wiring (in the consumer's Pages project — not this stack). The Service Binding lives in
your Pages app's wrangler.toml and points at the gateway worker this stack deployed. The
worker names + a ready-to-paste snippet are in auth-craft-stack outputs:
// auth-craft-stack outputs →
{
"workerNames": { "system": "auth-craft-prod-system", "tenant": "...", "customer": "auth-craft-prod-customer" },
"pagesProxy": { "serviceBindingExample": "[[services]]\nbinding = \"AUTH_GATEWAY\"\nservice = \"auth-craft-prod-customer\"\n" }
}In the Pages app:
# wrangler.toml (your Pages project) — bind the gateway worker for the scope this app serves
[[services]]
binding = "AUTH_GATEWAY"
service = "auth-craft-prod-customer" # from outputs.workerNames.<scope>// functions/customer/[[path]].ts
import { createPagesAuthProxy } from '@auth-craft/pages-auth-proxy/pages';
export const onRequest = createPagesAuthProxy({ prefix: '/customer' });Set EDGE_GATE_SECRET as a Pages secret (same value as the stack's CF_EDGE_GATE_SECRET).
With a Service Binding the worker→worker hop stays in-network (lowest latency; typically not
billed as a second request). If you can't use a binding, set AUTH_GATEWAY_URL to the
gateway URL from outputs.gatewayUrls.<scope> instead — that path is a normal billable
request. See @auth-craft/pages-auth-proxy.
Disable the public worker URL. With a Service Binding the worker is reached in-network,
so its public workers.dev URL only widens the attack surface (and bypasses the Pages
same-origin path). Set CF_WORKER_DISABLE_PUBLIC_URL=true (or per scope, e.g.
CF_WORKER_DISABLE_PUBLIC_URL_SYSTEM=true) and the deploy ships the worker with
workers_dev = false and preview_urls = false — turning off both the production
workers.dev subdomain and the per-version preview URLs (Cloudflare does not drop
preview URLs just because workers.dev is off, so both must be set; the deploy does this
for you). The worker then stays reachable only via its Service Binding (and any custom
domain). This applies only when no CF_WORKER_CUSTOM_DOMAIN_<SCOPE> is set for the scope;
if you're using AUTH_GATEWAY_URL (no binding) leave it off, since that path needs the
public URL.
Skip the edge gate (optional). The X-Edge-Gate secret exists to stop the public
internet from hitting the worker. Once the public URL is disabled and the worker is
reached only via a Service Binding, that gate is redundant — set
CF_WORKER_SKIP_EDGE_GATE=true (or per scope) to drop it; the Pages proxy then needs no
EDGE_GATE_SECRET. For safety it's only honored when the public URL is also disabled
for that scope (otherwise ignored with a warning). It does not touch the
Worker → Backend secret: X-Gateway-Secret stays required because the backend (Lambda
Function URL) is still on the public internet — without it, anyone could call the Lambda
directly.
Per-scope token transport
REFRESH_TOKEN_STRATEGY sets the transport for every scope. To mix, use the per-scope
vars (any one of them switches to per-scope mode; unset scopes fall back to the global
value, else body):
REFRESH_TOKEN_STRATEGY_CUSTOMER=body # storefront, many domains → body
REFRESH_TOKEN_STRATEGY_TENANT=cookie # your tenant admin app → cookie
REFRESH_TOKEN_STRATEGY_SYSTEM=cookie # your system admin app → cookieFor any scope using cookie, set trustedOrigins (the app's CORS origins) so the
backend's CSRF Origin check is active — SameSite is the primary defence, this is
defense-in-depth. See CORS for the origin config.
Cookie isolation does not come from CORS. It comes from first-party
SameSite+HttpOnly+ the backend CSRF Origin check. Removing CORS (same-origin via Pages proxy, or Bearer in body) does not weaken access control.
Versioning
Pin @auth-craft/aws-cf-stack in your package.json like any dependency. The package version
is the auth-craft version you deploy (the Lambda/worker bundles are baked in at publish
time). Bump the version to deploy newer auth-craft.
Troubleshooting
This CDK CLI is not compatible with the CDK library/ schema version mismatch — theaws-cdkCLI is older thanaws-cdk-lib's cloud-assembly schema. Install a CLI that satisfies the package'saws-cdkpeer range (≥ 2.1126.0 foraws-cdk-lib2.258).- Login through the gateway returns a basic JWT (no
tid/perms) — the worker isn't stampingX-Auth-Context. Each worker sets it fromAUTH_SCOPE(this CLI sets that per scope), so check the deploy used the right worker for the scope you're calling. - Backend returns a stealth 404 — the worker's
BACKEND_GATEWAY_SECRETdoesn't match the Lambda'sGATEWAY_SECRET. Both come fromLAMBDA_GATEWAY_SECRET; deploylambdaandgatewayfrom the same value. gatewaycan't read Lambda outputs — deploy thelambdastage first (it reads the live<app>-<stage>-apistack).- Customer CORS rejects a store origin (no
Access-Control-Allow-Origin) — the origin isn't on any allowlist. Either it doesn't matchCF_ALLOWED_ORIGIN_SUFFIXES_CUSTOMER, or it wasn'tPUTinto KV (/internal/cors-origins), orCF_CORS_KV_ID_CUSTOMERis unset so KV lookup is off. See CORS. - Cookies not sent from a store on another domain — CORS being correct isn't enough for
cross-site cookies; the backend must set
SameSite=None; Secure. Same-site (store is a subdomain via suffix match) works withSameSite=Lax. - Queue email silently not sent — with
LAMBDA_EMAIL_MODE=queue, the SQS queue must exist before thelambdastage: passLAMBDA_EMAIL_QUEUE_ARN/LAMBDA_EMAIL_QUEUE_URLof an already-deployed queue. The stack grants the Lambdasqs:SendMessageon that ARN, but it can't create a cross-stack queue — deploy your email/queue stack first, then this one.
Email delivery modes
LAMBDA_EMAIL_MODE = console (default) | smtp | queue.
- queue (e.g. an existing SQS-backed email service): set
LAMBDA_EMAIL_QUEUE_ARN+LAMBDA_EMAIL_QUEUE_URL(+ region/envelope, see.env.example). Ordering matters — the queue is a cross-stack resource this package does not create; deploy it first. Thelambdastage then grants the auth Lambdasqs:SendMessageon that ARN automatically. - smtp: set
LAMBDA_SMTP_*. console: logs only (dev).
How it's built (maintainers)
pnpm build (a.k.a. prepublishOnly) runs scripts/prebuild-assets.mjs, which fills assets/:
- Lambda: invokes the app's own build (
pnpm --filter @auth-craft/auth-hono-dynamodb-lambda build) and copies itsdist/index.mjs→assets/lambda/index.mjs. Using the app's build (not a re-implemented esbuild call) keeps the bundle identical to what the app ships. - Gateway: esbuilds
apps/cloudflare-auth-gateway/src/index.ts→assets/gateway/worker.js(ESM,node:*external — provided by the Workers runtime), the same way wrangler bundles internally. Uses the esbuild binary in the app'snode_modules(the root.bin/esbuildsymlink can be stale because esbuild is in pnpmignoredBuiltDependencies). - Admin CLI: esbuilds
packages/auth-admin-cli/src/index.ts→assets/admin/cli.mjs, fully inlined (auth-core + plugins +pg+ aws-sdk). It is bundled, not a dependency, because@auth-craft/auth-admin-cliisprivateand pulls in ~7 workspace packages — depending on it would breaknpm ifor public consumers. The bundle runs with onlynode.
The bundled cdk/lib/lambda-stack.ts deploys the Lambda via
lambda.Code.fromAsset('../../assets/lambda') + handler index.handler (not NodejsFunction),
so no source build happens at the consumer's deploy time. It is a copy of the app's
infrastructure/lib/lambda-stack.ts — keep cdk/lib/*.ts in sync until a future refactor
unifies them.
