@runtime-judgement/vercel-ai
v0.1.0
Published
Vercel AI SDK → Runtime Judgement bridge — one-line model wrapper that auto-instruments AI calls and sends traces for attribution. Zero breaking changes to your existing AI SDK usage.
Maintainers
Readme
@runtime-judgement/vercel-ai
Vercel AI SDK wrapper for Runtime Judgement — one line of code to instrument your AI calls and send traces for failure attribution and regression gating.
Installation
npm install @runtime-judgement/vercel-ai ai
# or
pnpm add @runtime-judgement/vercel-ai aiai (the Vercel AI SDK) is a peer dependency — you likely already have it installed.
Quick start
import { rj } from "@runtime-judgement/vercel-ai"
import { openai } from "@ai-sdk/openai"
import { generateText } from "ai"
// Wrap once at startup — drop-in replacement for any LanguageModelV1
const model = await rj(openai("gpt-4o"), {
apiKey: process.env.RJ_API_KEY!,
})
// Use exactly like a normal model — no other changes needed
const result = await generateText({ model, prompt: "Summarise this article..." })Synchronous variant
If you prefer not to await the wrapper (e.g. in module initialisation code), use rjSync and pass in wrapLanguageModel from ai directly:
import { rjSync } from "@runtime-judgement/vercel-ai"
import { wrapLanguageModel } from "ai"
import { openai } from "@ai-sdk/openai"
const model = rjSync(openai("gpt-4o"), { apiKey: process.env.RJ_API_KEY! }, wrapLanguageModel)Linking to a snapshot suite
Pass suiteId to automatically link every trace to a Runtime Judgement snapshot suite. When a trace arrives, RJ can auto-run the suite and flag regressions:
const model = await rj(openai("gpt-4o"), {
apiKey: process.env.RJ_API_KEY!,
suiteId: "01HZSUITE_ID_FROM_RJ_UI",
})Configuration options
| Option | Type | Default | Description |
|---|---|---|---|
| apiKey | string | required | Runtime Judgement API key. Generate at /app/settings/api-keys. |
| endpoint | string | https://runtime-judgement.app/api/ingest/otlp/v1/traces | OTLP ingest endpoint. Override for self-hosted deployments. |
| suiteId | string | — | Snapshot suite ID to link traces to. Enables auto-regression gating. |
| disabled | boolean | false | Set true to skip all trace sending (e.g. in unit tests or local dev). The model still works normally. |
| maxPromptChars | number | 1000 | Maximum prompt characters captured per call. Longer prompts are truncated. |
| maxCompletionChars | number | 2000 | Maximum completion characters captured per call. Longer completions are truncated. |
What data is captured
Each doGenerate (non-streaming) call sends one OTLP span with:
gen_ai.system— inferred from the model name (e.g."openai"for GPT models)gen_ai.request.model— the model ID string (e.g."gpt-4o")gen_ai.usage.prompt_tokens— token count from the SDK responsegen_ai.usage.completion_tokens— token count from the SDK responsegen_ai.response.finish_reasons— e.g.["stop"],["length"]gen_ai.prompt— firstmaxPromptCharschars of the prompt (default 1000)gen_ai.output.value— firstmaxCompletionCharschars of the completion (default 2000)- Start/end timestamps in nanosecond UNIX epoch format
What is NOT captured
- Tool call arguments — if your tool calls contain secrets or PII, they are not currently captured. A
redactFieldsoption for fine-grained field redaction is planned for a follow-up sprint. - Streaming calls — only
doGenerate(non-streaming) is currently instrumented.doStreamsupport is planned. - Full prompts > 1000 chars — truncated to
maxPromptChars(configurable). - Full completions > 2000 chars — truncated to
maxCompletionChars(configurable).
Privacy note
Prompt and completion text is captured and transmitted to the Runtime Judgement ingest endpoint. By default:
- Prompts are truncated to 1000 characters
- Completions are truncated to 2000 characters
You can reduce these limits further (maxPromptChars: 0 captures no prompt text at all), or set disabled: true in environments where you do not want any data transmitted.
Data is transmitted over HTTPS with Bearer token authentication. Runtime Judgement does not sell or share trace data. See our privacy policy for details.
How it works
rj() wraps your model using the Vercel AI SDK's wrapLanguageModel API with a custom middleware. The middleware:
- Calls the real model and returns the result immediately (no latency added to your application)
- Captures timing, prompt, completion, and token usage from the result
- Builds an OTLP JSON payload following the
gen_ai.*semantic conventions - POSTs the payload to the Runtime Judgement ingest endpoint in the background (fire-and-forget)
- Swallows any errors from the POST so telemetry failures never interrupt your agent
TypeScript support
This package ships full TypeScript definitions. The rj() wrapper preserves the generic type of the original model, so the returned model is typed identically to the input.
Testing your instrumentation
Set disabled: true in your test configuration to prevent calls to the Runtime Judgement endpoint:
const model = await rj(openai("gpt-4o"), {
apiKey: "test-key",
disabled: process.env.NODE_ENV === "test",
})Or mock fetch in your test suite to capture the OTLP payloads.
