stillrunning-vercel-ai-sdk
v0.1.0
Published
One-line monitoring for the Vercel AI SDK. Auto-pings StillRunning on every generateText / streamText / generateObject run with duration, tokens, cost, model, and tool-call counts.
Maintainers
Readme
stillrunning-vercel-ai-sdk
Monitoring for the Vercel AI SDK, in one line.
Wrap your generateText / streamText / generateObject calls and every run reports its
duration, token usage, estimated cost, model, and tool-call count to a
StillRunning workflow. Get alerted the moment an agent fails, runs too
long, or costs too much, without writing any ping plumbing.
npm install stillrunning-vercel-ai-sdk30-second quickstart
Create a workflow at stillrunning.ai/app/new and copy its ping token.
Set it as an env var:
STILLRUNNING_TOKEN=your_token_hereSwap your
aiimport for the StillRunning client:import { stillrunning } from 'stillrunning-vercel-ai-sdk' import { openai } from '@ai-sdk/openai' const { generateText } = stillrunning() // reads STILLRUNNING_TOKEN const { text } = await generateText({ model: openai('gpt-4o'), prompt: 'Summarize today’s standup notes.', })
That’s it. Every call now shows up in StillRunning with cost, tokens, and timing, and you get an alert if a run fails, stalls, or spikes in cost.
What gets captured
On each run the SDK sends a ping with:
| Field | Source |
| ------------ | ----------------------------------------------------------------- |
| durationMs | wall-clock time of the call |
| tokensIn | result.totalUsage.inputTokens (aggregated across all steps) |
| tokensOut | result.totalUsage.outputTokens |
| costUsd | estimated from a built-in pricing table (override-able) |
| model | result.response.modelId |
| toolCalls | total tool calls across every step |
| traceId | groups one logical run (auto-generated, or set via withTrace) |
| metadata | { finishReason, steps } |
A failed call sends a fail ping with the error message, then rethrows the original error
unchanged. Monitoring never alters your control flow, and a ping that fails to send never throws
into your code.
Streaming
streamText is handled too. The success ping fires when the stream finishes, and your own
onFinish / onError callbacks are preserved:
const { streamText } = stillrunning()
const result = streamText({
model: openai('gpt-4o'),
prompt: 'Write a haiku about uptime.',
onFinish: ({ text }) => console.log('done:', text), // still called
})
for await (const chunk of result.textStream) process.stdout.write(chunk)Grouping multi-step agent runs with withTrace
By default each call is its own run (one traceId). When an agent makes several model calls that
are really one logical execution, wrap them so they share a trace, and StillRunning stitches them
into a single outcome chain:
import { stillrunning, withTrace } from 'stillrunning-vercel-ai-sdk'
const sr = stillrunning()
await withTrace(async () => {
await sr.generateText({ model, prompt: 'plan the task' })
await sr.generateText({ model, prompt: 'execute step 1' })
await sr.generateText({ model, prompt: 'execute step 2' })
}) // all three pings share one traceIdYou can pass an explicit traceId / parentRunId for nested agents:
withTrace(fn, { traceId, parentRunId }).
Cost estimation
Cost is estimated from token counts and a built-in pricing table covering current Claude, GPT, and Gemini models. It’s intentionally approximate, it powers relative cost-anomaly detection (a 5x spike is a 5x spike regardless of the exact rate) and a ballpark spend figure. For exact accounting:
// Full control:
const sr = stillrunning({
computeCost: ({ model, inputTokens, outputTokens }) => myExactPricing(model, inputTokens, outputTokens),
})
// Or extend / override the built-in table:
import { registerModelPricing } from 'stillrunning-vercel-ai-sdk'
registerModelPricing([[/my-custom-model/, { input: 1.5, output: 6 }]]) // USD per 1M tokensUnknown models simply send no cost rather than a wrong one.
Configuration
stillrunning({
token, // ping token; defaults to process.env.STILLRUNNING_TOKEN
baseUrl, // defaults to https://stillrunning.ai
computeCost, // (input) => number | undefined , override cost estimation
awaitPing, // default true; set false for lowest latency (fire-and-forget)
pingTimeoutMs, // default 3000
onError, // (err) => void , observe ping delivery failures
fetch, // custom fetch (testing / non-global-fetch runtimes)
})By default the ping is awaited so it delivers reliably on serverless, adding the ping's round-trip
(a single small POST, hard-bounded by pingTimeoutMs) to a non-streaming call's return. A slow or
down StillRunning can therefore add up to pingTimeoutMs to a call but never hangs your agent; set
awaitPing: false for zero added latency (fire-and-forget). For streamText, your own onFinish
always runs before the ping, so streaming consumers are never gated on StillRunning.
Requirements
- Node 18+ (or any runtime with
fetchandAsyncLocalStorage) ai(Vercel AI SDK) v5 or later, as a peer dependency
License
MIT
