@sarfarajey/vertex-client
v1.0.0
Published
Streaming Vertex AI / Gemini client for edge runtimes — SSE token + tool-call events, retry, context caching. No google-cloud SDK.
Maintainers
Readme
@sarfarajey/vertex-client
Vertex AI / Gemini client built for edge runtimes. No @google-cloud/aiplatform SDK — pure fetch + Web Crypto. Runs in Cloudflare Workers, Vercel Edge, Deno, Bun, and Node 18+.
Includes:
- Streaming via SSE (token + tool-call events)
- Function calling
- Server-side context caching
- 429 backoff retry on the non-streaming path
Install
npm install @sarfarajey/vertex-client
# pulls @sarfarajey/gcp-edge-auth as a transitive dependencySingle-shot call
import { callVertexAI } from '@sarfarajey/vertex-client';
const text = await callVertexAI({
serviceAccount: env.SERVICE_ACCOUNT_JSON,
projectId: 'my-gcp-project',
systemPrompt: 'You are a concise assistant.',
userPrompt: 'In 3 bullets, what is RSVP?',
});Defaults: model: 'gemini-2.5-flash', location: 'us-central1', maxOutputTokens: 1024, temperature: 0.4. Override any of them.
Throws on non-2xx, on MAX_TOKENS truncation, and on empty response.
Streaming
import { callVertexAIStream } from '@sarfarajey/vertex-client';
const events = callVertexAIStream({
serviceAccount: env.SERVICE_ACCOUNT_JSON,
projectId: 'my-gcp-project',
contents: [
{ role: 'user', parts: [{ text: 'Tell me a joke.' }] },
],
});
for await (const e of events) {
if (e.type === 'token') process.stdout.write(e.text);
if (e.type === 'tool_call') console.log('\nTOOL:', e.name, e.args);
if (e.type === 'done') break;
}Event shapes:
{ type: 'token', text: string }
{ type: 'tool_call', name: string, args: object }
{ type: 'done' }Function calling
const tools = [{
functionDeclarations: [{
name: 'get_weather',
description: 'Get current weather for a city',
parameters: {
type: 'object',
properties: { city: { type: 'string' } },
required: ['city'],
},
}],
}];
for await (const e of callVertexAIStream({
serviceAccount, projectId,
contents: [{ role: 'user', parts: [{ text: 'Weather in Tokyo?' }] }],
tools,
})) {
if (e.type === 'tool_call') {
// run the tool, append its result to `contents`, call again…
}
}Context caching
For a long system prompt that is reused across many messages, create a cache and pass its resource name to subsequent calls — Gemini bills cached tokens at a discount.
import { createVertexCache, callVertexAIStream } from '@sarfarajey/vertex-client';
const cacheName = await createVertexCache({
serviceAccount, projectId,
systemPrompt: longSystemPromptHere,
ttlSeconds: 600,
});
for await (const e of callVertexAIStream({
serviceAccount, projectId,
contents,
cachedContent: cacheName, // takes precedence over systemPrompt
})) {
// ...
}createVertexCache returns null on failure — degrade gracefully and fall back to inline systemPrompt.
Thinking budget
callVertexAIStream({
serviceAccount, projectId,
contents,
thinkingBudget: 1024, // tokens
});Auth
Uses @sarfarajey/gcp-edge-auth under the hood. Supply your service account JSON (string or parsed object) — access tokens are cached for the process lifetime by client_email + scope.
License
MIT
