@sarfarajey/vertex-client

v1.0.0

Published

2 months ago

Streaming Vertex AI / Gemini client for edge runtimes — SSE token + tool-call events, retry, context caching. No google-cloud SDK.

0High
0Medium
0Low

sarfarajey

vertex-ai gemini google-ai llm streaming sse tool-calling function-calling context-cache cloudflare-workers edge

@sarfarajey/vertex-client

Vertex AI / Gemini client built for edge runtimes. No @google-cloud/aiplatform SDK — pure fetch + Web Crypto. Runs in Cloudflare Workers, Vercel Edge, Deno, Bun, and Node 18+.

Includes:

Streaming via SSE (token + tool-call events)
Function calling
Server-side context caching
429 backoff retry on the non-streaming path

Install

npm install @sarfarajey/vertex-client
# pulls @sarfarajey/gcp-edge-auth as a transitive dependency

Single-shot call

import { callVertexAI } from '@sarfarajey/vertex-client';

const text = await callVertexAI({
    serviceAccount: env.SERVICE_ACCOUNT_JSON,
    projectId:      'my-gcp-project',
    systemPrompt:   'You are a concise assistant.',
    userPrompt:     'In 3 bullets, what is RSVP?',
});

Defaults: model: 'gemini-2.5-flash', location: 'us-central1', maxOutputTokens: 1024, temperature: 0.4. Override any of them.

Throws on non-2xx, on MAX_TOKENS truncation, and on empty response.

Streaming

import { callVertexAIStream } from '@sarfarajey/vertex-client';

const events = callVertexAIStream({
    serviceAccount: env.SERVICE_ACCOUNT_JSON,
    projectId:      'my-gcp-project',
    contents: [
        { role: 'user', parts: [{ text: 'Tell me a joke.' }] },
    ],
});

for await (const e of events) {
    if (e.type === 'token')     process.stdout.write(e.text);
    if (e.type === 'tool_call') console.log('\nTOOL:', e.name, e.args);
    if (e.type === 'done')      break;
}

Event shapes:

{ type: 'token',     text: string }
{ type: 'tool_call', name: string, args: object }
{ type: 'done' }

Function calling

const tools = [{
    functionDeclarations: [{
        name: 'get_weather',
        description: 'Get current weather for a city',
        parameters: {
            type: 'object',
            properties: { city: { type: 'string' } },
            required: ['city'],
        },
    }],
}];

for await (const e of callVertexAIStream({
    serviceAccount, projectId,
    contents: [{ role: 'user', parts: [{ text: 'Weather in Tokyo?' }] }],
    tools,
})) {
    if (e.type === 'tool_call') {
        // run the tool, append its result to `contents`, call again…
    }
}

Context caching

For a long system prompt that is reused across many messages, create a cache and pass its resource name to subsequent calls — Gemini bills cached tokens at a discount.

import { createVertexCache, callVertexAIStream } from '@sarfarajey/vertex-client';

const cacheName = await createVertexCache({
    serviceAccount, projectId,
    systemPrompt: longSystemPromptHere,
    ttlSeconds:   600,
});

for await (const e of callVertexAIStream({
    serviceAccount, projectId,
    contents,
    cachedContent: cacheName,  // takes precedence over systemPrompt
})) {
    // ...
}

createVertexCache returns null on failure — degrade gracefully and fall back to inline systemPrompt.

Thinking budget

callVertexAIStream({
    serviceAccount, projectId,
    contents,
    thinkingBudget: 1024, // tokens
});

Auth

Uses @sarfarajey/gcp-edge-auth under the hood. Supply your service account JSON (string or parsed object) — access tokens are cached for the process lifetime by client_email + scope.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@sarfarajey/vertex-client

Install

Single-shot call

Streaming

Function calling

Context caching

Thinking budget

Auth

License