@orgn/gateway
v2.0.1
Published
[OLLM](https://ollm.com/) is an enterprise router that aggregates high-security, zero-knowledge LLM providers behind a single OpenAI-compatible API. Every upstream model runs inside a Trusted Execution Environment (TEE) for end-to-end confidential computi
Downloads
37
Maintainers
Readme
OLLM Provider for the AI SDK
OLLM is an enterprise router that aggregates high-security, zero-knowledge LLM providers behind a single OpenAI-compatible API. Every upstream model runs inside a Trusted Execution Environment (TEE) for end-to-end confidential computing, so prompts and completions are encrypted at every layer of the stack.
This package is the Vercel AI SDK provider for OLLM. It gives you:
- Verifiable privacy — every model runs under confidential computing
- One API key, hundreds of models — Claude, GPT, Gemini, Llama, GLM, Kimi, DeepSeek, Qwen, Mistral, Whisper, and more
- Dynamic model discovery — fetch the live catalog at runtime; no hardcoded IDs to go stale
- Multimodal input — text, images, PDFs (and other documents), plus speech-to-text via Whisper
- OpenAI-compatible wire format — drop-in for tools that already speak OpenAI
Learn more at the OLLM Website.
Installation
pnpm add @orgn/gateway # or: npm install / yarn add / bun addThen grab an API key from the OLLM Dashboard and export it:
export OLLM_API_KEY="sk-ollm-..."Provider instance
import { createOLLM } from '@orgn/gateway';
const ollm = createOLLM({
apiKey: process.env.OLLM_API_KEY,
// baseURL: 'https://api.ollm.com/v1', // optional, defaults to this
// headers: { 'X-Request-Id': '...' }, // optional extra headers
// fetch: customFetch, // optional fetch override (testing, proxies)
});OLLM_API_KEY is also picked up automatically if you omit apiKey. There's
also a default ollm export if you don't need custom settings:
import { ollm } from '@orgn/gateway';Discovering available models
OLLM's catalog changes frequently — new providers and model versions land all the time. Instead of hardcoding IDs, ask the gateway directly:
// All active models
const all = await ollm.listModels();
// Filter by modality
const chat = await ollm.listModels({ inputModality: 'text', outputModality: 'text' });
const embeddings = await ollm.listModels({ outputModality: 'embedding' });
const vision = await ollm.listModels({ inputModality: 'image' });
const audio = await ollm.listModels({ inputModality: 'audio' });
// Include inactive models in the result
const everything = await ollm.listModels({ activeOnly: false });Each entry includes:
{
id: 'near_glm_5_1',
display_name: 'GLM 5.1',
owned_by: 'zai',
is_active: true,
input_modalities: ['text'],
output_modalities: ['text'],
max_input_tokens: 202752,
max_output_tokens: 131072,
input_cost_per_token: 8.5e-7,
output_cost_per_token: 3.3e-6,
model_info: { TEE: true },
// ...
}Use the id field directly as the argument to chatModel(), embeddingModel(),
or transcriptionModel().
Chat models
generateText
import { generateText } from 'ai';
const { text, usage } = await generateText({
model: ollm.chatModel('near_glm_5_1'),
prompt: 'What is OLLM?',
});streamText
import { streamText } from 'ai';
const result = streamText({
model: ollm.chatModel('vercel_claude_sonnet_4_6'),
prompt: 'Write a short story about secure AI.',
});
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}System messages
const { text } = await generateText({
model: ollm.chatModel('near_glm_5_1'),
system: 'You are a helpful assistant that responds concisely.',
prompt: 'What is TypeScript in one sentence?',
});Multi-turn conversations
Use messages instead of prompt whenever you have a real conversation:
const { text } = await generateText({
model: ollm.chatModel('near_glm_5_1'),
messages: [
{ role: 'user', content: 'What is a TEE?' },
{ role: 'assistant', content: '…' },
{ role: 'user', content: 'How does that protect my prompt?' },
],
});Provider options
Chat calls accept OLLM-specific options under providerOptions.ollm:
const { text } = await generateText({
model: ollm.chatModel('vercel_gpt_5'),
prompt: 'Walk me through the proof.',
providerOptions: {
ollm: {
reasoningEffort: 'high', // 'low' | 'medium' | 'high' — for reasoning models (o1, o3, gpt-5, …)
user: 'user-1234', // optional end-user identifier for abuse monitoring
},
},
});Multimodal input
Images
Pass image bytes or a URL as a file content part with an image/* media type.
The provider serializes them as OpenAI-canonical image_url parts.
import { readFile } from 'node:fs/promises';
import { generateText } from 'ai';
const image = await readFile('photo.jpg');
const { text } = await generateText({
model: ollm.chatModel('vercel_claude_sonnet_4_6'),
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Describe this image.' },
{ type: 'file', data: image, mediaType: 'image/jpeg' },
],
}],
});Supported media types include image/jpeg, image/png, image/webp, image/gif,
and any other image/* value the underlying model accepts.
PDFs and documents
Non-image file parts are passed through as inline base64 — no S3 or Files API
required. The SDK rewrites them into OpenAI's canonical type: "file" shape
(file.file_data) before sending, so they work with any AI SDK call that takes
messages:
import { readFile } from 'node:fs/promises';
import { generateText } from 'ai';
const pdf = await readFile('report.pdf');
const { text } = await generateText({
model: ollm.chatModel('vercel_claude_sonnet_4_6'),
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Summarize this report in three bullets.' },
{ type: 'file', data: pdf, mediaType: 'application/pdf', filename: 'report.pdf' },
],
}],
});Common document types: application/pdf, text/plain, text/markdown,
text/csv, application/json. Pick a model whose input_modalities from
listModels() includes 'pdf' or 'text' (Claude 4.x, Gemini 2.5/3, GPT-4.1+,
Kimi K2.6, etc.).
Embeddings
import { embed, embedMany } from 'ai';
// Single vector
const { embedding } = await embed({
model: ollm.embeddingModel('near_qwen3_embedding_0_6b'),
value: 'OLLM routes confidential LLM traffic.',
});
// Batch
const { embeddings } = await embedMany({
model: ollm.embeddingModel('vercel_text_embedding_3_small'),
values: [
'Confidential computing protects data in use.',
'TEEs use hardware-level encryption.',
],
});Discover available embedding models with
ollm.listModels({ outputModality: 'embedding' }).
Audio transcription (speech → text)
OLLM exposes Whisper-style speech-to-text through transcriptionModel(),
compatible with the AI SDK's experimental_transcribe() helper:
import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'node:fs/promises';
const audio = await readFile('meeting.mp3');
const result = await transcribe({
model: ollm.transcriptionModel('near_whisper_large_v3'),
audio,
});
console.log(result.text); // full transcript
console.log(result.language); // e.g. "english"
console.log(result.durationInSeconds); // total audio length
for (const seg of result.segments) {
console.log(`[${seg.startSecond}s – ${seg.endSecond}s] ${seg.text}`);
}You can pass Whisper-specific options through providerOptions.ollm:
await transcribe({
model: ollm.transcriptionModel('near_whisper_large_v3'),
audio,
providerOptions: {
ollm: {
language: 'en',
temperature: 0,
prompt: 'OLLM, Whisper, TEE', // hint for tricky vocabulary
},
},
});Supported audio types: audio/mpeg, audio/wav, audio/mp4 (m4a), audio/webm,
audio/flac, audio/ogg.
What is not supported
| Capability | Status | Why |
|---|---|---|
| imageModel() (text → image) | ❌ Throws NoSuchModelError | The OLLM gateway doesn't expose image-generation models through the AI SDK provider interface yet. Image-output models from listModels() (vercel_flux_*, vercel_seedream_*, etc.) are reachable via raw HTTP, but not wired up here. |
| completionModel() (legacy /v1/completions) | ❌ Removed | The OLLM gateway only serves /v1/chat/completions. Use chatModel() — every "completion" task can be expressed as a chat call. |
| Speech generation (TTS) | ❌ Not available | No TTS models in the OLLM catalog (zero models with audio in output_modalities). |
Advanced features
OLLM provides a number of capabilities you don't have to wire up yourself:
- Zero Data Retention (ZDR) — prompts and completions are never stored or logged by upstream providers.
- Confidential Computing — hardware-level encryption via TEE on every model.
- Verifiable Privacy — cryptographic proofs that processing happened securely.
- Model Flexibility — swap models with a one-line change; no per-provider SDKs.
- Cost Management — real-time per-model usage and cost on the dashboard.
- Enterprise Support — custom SLAs for high-volume users.
- Tool Integrations — Cursor, Windsurf, VS Code, Cline, Roo Code, Replit.
Examples
Runnable scripts live in example/. Each file demos exactly one
provider method and writes its output to example/output/<name>.log.
| Script | Provider method |
|---|---|
| list-models.ts | provider.listModels() |
| chat-model.ts | provider.chatModel() + generateText / streamText |
| chat-pdf.ts | provider.chatModel() with a PDF attachment |
| embedding-model.ts | provider.embeddingModel() + embed / embedMany |
| transcription-model.ts | provider.transcriptionModel() + experimental_transcribe |
| image-model.ts | provider.imageModel() (demonstrates the NoSuchModelError) |
Run any of them with:
OLLM_API_KEY=... bunx tsx example/<name>.ts
# or: npx tsx example/<name>.tsAPI reference
import {
createOLLM,
ollm,
type OLLMProvider,
type OLLMProviderSettings,
type OLLMChatModelId, // alias for string
type OLLMChatProviderOptions, // providerOptions.ollm shape for chat
type OLLMEmbeddingModelId, // alias for string
type OLLMEmbeddingProviderOptions,
type OLLMTranscriptionModelId, // alias for string
type OLLMModel, // shape returned by listModels()
type ListModelsOptions,
type OLLMErrorData,
VERSION,
} from '@orgn/gateway';OLLMProvider exposes:
(modelId)— shorthand forchatModel(modelId)chatModel(modelId),languageModel(modelId)— chat / text generation (V3)embeddingModel(modelId),textEmbeddingModel(modelId)(deprecated alias) — embeddings (V3)transcriptionModel(modelId)— speech-to-text (V3)imageModel(modelId)— currently throwsNoSuchModelErrorlistModels(options?)— fetch the live/v1/modelscatalog with optional modality/active filters
