@orgn/gateway

v2.0.1

Published

a month ago

[OLLM](https://ollm.com/) is an enterprise router that aggregates high-security, zero-knowledge LLM providers behind a single OpenAI-compatible API. Every upstream model runs inside a Trusted Execution Environment (TEE) for end-to-end confidential computi

Downloads

0High
0Medium
0Low

oxyz-official

ai ollm llm openai-compatible tee confidential-computing vercel-ai-sdk

OLLM Provider for the AI SDK

OLLM is an enterprise router that aggregates high-security, zero-knowledge LLM providers behind a single OpenAI-compatible API. Every upstream model runs inside a Trusted Execution Environment (TEE) for end-to-end confidential computing, so prompts and completions are encrypted at every layer of the stack.

This package is the Vercel AI SDK provider for OLLM. It gives you:

Verifiable privacy — every model runs under confidential computing
One API key, hundreds of models — Claude, GPT, Gemini, Llama, GLM, Kimi, DeepSeek, Qwen, Mistral, Whisper, and more
Dynamic model discovery — fetch the live catalog at runtime; no hardcoded IDs to go stale
Multimodal input — text, images, PDFs (and other documents), plus speech-to-text via Whisper
OpenAI-compatible wire format — drop-in for tools that already speak OpenAI

Learn more at the OLLM Website.

Installation

pnpm add @orgn/gateway    # or: npm install / yarn add / bun add

Then grab an API key from the OLLM Dashboard and export it:

export OLLM_API_KEY="sk-ollm-..."

Provider instance

import { createOLLM } from '@orgn/gateway';

const ollm = createOLLM({
  apiKey: process.env.OLLM_API_KEY,
  // baseURL: 'https://api.ollm.com/v1',  // optional, defaults to this
  // headers: { 'X-Request-Id': '...' },  // optional extra headers
  // fetch: customFetch,                  // optional fetch override (testing, proxies)
});

OLLM_API_KEY is also picked up automatically if you omit apiKey. There's also a default ollm export if you don't need custom settings:

import { ollm } from '@orgn/gateway';

Discovering available models

OLLM's catalog changes frequently — new providers and model versions land all the time. Instead of hardcoding IDs, ask the gateway directly:

// All active models
const all = await ollm.listModels();

// Filter by modality
const chat       = await ollm.listModels({ inputModality: 'text',  outputModality: 'text' });
const embeddings = await ollm.listModels({ outputModality: 'embedding' });
const vision     = await ollm.listModels({ inputModality: 'image' });
const audio      = await ollm.listModels({ inputModality: 'audio' });

// Include inactive models in the result
const everything = await ollm.listModels({ activeOnly: false });

Each entry includes:

{
  id: 'near_glm_5_1',
  display_name: 'GLM 5.1',
  owned_by: 'zai',
  is_active: true,
  input_modalities:  ['text'],
  output_modalities: ['text'],
  max_input_tokens:  202752,
  max_output_tokens: 131072,
  input_cost_per_token:  8.5e-7,
  output_cost_per_token: 3.3e-6,
  model_info: { TEE: true },
  // ...
}

Use the id field directly as the argument to chatModel(), embeddingModel(), or transcriptionModel().

Chat models

`generateText`

import { generateText } from 'ai';

const { text, usage } = await generateText({
  model: ollm.chatModel('near_glm_5_1'),
  prompt: 'What is OLLM?',
});

`streamText`

import { streamText } from 'ai';

const result = streamText({
  model: ollm.chatModel('vercel_claude_sonnet_4_6'),
  prompt: 'Write a short story about secure AI.',
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

System messages

const { text } = await generateText({
  model: ollm.chatModel('near_glm_5_1'),
  system: 'You are a helpful assistant that responds concisely.',
  prompt: 'What is TypeScript in one sentence?',
});

Multi-turn conversations

Use messages instead of prompt whenever you have a real conversation:

const { text } = await generateText({
  model: ollm.chatModel('near_glm_5_1'),
  messages: [
    { role: 'user',      content: 'What is a TEE?' },
    { role: 'assistant', content: '…' },
    { role: 'user',      content: 'How does that protect my prompt?' },
  ],
});

Provider options

Chat calls accept OLLM-specific options under providerOptions.ollm:

const { text } = await generateText({
  model: ollm.chatModel('vercel_gpt_5'),
  prompt: 'Walk me through the proof.',
  providerOptions: {
    ollm: {
      reasoningEffort: 'high',  // 'low' | 'medium' | 'high' — for reasoning models (o1, o3, gpt-5, …)
      user: 'user-1234',        // optional end-user identifier for abuse monitoring
    },
  },
});

Multimodal input

Images

Pass image bytes or a URL as a file content part with an image/* media type. The provider serializes them as OpenAI-canonical image_url parts.

import { readFile } from 'node:fs/promises';
import { generateText } from 'ai';

const image = await readFile('photo.jpg');

const { text } = await generateText({
  model: ollm.chatModel('vercel_claude_sonnet_4_6'),
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Describe this image.' },
      { type: 'file', data: image, mediaType: 'image/jpeg' },
    ],
  }],
});

Supported media types include image/jpeg, image/png, image/webp, image/gif, and any other image/* value the underlying model accepts.

PDFs and documents

Non-image file parts are passed through as inline base64 — no S3 or Files API required. The SDK rewrites them into OpenAI's canonical type: "file" shape (file.file_data) before sending, so they work with any AI SDK call that takes messages:

import { readFile } from 'node:fs/promises';
import { generateText } from 'ai';

const pdf = await readFile('report.pdf');

const { text } = await generateText({
  model: ollm.chatModel('vercel_claude_sonnet_4_6'),
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Summarize this report in three bullets.' },
      { type: 'file', data: pdf, mediaType: 'application/pdf', filename: 'report.pdf' },
    ],
  }],
});

Common document types: application/pdf, text/plain, text/markdown, text/csv, application/json. Pick a model whose input_modalities from listModels() includes 'pdf' or 'text' (Claude 4.x, Gemini 2.5/3, GPT-4.1+, Kimi K2.6, etc.).

Embeddings

import { embed, embedMany } from 'ai';

// Single vector
const { embedding } = await embed({
  model: ollm.embeddingModel('near_qwen3_embedding_0_6b'),
  value: 'OLLM routes confidential LLM traffic.',
});

// Batch
const { embeddings } = await embedMany({
  model: ollm.embeddingModel('vercel_text_embedding_3_small'),
  values: [
    'Confidential computing protects data in use.',
    'TEEs use hardware-level encryption.',
  ],
});

Discover available embedding models with ollm.listModels({ outputModality: 'embedding' }).

Audio transcription (speech → text)

OLLM exposes Whisper-style speech-to-text through transcriptionModel(), compatible with the AI SDK's experimental_transcribe() helper:

import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'node:fs/promises';

const audio = await readFile('meeting.mp3');

const result = await transcribe({
  model: ollm.transcriptionModel('near_whisper_large_v3'),
  audio,
});

console.log(result.text);                // full transcript
console.log(result.language);            // e.g. "english"
console.log(result.durationInSeconds);   // total audio length
for (const seg of result.segments) {
  console.log(`[${seg.startSecond}s – ${seg.endSecond}s] ${seg.text}`);
}

You can pass Whisper-specific options through providerOptions.ollm:

await transcribe({
  model: ollm.transcriptionModel('near_whisper_large_v3'),
  audio,
  providerOptions: {
    ollm: {
      language: 'en',
      temperature: 0,
      prompt: 'OLLM, Whisper, TEE',  // hint for tricky vocabulary
    },
  },
});

Supported audio types: audio/mpeg, audio/wav, audio/mp4 (m4a), audio/webm, audio/flac, audio/ogg.

What is not supported

| Capability | Status | Why | |---|---|---| | imageModel() (text → image) | ❌ Throws NoSuchModelError | The OLLM gateway doesn't expose image-generation models through the AI SDK provider interface yet. Image-output models from listModels() (vercel_flux_*, vercel_seedream_*, etc.) are reachable via raw HTTP, but not wired up here. | | completionModel() (legacy /v1/completions) | ❌ Removed | The OLLM gateway only serves /v1/chat/completions. Use chatModel() — every "completion" task can be expressed as a chat call. | | Speech generation (TTS) | ❌ Not available | No TTS models in the OLLM catalog (zero models with audio in output_modalities). |

Advanced features

OLLM provides a number of capabilities you don't have to wire up yourself:

Zero Data Retention (ZDR) — prompts and completions are never stored or logged by upstream providers.
Confidential Computing — hardware-level encryption via TEE on every model.
Verifiable Privacy — cryptographic proofs that processing happened securely.
Model Flexibility — swap models with a one-line change; no per-provider SDKs.
Cost Management — real-time per-model usage and cost on the dashboard.
Enterprise Support — custom SLAs for high-volume users.
Tool Integrations — Cursor, Windsurf, VS Code, Cline, Roo Code, Replit.

Examples

Runnable scripts live in example/. Each file demos exactly one provider method and writes its output to example/output/<name>.log.

| Script | Provider method | |---|---| | list-models.ts | provider.listModels() | | chat-model.ts | provider.chatModel() + generateText / streamText | | chat-pdf.ts | provider.chatModel() with a PDF attachment | | embedding-model.ts | provider.embeddingModel() + embed / embedMany | | transcription-model.ts | provider.transcriptionModel() + experimental_transcribe | | image-model.ts | provider.imageModel() (demonstrates the NoSuchModelError) |

Run any of them with:

OLLM_API_KEY=... bunx tsx example/<name>.ts
# or: npx tsx example/<name>.ts

API reference

import {
  createOLLM,
  ollm,
  type OLLMProvider,
  type OLLMProviderSettings,
  type OLLMChatModelId,             // alias for string
  type OLLMChatProviderOptions,     // providerOptions.ollm shape for chat
  type OLLMEmbeddingModelId,        // alias for string
  type OLLMEmbeddingProviderOptions,
  type OLLMTranscriptionModelId,    // alias for string
  type OLLMModel,                   // shape returned by listModels()
  type ListModelsOptions,
  type OLLMErrorData,
  VERSION,
} from '@orgn/gateway';

OLLMProvider exposes:

(modelId) — shorthand for chatModel(modelId)
chatModel(modelId), languageModel(modelId) — chat / text generation (V3)
embeddingModel(modelId), textEmbeddingModel(modelId) (deprecated alias) — embeddings (V3)
transcriptionModel(modelId) — speech-to-text (V3)
imageModel(modelId) — currently throws NoSuchModelError
listModels(options?) — fetch the live /v1/models catalog with optional modality/active filters