@salesforce/llm-gateway-sdk

v0.14.0

Published

a day ago

Salesforce LLM Gateway SDK

0High
0Medium
0Low

@salesforce/llm-gateway-sdk

Typed SDK for the Salesforce LLM Gateway API. Provides an HTTP client with JWT management, model abstraction, streaming (SSE) support, and request/response tracing.

Quick Start

Closed source. This package is published to npm under the Salesforce Public Code License and is for use by Salesforce only.

import { createJWT, createLLMGatewayClient } from '@salesforce/llm-gateway-sdk';

// 1. Create a JWT and client
const jwt = await createJWT({
  accessToken: 'your-access-token',
  instanceUrl: 'https://your-instance.salesforce.com',
});
const client = createLLMGatewayClient({ jwt });

// 2. Send a non-streaming chat request
const response = await client.chat({
  messages: [{ role: 'user', content: 'Hello!' }],
  generation_settings: { max_tokens: 100 },
});
console.log(response.data.generatedText);

// 3. Or stream the response
const stream = await client.chatStream({
  messages: [{ role: 'user', content: 'Tell me a story.' }],
  generation_settings: { max_tokens: 500 },
});

for await (const chunk of stream.data) {
  if (chunk.generatedText) process.stdout.write(chunk.generatedText);
  if (chunk.done) console.log('\nUsage:', chunk.usage);
}

API Reference

`createLLMGatewayClient(options): LLMGatewayClient`

Creates a ready-to-use client. This is the recommended entry point.

import { createJWT, createLLMGatewayClient, Models, ModelName, SfApiEnv } from '@salesforce/llm-gateway-sdk';

const jwt = await createJWT({
  accessToken: 'your-access-token',
  instanceUrl: 'https://your-instance.salesforce.com',
});

const client = createLLMGatewayClient({ jwt, env: SfApiEnv.Prod });
client.setModel(Models.getByName(ModelName.GPT_5));

Accepts GatewayClientOptions:

| Option | Type | Default | Description | | -------------------------- | ----------------------- | ------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | jwt | JSONWebToken | — | JWT for authentication (required). | | env? | SfApiEnv | prod | API environment. | | basePath? | string | LLMG_BASE_PATH env var or '/einstein/gpt/code/v1.1' | Override the API base path. See route selection rationale. | | clientFeatureIdOverride? | string | — | Override x-client-feature-id header instead of using the JWT minting feature ID. | | retry? | RetryOptions \| false | below | Retry policy for transient failures. Pass false to disable retries entirely. |

`LLMGatewayClient` (interface)

The core contract consumers program against. All factory functions return this type.

interface LLMGatewayClient {
  getModel(): Model;
  setModel(model: Model): void;

  chat(request: ChatRequest, options?: ChatOptions): Promise<ChatResponse>;
  chatStream(request: ChatRequest, options?: ChatOptions): Promise<ChatStreamResponse>;

  onTrace(callback: (event: TraceEvent) => void): Unsubscribe;
  onTelemetry(callback: TelemetryEventCallback): Unsubscribe;
  onLog(callback: (record: LogRecord) => void): Unsubscribe;

  startTracing(filePath: string): void;
  stopTracing(): void;

  dispose(): void;
}

`ChatOptions`

type ChatOptions = {
  abortSignal?: AbortSignal;
};

Models

abstract class Model {
  abstract readonly name: ModelName;
  abstract readonly displayId: string;
  abstract readonly maxInputTokens: number;
  abstract readonly maxOutputTokens: number;
  abstract readonly contextWindow: number;
  abstract readonly supportsPromptCache: boolean;
  // The file formats this model accepts, with per-format caps. `[]` = text-only.
  abstract readonly supportedFormats: readonly SupportedFileFormat[];
}

type SupportedFileFormat = {
  name: 'png' | 'jpeg' | 'pdf';
  mimeType: MimeType; // 'image/png' | 'image/jpeg' | 'application/pdf'
  maxBytesPerFile?: number; // per-format byte cap; unset = only the global 15 MiB applies
  maxFilesPerRequest?: number; // per-format count cap; unset = only the global 10-file cap applies
};

enum ModelName {
  GPT_5 = 'llmgateway__OpenAIGPT5',
  GPT_5_4 = 'llmgateway__OpenAIGPT54',
  GPT_5_5 = 'llmgateway__OpenAIGPT55',
  CLAUDE_SONNET_4_5 = 'llmgateway__BedrockAnthropicClaude45Sonnet',
  CLAUDE_SONNET_4_6 = 'llmgateway__BedrockAnthropicClaude46Sonnet',
  CLAUDE_OPUS_4_5 = 'llmgateway__BedrockAnthropicClaude45Opus',
  CLAUDE_OPUS_4_6 = 'llmgateway__BedrockAnthropicClaude46Opus',
  CLAUDE_OPUS_4_7 = 'llmgateway__BedrockAnthropicClaude47Opus',
}

A model's supportedFormats array is the single source of truth for multimodal capability and caps. A file is accepted only if its mimeType matches a declared format; an empty array means the model is text-only.

Built-in Models

| Model | ModelName | Context Window | Max Output | Images | PDFs | Prompt Cache | | ----------------- | ------------------- | -------------- | ---------- | ------ | ---- | ------------ | | GPT-5 | GPT_5 | 272K | 128K | Yes | Yes | No | | GPT-5.4 | GPT_5_4 | 1.05M | 128K | Yes | Yes | No | | GPT-5.5 (BETA) | GPT_5_5 | 1.05M | 128K | Yes | Yes | No | | Claude Sonnet 4.5 | CLAUDE_SONNET_4_5 | 200K | 8192 | Yes | Yes | Yes | | Claude Sonnet 4.6 | CLAUDE_SONNET_4_6 | 200K | 16384 | Yes | Yes | Yes | | Claude Opus 4.5 | CLAUDE_OPUS_4_5 | 200K | 64K | Yes | Yes | Yes | | Claude Opus 4.6 | CLAUDE_OPUS_4_6 | 1M | 128K | Yes | Yes | Yes | | Claude Opus 4.7 | CLAUDE_OPUS_4_7 | 1M | 128K | Yes | Yes | Yes |

GPT-5.5 is gated behind the AIModelBetaEnabled org preference; orgs without that gate enabled will get a gateway error at request time.

Per-format multimodal caps for Claude 4.x on Bedrock: 3.75 MiB/PNG, 3.75 MiB/JPEG, 4.5 MiB/PDF, and a 5-PDF per-request cap. GPT-5 / GPT-5.4 / GPT-5.5 declare the same three formats but no per-format caps — only the global 15 MiB / 10-file caps bind, since no per-file rejection has been observed on GPT-5 below the gateway's raw-payload wall.

`Models` Registry

const Models = {
  getDefault(): Model;              // ClaudeSonnet46
  getByName(name: ModelName): Model;
};

`createClaudeModel` (forward-compat escape hatch)

When the LLM Gateway publishes a new Bedrock-Anthropic Claude variant before this SDK has been updated, consumers can opt into it by name without waiting for a release. The factory builds a Model instance using the shared AnthropicClaudeResponseProcessor (correct for every Bedrock-Claude variant the gateway hosts) with conservative Sonnet/Opus 4.5-baseline caps. Newer flagship variants ship with larger windows — pass overrides to dial caps in for the specific model.

function createClaudeModel(gatewayId: string, overrides?: ClaudeModelOverrides): Model;

type ClaudeModelOverrides = {
  displayId?: string;
  maxInputTokens?: number; // default: 200_000 (Sonnet/Opus 4.5 baseline)
  maxOutputTokens?: number; // default: 64_000 (Sonnet/Opus 4.5 baseline)
  contextWindow?: number; // default: 200_000 (Sonnet/Opus 4.5 baseline)
  supportsPromptCache?: boolean; // default: true
  supportedFormats?: readonly SupportedFileFormat[]; // default: PNG/JPEG/PDF Claude caps
  permittedParameters?: string[]; // e.g. ['temperature', 'top_p']
  customHeaders?: Record<string, string>; // e.g. { 'anthropic-beta': '...' }
};

import { createClaudeModel } from '@salesforce/llm-gateway-sdk';

const model = createClaudeModel('llmgateway__BedrockAnthropicClaude48Opus', {
  displayId: 'Claude Opus 4.8',
  maxInputTokens: 1_000_000,
  contextWindow: 1_000_000,
  maxOutputTokens: 128_000,
});
client.setModel(model);

The factory is intentionally Claude-only. Prefer the Models.getByName(...) registry for any model the SDK already ships, since the registry instance is the canonical source for caps and stays in sync with releases.

Request Types

type ChatRequest = {
  messages: ChatMessageIn[];
  generation_settings: GenerationSettings;
  reasoning_settings?: ReasoningSettings;
  tools?: ChatCompletionFunctionTool[];
  tool_config?: ToolConfig;
};

type ChatMessageIn = {
  role: 'user' | 'assistant' | 'system' | 'tool';
  content: string;
  files?: ChatMessageFile[];
  tool_call_id?: string;
  tool_call_name?: string;
  tool_invocations?: ToolInvocationIn[];
};

type GenerationSettings = {
  max_tokens?: number;
  temperature?: number;
  stop_sequences?: string[];
  frequency_penalty?: number;
  presence_penalty?: number;
  parameters?: object;
};

type ToolConfig = {
  mode: 'auto' | 'none' | 'tool' | 'any';
  allowed_tools?: { type: 'function'; name: string }[];
};

type ChatCompletionFunctionTool = {
  type?: 'function';
  function: {
    name: string;
    description?: string;
    parameters?: {
      type: 'object';
      properties?: Record<string, LlmgPropertyDefinition>;
      required?: string[];
    };
    strict?: boolean;
  };
};

Response Types

type ChatResponse = {
  status: number;
  data: ChatResponseData;
};

type ChatResponseData = {
  generatedText?: string;
  finishReason?: FinishReason;
  error?: LLMGError;
  toolInvocations?: ToolInvocation[];
  usage?: TokenUsage;
};

type ChatStreamResponse = {
  status: number;
  data: AsyncGenerator<ChatStreamChunk>;
};

type ChatStreamChunk = {
  generatedText: string;
  done: boolean;
  finishReason?: FinishReason;
  error?: LLMGError;
  toolInvocations?: ToolInvocation[];
  usage?: TokenUsage;
};

type ToolInvocation = {
  id: string;
  // `arguments` is always a JSON-parseable string. Tool calls with no arguments
  // surface as `arguments: "{}"` (the stringified empty object).
  function: { name: string; arguments: string };
};

type TokenUsage = {
  inputTokens: number;
  outputTokens: number;
  totalTokens: number;
  reasoningTokens?: number;
  cacheReadTokens?: number;
  cacheWriteTokens?: number;
};

type FinishReason = 'stop' | 'length' | 'tool_calls' | 'end_turn' | 'max_tokens' | 'tool_use';

Multimodal Input

Attach images and PDFs to any ChatMessageIn via the optional files array. The same files[] shape works on both chat() and chatStream().

import { createLLMGatewayClient, Models, ModelName } from '@salesforce/llm-gateway-sdk';
import { readFile } from 'node:fs/promises';
import { randomUUID } from 'node:crypto';

const client = createLLMGatewayClient({ jwt });
client.setModel(Models.getByName(ModelName.CLAUDE_SONNET_4_6));

const imageBytes = await readFile('./screenshot.png');

const response = await client.chat({
  messages: [
    {
      role: 'user',
      content: 'Describe what you see.',
      files: [
        {
          fileId: randomUUID(),
          mimeType: MimeType.Png,
          dataType: 'base64',
          data: imageBytes.toString('base64'),
          fileName: 'screenshot.png',
        },
      ],
    },
  ],
  generation_settings: { max_tokens: 200 },
});

`ChatMessageFile`

type ChatMessageFile = {
  fileId: string;
  mimeType: MimeType; // 'image/png' | 'image/jpeg' | 'application/pdf'
  dataType: 'base64';
  data: string;
  fileName?: string;
};

Casing is camelCase by design. Even though the surrounding ChatRequest envelope is snake_case (generation_settings, max_tokens, …), the inner FileData object uses camelCase per the gateway's OpenAPI schema.

Supported MIME types

MimeType.Png (image/png), MimeType.Jpeg (image/jpeg), MimeType.Pdf (application/pdf). A file is accepted only if its mimeType matches one of the configured model's supportedFormats; anything else is rejected with LLMGClientError code MODEL_DOES_NOT_SUPPORT_FORMAT. Import the constant to avoid hard-coding the wire string: import { MimeType } from '@salesforce/llm-gateway-sdk';.

Transport

dataType: 'base64' only in v1. uri and sfDrive are deferred with .

Limits

| Layer | Limit | | ------------------------------- | ----------------------------------------------------------------- | | Files per request (global) | 10 | | Total decoded bytes (global) | 15 MiB | | Per-PNG/JPEG bytes — Claude 4.x | 3.75 MiB (from the png/jpeg supportedFormats[].maxBytesPerFile) | | Per-PDF bytes — Claude 4.x | 4.5 MiB (from the pdf supportedFormats[].maxBytesPerFile) | | PDFs per request — Claude 4.x | 5 (from the pdf supportedFormats[].maxFilesPerRequest) | | GPT-5 / 5.4 / 5.5 caps | Declare png/jpeg/pdf with no per-format caps; global limits only |

The SDK measures size by approximating Math.floor(data.length * 0.75) — accurate to within ≤ 2 bytes of the exact decoded length.

Internal limits are noted here: https://docs.internal.salesforce.com/ai/einstein/gateway/models-and-providers/ Externally documented LLM Limits: https://help.salesforce.com/s/articleView?id=ai.generative_ai_llm_multimodal_support.htm&type=5

Validation errors

| Code | Cause | | ------------------------------- | ------------------------------------------------------------------------------------ | | TOO_MANY_FILES | Combined files[] count across all messages > 10 (global cap). | | FILES_TOO_LARGE | Total decoded bytes across all files > 15 MiB (global cap). | | MODEL_DOES_NOT_SUPPORT_FORMAT | A file's mimeType is not in the configured model's supportedFormats. | | FILE_TOO_LARGE_FOR_FORMAT | A single file exceeds its format's maxBytesPerFile cap for the configured model. | | TOO_MANY_FILES_FOR_FORMAT | The count of files of one format exceeds its maxFilesPerRequest cap for the model. |

Trust Layer. Files sent as binary base64 bypass Salesforce Trust Layer protections (toxicity scoring, data masking). If those protections matter for your use case, extract the content client-side and send it as text.

Reusing validation outside the client

chat() / chatStream() run this validation automatically. Integrations that build requests through a different transport (e.g. an agent harness speaking a provider-native protocol) can run the identical pre-flight check via the exported validateMultimodalFiles(files, model) — it throws the same LLMGClientError codes as the client. files only needs { mimeType, data } (the MultimodalFile type), so callers holding partial file shapes don't need to construct a full ChatMessageFile.

import { Models, ModelName, validateMultimodalFiles } from '@salesforce/llm-gateway-sdk';

const model = Models.getByName(ModelName.CLAUDE_SONNET_4_6);
validateMultimodalFiles([{ mimeType: 'image/png', data: base64Png }], model); // throws LLMGClientError on a cap/format violation

Tracing

Subscribe to per-request tracing via onTrace(), or use the convenience startTracing()/stopTracing() methods to write trace events to a Markdown file.

// Callback-based tracing
const offTrace = client.onTrace((event) => {
  if (event.type === TraceType.Response) {
    console.log(`[${event.status}] ${event.totalDurationMs}ms`);
  }
});
offTrace(); // unsubscribe

// File-based tracing
client.startTracing('./trace.md');
// ... make requests ...
client.stopTracing();

type TraceEvent = TraceRequestEvent | TraceResponseEvent;

type TraceRequestEvent = {
  type: TraceType.Request;
  timestamp: Date;
  url: string;
  method: string;
  model: string;
  body: Record<string, unknown>;
};

type TraceResponseEvent = {
  type: TraceType.Response;
  timestamp: Date;
  model: string;
  status: number;
  error?: Error;
  timeToFirstTokenMs?: number;
  usage?: TokenUsage;
  totalDurationMs?: number;
  responseText?: string;
  xClientTraceId?: string;
};

enum TraceType {
  Request = 'request',
  Response = 'response',
  Info = 'info',
}

`TraceFileWriter`

Lower-level class for manual trace file management:

const writer = new TraceFileWriter(client, { filePath: './trace.md' });
// ... make requests ...
writer.detach();

Telemetry & Structured Logs

LLMGatewayClient exposes two more typed event streams. Consumers subscribe via onTelemetry() and onLog(), both of which return an Unsubscribe closure.

type TelemetryEvent =
  | RequestStartedEvent
  | RequestCompletedEvent
  | RequestFailedEvent
  | RequestRetryEvent
  | JwtRefreshedEvent
  | JwtRefreshFailedEvent;

type RequestStartedEvent = {
  type: 'request-started';
  timestamp: Date;
  url: string;
  method: string;
  model: string;
};

type RequestCompletedEvent = {
  type: 'request-completed';
  timestamp: Date;
  model: string;
  status: number;
  durationMs: number;
  timeToFirstTokenMs?: number;
  usage?: TokenUsage;
};

type RequestFailedEvent = {
  type: 'request-failed';
  timestamp: Date;
  model: string;
  status?: number;
  durationMs: number;
  error: Error;
};

type RequestRetryEvent = {
  type: 'request-retry';
  timestamp: Date;
  model: string;
  attempt: number; // 1-indexed: attempt 1 is emitted before the first retry
  delayMs: number; // jittered or Retry-After-driven wait
  status?: number; // set when the retry was triggered by an HTTP response (429/5xx)
  error?: Error; // set when the retry was triggered by a transport-layer failure
};

type JwtRefreshedEvent = {
  type: 'jwt-refreshed';
  timestamp: Date;
  durationMs: number;
};

type JwtRefreshFailedEvent = {
  type: 'jwt-refresh-failed';
  timestamp: Date;
  durationMs: number;
  error: Error;
};

type LogRecord = {
  level: 'debug' | 'info' | 'warn' | 'error';
  message: string;
  timestamp: Date;
  context?: Record<string, unknown>;
  error?: Error;
};

const offTelemetry = client.onTelemetry((event) => {
  if (event.type === 'request-completed') {
    histogram('llm_request_ms', event.durationMs, { model: event.model });
  }
});

const offLog = client.onLog((record) => {
  logger[record.level](record.context, record.message);
});

Log messages emitted by the SDK

Each LogRecord carries a context object; the model key is populated on every record whenever a model is set.

| Level | Message | Notable context keys | | ----- | ------------------------------------ | ---------------------------------------------------- | | debug | Model set to {name} | model | | debug | Refreshing expired JWT | — | | info | JWT refreshed | durationMs | | warn | Rate limit hit | model, reset, remaining, limit | | warn | Unexpected status {code} | model, status | | warn | SSE parse error | model, chunk, reason | | warn | Retrying transient request failure | model, attempt, delayMs, status, errorCode | | warn | Retries exhausted | model, status |

Disposal

dispose() releases event resources: it detaches any active trace writer and clears the trace, telemetry, and log buses. After disposal every consumer-facing method throws LLMGClientError. dispose() itself is idempotent.

In-flight streams returned by a prior chatStream() call are not aborted by dispose(); the async generator keeps yielding until it naturally completes. Use the caller-supplied AbortSignal if you need to cancel an active stream.

`SfApiEnv`

enum SfApiEnv {
  Dev = 'dev',
  Perf = 'perf',
  Prod = 'prod',
  Stage = 'stage',
  Test = 'test',
}

Retries

By default the client retries transient failures with exponential backoff and full jitter. Retries cover only connection establishment and the initial HTTP response — once the response body starts streaming, errors propagate to the caller.

The SDK retries on:

HTTP 429, 500, 502, 503, 504
Socket / network errors: ECONNRESET, ECONNREFUSED, ENOTFOUND, ENETDOWN, ENETUNREACH, EHOSTDOWN, ETIMEDOUT, UND_ERR_SOCKET, UND_ERR_CONNECT_TIMEOUT, UND_ERR_HEADERS_TIMEOUT, UND_ERR_BODY_TIMEOUT
HTTP/2 GOAWAY (surfaces as UND_ERR_SOCKET)

Retry-After and Salesforce LLMG's x-ratelimit-reset headers are honored when present, used verbatim without jitter or further clamping (the server already specified the wait). The hint is bounded only by maxRetryAfterMs — if the server asks for longer, the SDK does not retry and instead surfaces the response so the caller sees the structured error (rather than the SDK retrying too soon and cascading more 429s). Computed exponential-backoff retries are bounded separately by maxDelayMs; that cap does not apply to server-driven hints.

maxTotalElapsedMs provides an optional wall-clock ceiling on a single call (default: no deadline). When the next sleep would exceed it, the loop stops retrying and surfaces the most recent error / response, even if maxAttempts hasn't been reached.

JWT-bearing headers are recomputed on every attempt, so a long Retry-After wait never replays a stale token.

Retries do not fire on:

4xx that aren't 429 (400, 401, 403, 404, 422, …)
Caller-initiated AbortSignal cancellation
Errors after the response body has begun streaming
Server hints that exceed maxRetryAfterMs — the response is surfaced instead

type RetryOptions = {
  /** Total attempts including the first. Default: 3. Must be >= 1. To disable retries, pass `retry: false`. */
  maxAttempts?: number;
  /** Initial delay before the first retry, in ms. Default: 100. */
  initialDelayMs?: number;
  /** Maximum delay between *computed* exponential-backoff retries, in ms. Default: 2000. */
  maxDelayMs?: number;
  /** Maximum delay accepted from a server-driven hint (`Retry-After` / `x-ratelimit-reset`), in ms. Default: 60000. */
  maxRetryAfterMs?: number;
  /** Exponential backoff multiplier. Default: 2. */
  backoffFactor?: number;
  /**
   * Hard ceiling on cumulative wall-clock time a single call may spend across all attempts and
   * inter-attempt sleeps, in ms. When the next sleep would push past this deadline the loop bails
   * and surfaces the most recent error / response. Default: `Infinity` (no deadline).
   */
  maxTotalElapsedMs?: number;
};

// Tighten the policy
const client = createLLMGatewayClient({ jwt, retry: { maxAttempts: 5, initialDelayMs: 50 } });

// Disable retries entirely
const noRetry = createLLMGatewayClient({ jwt, retry: false });

// Observe each retry attempt
client.onTelemetry((event) => {
  if (event.type === 'request-retry') {
    console.warn(`retry #${event.attempt} after ${event.delayMs}ms`, event.status, event.error.message);
  }
});

When retries are exhausted, the underlying response is surfaced as a structured LLMGClientError — the messageCode, cause (parsed LLMGError), and formatted rate-limit message are preserved exactly as on the non-retry path.

Error Handling

The client throws LLMGClientError for API errors (4xx/5xx). Rate-limit responses (429) include formatted reset time information.

import { createLLMGatewayClient, LLMGClientError, LLMGClientErrorCode } from '@salesforce/llm-gateway-sdk';

try {
  await client.chat(request);
} catch (err) {
  if (err instanceof LLMGClientError) {
    if (err.code === LLMGClientErrorCode.TooManyFiles) {
      // narrow on a stable client-raised code …
    }
    console.error(`LLMG error [${err.code}]: ${err.message}`);
  }
}

LLMGClientErrorCode enumerates the stable codes the client raises itself (lifecycle + pre-flight multimodal validation). Gateway-side errors (4xx/5xx) carry the upstream errorData.messageCode verbatim instead — those values are an opaque string and are not part of this enum.

Function Calling Example

const response = await client.chat({
  messages: [{ role: 'user', content: 'Get the weather for San Francisco' }],
  generation_settings: { max_tokens: 200 },
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get weather for a location',
        parameters: {
          type: 'object',
          properties: { location: { type: 'string' } },
          required: ['location'],
        },
      },
    },
  ],
  tool_config: { mode: 'auto' },
});

if (response.data.toolInvocations) {
  for (const invocation of response.data.toolInvocations) {
    console.log(`Tool: ${invocation.function.name}, Args: ${invocation.function.arguments}`);
  }
}

Advanced: `createJWT`

Creates an auto-refreshing JWT from raw credentials. Establishes a connection, mints an initial token (fail-fast on invalid credentials), and returns a self-refreshing JSONWebToken:

import { createJWT } from '@salesforce/llm-gateway-sdk';

const jwt = await createJWT({
  accessToken: 'your-access-token',
  instanceUrl: 'https://your-instance.salesforce.com',
  featureId: 'MyFeature',
});

| Option | Type | Default | Description | | -------------- | -------- | --------------------------------------------- | -------------------------------------- | | accessToken | string | — | Salesforce access token. | | instanceUrl | string | — | Salesforce instance URL. | | mintingPath? | string | '/ide/auth' | Token minting endpoint path. | | featureId? | string | LLMG_FEATURE_ID env var or 'VibesService' | Feature identifier sent with requests. |

Advanced: `createJWTFromConnection`

Creates an auto-refreshing JWT from an existing OrgConnection. Use this when you already have a validated connection and want to avoid the extra network call that createJWT performs to establish one:

import { createJWTFromConnection } from '@salesforce/llm-gateway-sdk';
import type { OrgConnection } from '@salesforce/agentic-common';

const jwt = await createJWTFromConnection(orgConnection, { featureId: 'MyFeature' });

| Option | Type | Default | Description | | --------------- | --------------- | --------------------------------------------- | ---------------------------------------- | | orgConnection | OrgConnection | — | An already-authenticated org connection. | | mintingPath? | string | '/ide/auth' | Token minting endpoint path. | | featureId? | string | LLMG_FEATURE_ID env var or 'VibesService' | Feature identifier sent with requests. |

Advanced: `DefaultLLMGatewayClientFactory`

For dependency-injection scenarios where a factory interface is needed:

interface LLMGatewayClientFactory {
  create(jwt: JSONWebToken, options?: { env?: SfApiEnv }): LLMGatewayClient;
}

Development

See DEVELOPING.md for build-from-source setup, scripts, E2E testing, and packaging.

See ARCHITECTURE.md for implementation details on the request pipeline, SSE parsing, response processors, and JWT lifecycle.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@salesforce/llm-gateway-sdk

Quick Start

API Reference

createLLMGatewayClient(options): LLMGatewayClient

LLMGatewayClient (interface)

ChatOptions

Models

Built-in Models

Models Registry

createClaudeModel (forward-compat escape hatch)

Request Types

Response Types

Multimodal Input

ChatMessageFile

Supported MIME types

Transport

Limits

Validation errors

Reusing validation outside the client

Tracing

TraceFileWriter

Telemetry & Structured Logs

Log messages emitted by the SDK

Disposal

SfApiEnv

Retries

Error Handling

Function Calling Example

Advanced: createJWT

Advanced: createJWTFromConnection

Advanced: DefaultLLMGatewayClientFactory

Development

`createLLMGatewayClient(options): LLMGatewayClient`

`LLMGatewayClient` (interface)

`ChatOptions`

`Models` Registry

`createClaudeModel` (forward-compat escape hatch)

`ChatMessageFile`

`TraceFileWriter`

`SfApiEnv`

Advanced: `createJWT`

Advanced: `createJWTFromConnection`

Advanced: `DefaultLLMGatewayClientFactory`