@inferencesh/sdk
v0.6.10
Published
Official JavaScript/TypeScript SDK for inference.sh - Run AI models with a simple API
Downloads
1,964
Maintainers
Readme
@inferencesh/sdk — ai inference api for javascript & typescript
official javascript/typescript sdk for inference.sh — the ai agent runtime for serverless ai inference.
run ai models, build ai agents, and deploy generative ai applications with a simple api. access 250+ models including flux, stable diffusion, llms (claude, gpt, gemini), video generation (veo, seedance), and more.
Installation
npm install @inferencesh/sdk
# or
yarn add @inferencesh/sdk
# or
pnpm add @inferencesh/sdkGetting an API Key
Get your API key from the inference.sh dashboard.
Quick Start
import { inference, TaskStatusCompleted } from '@inferencesh/sdk';
const client = inference({ apiKey: 'your-api-key' });
// Run a task and wait for the result
const result = await client.tasks.run({
app: 'your-app',
input: {
prompt: 'Hello, world!'
}
});
if (result.status === TaskStatusCompleted) {
console.log(result.output);
}Usage
Basic Usage
import { inference, TaskStatusCompleted } from '@inferencesh/sdk';
const client = inference({ apiKey: 'your-api-key' });
// Wait for result (default behavior)
const result = await client.tasks.run({
app: 'my-app',
input: { prompt: 'Generate something amazing' }
});
if (result.status === TaskStatusCompleted) {
console.log('Output:', result.output);
}With Setup Parameters
Setup parameters configure the app instance (e.g., model selection). Workers with matching setup are "warm" and skip setup:
const result = await client.tasks.run({
app: 'my-app',
setup: { model: 'schnell' }, // Setup parameters
input: { prompt: 'hello' }
});Fire and Forget
// Get task info immediately without waiting
const task = await client.tasks.run(
{ app: 'my-app', input: { prompt: 'hello' } },
{ wait: false }
);
console.log('Task ID:', task.id);
console.log('Status:', task.status);Real-time Status Updates
By default, the client streams task progress over NDJSON (/tasks/{id}/stream) and invokes onUpdate as the task changes. Use onPartialUpdate when you only need specific fields from a partial stream payload:
const result = await client.run(
{ app: 'my-app', input: { prompt: 'hello' } },
{
onUpdate: (update) => {
console.log('Status:', update.status);
console.log('Progress:', update.logs);
},
onPartialUpdate: (update, fields) => {
console.log('Changed fields:', fields, update.status);
},
}
);Streaming vs Polling
SSE/NDJSON streaming is the default. For edge runtimes that cannot keep long-lived connections open (Convex actions, Cloudflare Workers, etc.), disable streaming and use lightweight status polling instead:
const client = inference({
apiKey: 'your-api-key',
stream: false, // poll /tasks/{id}/status instead of streaming
pollIntervalMs: 2000, // default: 2000
});
// Per-call override
const result = await client.run(
{ app: 'my-app', input: { prompt: 'hello' } },
{ stream: false, onUpdate: (u) => console.log(u.status) }
);In polling mode, the SDK checks /tasks/{id}/status and fetches the full task when the status changes. If that fetch fails after a status transition, run() rejects with the underlying error.
Batch Processing
async function processImages(images: string[]) {
const results = [];
for (const image of images) {
const result = await client.tasks.run({
app: 'image-processor',
input: { image }
}, {
onUpdate: (update) => console.log(`Processing: ${update.status}`)
});
results.push(result);
}
return results;
}File Upload
// Upload from base64
const file = await client.files.upload('data:image/png;base64,...', {
filename: 'image.png',
contentType: 'image/png'
});
// Use the uploaded file in a task
const result = await client.tasks.run({
app: 'image-app',
input: { image: file.uri }
});Cancel a Task
const task = await client.tasks.run(
{ app: 'long-running-app', input: {} },
{ wait: false }
);
// Cancel if needed
await client.tasks.cancel(task.id);Sessions (Stateful Execution)
Sessions allow you to maintain state across multiple task invocations. The worker stays warm between calls, preserving loaded models and in-memory state.
// Start a new session
const result = await client.tasks.run({
app: 'my-stateful-app',
input: { prompt: 'hello' },
session: 'new'
});
const sessionId = result.session_id;
console.log('Session ID:', sessionId);
// Continue the session with another call
const result2 = await client.tasks.run({
app: 'my-stateful-app',
input: { prompt: 'remember what I said?' },
session: sessionId
});Custom Session Timeout
By default, sessions expire after 60 seconds of inactivity. You can customize this with session_timeout (1-3600 seconds):
// Create a session with 5-minute idle timeout
const result = await client.tasks.run({
app: 'my-stateful-app',
input: { prompt: 'hello' },
session: 'new',
session_timeout: 300 // 5 minutes
});
// Session stays alive for 5 minutes after each callNotes:
session_timeoutis only valid whensession: 'new'- Minimum timeout: 1 second
- Maximum timeout: 3600 seconds (1 hour)
- Each successful call resets the idle timer
Session management API
Manage sessions directly without running a task:
// Inspect a session
const info = await client.sessions.get(sessionId);
console.log(info.status, info.expires_at, info.call_count);
// List active sessions
const sessions = await client.sessions.list();
// Extend idle timeout without a task call (sliding window)
await client.sessions.keepalive(sessionId);
// Release the worker immediately
await client.sessions.end(sessionId);Session errors
import {
SessionNotFoundError,
SessionExpiredError,
SessionEndedError,
} from '@inferencesh/sdk';
try {
await client.tasks.run({
app: 'my-stateful-app',
input: { prompt: 'hello' },
session: sessionId,
});
} catch (error) {
if (
error instanceof SessionNotFoundError ||
error instanceof SessionExpiredError ||
error instanceof SessionEndedError
) {
// Start a new session and retry
const result = await client.tasks.run({
app: 'my-stateful-app',
input: { prompt: 'hello' },
session: 'new',
});
} else {
throw error;
}
}For complete session documentation including error handling, best practices, and advanced patterns, see the Sessions Developer Guide.
Agent Chat
Chat with AI agents using client.agents.create().
Using a Template Agent
Use an existing agent from your workspace by its namespace/name@shortid:
import { inference } from '@inferencesh/sdk';
const client = inference({ apiKey: 'your-api-key' });
// Create agent from template
const agent = client.agents.create('my-org/assistant@abc123');
// Send a message with streaming
await agent.sendMessage('Hello!', {
onMessage: (msg) => {
if (msg.content) {
for (const c of msg.content) {
if (c.type === 'text' && c.text) {
process.stdout.write(c.text);
}
}
}
}
});
// Clean up
agent.disconnect();Creating an Ad-Hoc Agent
Create agents on-the-fly without saving to your workspace:
import { inference, tool, string } from '@inferencesh/sdk';
const client = inference({ apiKey: 'your-api-key' });
// Create ad-hoc agent (config uses API field names: core_app, system_prompt)
const weatherTool = tool('get_weather')
.describe('Get current weather')
.param('city', string('City name'))
.build();
const agent = client.agents.create({
core_app: { ref: 'infsh/claude-sonnet-4@abc123' },
system_prompt: 'You are a helpful assistant.',
tools: [weatherTool], // only schemas are sent to the API; handlers stay client-side
});
await agent.sendMessage('What is the weather in Paris?', {
onMessage: (msg) => console.log(msg),
onToolCall: async (call) => {
const result = await runMyClientTool(call.name, call.args);
await agent.submitToolResult(call.id, result);
},
});For multi-turn chats, the SDK opens the chat stream before sending the next message so updates are not missed. Use stopChat() to cancel in-flight generation (POST /chats/{id}/stop), and reset() to clear the current chat and start fresh.
Tool builder
Use the fluent builders to define AgentTool schemas. Client tools (tool) run in your app via onToolCall; server-side tools run on inference.sh.
| Builder | Runs on | Description |
|---------|---------|-------------|
| tool(name) | Client | Local handler; only the schema is sent to the API |
| appTool(name, appRef) | Server | Invoke another inference app |
| agentTool(name, agentRef) | Server | Delegate to a sub-agent |
| httpTool(name, url) / callTool(name, url) | Server | HTTP request with credential injection (preferred over webhookTool) |
| webhookTool(name, url) | Server | Unsigned webhook (legacy; use httpTool for new tools) |
| mcpTool(name, integrationId, toolName) | Server | Call a tool on a connected MCP integration |
| internalTools() | Server | Built-in plan, memory, and widget tools |
import {
inference,
tool,
appTool,
httpTool,
mcpTool,
internalTools,
string,
IntegrationProviderGoogle,
} from '@inferencesh/sdk';
const clientTool = tool('get_weather')
.describe('Get current weather')
.param('city', string('City name'))
.build();
// HTTP tool with OAuth integration credentials (injected server-side)
const gmailSend = httpTool('gmail_send', 'https://gmail.googleapis.com/gmail/v1/users/me/messages/send')
.describe('Send an email via Gmail')
.method('POST')
.auth({ integration: IntegrationProviderGoogle, integrationId: 'your-integration-id' })
.build();
// API key or bearer auth
const fetchData = httpTool('fetch', 'https://api.example.com/data')
.method('GET')
.auth({ apiKey: 'YOUR_KEY', header: 'X-API-Key' }) // default header: X-API-Key
.header('Accept', 'application/json')
.build();
const bearerFetch = httpTool('bearer_fetch', 'https://api.example.com')
.auth({ bearer: 'YOUR_TOKEN' })
.build();
const imageGen = appTool('generate_image', 'infsh/flux-schnell@abc123')
.param('prompt', string('Image description'))
.requireApproval()
.build();
const mcpSearch = mcpTool('notion_search', 'your-mcp-integration-id', 'search')
.describe('Search Notion pages')
.param('query', string('Search query'))
.build();
const agent = client.agents.create({
core_app: { ref: 'infsh/claude-sonnet-4@latest' },
system_prompt: 'You are helpful.',
tools: [clientTool, gmailSend, imageGen, mcpSearch],
internal_tools: internalTools().memory().build(),
});callTool is an alias for httpTool. Run npx tsx examples/tool-builder.ts for more schema examples (no API key required).
File attachments
Pass files in sendMessage options. Blob values are uploaded first; objects with a uri (already uploaded via client.files.upload) are attached as-is:
const uploaded = await client.files.upload(imageBlob, {
filename: 'photo.png',
contentType: 'image/png',
});
await agent.sendMessage('Describe this image', {
files: [imageBlob, uploaded], // Blob uploads; FileDTO reuses uri
onMessage: (msg) => console.log(msg),
});Structured Output
Use output_schema to get structured JSON responses:
const agent = client.agents.create({
core_app: { ref: 'infsh/claude-sonnet-4@latest' },
output_schema: {
type: 'object',
properties: {
summary: { type: 'string' },
sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
confidence: { type: 'number' },
},
required: ['summary', 'sentiment', 'confidence'],
},
internal_tools: { finish: true },
});
const output = await agent.run('Analyze: Great product!');agent.run() sends a message with polling (no SSE), waits until the chat is idle, and returns chat.output (parsed finish-tool result, or null if none).
Agent Methods
| Method | Description |
|--------|-------------|
| sendMessage(text, options?) | Send a message; streams or polls until idle when callbacks or stream: false |
| run(text, options?) | Send and return structured chat.output (always uses polling) |
| getChat(chatId?) | Get the current or specified chat (chat_messages on the returned chat) |
| stopChat() | Stop generation for the current chat (no-op if no active chat) |
| submitToolResult(toolId, resultOrAction) | Submit result for a client tool (string or {action, form_data}) |
| startStreaming(options?) | Manually attach to /chats/{id}/stream for the current chat |
| disconnect() | Stop active stream/poll connections |
| reset() | Disconnect and clear chat state so the next message starts a new chat |
API Reference
inference(config)
Creates a new inference client.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| config.apiKey | string | Yes | Your inference.sh API key |
| config.baseUrl | string | No | Custom API URL (default: https://api.inference.sh) |
| config.stream | boolean | No | Use NDJSON streaming (true, default) or status polling (false) |
| config.pollIntervalMs | number | No | Poll interval when stream: false (default: 2000) |
| config.proxyUrl | string | No | Proxy base URL for frontend apps (keeps API keys server-side) |
client.tasks.run(params, options?)
Runs a task on inference.sh.
Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| params.app | string | Yes | App identifier (e.g., 'username/app-name') |
| params.input | object | Yes | Input parameters for the app |
| params.setup | object | No | Setup parameters (affects worker warmth/scheduling) |
| params.infra | string | No | Infrastructure: 'cloud' or 'private' |
| params.variant | string | No | App variant to use |
| params.session | string | No | Session ID or 'new' to start a new session |
| params.session_timeout | number | No | Session timeout in seconds (1-3600, only with session: 'new') |
Options:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| wait | boolean | true | Wait for task completion |
| stream | boolean | client default | Use NDJSON streaming or status polling |
| pollIntervalMs | number | client default | Poll interval when stream: false |
| onUpdate | function | - | Callback for task updates (full fetch on status change when polling) |
| onPartialUpdate | function | - | Callback for partial NDJSON stream updates (task, fields) |
| maxReconnects | number | 5 | Max poll retries when stream: false |
client.tasks.get(taskId)
Gets a task by ID.
client.tasks.cancel(taskId)
Cancels a running task.
client.files.upload(data, options?)
Uploads a file to inference.sh.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| data | string \| Blob | Base64 string, data URI, or Blob |
| options.filename | string | Filename |
| options.contentType | string | MIME type |
| options.public | boolean | Make file publicly accessible |
client.agents.create(templateOrConfig) / client.agent(...)
Creates an agent instance from a template or ad-hoc configuration. client.agent(...) is an alias for client.agents.create(...).
sendMessage options: onMessage, onChat, onToolCall, files, stream, pollIntervalMs. Client tools with status awaiting_input are dispatched once per invocation ID via onToolCall.
Template mode:
const agent = client.agents.create('namespace/name@version');Ad-hoc mode:
const agent = client.agents.create({
core_app: { ref: 'infsh/claude-sonnet-4@abc123' },
system_prompt: 'You are helpful.',
tools: [...],
});client.sessions
| Method | HTTP | Description |
|--------|------|-------------|
| get(sessionId) | GET /sessions/{id} | Session metadata (status, expires_at, call_count, …) |
| list() | GET /sessions | All sessions (empty array if none) |
| keepalive(sessionId) | POST /sessions/{id}/keepalive | Reset idle expiration |
| end(sessionId) | DELETE /sessions/{id} | End session and release worker |
Task Status Constants
import {
TaskStatusQueued,
TaskStatusRunning,
TaskStatusCompleted,
TaskStatusFailed,
TaskStatusCancelled
} from '@inferencesh/sdk';
if (task.status === TaskStatusCompleted) {
console.log('Done!');
}Integration Constants
IntegrationDTO fields (provider, type, auth, status) use typed string unions exported as constants:
import type { IntegrationDTO } from '@inferencesh/sdk';
import {
IntegrationProviderGoogle,
IntegrationAuthTypeOAuth,
IntegrationStatusConnected,
IntegrationStatusDisconnected,
IntegrationStatusExpired,
IntegrationStatusError,
isRequirementsNotMetException,
} from '@inferencesh/sdk';
function isGoogleConnected(integration: IntegrationDTO): boolean {
return (
integration.provider === IntegrationProviderGoogle &&
integration.status === IntegrationStatusConnected
);
}
// HTTP 412 when an app requires a missing secret, integration, or scope
try {
await client.run({ app: 'my-app', input: {} });
} catch (error) {
if (isRequirementsNotMetException(error)) {
for (const req of error.errors) {
if (req.type === 'integration' && req.action?.provider === IntegrationProviderGoogle) {
// User must connect Google — see https://inference.sh/docs/extend/integrations
}
}
}
}| Constant group | Values |
|----------------|--------|
| IntegrationProvider* | google, slack, notion, github, x, microsoft, salesforce, discord, gcp, mcp, reddit |
| IntegrationAuthType* | service_account, oauth, api_key, wif, mcp |
| IntegrationStatus* | connected, disconnected, expired, error |
Instance Status Constants
Use when working with engine instance APIs (InstanceDTO.status):
import {
InstanceStatusCreating,
InstanceStatusPendingProvider,
InstanceStatusPending,
InstanceStatusActive,
InstanceStatusError,
InstanceStatusDeleting,
InstanceStatusDeleted,
} from '@inferencesh/sdk';Tool Parameter Types
When building AgentTool schemas manually (outside the tool builder), use ToolParamType* for JSON Schema type fields:
import {
ToolParamTypeObject,
ToolParamTypeString,
ToolParamTypeInteger,
ToolParamTypeNumber,
ToolParamTypeBoolean,
ToolParamTypeArray,
ToolParamTypeNull,
} from '@inferencesh/sdk';
const schema = {
type: ToolParamTypeObject,
properties: {
city: { type: ToolParamTypeString, description: 'City name' },
},
required: ['city'],
};The fluent tool builder (string(), number(), object(), …) infers these types automatically.
TypeScript Support
This SDK is written in TypeScript and includes full type definitions. All types are exported:
import type {
Task,
ApiAppRunRequest,
RunOptions,
IntegrationDTO,
AgentTool,
} from '@inferencesh/sdk';Requirements
- Node.js 18.0.0 or higher
- Modern browsers with
fetchsupport
resources
- documentation — getting started guides and api reference
- blog — tutorials on ai agents, image generation, and more
- app store — browse 250+ ai models
- discord — community support
- github — open source projects
license
MIT © inference.sh
