responses-to-completions
v0.2.1
Published
SDK that exposes the OpenAI Responses API on top of any /v1/chat/completions backend (vLLM, Ollama, llama.cpp, TGI, etc.). Handles conversation state, MCP tool execution, and streaming event translation.
Maintainers
Readme
responses-to-completions
An SDK that exposes the OpenAI Responses API on top of any
OpenAI-compatible /v1/chat/completions backend (vLLM, Ollama, llama.cpp,
TGI, LiteLLM, Together, Groq, OpenRouter, …).
Construct a ResponsesClient, give it a backend, and call
client.responses.create(...) — the client handles conversation state, MCP
tool execution, and streaming-event translation internally.
Install
npm i responses-to-completionsQuick start
import {
ResponsesClient,
OpenAICompatAdapter,
LocalFileStore,
} from "responses-to-completions";
const client = new ResponsesClient({
backend: new OpenAICompatAdapter({
baseUrl: "https://api.openai.com/v1",
apiKey: process.env.OPENAI_API_KEY,
}),
// optional — required for conversations and responses.{get,del}
store: new LocalFileStore("./.data"),
});
const resp = await client.responses.create({
model: "gpt-4o-mini",
input: "Write a haiku about TypeScript.",
});
console.log(resp.output_text);Streaming
responses.create({ stream: true }) returns a StreamResponse — an
async-iterable of StreamEvents plus a finalResponse() promise.
const stream = await client.responses.create({
model: "gpt-4o-mini",
input: "Tell me a story.",
stream: true,
});
for await (const ev of stream) {
if (ev.type === "response.output_text.delta") {
process.stdout.write(ev.delta);
}
}
const finalResp = await stream.finalResponse();
console.log("\nfinal id:", finalResp.id);Backends
The constructor takes any BackendAdapter. Three are built in.
OpenAICompatAdapter
Works with any server implementing POST /v1/chat/completions — OpenAI,
vLLM, llama.cpp server, TGI, LiteLLM, Together, Groq, and Ollama's own
OpenAI-compat endpoint on port 11434.
new OpenAICompatAdapter({
baseUrl: "http://localhost:8000/v1",
apiKey: "optional-bearer",
forceModel: "qwen2.5-coder:32b", // optional override
});OllamaAdapter
Native Ollama /api/chat. Normalizes NDJSON streaming, object-shaped tool
arguments, and the missing developer role.
new OllamaAdapter({ host: "http://localhost:11434" });OpenRouterAdapter
OpenRouter with provider-routing preferences.
new OpenRouterAdapter({
apiKey: process.env.OPENROUTER_API_KEY!,
provider: { order: ["Anthropic", "Google"], allow_fallbacks: true },
});Custom backends
Implement the BackendAdapter interface (complete + stream).
Embeddings
All three built-in adapters expose an optional embeddings() method that
returns the OpenAI-shaped EmbeddingsResponse regardless of provider.
Call it directly on the adapter — embeddings are independent of the
ResponsesClient surface.
import {
OpenAICompatAdapter,
OllamaAdapter,
OpenRouterAdapter,
} from "responses-to-completions";
// OpenAI / vLLM / any /v1/embeddings-compatible server
const openai = new OpenAICompatAdapter({
baseUrl: "https://api.openai.com/v1",
apiKey: process.env.OPENAI_API_KEY,
});
const r1 = await openai.embeddings({
model: "text-embedding-3-small",
input: "hello world",
});
// Ollama — POST /api/embed, translated to the OpenAI shape
const ollama = new OllamaAdapter({ host: "http://localhost:11434" });
const r2 = await ollama.embeddings({
model: "nomic-embed-text",
input: ["a", "b", "c"],
});
// OpenRouter — routes to whichever provider exposes the model's embeddings
const openrouter = new OpenRouterAdapter({
apiKey: process.env.OPENROUTER_API_KEY!,
});
const r3 = await openrouter.embeddings({
model: "openai/text-embedding-3-large",
input: "hi",
});
// All three return the same shape:
// { object: "list", data: [{ object: "embedding", index, embedding }], model, usage }Request shape
interface EmbeddingsRequest {
model: string; // honors `forceModel` on the adapter
input: string | string[]; // batch by passing an array
encoding_format?: "float" | "base64"; // default "float"; Ollama ignores
dimensions?: number; // text-embedding-3-* only; others ignore
user?: string;
}Adapter-specific notes
OpenAICompatAdapter— pass-through toPOST {baseUrl}/embeddings. Reuses the same auth/headers/forceModelascomplete().OpenRouterAdapter— same path, plus OpenRouter'sproviderrouting preferences are forwarded. Note that OpenRouter's embeddings coverage is narrower than its chat coverage; only models exposing an embeddings endpoint will work.OllamaAdapter— calls nativePOST /api/embedand translates the response to the OpenAI shape.encoding_format,dimensions, anduserare silently dropped (Ollama doesn't honor them).
Non-2xx responses throw BackendError (same as complete()/respond()).
Stores
Persistence is optional. Without a store the client still runs requests
but conversations.* and responses.{get,del} throw, and
responses.create skips writing.
LocalFileStore — for local development / tests
new LocalFileStore("./.data");Writes one JSON file per artifact:
<root>/conversations/<id>.json
<root>/responses/<id>.jsonS3Store — for production
new S3Store({
bucket: "my-bucket",
prefix: "prod/responses",
clientConfig: { region: "us-east-1" },
});Custom stores
Implement the Store interface — see src/store/store.ts.
Conversations
Conversation state is owned by the store. The client offers a thin facade:
const conv = await client.conversations.create({ metadata: { user: "u_42" } });
await client.responses.create({
model: "gpt-4o-mini",
conversation: conv.id,
input: "Remember the secret word 'banana'.",
});
await client.responses.create({
model: "gpt-4o-mini",
conversation: conv.id,
input: "What was the secret word?",
});
const { data } = await client.conversations.items.list(conv.id);Full surface:
| Method | Notes |
| --- | --- |
| conversations.create({ id?, items?, metadata? }) | id is optional; client-generated when omitted |
| conversations.get(id) | Returns metadata only (items via items.list) |
| conversations.update(id, { metadata }) | |
| conversations.del(id) | |
| conversations.items.list(id, { limit?, after?, order? }) | Paginated |
| conversations.items.append(id, items) | Returns the full item list after append |
| conversations.items.get(id, itemId) | |
| conversations.items.del(id, itemId) | |
Items are stored canonically as typed Responses-API items (message,
function_call, function_call_output, mcp_list_tools, mcp_call,
mcp_approval_request, reasoning) and rehydrated into a chat-completions
messages[] view before each upstream call.
You can also continue a conversation by previous_response_id, same as
OpenAI.
MCP tools
Remote MCP servers are supported via the standard Responses-API tools
entry (type: "mcp"). The client connects to each MCP server at request
time, lists its tools, exposes them to the backend model as function tools,
and executes any tool calls server-side.
await client.responses.create({
model: "gpt-4o-mini",
tools: [
{
type: "mcp",
server_label: "docs",
server_url: "https://docs.example.com/mcp",
authorization: process.env.DOCS_MCP_TOKEN,
allowed_tools: ["search_docs", "read_page"],
require_approval: "never",
},
],
input: "Find the section on rate limits.",
});The response surface includes:
mcp_list_toolsitems — the tools discovered on each server.mcp_callitems — each executed tool call with its output.mcp_approval_requestitems — whenrequire_approvaldemands one.
To approve a paused call, send a follow-up responses.create with an
mcp_approval_response input item and previous_response_id.
require_approval
Supports the full OpenAI shape:
"never"— execute all tools immediately."always"— emit anmcp_approval_requestfor every call.{ always: { tool_names: [...] }, never: { tool_names: [...] } }— per-tool.
Not yet supported
- OpenAI connectors (
connector_id) — only rawserver_urlMCP servers. - Built-in tools
web_search,file_search,code_interpreter,computer_use,image_generation. Requests including them will fail at the backend since we forward them as-is.
Configuration reference
All configuration is via constructor options — the library reads no environment variables.
new ResponsesClient(options):
| Option | Default | Description |
| --- | --- | --- |
| backend | — (required) | A BackendAdapter instance |
| store | undefined | A Store instance — required for conversations and responses.{get,del} |
| maxIterations | 10 | Hard cap on backend round-trips per request |
Advanced usage
You can bypass the client and drive AgentLoop directly:
import { AgentLoop, OpenAICompatAdapter } from "responses-to-completions";
const agent = new AgentLoop({
backend: new OpenAICompatAdapter({ baseUrl: "..." }),
});
const result = await agent.run({
request: { model: "gpt-4o-mini", input: "hi" },
history: [],
});The translators (itemsToMessages, completionToOutputItems,
translateChunkStream) and the resolveHistory helper are also exported
for custom orchestration.
Running the SDK smoke test
tsx examples/sdk-test.ts \
--backend openai-compat \
--base-url https://api.openai.com/v1 \
--api-key "$OPENAI_API_KEY" \
--model gpt-4o-mini \
--store-local ./.test-dataWithout --store-local, conversation/persistence tests are skipped.
Design notes
- Items are the source of truth. Conversations are persisted as an ordered array of typed items. Each request rehydrates items → chat-completions messages, runs the loop, appends new items.
- Agent loop owns MCP execution. Non-MCP function tools are passed
through to the client (
function_callitems), same as OpenAI. - Streaming is synthesized. Each
chat.completion.chunkis mapped into the correct Responses-API event sequence; multi-turn tool loops serialize per iteration (stream deltas → execute tools → stream next). - No hidden writes. Setting
store: falseon a request skips persistence; constructing without astoreskips it for every request.
License
MIT
