npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

responses-to-completions

v0.2.1

Published

SDK that exposes the OpenAI Responses API on top of any /v1/chat/completions backend (vLLM, Ollama, llama.cpp, TGI, etc.). Handles conversation state, MCP tool execution, and streaming event translation.

Readme

responses-to-completions

An SDK that exposes the OpenAI Responses API on top of any OpenAI-compatible /v1/chat/completions backend (vLLM, Ollama, llama.cpp, TGI, LiteLLM, Together, Groq, OpenRouter, …).

Construct a ResponsesClient, give it a backend, and call client.responses.create(...) — the client handles conversation state, MCP tool execution, and streaming-event translation internally.

Install

npm i responses-to-completions

Quick start

import {
  ResponsesClient,
  OpenAICompatAdapter,
  LocalFileStore,
} from "responses-to-completions";

const client = new ResponsesClient({
  backend: new OpenAICompatAdapter({
    baseUrl: "https://api.openai.com/v1",
    apiKey: process.env.OPENAI_API_KEY,
  }),
  // optional — required for conversations and responses.{get,del}
  store: new LocalFileStore("./.data"),
});

const resp = await client.responses.create({
  model: "gpt-4o-mini",
  input: "Write a haiku about TypeScript.",
});
console.log(resp.output_text);

Streaming

responses.create({ stream: true }) returns a StreamResponse — an async-iterable of StreamEvents plus a finalResponse() promise.

const stream = await client.responses.create({
  model: "gpt-4o-mini",
  input: "Tell me a story.",
  stream: true,
});

for await (const ev of stream) {
  if (ev.type === "response.output_text.delta") {
    process.stdout.write(ev.delta);
  }
}

const finalResp = await stream.finalResponse();
console.log("\nfinal id:", finalResp.id);

Backends

The constructor takes any BackendAdapter. Three are built in.

OpenAICompatAdapter

Works with any server implementing POST /v1/chat/completions — OpenAI, vLLM, llama.cpp server, TGI, LiteLLM, Together, Groq, and Ollama's own OpenAI-compat endpoint on port 11434.

new OpenAICompatAdapter({
  baseUrl: "http://localhost:8000/v1",
  apiKey: "optional-bearer",
  forceModel: "qwen2.5-coder:32b", // optional override
});

OllamaAdapter

Native Ollama /api/chat. Normalizes NDJSON streaming, object-shaped tool arguments, and the missing developer role.

new OllamaAdapter({ host: "http://localhost:11434" });

OpenRouterAdapter

OpenRouter with provider-routing preferences.

new OpenRouterAdapter({
  apiKey: process.env.OPENROUTER_API_KEY!,
  provider: { order: ["Anthropic", "Google"], allow_fallbacks: true },
});

Custom backends

Implement the BackendAdapter interface (complete + stream).

Embeddings

All three built-in adapters expose an optional embeddings() method that returns the OpenAI-shaped EmbeddingsResponse regardless of provider. Call it directly on the adapter — embeddings are independent of the ResponsesClient surface.

import {
  OpenAICompatAdapter,
  OllamaAdapter,
  OpenRouterAdapter,
} from "responses-to-completions";

// OpenAI / vLLM / any /v1/embeddings-compatible server
const openai = new OpenAICompatAdapter({
  baseUrl: "https://api.openai.com/v1",
  apiKey: process.env.OPENAI_API_KEY,
});
const r1 = await openai.embeddings({
  model: "text-embedding-3-small",
  input: "hello world",
});

// Ollama — POST /api/embed, translated to the OpenAI shape
const ollama = new OllamaAdapter({ host: "http://localhost:11434" });
const r2 = await ollama.embeddings({
  model: "nomic-embed-text",
  input: ["a", "b", "c"],
});

// OpenRouter — routes to whichever provider exposes the model's embeddings
const openrouter = new OpenRouterAdapter({
  apiKey: process.env.OPENROUTER_API_KEY!,
});
const r3 = await openrouter.embeddings({
  model: "openai/text-embedding-3-large",
  input: "hi",
});

// All three return the same shape:
//   { object: "list", data: [{ object: "embedding", index, embedding }], model, usage }

Request shape

interface EmbeddingsRequest {
  model: string;                        // honors `forceModel` on the adapter
  input: string | string[];             // batch by passing an array
  encoding_format?: "float" | "base64"; // default "float"; Ollama ignores
  dimensions?: number;                  // text-embedding-3-* only; others ignore
  user?: string;
}

Adapter-specific notes

  • OpenAICompatAdapter — pass-through to POST {baseUrl}/embeddings. Reuses the same auth/headers/forceModel as complete().
  • OpenRouterAdapter — same path, plus OpenRouter's provider routing preferences are forwarded. Note that OpenRouter's embeddings coverage is narrower than its chat coverage; only models exposing an embeddings endpoint will work.
  • OllamaAdapter — calls native POST /api/embed and translates the response to the OpenAI shape. encoding_format, dimensions, and user are silently dropped (Ollama doesn't honor them).

Non-2xx responses throw BackendError (same as complete()/respond()).

Stores

Persistence is optional. Without a store the client still runs requests but conversations.* and responses.{get,del} throw, and responses.create skips writing.

LocalFileStore — for local development / tests

new LocalFileStore("./.data");

Writes one JSON file per artifact:

<root>/conversations/<id>.json
<root>/responses/<id>.json

S3Store — for production

new S3Store({
  bucket: "my-bucket",
  prefix: "prod/responses",
  clientConfig: { region: "us-east-1" },
});

Custom stores

Implement the Store interface — see src/store/store.ts.

Conversations

Conversation state is owned by the store. The client offers a thin facade:

const conv = await client.conversations.create({ metadata: { user: "u_42" } });

await client.responses.create({
  model: "gpt-4o-mini",
  conversation: conv.id,
  input: "Remember the secret word 'banana'.",
});
await client.responses.create({
  model: "gpt-4o-mini",
  conversation: conv.id,
  input: "What was the secret word?",
});

const { data } = await client.conversations.items.list(conv.id);

Full surface:

| Method | Notes | | --- | --- | | conversations.create({ id?, items?, metadata? }) | id is optional; client-generated when omitted | | conversations.get(id) | Returns metadata only (items via items.list) | | conversations.update(id, { metadata }) | | | conversations.del(id) | | | conversations.items.list(id, { limit?, after?, order? }) | Paginated | | conversations.items.append(id, items) | Returns the full item list after append | | conversations.items.get(id, itemId) | | | conversations.items.del(id, itemId) | |

Items are stored canonically as typed Responses-API items (message, function_call, function_call_output, mcp_list_tools, mcp_call, mcp_approval_request, reasoning) and rehydrated into a chat-completions messages[] view before each upstream call.

You can also continue a conversation by previous_response_id, same as OpenAI.

MCP tools

Remote MCP servers are supported via the standard Responses-API tools entry (type: "mcp"). The client connects to each MCP server at request time, lists its tools, exposes them to the backend model as function tools, and executes any tool calls server-side.

await client.responses.create({
  model: "gpt-4o-mini",
  tools: [
    {
      type: "mcp",
      server_label: "docs",
      server_url: "https://docs.example.com/mcp",
      authorization: process.env.DOCS_MCP_TOKEN,
      allowed_tools: ["search_docs", "read_page"],
      require_approval: "never",
    },
  ],
  input: "Find the section on rate limits.",
});

The response surface includes:

  • mcp_list_tools items — the tools discovered on each server.
  • mcp_call items — each executed tool call with its output.
  • mcp_approval_request items — when require_approval demands one.

To approve a paused call, send a follow-up responses.create with an mcp_approval_response input item and previous_response_id.

require_approval

Supports the full OpenAI shape:

  • "never" — execute all tools immediately.
  • "always" — emit an mcp_approval_request for every call.
  • { always: { tool_names: [...] }, never: { tool_names: [...] } } — per-tool.

Not yet supported

  • OpenAI connectors (connector_id) — only raw server_url MCP servers.
  • Built-in tools web_search, file_search, code_interpreter, computer_use, image_generation. Requests including them will fail at the backend since we forward them as-is.

Configuration reference

All configuration is via constructor options — the library reads no environment variables.

new ResponsesClient(options):

| Option | Default | Description | | --- | --- | --- | | backend | — (required) | A BackendAdapter instance | | store | undefined | A Store instance — required for conversations and responses.{get,del} | | maxIterations | 10 | Hard cap on backend round-trips per request |

Advanced usage

You can bypass the client and drive AgentLoop directly:

import { AgentLoop, OpenAICompatAdapter } from "responses-to-completions";

const agent = new AgentLoop({
  backend: new OpenAICompatAdapter({ baseUrl: "..." }),
});
const result = await agent.run({
  request: { model: "gpt-4o-mini", input: "hi" },
  history: [],
});

The translators (itemsToMessages, completionToOutputItems, translateChunkStream) and the resolveHistory helper are also exported for custom orchestration.

Running the SDK smoke test

tsx examples/sdk-test.ts \
  --backend openai-compat \
  --base-url https://api.openai.com/v1 \
  --api-key "$OPENAI_API_KEY" \
  --model gpt-4o-mini \
  --store-local ./.test-data

Without --store-local, conversation/persistence tests are skipped.

Design notes

  • Items are the source of truth. Conversations are persisted as an ordered array of typed items. Each request rehydrates items → chat-completions messages, runs the loop, appends new items.
  • Agent loop owns MCP execution. Non-MCP function tools are passed through to the client (function_call items), same as OpenAI.
  • Streaming is synthesized. Each chat.completion.chunk is mapped into the correct Responses-API event sequence; multi-turn tool loops serialize per iteration (stream deltas → execute tools → stream next).
  • No hidden writes. Setting store: false on a request skips persistence; constructing without a store skips it for every request.

License

MIT