@flink-app/openai-adapter
v2.0.0-alpha.56
Published
OpenAI adapter for Flink AI framework
Readme
@flink-app/openai-adapter
OpenAI adapter for the Flink AI framework using the Responses API - OpenAI's modern API that provides step-aware reasoning, explicit tool invocation, and better performance.
Why Responses API?
This adapter uses OpenAI's Responses API instead of the older Chat Completions API, providing:
- 🔧 Step-aware reasoning: Model returns multiple tool calls as explicit, typed items in a single response
- ⚡ Better performance: 3% improvement on SWE-bench, 5% on TAUBench vs Chat Completions
- 💰 Lower costs: 40-80% better cache utilization via response persistence
- 📦 First-class tool steps: Tool calls and results are structured items, not message hacks
- 🎯 Future-proof: All new OpenAI features will land in Responses API first
Important: Understanding the Agent Loop
The Responses API does NOT run your agent loop.
What it provides:
- Multiple tool calls in one response (you still execute them)
- Structured step types (message, function_call, function_call_output)
- Optional response persistence for caching
What Flink handles (via AgentRunner):
- Tool execution
- Multi-turn loops (API → execute tools → API → execute tools → done)
- Deciding when to stop
- Managing conversation state
This separation is intentional - it gives you full control over agent behavior while leveraging better API primitives.
Installation
npm install @flink-app/openai-adapter
# or
pnpm add @flink-app/openai-adapterThe openai package is included as a dependency, so you don't need to install it separately.
Usage
Basic Setup
import { OpenAIAdapter } from "@flink-app/openai-adapter";
import { FlinkApp } from "@flink-app/flink";
const app = new FlinkApp({
ai: {
llms: {
default: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5"
}),
},
},
});
await app.start();Legacy API (still supported):
// Backward-compatible constructor
new OpenAIAdapter(process.env.OPENAI_API_KEY!, "gpt-5")Agent Instructions
Define your agent's behavior using the instructions property:
// src/agents/support_agent.ts
export const Agent: FlinkAgentProps = {
name: "support_agent",
instructions: "You are a helpful customer support agent.",
tools: ["get_order_status"],
model: { adapterId: "default" },
};How it works:
- Instructions are prepended as a system message to every conversation
- Follows Vercel AI SDK pattern for consistency
- Provides stable agent behavior across all interactions
Dynamic Context with System Messages
For per-request context, add system messages to the conversation:
const result = await ctx.agents.myAgent.execute({
message: [
{ role: "system", content: "Current user tier: Premium" },
{ role: "user", content: "What can I do?" }
]
});Order of messages sent to OpenAI:
- Agent
instructions(as system message) - User-provided system messages (if any)
- Conversation messages
This gives you both static agent behavior and dynamic per-request context.
Structured Outputs
Enable structured outputs with JSON schema for 100% reliability in output format:
import { OpenAIAdapter } from "@flink-app/openai-adapter";
const adapter = new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5",
structuredOutput: {
type: "json_schema",
name: "car_analysis",
description: "Analysis of car specifications",
schema: {
type: "object",
properties: {
brand: { type: "string" },
model: { type: "string" },
year: { type: "number" },
features: {
type: "array",
items: { type: "string" }
}
},
required: ["brand", "model", "year"],
additionalProperties: false
},
strict: true // Enforces 100% schema adherence
}
});Benefits of Structured Outputs:
- 100% reliability (vs ~95% with JSON mode)
- No need for retry logic or manual validation
- Automatic schema validation during generation
- Supported on all modern models:
gpt-4.1,gpt-5,o4-mini,o3
Zero Data Retention (ZDR)
For organizations with compliance or data retention requirements:
const adapter = new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5",
persistResponse: false // Don't store responses on OpenAI servers
});What persistResponse does:
true(default): OpenAI stores the response for caching and retrieval viaresponse_idfalse: No data stored on OpenAI servers (ZDR compliance)
What it does NOT do:
- It does NOT automatically manage conversation state
- You still need to pass full conversation history in messages
- It's purely about server-side response persistence
Note: OpenAI automatically enforces persistResponse: false for Zero Data Retention organizations.
Multiple Adapters
You can register multiple OpenAI adapters with different configurations:
import { OpenAIAdapter } from "@flink-app/openai-adapter";
import { FlinkApp } from "@flink-app/flink";
const app = new FlinkApp({
ai: {
llms: {
// Default GPT-5 - best for general tasks
default: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5"
}),
// Fast reasoning model - cost-efficient
fast: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "o4-mini"
}),
// Maximum intelligence for complex reasoning
smart: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "o3"
}),
// With structured output
structured: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5",
structuredOutput: {
type: "json_schema",
name: "response",
schema: { /* your schema */ },
strict: true
}
}),
},
},
});Using in Agents
Reference the adapter by its registered ID in your agent configuration:
// src/agents/support_agent.ts
import { FlinkAgentProps } from "@flink-app/flink";
export const Agent: FlinkAgentProps = {
name: "support_agent",
description: "Customer support assistant",
instructions: "You are a helpful customer support agent.",
tools: ["get_order_status", "search_knowledge_base"],
model: {
adapterId: "default", // Uses the "default" adapter
maxTokens: 2000,
temperature: 0.7,
},
};Supported Models
This adapter works with all OpenAI models available via the Responses API. The latest models (as of 2026) offer significant improvements:
GPT-5 Series (Recommended)
- GPT-5:
gpt-5- Latest and most capable model
- Best for general-purpose applications
- Excellent at coding, reasoning, and agentic tasks
GPT-4.1 Series
GPT-4.1:
gpt-4.1- Smartest non-reasoning model
- Excellent at coding tasks
- Strong at precise instruction following
- Best for web development and technical tasks
GPT-4.1 mini:
gpt-4.1-mini- Smaller, faster, more cost-efficient
- Good balance of capability and cost
GPT-4.1 nano:
gpt-4.1-nano- Ultra-fast and cost-efficient
- Best for simple, high-volume tasks
O-Series Reasoning Models
o4-mini:
o4-mini(recommended for reasoning tasks)- Fast, cost-efficient reasoning model
- Best-performing on AIME 2024 and 2025 benchmarks
- Optimized for mathematical and logical reasoning
o3:
o3- Advanced reasoning model for complex tasks
- State-of-the-art performance on coding, math, and science
- Excellent at Codeforces, SWE-bench, and MMMU
o3-pro:
o3-pro- Premium reasoning model (Pro users only)
- Designed to think longer and provide most reliable responses
Legacy Models
For backwards compatibility:
- GPT-4 Turbo:
gpt-4-turbo - GPT-4:
gpt-4 - GPT-3.5 Turbo:
gpt-3.5-turbo
Note: Some older models (GPT-4o, early GPT-4.1 variants) are being retired in 2026. Migrate to the latest models for continued support.
Model Selection Guide
| Use Case | Recommended Model | Why |
|----------|------------------|-----|
| General development | gpt-5 | Latest and most capable |
| Coding & technical | gpt-4.1 | Best instruction following |
| High-volume tasks | gpt-4.1-mini | Cost-efficient with good performance |
| Mathematical reasoning | o4-mini | Optimized for math, fast and cost-efficient |
| Complex problem-solving | o3 | State-of-the-art reasoning |
| Mission-critical | o3-pro | Maximum reliability (Pro users) |
Features
- ✅ Step-based tool calling - Multiple tool calls in one response as typed items
- ✅ Event-based streaming - Proper event taxonomy (not just token streaming)
- ✅ Structured outputs with JSON schema (100% reliability)
- ✅ Response persistence for better caching (optional)
- ✅ First-class tool steps - function_call and function_call_output as explicit types
- ✅ Zero Data Retention mode for compliance
- ✅ Support for all OpenAI models
- ✅ 40-80% cost savings via better caching
- ✅ 3-5% performance improvement over Chat Completions
What Makes Responses API Different
The key difference is how tool calls are represented, not who executes them:
Chat Completions (old):
Response: {
message: {
content: "...",
tool_calls: [...] // Tool calls embedded in message
}
}
You: Extract tool calls, execute them, create new messages, call API againResponses API (new):
Response: {
output: [
{ type: "message", content: "..." },
{ type: "function_call", name: "...", call_id: "..." },
{ type: "function_call", name: "...", call_id: "..." } // Multiple calls!
]
}
You: Extract tool calls, execute them, create function_call_output items, call API againKey improvements:
- Multiple tool calls per response: Model can request several tools at once
- Explicit step types: No more message role gymnastics
- Better structured: function_call_output vs cramming into user messages
- Clearer semantics: Steps are first-class, not message metadata
Still your responsibility:
- Executing the tools
- Deciding when to stop the loop
- Managing conversation history
API
OpenAIAdapter
interface OpenAIAdapterOptions {
apiKey: string;
model: string;
structuredOutput?: {
type: "json_schema";
name: string;
description?: string;
schema: Record<string, any>;
strict?: boolean;
};
persistResponse?: boolean; // Default: true
}
class OpenAIAdapter implements LLMAdapter {
constructor(options: OpenAIAdapterOptions);
constructor(apiKey: string, model: string); // Legacy
}Parameters
apiKey: Your OpenAI API keymodel: The OpenAI model to use (e.g., "gpt-5", "o4-mini")structuredOutput: Optional JSON schema for structured outputspersistResponse: Whether to persist responses on OpenAI servers for caching (default:true)
Architecture Notes
Responses API vs Chat Completions
This adapter uses OpenAI's Responses API, which differs from Chat Completions in several ways:
Request Format:
- Chat Completions:
messagesarray with system/user/assistant roles - Responses API:
inputarray with typed items (messages, function_call_outputs, etc.) + separateinstructionsfield
Response Format:
- Chat Completions:
choices[0].message.content - Responses API:
outputarray of items with typemessage,function_call, etc.
Tool/Function Format:
- Chat Completions: Externally-tagged
{ type: "function", function: {...} } - Responses API: Internally-tagged
{ type: "function", name: "...", ... }(strict by default)
Structured Outputs:
- Chat Completions:
response_format: { type: "json_schema", json_schema: {...} } - Responses API:
text: { format: { type: "json_schema", ... } }
State Management:
- Chat Completions: Manual conversation state management
- Responses API: Optional response persistence with
persistResponse: true(for caching, not automatic state replay)
Flink Integration
The adapter seamlessly integrates with Flink's LLMAdapter interface:
- Flink's
instructions→ Responses APIinstructions - Flink's
messages→ Converted to Responses APIinputitems (typed steps) - Flink's tool schema → Converted to Responses API function format (internally-tagged, strict by default)
- Responses
output→ Extracted to Flink'sLLMResponseformat
Each API call is one turn:
- Flink calls
adapter.execute()→ One Responses API request - Response may contain multiple tool calls (as separate items)
- Flink's AgentRunner executes those tools
- Flink calls
adapter.execute()again with tool results → Another Responses API request - Repeat until no more tool calls
This is the standard agent loop architecture used by modern frameworks (LangGraph, Vercel AI SDK, etc.)
Migration from Chat Completions
If you're coming from the Chat Completions API, the good news is: no code changes needed!
The adapter handles all the differences internally:
// This works the same way with both APIs
const app = new FlinkApp({
ai: {
llms: {
default: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5" // Just update the model
}),
},
},
});Benefits of upgrading:
- ✅ Step-aware reasoning (multiple tool calls per response)
- ✅ Better performance (3-5% improvement on benchmarks)
- ✅ Lower costs (40-80% better caching via response persistence)
- ✅ First-class tool steps (cleaner than message role hacks)
- ✅ Future-proof (new features land here first)
Requirements
- Node.js >= 18
- @flink-app/flink >= 1.0.0
- openai >= 4.77.0 (with Responses API support)
License
MIT
