@flink-app/openai-adapter
v2.0.0-alpha.86
Published
OpenAI adapter for Flink AI framework
Readme
@flink-app/openai-adapter
OpenAI adapter for the Flink AI framework using the Responses API - OpenAI's modern API that provides step-aware reasoning, explicit tool invocation, and better performance.
Why Responses API?
This adapter uses OpenAI's Responses API instead of the older Chat Completions API, providing:
- 🔧 Step-aware reasoning: Model returns multiple tool calls as explicit, typed items in a single response
- ⚡ Better performance: 3% improvement on SWE-bench, 5% on TAUBench vs Chat Completions
- 📦 First-class tool steps: Tool calls and results are structured items, not message hacks
- 🎯 Future-proof: All new OpenAI features will land in Responses API first
Important: Understanding the Agent Loop
The Responses API does NOT run your agent loop.
What it provides:
- Multiple tool calls in one response (you still execute them)
- Structured step types (message, function_call, function_call_output)
What Flink handles (via AgentRunner):
- Tool execution
- Multi-turn loops (API → execute tools → API → execute tools → done)
- Deciding when to stop
- Managing conversation state
This separation is intentional - it gives you full control over agent behavior while leveraging better API primitives.
Installation
npm install @flink-app/openai-adapter
# or
pnpm add @flink-app/openai-adapterThe openai package is included as a dependency, so you don't need to install it separately.
Usage
Basic Setup
import { OpenAIAdapter } from "@flink-app/openai-adapter";
import { FlinkApp } from "@flink-app/flink";
const app = new FlinkApp({
ai: {
llms: {
default: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5"
}),
},
},
});
await app.start();Legacy API (still supported):
// Backward-compatible constructor
new OpenAIAdapter(process.env.OPENAI_API_KEY!, "gpt-5")Agent Instructions
Define your agent's behavior using the instructions property:
// src/agents/support_agent.ts
export const Agent: FlinkAgentProps = {
name: "support_agent",
instructions: "You are a helpful customer support agent.",
tools: ["get_order_status"],
model: { adapterId: "default" },
};How it works:
- Instructions are prepended as a system message to every conversation
- Follows Vercel AI SDK pattern for consistency
- Provides stable agent behavior across all interactions
Dynamic Context with System Messages
For per-request context, add system messages to the conversation:
const result = await ctx.agents.myAgent.execute({
message: [
{ role: "system", content: "Current user tier: Premium" },
{ role: "user", content: "What can I do?" }
]
});Order of messages sent to OpenAI:
- Agent
instructions(as system message) - User-provided system messages (if any)
- Conversation messages
This gives you both static agent behavior and dynamic per-request context.
Structured Outputs
Enable structured outputs with JSON schema for 100% reliability in output format:
import { OpenAIAdapter } from "@flink-app/openai-adapter";
const adapter = new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5",
structuredOutput: {
type: "json_schema",
name: "car_analysis",
description: "Analysis of car specifications",
schema: {
type: "object",
properties: {
brand: { type: "string" },
model: { type: "string" },
year: { type: "number" },
features: {
type: "array",
items: { type: "string" }
}
},
required: ["brand", "model", "year"],
additionalProperties: false
},
strict: true // Enforces 100% schema adherence
}
});Benefits of Structured Outputs:
- 100% reliability (vs ~95% with JSON mode)
- No need for retry logic or manual validation
- Automatic schema validation during generation
- Supported on all modern models:
gpt-4.1,gpt-5,o4-mini,o3
Conversation Persistence (Recommended)
For multi-turn conversations, use ConversationAgent to automatically save conversation history:
// src/agents/ChatAgent.ts
import { ConversationAgent, ConversationData } from "@flink-app/flink";
export default class ChatAgent extends ConversationAgent<AppContext> {
id = "chat-agent";
description = "Chat assistant with automatic conversation persistence";
instructions = "You are a helpful assistant...";
tools = ["search-knowledge"];
// Load conversation from your storage backend
protected async loadConversation(conversationId: string) {
const conv = await this.ctx.repos.conversationRepo.getById(conversationId);
return conv ? {
messages: conv.messages,
providerMetadata: conv.providerMetadata // ← Preserved for tracking
} : null;
}
// Save conversation with updated metadata
protected async saveConversation(conversationId: string, data: ConversationData) {
await this.ctx.repos.conversationRepo.upsert({
_id: conversationId,
messages: data.messages,
providerMetadata: data.providerMetadata // ← Saved automatically
});
}
}What this enables:
- Automatic History Management: Framework handles full conversation history
- Metadata Tracking: Provider metadata preserved for debugging and future use
- Storage Flexibility: Works with MongoDB, Redis, in-memory, or any backend
- Type Safety: Abstract methods enforce consistent implementation
Example Storage Backends:
How Provider Metadata Works:
- First turn: Agent executes, adapter may return provider-specific metadata
saveConversation()stores messages + metadata- Second turn:
loadConversation()returns messages + metadata - Framework populates
input.providerMetadatafor tracking/debugging - Full conversation history is always sent for reliability
Note: Provider metadata is namespaced by adapter (e.g., { openai: {...}, anthropic: {...} }) for multi-provider support.
Debug Logging
The adapter includes component-specific logging to help debug API requests and responses:
const adapter = new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5",
debug: true // Enable debug logging for this adapter
});What gets logged:
When debug: true:
- ✅ Initialization confirmation
- ✅ Request metadata (model, message count, tool count, token limits)
- ✅ Full request parameters sent to OpenAI
- ✅ Tool call decisions made by the LLM
- ✅ Token usage and performance metrics
- ✅ Response IDs for conversation continuation
- ✅ Error details with context
When debug: false (default):
- ✅ Request completion with token usage (info level)
- ✅ Errors with context
Log Output Example:
2026-02-15 [OpenAI] [INFO] OpenAI adapter initialized with debug logging enabled
2026-02-15 [OpenAI] [DEBUG] Starting OpenAI stream request {"model":"gpt-5","messageCount":2,"toolCount":3}
2026-02-15 [OpenAI] [DEBUG] Stream connection established
2026-02-15 [OpenAI] [DEBUG] LLM decided to call tool {"name":"get_weather","call_id":"call_abc123"}
2026-02-15 [OpenAI] [INFO] OpenAI request completed {"inputTokens":150,"outputTokens":80,"totalTokens":230}Environment-Based Configuration:
You can also enable debug logging via environment variables without code changes:
# Enable debug for OpenAI adapter specifically
DEBUG=OpenAI npm start
# Enable debug for multiple components
DEBUG=OpenAI,Database npm start
# Set global log level
LOG_LEVEL=debug npm startProgrammatic Control:
For dynamic control at runtime:
import { FlinkLogFactory } from "@flink-app/flink";
// Enable debug for OpenAI adapter
FlinkLogFactory.setComponentLevel("OpenAI", "debug");
// Later, disable it
FlinkLogFactory.setComponentLevel("OpenAI", "info");Benefits:
- 🔍 Debug API requests without flooding other logs
- 📊 Track token usage per request
- 🐛 Investigate tool calling issues
- ⚡ Monitor performance and latency
- 🎯 Component-specific logging (OpenAI logs don't affect Database, Cache, etc.)
See Flink Logging Guide for more details on component-specific logging.
Cache Metrics & Cost Optimization
The OpenAI adapter automatically captures detailed cache metrics to help you optimize costs and understand performance:
const result = await ctx.agents.myAgent.execute({
message: "What's the weather?"
});
console.log(result.usage);
// {
// inputTokens: 2006,
// outputTokens: 300,
// cachedInputTokens: 1920 // 95% cache hit!
// }
// Calculate actual billed input tokens
const billedInputTokens = result.usage.inputTokens - (result.usage.cachedInputTokens || 0);
console.log(`Only ${billedInputTokens} tokens charged at full rate`);
// "Only 86 tokens charged at full rate"
// Estimate cost savings (cached tokens are ~10% of regular cost)
const regularCost = billedInputTokens * 1.0;
const cachedCost = (result.usage.cachedInputTokens || 0) * 0.1;
const totalCost = regularCost + cachedCost;
console.log(`Estimated cost factor: ${totalCost} (vs ${result.usage.inputTokens} without caching)`);
// "Estimated cost factor: 278 (vs 2006 without caching)" - 86% savings!Cache Metrics:
inputTokens: Total input tokens (including cached)outputTokens: Total output tokenscachedInputTokens: Input tokens served from cache (10% cost)
OpenAI Caching:
- Cached tokens cost ~10% of regular tokens
- Automatic caching for repeated context
- No configuration needed - OpenAI handles it automatically
When Caching Helps Most:
- Long system prompts used across requests
- Repeated context in multi-turn conversations
- Few-shot examples in prompts
- Large knowledge bases in context
Logging: When debug mode is enabled, cache metrics are automatically logged:
2026-02-23 [flink.ai.openai] [INFO] OpenAI request completed {
"inputTokens": 2006,
"outputTokens": 300,
"cachedInputTokens": 1920,
"totalTokens": 2306,
"status": "completed",
"model": "gpt-5"
}Multiple Adapters
You can register multiple OpenAI adapters with different configurations:
import { OpenAIAdapter } from "@flink-app/openai-adapter";
import { FlinkApp } from "@flink-app/flink";
const app = new FlinkApp({
ai: {
llms: {
// Default GPT-5 - best for general tasks
default: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5"
}),
// Fast reasoning model - cost-efficient
fast: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "o4-mini"
}),
// Maximum intelligence for complex reasoning
smart: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "o3"
}),
// With structured output
structured: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5",
structuredOutput: {
type: "json_schema",
name: "response",
schema: { /* your schema */ },
strict: true
}
}),
},
},
});Using in Agents
Reference the adapter by its registered ID in your agent configuration:
// src/agents/support_agent.ts
import { FlinkAgentProps } from "@flink-app/flink";
export const Agent: FlinkAgentProps = {
name: "support_agent",
description: "Customer support assistant",
instructions: "You are a helpful customer support agent.",
tools: ["get_order_status", "search_knowledge_base"],
model: {
adapterId: "default", // Uses the "default" adapter
maxTokens: 2000,
temperature: 0.7,
},
};Supported Models
This adapter works with all OpenAI models available via the Responses API. The latest models (as of 2026) offer significant improvements:
GPT-5 Series (Recommended)
- GPT-5:
gpt-5- Latest and most capable model
- Best for general-purpose applications
- Excellent at coding, reasoning, and agentic tasks
GPT-4.1 Series
GPT-4.1:
gpt-4.1- Smartest non-reasoning model
- Excellent at coding tasks
- Strong at precise instruction following
- Best for web development and technical tasks
GPT-4.1 mini:
gpt-4.1-mini- Smaller, faster, more cost-efficient
- Good balance of capability and cost
GPT-4.1 nano:
gpt-4.1-nano- Ultra-fast and cost-efficient
- Best for simple, high-volume tasks
O-Series Reasoning Models
o4-mini:
o4-mini(recommended for reasoning tasks)- Fast, cost-efficient reasoning model
- Best-performing on AIME 2024 and 2025 benchmarks
- Optimized for mathematical and logical reasoning
o3:
o3- Advanced reasoning model for complex tasks
- State-of-the-art performance on coding, math, and science
- Excellent at Codeforces, SWE-bench, and MMMU
o3-pro:
o3-pro- Premium reasoning model (Pro users only)
- Designed to think longer and provide most reliable responses
Legacy Models
For backwards compatibility:
- GPT-4 Turbo:
gpt-4-turbo - GPT-4:
gpt-4 - GPT-3.5 Turbo:
gpt-3.5-turbo
Note: Some older models (GPT-4o, early GPT-4.1 variants) are being retired in 2026. Migrate to the latest models for continued support.
Model Selection Guide
| Use Case | Recommended Model | Why |
|----------|------------------|-----|
| General development | gpt-5 | Latest and most capable |
| Coding & technical | gpt-4.1 | Best instruction following |
| High-volume tasks | gpt-4.1-mini | Cost-efficient with good performance |
| Mathematical reasoning | o4-mini | Optimized for math, fast and cost-efficient |
| Complex problem-solving | o3 | State-of-the-art reasoning |
| Mission-critical | o3-pro | Maximum reliability (Pro users) |
Features
- ✅ Step-based tool calling - Multiple tool calls in one response as typed items
- ✅ Event-based streaming - Proper event taxonomy (not just token streaming)
- ✅ Structured outputs with JSON schema (100% reliability)
- ✅ First-class tool steps - function_call and function_call_output as explicit types
- ✅ Support for all OpenAI models
- ✅ 3-5% performance improvement over Chat Completions
What Makes Responses API Different
The key difference is how tool calls are represented, not who executes them:
Chat Completions (old):
Response: {
message: {
content: "...",
tool_calls: [...] // Tool calls embedded in message
}
}
You: Extract tool calls, execute them, create new messages, call API againResponses API (new):
Response: {
output: [
{ type: "message", content: "..." },
{ type: "function_call", name: "...", call_id: "..." },
{ type: "function_call", name: "...", call_id: "..." } // Multiple calls!
]
}
You: Extract tool calls, execute them, create function_call_output items, call API againKey improvements:
- Multiple tool calls per response: Model can request several tools at once
- Explicit step types: No more message role gymnastics
- Better structured: function_call_output vs cramming into user messages
- Clearer semantics: Steps are first-class, not message metadata
Still your responsibility:
- Executing the tools
- Deciding when to stop the loop
- Managing conversation history
API
OpenAIAdapter
interface OpenAIAdapterOptions {
apiKey: string;
model: string;
structuredOutput?: {
type: "json_schema";
name: string;
description?: string;
schema: Record<string, any>;
strict?: boolean;
};
}
class OpenAIAdapter implements LLMAdapter {
constructor(options: OpenAIAdapterOptions);
constructor(apiKey: string, model: string); // Legacy
}Parameters
apiKey: Your OpenAI API keymodel: The OpenAI model to use (e.g., "gpt-5", "o4-mini")structuredOutput: Optional JSON schema for structured outputs
Architecture Notes
Responses API vs Chat Completions
This adapter uses OpenAI's Responses API, which differs from Chat Completions in several ways:
Request Format:
- Chat Completions:
messagesarray with system/user/assistant roles - Responses API:
inputarray with typed items (messages, function_call_outputs, etc.) + separateinstructionsfield
Response Format:
- Chat Completions:
choices[0].message.content - Responses API:
outputarray of items with typemessage,function_call, etc.
Tool/Function Format:
- Chat Completions: Externally-tagged
{ type: "function", function: {...} } - Responses API: Internally-tagged
{ type: "function", name: "...", ... }(strict by default)
Structured Outputs:
- Chat Completions:
response_format: { type: "json_schema", json_schema: {...} } - Responses API:
text: { format: { type: "json_schema", ... } }
Flink Integration
The adapter seamlessly integrates with Flink's LLMAdapter interface:
- Flink's
instructions→ Responses APIinstructions - Flink's
messages→ Converted to Responses APIinputitems (typed steps) - Flink's tool schema → Converted to Responses API function format (internally-tagged, strict by default)
- Responses
output→ Extracted to Flink'sLLMResponseformat
Each API call is one turn:
- Flink calls
adapter.execute()→ One Responses API request - Response may contain multiple tool calls (as separate items)
- Flink's AgentRunner executes those tools
- Flink calls
adapter.execute()again with tool results → Another Responses API request - Repeat until no more tool calls
This is the standard agent loop architecture used by modern frameworks (LangGraph, Vercel AI SDK, etc.)
Migration from Chat Completions
If you're coming from the Chat Completions API, the good news is: no code changes needed!
The adapter handles all the differences internally:
// This works the same way with both APIs
const app = new FlinkApp({
ai: {
llms: {
default: new OpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-5" // Just update the model
}),
},
},
});Benefits of upgrading:
- ✅ Step-aware reasoning (multiple tool calls per response)
- ✅ Better performance (3-5% improvement on benchmarks)
- ✅ First-class tool steps (cleaner than message role hacks)
- ✅ Future-proof (new features land here first)
Requirements
- Node.js >= 18
- @flink-app/flink >= 1.0.0
- openai >= 4.77.0 (with Responses API support)
License
MIT
