@pixelcop/openai-compatible-queuing
v0.2.0
Published
Downloads
75
Readme
AI SDK - OpenAI Compatible Queuing Provider
This package wraps @ai-sdk/openai-compatible and adds provider-local request queueing.
It is intended for OpenAI-compatible backends that enforce very low parallel request limits, such as
a LiteLLM deployment configured with max_parallel_requests=1. Instead of immediately sending
concurrent requests and relying on retries after 429 responses, this provider waits for an
available slot and dispatches requests in FIFO order.
Setup
npm i @pixelcop/openai-compatible-queuingUsage
import { createOpenAICompatibleQueueing } from '@pixelcop/openai-compatible-queuing';
import { generateText } from 'ai';
const provider = createOpenAICompatibleQueueing({
baseURL: 'https://litellm.example.com/v1',
name: 'litellm',
apiKey: process.env.LITELLM_API_KEY,
queue: {
concurrency: 1,
maxQueueWaitMs: 120_000,
cooldownMs: 250,
},
});
const { text } = await generateText({
model: provider.chatModel('gpt-4.1-mini'),
prompt: 'Summarize the latest release notes.',
});Opencode Configuration
All @ai-sdk/openai-compatible options are still supported. Queueing options are additive:
{
"myprovider": {
"npm": "@pixelcop/openai-compatible-queuing",
"name": "My AI ProviderDisplay Name",
"options": {
"baseURL": "https://api.myprovider.com/v1",
"queue": {
"concurrency": 1,
"maxQueueWaitMs": 120000,
"cooldownMs": 250
}
},
"models": {
"my-model-name": {
"name": "My Model Display Name"
}
}
}
}Queue Behavior
- Queueing is shared across all requests created from one provider instance.
- Requests are dispatched in FIFO order.
queue.concurrencycontrols the maximum number of in-flight requests.queue.maxQueueWaitMslimits how long a request may wait for a slot before it is rejected.- Streaming requests hold a queue slot until the response body is consumed or canceled.
queue.cooldownMsadds an extra delay after completion before the next queued request starts.
