@kyyn/llm-stream
v1.0.0
Published
Smart buffer and pagination engine for piping LLM streams (OpenAI, Anthropic, LangChain…) into Discord messages. Handles rate limits, the 2000-char cap, markdown continuity, and graceful error recovery.
Maintainers
Readme
@kyyn/llm-stream
Smart buffer and pagination engine for streaming LLM responses into Discord messages.
Pipe any async LLM stream — OpenAI, Anthropic, LangChain, local models — directly into a Discord message. The library handles every Discord constraint automatically, so you write zero boilerplate.
Problems it solves
| Discord constraint | How @kyyn/llm-stream handles it |
|---|---|
| ~5 edits per 5 s rate limit | Interval-based flush (default 1 500 ms) — never edits per-token |
| 2 000-character message cap | Auto-paginates: spawns followUp / channel.send seamlessly |
| Broken markdown on split | Closes ``` on message A, reopens with same language on message B |
| Provider lock-in | Accepts any AsyncIterable<string> or Node.js ReadableStream |
| Mid-stream message deletion | Catches DiscordAPIError 10008 and aborts gracefully — no crash |
| Timer leaks | All setInterval handles are cleared in a finally block |
Installation
npm install @kyyn/llm-stream
# discord.js is a peer dependency — install it if you haven't already
npm install discord.jsQuick start
import { DiscordLLMStreamer } from '@kyyn/llm-stream';
// ── Slash command handler ─────────────────────────────────────────────────────
client.on('interactionCreate', async (interaction) => {
if (!interaction.isChatInputCommand()) return;
// 1. Defer so Discord gives us 15 minutes to respond
await interaction.deferReply();
// 2. Start your LLM call (OpenAI SDK v4 example)
const openaiStream = await openai.chat.completions.create({
model: 'gpt-4o',
stream: true,
messages: [{ role: 'user', content: interaction.options.getString('prompt')! }],
});
// 3. Wrap the SDK stream in a plain AsyncGenerator<string>
const textStream = (async function* () {
for await (const chunk of openaiStream) {
yield chunk.choices[0]?.delta?.content ?? '';
}
})();
// 4. Hand it to the streamer — that's it
const streamer = new DiscordLLMStreamer(interaction, {
editIntervalMs: 1500, // optional, this is the default
maxLength: 1950, // optional, this is the default
});
await streamer.stream(textStream);
});API
new DiscordLLMStreamer(target, options?)
| Parameter | Type | Description |
|---|---|---|
| target | CommandInteraction \| Message | A deferred interaction or any message to reply to |
| options.editIntervalMs | number | Milliseconds between Discord edits. Default: 1500 |
| options.maxLength | number | Characters before a new message is spawned. Default: 1950 |
streamer.stream(source)
| Parameter | Type | Description |
|---|---|---|
| source | AsyncIterable<string> \| NodeJS.ReadableStream | Any stream of text tokens |
Returns Promise<void> that resolves once the stream is fully consumed and the final edit has been made.
Provider examples
OpenAI
const raw = await openai.chat.completions.create({ model: 'gpt-4o', stream: true, messages });
await streamer.stream(
(async function* () {
for await (const chunk of raw) {
yield chunk.choices[0]?.delta?.content ?? '';
}
})(),
);Anthropic
const raw = await anthropic.messages.create({ model: 'claude-3-5-sonnet-latest', stream: true, ... });
await streamer.stream(
(async function* () {
for await (const event of raw) {
if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
yield event.delta.text;
}
}
})(),
);LangChain
const chain = prompt.pipe(llm);
await streamer.stream(
(async function* () {
for await (const chunk of await chain.stream({ question })) {
yield chunk.content as string;
}
})(),
);Node.js ReadableStream
import { Readable } from 'node:stream';
const readable = Readable.from(['Hello', ' ', 'world']);
await streamer.stream(readable);Markdown continuity
When a response contains a code block that happens to straddle the 1 950-character boundary, the library automatically preserves formatting:
Message A (finalised at the split):
Here is the implementation:
```javascript
function greet(name) {
console.log(`Hello, ${name
```Message B (continuation):
```javascript
}!`);
}
```The closing ``` is appended to message A, and the opening ```javascript is prepended to message B. Users see seamlessly formatted code across both messages.
How the rate limiter works
Discord allows approximately 5 message edits per 5 seconds per message. Streaming an LLM can produce 20–100 tokens per second, making per-token edits impossible.
DiscordLLMStreamer solves this with a two-layer approach:
LLM stream ──tokens──▶ pendingBuffer ──every 1500ms──▶ message.edit()
│
(if length ≥ 1950)
│
▼
paginate() ──▶ followUp() / channel.send()- Token ingestion — every yielded string is appended to an in-memory
pendingBuffer. No Discord API calls happen here. - Interval flush — every
editIntervalMsthe buffer is drained and a singlemessage.edit()is made. 1 500 ms ≈ 0.67 edits/s, well within Discord's limits. - Pagination trigger — if
currentContent.length + pendingBuffer.length ≥ maxLength, pagination runs immediately (bypassing the next interval tick).
Building from source
git clone https://github.com/kyyn/llm-stream
cd llm-stream
npm install
npm run build # outputs to dist/
npm run typecheck # tsc --noEmit
npm test # jestLicense
MIT © kyyn
