@kyyn/llm-stream

v1.0.0

Published

18 days ago

Smart buffer and pagination engine for piping LLM streams (OpenAI, Anthropic, LangChain…) into Discord messages. Handles rate limits, the 2000-char cap, markdown continuity, and graceful error recovery.

0High
0Medium
0Low

shadow.dev

discord discord.js llm stream openai anthropic langchain buffer pagination bot

@kyyn/llm-stream

Smart buffer and pagination engine for streaming LLM responses into Discord messages.

Pipe any async LLM stream — OpenAI, Anthropic, LangChain, local models — directly into a Discord message. The library handles every Discord constraint automatically, so you write zero boilerplate.

Problems it solves

| Discord constraint | How @kyyn/llm-stream handles it | |---|---| | ~5 edits per 5 s rate limit | Interval-based flush (default 1 500 ms) — never edits per-token | | 2 000-character message cap | Auto-paginates: spawns followUp / channel.send seamlessly | | Broken markdown on split | Closes ``` on message A, reopens with same language on message B | | Provider lock-in | Accepts any AsyncIterable<string> or Node.js ReadableStream | | Mid-stream message deletion | Catches DiscordAPIError 10008 and aborts gracefully — no crash | | Timer leaks | All setInterval handles are cleared in a finally block |

Installation

npm install @kyyn/llm-stream
# discord.js is a peer dependency — install it if you haven't already
npm install discord.js

Quick start

import { DiscordLLMStreamer } from '@kyyn/llm-stream';

// ── Slash command handler ─────────────────────────────────────────────────────
client.on('interactionCreate', async (interaction) => {
  if (!interaction.isChatInputCommand()) return;

  // 1. Defer so Discord gives us 15 minutes to respond
  await interaction.deferReply();

  // 2. Start your LLM call (OpenAI SDK v4 example)
  const openaiStream = await openai.chat.completions.create({
    model: 'gpt-4o',
    stream: true,
    messages: [{ role: 'user', content: interaction.options.getString('prompt')! }],
  });

  // 3. Wrap the SDK stream in a plain AsyncGenerator<string>
  const textStream = (async function* () {
    for await (const chunk of openaiStream) {
      yield chunk.choices[0]?.delta?.content ?? '';
    }
  })();

  // 4. Hand it to the streamer — that's it
  const streamer = new DiscordLLMStreamer(interaction, {
    editIntervalMs: 1500, // optional, this is the default
    maxLength: 1950,       // optional, this is the default
  });

  await streamer.stream(textStream);
});

API

`new DiscordLLMStreamer(target, options?)`

| Parameter | Type | Description | |---|---|---| | target | CommandInteraction \| Message | A deferred interaction or any message to reply to | | options.editIntervalMs | number | Milliseconds between Discord edits. Default: 1500 | | options.maxLength | number | Characters before a new message is spawned. Default: 1950 |

`streamer.stream(source)`

| Parameter | Type | Description | |---|---|---| | source | AsyncIterable<string> \| NodeJS.ReadableStream | Any stream of text tokens |

Returns Promise<void> that resolves once the stream is fully consumed and the final edit has been made.

Provider examples

OpenAI

const raw = await openai.chat.completions.create({ model: 'gpt-4o', stream: true, messages });

await streamer.stream(
  (async function* () {
    for await (const chunk of raw) {
      yield chunk.choices[0]?.delta?.content ?? '';
    }
  })(),
);

Anthropic

const raw = await anthropic.messages.create({ model: 'claude-3-5-sonnet-latest', stream: true, ... });

await streamer.stream(
  (async function* () {
    for await (const event of raw) {
      if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
        yield event.delta.text;
      }
    }
  })(),
);

LangChain

const chain = prompt.pipe(llm);

await streamer.stream(
  (async function* () {
    for await (const chunk of await chain.stream({ question })) {
      yield chunk.content as string;
    }
  })(),
);

Node.js ReadableStream

import { Readable } from 'node:stream';

const readable = Readable.from(['Hello', ' ', 'world']);
await streamer.stream(readable);

Markdown continuity

When a response contains a code block that happens to straddle the 1 950-character boundary, the library automatically preserves formatting:

Message A (finalised at the split):

Here is the implementation:

```javascript
function greet(name) {
  console.log(`Hello, ${name
```

Message B (continuation):

```javascript
}!`);
}
```

The closing ``` is appended to message A, and the opening ```javascript is prepended to message B. Users see seamlessly formatted code across both messages.

How the rate limiter works

Discord allows approximately 5 message edits per 5 seconds per message. Streaming an LLM can produce 20–100 tokens per second, making per-token edits impossible.

DiscordLLMStreamer solves this with a two-layer approach:

LLM stream  ──tokens──▶  pendingBuffer  ──every 1500ms──▶  message.edit()
                               │
                    (if length ≥ 1950)
                               │
                               ▼
                         paginate()  ──▶  followUp() / channel.send()

Token ingestion — every yielded string is appended to an in-memory pendingBuffer. No Discord API calls happen here.
Interval flush — every editIntervalMs the buffer is drained and a single message.edit() is made. 1 500 ms ≈ 0.67 edits/s, well within Discord's limits.
Pagination trigger — if currentContent.length + pendingBuffer.length ≥ maxLength, pagination runs immediately (bypassing the next interval tick).

Building from source

git clone https://github.com/kyyn/llm-stream
cd llm-stream
npm install
npm run build      # outputs to dist/
npm run typecheck  # tsc --noEmit
npm test           # jest

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@kyyn/llm-stream

Problems it solves

Installation

Quick start

API

new DiscordLLMStreamer(target, options?)

streamer.stream(source)

Provider examples

OpenAI

Anthropic

LangChain

Node.js ReadableStream

Markdown continuity

How the rate limiter works

Building from source

License

`new DiscordLLMStreamer(target, options?)`

`streamer.stream(source)`