glim-llm

v0.1.1

Published

5 months ago

Lightweight multi-provider LLM wrapper with rate limiting, retries, caching, and prompt sanitization.

0High
0Medium
0Low

salmanghouridev

llm openai groq gemini ai wrapper typescript rate-limit retry

glim-llm

Lightweight multi-provider LLM wrapper for Node/Edge (ESM + TypeScript).

Features:

OpenAI, Groq (OpenAI-compatible), Gemini (Google) initial support
Unified generate API
Optional caching (LRU in-memory)
Rate limiting (concurrency + window request limiting)
Automatic retries with exponential backoff (configurable / disable)
Prompt sanitization helper (pluggable)
Simple streaming interface placeholder (upgrade later to real streaming)
ESM + CommonJS builds, type declarations
Zero heavy dependencies; only focused utilities

Install

npm install glim-llm
# plus provider SDKs you want (peer deps). Example:
npm install openai @google/generative-ai

Quick Start

import { createLLMClient, SUPPORTED_PROVIDERS } from 'glim-llm';

const openaiClient = createLLMClient({
  provider: 'openai',
  config: { apiKey: process.env.OPENAI_KEY!, model: 'gpt-4o-mini' },
  rateLimit: { concurrency: 2, requestsPerInterval: 60, intervalMs: 60_000 },
  cache: { ttlMs: 300_000, max: 1000 },
  retry: { retries: 3 },
  sanitize: true,
});

const result = await openaiClient.generate({ prompt: 'Explain edge computing in 2 sentences.' });
console.log(result.text);
console.log('Providers available:', SUPPORTED_PROVIDERS);

API

`createLLMClient(options)`

Returns an object with:

name (provider)
generate(params) Promise
stream(params) AsyncGenerator (currently one-shot; future: true streaming)

Options

provider: 'openai' | 'groq' | 'gemini'
config: { apiKey, model, maxOutputTokens?, temperature?, extra? }
rateLimit (optional): { concurrency?, requestsPerInterval?, intervalMs?, throwOnLimit? }
cache (optional | false): { ttlMs?, max?, namespace? }
retry (optional | false): { retries?, factor?, minTimeoutMs?, maxTimeoutMs? }
sanitize (boolean | function) enable basic prompt cleaning.

Generate Params

prompt (string)
systemPrompt?
model? override
temperature?
maxOutputTokens?
streaming? (future use)
signal? AbortSignal (future wiring)
cacheKey? custom or false to bypass cache

Caching

LRU in-memory; suits single runtime instance. External cache (Redis, KV) can wrap by replacing ResponseCache logic (PRs welcome).

Rate Limiting

Two layers: concurrency (parallel tasks) and requestsPerInterval within a sliding window intervalMs. Set throwOnLimit: true to error instead of waiting.

Retries

Uses exponential backoff. Disable with retry: false.

Sanitization

Naive removal of control chars and prompt injection phrases; override with custom function.

Edge / Serverless

All dependencies are ESM-friendly. Avoid Node-specific APIs if targeting strict edge (replace crypto hash with Web Crypto; TODO: auto-detect later).

Roadmap

True streaming per provider
Tool / function calling abstraction
Token usage normalization
Pluggable logging hooks
Web Crypto fallback
Middleware pipeline

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

glim-llm

Install

Quick Start

API

createLLMClient(options)

Options

Generate Params

Caching

Rate Limiting

Retries

Sanitization

Edge / Serverless

Roadmap

License

`createLLMClient(options)`