glim-llm
v0.1.1
Published
Lightweight multi-provider LLM wrapper with rate limiting, retries, caching, and prompt sanitization.
Maintainers
Readme
glim-llm
Lightweight multi-provider LLM wrapper for Node/Edge (ESM + TypeScript).
Features:
- OpenAI, Groq (OpenAI-compatible), Gemini (Google) initial support
- Unified
generateAPI - Optional caching (LRU in-memory)
- Rate limiting (concurrency + window request limiting)
- Automatic retries with exponential backoff (configurable / disable)
- Prompt sanitization helper (pluggable)
- Simple streaming interface placeholder (upgrade later to real streaming)
- ESM + CommonJS builds, type declarations
- Zero heavy dependencies; only focused utilities
Install
npm install glim-llm
# plus provider SDKs you want (peer deps). Example:
npm install openai @google/generative-aiQuick Start
import { createLLMClient, SUPPORTED_PROVIDERS } from 'glim-llm';
const openaiClient = createLLMClient({
provider: 'openai',
config: { apiKey: process.env.OPENAI_KEY!, model: 'gpt-4o-mini' },
rateLimit: { concurrency: 2, requestsPerInterval: 60, intervalMs: 60_000 },
cache: { ttlMs: 300_000, max: 1000 },
retry: { retries: 3 },
sanitize: true,
});
const result = await openaiClient.generate({ prompt: 'Explain edge computing in 2 sentences.' });
console.log(result.text);
console.log('Providers available:', SUPPORTED_PROVIDERS);API
createLLMClient(options)
Returns an object with:
name(provider)generate(params)Promisestream(params)AsyncGenerator (currently one-shot; future: true streaming)
Options
provider:'openai' | 'groq' | 'gemini'config:{ apiKey, model, maxOutputTokens?, temperature?, extra? }rateLimit(optional):{ concurrency?, requestsPerInterval?, intervalMs?, throwOnLimit? }cache(optional | false):{ ttlMs?, max?, namespace? }retry(optional | false):{ retries?, factor?, minTimeoutMs?, maxTimeoutMs? }sanitize(boolean | function) enable basic prompt cleaning.
Generate Params
prompt(string)systemPrompt?model?overridetemperature?maxOutputTokens?streaming?(future use)signal?AbortSignal (future wiring)cacheKey?custom orfalseto bypass cache
Caching
LRU in-memory; suits single runtime instance. External cache (Redis, KV) can wrap by replacing ResponseCache logic (PRs welcome).
Rate Limiting
Two layers: concurrency (parallel tasks) and requestsPerInterval within a sliding window intervalMs. Set throwOnLimit: true to error instead of waiting.
Retries
Uses exponential backoff. Disable with retry: false.
Sanitization
Naive removal of control chars and prompt injection phrases; override with custom function.
Edge / Serverless
All dependencies are ESM-friendly. Avoid Node-specific APIs if targeting strict edge (replace crypto hash with Web Crypto; TODO: auto-detect later).
Roadmap
- True streaming per provider
- Tool / function calling abstraction
- Token usage normalization
- Pluggable logging hooks
- Web Crypto fallback
- Middleware pipeline
License
MIT
