@vishnu-vashishth/llm-sdk
v1.2.0
Published
A standalone, type-safe LLM SDK with multi-provider support, retry logic, context management, and intelligent tokenization.
Maintainers
Readme
LLM SDK
A flexible LLM SDK with streaming, context management, multi-provider support, and choice between integrated or custom flows.
Installation
npm install @vishnu-vashishth/llm-sdk zodTwo Approaches
1. Integrated Flow (ChatEngine)
Full-featured engine that handles everything:
import {
ChatEngine,
ModelRegistry,
ProviderRegistry,
ContextManager,
TokenizerService,
ProviderType,
} from '@vishnu-vashishth/llm-sdk';
// Setup
const models = new ModelRegistry();
models.registerModel({
modelName: 'gpt-4',
providerType: ProviderType.OPENAI,
contextWindow: 8192,
providerConfig: { apiKey: process.env.OPENAI_API_KEY },
defaultSettings: { temperature: 0.7 },
});
const engine = new ChatEngine({
models,
providers: new ProviderRegistry(),
context: new ContextManager(TokenizerService.getInstance()),
messages: yourMessageStore, // Implement MessageStore interface
defaultModel: 'gpt-4',
});
// Use
const response = await engine.chat({
conversationId: 'user-123',
content: 'Hello!',
});
// Stream
for await (const delta of engine.stream({ conversationId: 'user-123', content: 'Hi' })) {
process.stdout.write(delta.delta.content ?? '');
}2. Custom Flow (Individual Components)
Build your own pipeline:
import {
OpenAIProvider,
ContextManager,
TokenizerService,
ModelRegistry,
TruncationStrategy,
} from '@vishnu-vashishth/llm-sdk';
// 1. Get messages from your database
const messages = await db.getMessages(userId);
// 2. Truncate to fit context window
const ctx = new ContextManager(TokenizerService.getInstance());
const { messages: truncated } = await ctx.truncateMessagesWithMetadata(messages, {
maxTokens: 8192,
bufferTokens: 100,
completionTokens: 1000,
truncationStrategy: TruncationStrategy.SLIDING_WINDOW,
});
// 3. Send to provider
const provider = new OpenAIProvider({ apiKey: process.env.OPENAI_API_KEY });
const response = await provider.generateChat({
model: 'gpt-4',
messages: truncated,
temperature: 0.7,
});
// 4. Stream tokens
for await (const delta of provider.generateChatStream({ model: 'gpt-4', messages })) {
process.stdout.write(delta.delta.content ?? '');
}Core Components
| Component | Purpose |
|-----------|---------|
| ChatEngine | Integrated flow coordinator |
| ModelRegistry | Model configurations |
| ProviderRegistry | Provider instances |
| ContextManager | Token counting, truncation, caching |
| OpenAIProvider | OpenAI-compatible API calls |
| PromptService | Template rendering with Zod validation |
Streaming
// Async generator
for await (const delta of provider.generateChatStream(payload)) {
process.stdout.write(delta.delta.content ?? '');
}
// Callbacks
await provider.generateChatStreamWithCallbacks(payload, {
onToken: (token) => process.stdout.write(token),
onComplete: (response) => console.log('Done'),
});Context Management
const ctx = new ContextManager(tokenizer, { cacheSize: 5000 });
// Token counts are cached
const tokens = await ctx.countMessagesTokens(messages);
// Incremental session management
ctx.createSession('chat-1', { maxTokens: 8192 });
await ctx.appendToSession('chat-1', newMessages);
// Cache stats
console.log(ctx.getCacheStats()); // { hits, misses, hitRate }Error Handling
import { RateLimitError, TimeoutError, isRecoverableError } from '@vishnu-vashishth/llm-sdk';
try {
await provider.generateChat(payload);
} catch (error) {
if (error instanceof RateLimitError) {
await delay(error.retryAfter);
} else if (isRecoverableError(error)) {
// Retry
}
}Implementing Stores
// MessageStore interface
interface MessageStore {
getMessages(conversationId: string, limit?: number): Promise<Message[]>;
saveMessage(conversationId: string, message: Message): Promise<void>;
}
// MemoryStore interface (optional)
interface MemoryStore {
getRelevant(query: string, limit?: number): Promise<Memory[]>;
}Documentation
License
ISC
