@epicdm/flowstate-agents-llm-client

v1.0.7

Published

25 days ago

Unified LLM API client for Epic Flow with multi-provider support

0High
0Medium
0Low

spencer.epic

llm ai anthropic openai lmstudio vercel-ai epic-flow

@epic-flow/llm-client

A unified, production-ready LLM client for Epic Flow with multi-provider support, streaming capabilities, automatic retries, and seamless integration with Agent Memory Server (AMS) and Knowledge Store.

Features

Multi-Provider Support - Work with Anthropic Claude, OpenAI GPT, and LM Studio with a unified API
Streaming Responses - Stream text generation for real-time user experiences
Automatic Retry Logic - Exponential backoff with configurable retry strategies
Cost Tracking - Automatic cost calculation for all providers
Agent Memory Server (AMS) Integration - Maintain conversation context across sessions
Knowledge Store Integration - Enhance prompts with relevant knowledge automatically
TypeScript First - Full type safety with comprehensive type definitions
Environment Agnostic - Works in Node.js and browser environments
Event Hooks - Monitor and control request lifecycle with callbacks
Conversation History - Automatic conversation management
Budget Controls - Set spending limits and token caps

Installation

yarn add @epic-flow/llm-client

Peer Dependencies

The package has optional peer dependencies for advanced features:

# Optional: For AMS integration
yarn add @epic-flow/flowstate-agents-memory-client

# Optional: For Knowledge Store integration
yarn add @epic-flow/flowstate-agents-knowledge-store

Quick Start

Basic Query

import { LLMClient } from '@epic-flow/llm-client';

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const result = await client.query({
  prompt: 'What is the capital of France?',
});

console.log(result.content); // "The capital of France is Paris."
console.log(result.cost.totalCostUSD); // 0.0012

Streaming Response

const client = new LLMClient({
  provider: 'openai',
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
});

for await (const chunk of client.stream({
  prompt: 'Write a short poem about coding',
})) {
  if (chunk.type === 'text-delta') {
    process.stdout.write(chunk.textDelta);
  } else if (chunk.type === 'finish') {
    console.log('\n\nCost:', chunk.cost.totalCostUSD);
  }
}

Provider Configuration

Anthropic Claude

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  temperature: 0.7,
  maxTokens: 2000,
});

Available Models:

claude-opus-4 - Most powerful, best for complex tasks
claude-sonnet-4 - Balanced performance and cost
claude-haiku-4 - Fast and cost-effective

OpenAI GPT

const client = new LLMClient({
  provider: 'openai',
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
});

Available Models:

gpt-4o - Latest GPT-4 optimized
gpt-4-turbo - High performance GPT-4
gpt-3.5-turbo - Cost-effective option

LM Studio (Local Models)

const client = new LLMClient({
  provider: 'lmstudio',
  model: 'your-local-model',
  baseUrl: 'http://localhost:1234/v1',
});

Advanced Features

Retry Configuration

Automatic retry with exponential backoff for transient failures:

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  retry: {
    maxAttempts: 3,
    initialDelayMs: 1000,
    maxDelayMs: 10000,
    backoffMultiplier: 2,
    onRetry: (attempt, error) => {
      console.log(`Retry attempt ${attempt}:`, error.message);
    },
  },
});

Agent Memory Server (AMS) Integration

Maintain conversation context across sessions:

import { createAMSClient } from '@epic-flow/flowstate-agents-memory-client';

const amsClient = createAMSClient({
  serverUrl: 'http://localhost:7001',
  apiKey: process.env.AMS_API_KEY,
});

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  ams: {
    client: amsClient,
    sessionId: 'user-session-123',
    namespace: 'chat',
    contextWindowMax: 10,
    autoSave: true,
    includeContextByDefault: true,
  },
});

// AMS will automatically summarize and store conversation history
const result = await client.query({
  prompt: 'What did we discuss earlier?',
  includeAMSContext: true,
  saveToAMS: true,
});

Knowledge Store Integration

Enhance prompts with relevant knowledge automatically:

import { createKnowledgeStoreClient } from '@epic-flow/flowstate-agents-knowledge-store';

const knowledgeClient = createKnowledgeStoreClient({
  serverUrl: 'http://localhost:7002',
});

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  knowledgeStore: {
    client: knowledgeClient,
    agentId: 'support-agent',
    autoEnhancePrompt: true,
    knowledgeTypes: ['policies', 'procedures'],
    maxKnowledgeItems: 5,
    minImportance: 0.7,
  },
});

// Knowledge Store will automatically retrieve and inject relevant context
const result = await client.query({
  prompt: 'What is our refund policy?',
});

Event Hooks

Monitor and control the request lifecycle:

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  onRequest: (request) => {
    console.log('Request started:', request.timestamp);
  },
  onResponse: (response) => {
    console.log('Response received:', {
      latency: response.latency,
      tokens: response.tokens,
      cost: response.cost,
    });
  },
  onError: (error) => {
    console.error('Request failed:', error);
  },
  onStreamStart: (event) => {
    console.log('Stream started:', event);
  },
  onStreamEnd: (event) => {
    console.log('Stream ended:', event);
  },
});

Budget Controls

Set spending limits and token caps:

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  budget: {
    maxCostPerRequest: 0.10, // USD
    maxTokensPerRequest: 4000,
    dailyBudget: 10.00, // USD
    onBudgetExceeded: (spent, limit) => {
      console.warn(`Budget exceeded: $${spent} / $${limit}`);
    },
  },
});

Conversation History Management

Access and manage conversation history:

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// First message
await client.query({ prompt: 'Hello!' });

// Second message - context is maintained
await client.query({ prompt: 'What did I just say?' });

// Get conversation history
const history = client.getHistory();
console.log(history);
// [
//   { role: 'user', content: 'Hello!' },
//   { role: 'assistant', content: 'Hello! How can I help you today?' },
//   { role: 'user', content: 'What did I just say?' },
//   { role: 'assistant', content: 'You said "Hello!"' }
// ]

// Clear history for new conversation
client.clearHistory();

Logging Configuration

Control logging behavior:

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  logging: {
    level: 'debug',
    logRequests: true,
    logResponses: true,
    logTokenUsage: true,
    redactApiKey: true,
    logger: {
      debug: (msg, ...args) => console.debug(msg, ...args),
      info: (msg, ...args) => console.info(msg, ...args),
      warn: (msg, ...args) => console.warn(msg, ...args),
      error: (msg, ...args) => console.error(msg, ...args),
    },
  },
});

API Documentation

LLMClient

Constructor

new LLMClient(config: LLMClientConfig)

Creates a new LLM client instance with the specified configuration.

Methods

`query(params: QueryParams): Promise<LLMResult>`

Execute a single query and get a complete response.

Parameters:

prompt (string, required) - The user prompt
systemPrompt (string, optional) - System instructions
temperature (number, optional) - Override default temperature (0-1)
maxTokens (number, optional) - Override default max tokens
tools (Record<string, ToolDefinition>, optional) - Available tools
toolChoice ('auto' | 'required' | 'none', optional) - Tool usage strategy
maxToolRoundtrips (number, optional) - Max tool execution rounds
includeAMSContext (boolean, optional) - Include AMS context
saveToAMS (boolean, optional) - Save to AMS after query
responseFormat ('text' | 'json', optional) - Response format
schema (ZodSchema, optional) - Validation schema for JSON responses

Returns: Promise<LLMResult>

content - The generated text
usage - Token usage statistics
cost - Cost breakdown in USD
toolCalls - Tool calls made (if any)
metadata - Request metadata (provider, model, latency, etc.)

`stream(params: QueryParams): AsyncGenerator<StreamChunk>`

Stream text generation for real-time responses.

Parameters: Same as query()

Yields: StreamChunk

Text chunks: { type: 'text-delta', textDelta: string }
Finish chunk: { type: 'finish', finishReason: string, usage: TokenUsage, cost: CostBreakdown }

`getHistory(): Array<{ role: string; content: string }>`

Get the current conversation history.

`clearHistory(): void`

Clear the conversation history.

Types

LLMClientConfig

interface LLMClientConfig {
  provider: 'anthropic' | 'openai' | 'lmstudio';
  model: string;
  apiKey?: string;
  baseUrl?: string;
  environment?: 'node' | 'browser';
  proxyUrl?: string;
  temperature?: number;
  maxTokens?: number;
  retry?: RetryConfig;
  logging?: LoggingConfig;
  budget?: BudgetConfig;
  onRequest?: (request: LLMRequest) => void;
  onResponse?: (response: LLMResponse) => void;
  onError?: (error: Error) => void;
  onStreamStart?: (event: StreamStartEvent) => void;
  onStreamEnd?: (event: StreamEndEvent) => void;
  ams?: AMSConfig;
  knowledgeStore?: KnowledgeStoreConfig;
}

LLMResult

interface LLMResult {
  content: string;
  usage: TokenUsage;
  cost: CostBreakdown;
  toolCalls?: ToolCall[];
  metadata: ResultMetadata;
}

TokenUsage

interface TokenUsage {
  inputTokens: number;
  outputTokens: number;
  cacheCreationTokens: number;
  cacheReadTokens: number;
  totalTokens: number;
}

CostBreakdown

interface CostBreakdown {
  inputCostUSD: number;
  outputCostUSD: number;
  cacheCostUSD: number;
  totalCostUSD: number;
  model: string;
  currency: 'USD';
}

Error Handling

The client throws LLMError instances with detailed error information:

import { LLMError, LLMErrorCode } from '@epic-flow/llm-client';

try {
  const result = await client.query({ prompt: 'Test' });
} catch (error) {
  if (error instanceof LLMError) {
    console.error('Error code:', error.code);
    console.error('Provider:', error.provider);
    console.error('Retryable:', error.retryable);
    console.error('Original error:', error.originalError);
  }
}

Error Codes:

RATE_LIMIT - Rate limit exceeded
INVALID_API_KEY - API key is invalid
MODEL_NOT_FOUND - Model doesn't exist
CONTEXT_LENGTH_EXCEEDED - Input too long
NETWORK_ERROR - Network connection failed
TIMEOUT - Request timed out
TOOL_EXECUTION_ERROR - Tool execution failed
BUDGET_EXCEEDED - Budget limit reached
UNKNOWN - Unclassified error

Environment Detection

The client automatically detects whether it's running in Node.js or a browser environment and adjusts its behavior accordingly:

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  // environment is auto-detected, but can be overridden
  environment: 'node', // or 'browser'
});

console.log(client.environment); // 'node' or 'browser'

Cost Tracking

The client automatically tracks costs for all providers based on current pricing:

const result = await client.query({ prompt: 'Test' });

console.log('Cost Breakdown:');
console.log('  Input:  $', result.cost.inputCostUSD);
console.log('  Output: $', result.cost.outputCostUSD);
console.log('  Cache:  $', result.cost.cacheCostUSD);
console.log('  Total:  $', result.cost.totalCostUSD);
console.log('  Model: ', result.cost.model);

Best Practices

1. Use Environment Variables for API Keys

// Good
const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// Bad - hardcoded API key
const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: 'sk-ant-...',
});

2. Configure Retry Logic for Production

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  retry: {
    maxAttempts: 3,
    initialDelayMs: 1000,
    maxDelayMs: 10000,
  },
});

3. Set Budget Limits

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  budget: {
    maxCostPerRequest: 0.50,
    dailyBudget: 100.00,
  },
});

4. Use Event Hooks for Monitoring

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY,
  onResponse: (response) => {
    // Log to your monitoring system
    metrics.recordLatency(response.latency);
    metrics.recordCost(response.cost);
  },
});

5. Clear History for New Conversations

// Starting a new conversation topic
client.clearHistory();
await client.query({ prompt: 'New topic...' });

Contributing

Contributions are welcome! Please see the Epic Flow contribution guidelines.

Development

# Install dependencies
yarn install

# Run tests
yarn test

# Run tests in watch mode
yarn test:watch

# Build the package
yarn build

# Type checking
yarn typecheck

# Linting
yarn lint

License

MIT

Support

For issues, questions, or contributions, please refer to the Epic Flow project documentation.