burt-logger

v0.1.0

Published

3 months ago

A lightweight TypeScript/JavaScript package for collecting LLM training data

0High
0Medium
0Low

bobbyzhong

llm logging training-data machine-learning ai openai anthropic typescript

Burt Logger (TypeScript/JavaScript)

A lightweight, production-ready TypeScript/JavaScript package for collecting LLM training data. Automatically pipe LLM request/response data to your backend for model fine-tuning and dataset creation.

Features

✨ Non-blocking & Asynchronous - Uses async/await and timers to ensure zero impact on your application performance

🔄 Intelligent Batching - Automatically batches logs by size or time interval for optimal network efficiency

🛡️ Production-Ready - Graceful error handling, automatic retry with exponential backoff

🚀 Zero Dependencies - Only uses Node.js built-in modules (fetch, timers, process)

⚙️ Highly Configurable - Customize batch sizes, flush intervals, queue sizes, retry logic, and more

🔌 Provider Agnostic - Works with OpenAI, Anthropic, or any LLM provider

📦 TypeScript Native - Full TypeScript support with type definitions included

Installation

npm install burt-logger

Or with yarn:

yarn add burt-logger

Or with pnpm:

pnpm add burt-logger

Quick Start

TypeScript

import { LLMLogger } from 'burt-logger';

// Initialize the logger
const logger = new LLMLogger({
  endpoint: 'https://your-api.com/logs',
  apiKey: 'your-api-key',
});

// Log your LLM requests and responses
const response = await openai.chat.completions.create(...);

logger.log(
  'gpt-4',
  {
    messages: [...],
    temperature: 0.7,
  },
  {
    content: response.choices[0].message.content,
    usage: {
      prompt_tokens: response.usage?.prompt_tokens || 0,
      completion_tokens: response.usage?.completion_tokens || 0,
      total_tokens: response.usage?.total_tokens || 0,
    },
  }
);

// Gracefully shutdown (flushes remaining logs)
await logger.shutdown();

JavaScript (CommonJS)

const { LLMLogger } = require('burt-logger');

const logger = new LLMLogger({
  endpoint: 'https://your-api.com/logs',
  apiKey: 'your-api-key',
});

// Your code here
logger.log('gpt-3.5-turbo', request, response);

// Cleanup
await logger.shutdown();

That's it! The logger handles everything asynchronously in the background.

Configuration

The LLMLogger constructor accepts a configuration object with the following options:

| Parameter | Type | Default | Description | | ------------------- | ------- | ------------ | ------------------------------------------------ | | endpoint | string | Required | Backend API endpoint to send logs to | | apiKey | string | Required | API key for authentication | | batchSize | number | 10 | Number of logs to batch before sending | | flushInterval | number | 5000 | Milliseconds to wait before flushing incomplete batch | | maxQueueSize | number | 10000 | Maximum number of logs to queue | | maxRetries | number | 3 | Maximum number of retry attempts | | initialRetryDelay | number | 1000 | Initial delay for exponential backoff (ms) | | maxRetryDelay | number | 60000 | Maximum retry delay (ms) | | timeout | number | 10000 | HTTP request timeout (ms) | | debug | boolean | false | Enable debug logging |

Example with Custom Configuration

const logger = new LLMLogger({
  endpoint: 'https://your-api.com/logs',
  apiKey: 'your-api-key',
  batchSize: 20,           // Send in batches of 20
  flushInterval: 10000,    // Or every 10 seconds
  maxQueueSize: 50000,     // Large queue for high-volume apps
  maxRetries: 5,           // More retries for flaky networks
  debug: true,             // See what's happening
});

API Reference

`log(targetModel, request, response, metadata?)`

Log an LLM request/response pair.

Parameters:

targetModel (string): The target model being used (e.g., "gpt-4", "claude-3")
request (object): The LLM request data (prompt, model, parameters, etc.)
response (object): The LLM response data (completion, tokens, etc.)
metadata (object, optional): Additional metadata (user_id, session_id, etc.)

Returns:

boolean: true if log was queued successfully, false if queue is full

Example:

const success = logger.log(
  'gpt-4',
  { model: 'gpt-4', prompt: '...' },
  { completion: '...', tokens: 150 },
  { user_id: '123', environment: 'production' }
);

`flush(timeoutMs?)`

Flush all queued logs and wait for them to be sent.

Parameters:

timeoutMs (number, optional): Maximum time to wait in milliseconds

Example:

await logger.flush(5000); // Wait up to 5 seconds

`shutdown(timeoutMs?)`

Gracefully shutdown the logger, flushing all remaining logs.

Parameters:

timeoutMs (number): Maximum time to wait for shutdown in milliseconds (default: 10000)

Example:

await logger.shutdown(10000);

`getStats()`

Get statistics about logger performance.

Returns:

LoggerStats: Object containing statistics

Example:

const stats = logger.getStats();
console.log(stats);
// {
//   logsQueued: 150,
//   logsSent: 145,
//   logsFailed: 5,
//   batchesSent: 15,
//   batchesFailed: 1
// }

How It Works

Queueing: When you call log(), the entry is immediately added to an in-memory queue and the method returns instantly (non-blocking)
Batching: Logs are batched based on:
- Batch size (e.g., 10 logs)
- Time interval (e.g., every 5 seconds)
Sending: Batches are sent to your backend API via HTTP POST with proper authentication headers
Retry Logic: If sending fails:
- 5xx errors: Retries with exponential backoff
- 429 (rate limit): Retries with exponential backoff
- 4xx errors: No retry (client error)
- Network errors: Retries with exponential backoff
Shutdown: On program exit or explicit shutdown, all remaining logs are flushed

Backend API Expected Format

Your backend should expect POST requests with the following format:

Headers:

Content-Type: application/json
Authorization: Bearer <api_key>

Payload:

{
  "logs": [
    {
      "target_model": "gpt-4",
      "request": { /* your request data */ },
      "response": { /* your response data */ },
      "metadata": { /* optional metadata */ },
      "timestamp": 1234567890.123
    },
    ...
  ]
}

Expected Response:

Success: HTTP 200, 201, or 202
Server Error: HTTP 5xx (will retry)
Client Error: HTTP 4xx (will not retry)
Rate Limited: HTTP 429 (will retry)

Error Handling

The logger is designed to be resilient and never crash your application:

Queue Full: If the queue is full, log() returns false and the log is dropped
Network Errors: Automatic retry with exponential backoff
Backend Down: Retries up to maxRetries times, then drops the batch
Process Exit: Automatically attempts to flush remaining logs

All errors are logged to console. Enable debug mode to see detailed logs:

const logger = new LLMLogger({ ..., debug: true });

Examples

OpenAI Integration

import OpenAI from 'openai';
import { LLMLogger } from 'burt-logger';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const logger = new LLMLogger({
  endpoint: process.env.BURT_ENDPOINT!,
  apiKey: process.env.BURT_API_KEY!,
});

async function chat(userMessage: string) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: userMessage }],
  });

  // Log the interaction
  logger.log(
    'gpt-4',
    { messages: [{ role: 'user', content: userMessage }] },
    {
      content: response.choices[0].message.content,
      usage: response.usage,
    },
    { user_id: 'example-user' }
  );

  return response.choices[0].message.content;
}

Anthropic Claude Integration

import Anthropic from '@anthropic-ai/sdk';
import { LLMLogger } from 'burt-logger';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const logger = new LLMLogger({
  endpoint: process.env.BURT_ENDPOINT!,
  apiKey: process.env.BURT_API_KEY!,
});

async function chat(userMessage: string) {
  const response = await anthropic.messages.create({
    model: 'claude-3-opus-20240229',
    max_tokens: 1024,
    messages: [{ role: 'user', content: userMessage }],
  });

  // Log the interaction
  logger.log(
    'claude-3-opus-20240229',
    { messages: [{ role: 'user', content: userMessage }] },
    {
      content: response.content[0].text,
      usage: {
        prompt_tokens: response.usage.input_tokens,
        completion_tokens: response.usage.output_tokens,
        total_tokens: response.usage.input_tokens + response.usage.output_tokens,
      },
    }
  );

  return response.content[0].text;
}

Express.js Middleware

import express from 'express';
import { LLMLogger } from 'burt-logger';

const app = express();
const logger = new LLMLogger({
  endpoint: process.env.BURT_ENDPOINT!,
  apiKey: process.env.BURT_API_KEY!,
});

// Graceful shutdown
process.on('SIGTERM', async () => {
  await logger.shutdown();
  process.exit(0);
});

app.post('/api/chat', async (req, res) => {
  const { message } = req.body;

  // Your LLM call
  const llmResponse = await callLLM(message);

  // Log the interaction
  logger.log(
    'gpt-4',
    { message },
    llmResponse,
    { user_id: req.user?.id, session_id: req.sessionID }
  );

  res.json(llmResponse);
});

Performance Considerations

Non-blocking: log() calls take ~0.001ms (just queue insertion)
Memory: Each log entry is ~1-5KB. Default max queue size is 10,000 logs = ~10-50MB
Network: Batching reduces network overhead. 1000 logs/second = 100 batches (batchSize=10)
Timers: Uses a single timer for periodic flushing

Production Recommendations

Set appropriate batch_size: Larger batches are more efficient but increase memory usage
```
const logger = new LLMLogger({ ..., batchSize: 50 }); // For high-volume apps
```

Monitor queue size: If logs are being dropped, increase maxQueueSize or reduce traffic

const stats = logger.getStats();
if (stats.logsFailed > 0) {
  // Handle appropriately
}

Use metadata: Add user_id, session_id, etc. for better data analysis
```
logger.log(..., { user_id: userId, env: 'prod' });
```

Graceful shutdown: Always call shutdown() before exiting

process.on('SIGTERM', async () => {
  await logger.shutdown();
  process.exit(0);
});

Requirements

Node.js >= 14.0.0 (requires native fetch support in Node.js 18+, or a polyfill for earlier versions)

License

MIT License - see LICENSE file for details

Support

Issues: https://github.com/trainburt/burt-logger-ts/issues
Email: [email protected]

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Changelog

0.1.0 (Initial Release)

✅ Non-blocking asynchronous logging
✅ Intelligent batching (by size and time)
✅ Retry with exponential backoff
✅ Graceful shutdown and cleanup
✅ Statistics tracking
✅ Full TypeScript support

Built with ❤️ for the LLM training data collection community

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Burt Logger (TypeScript/JavaScript)

Features

Installation

Quick Start

TypeScript

JavaScript (CommonJS)

Configuration

Example with Custom Configuration

API Reference

log(targetModel, request, response, metadata?)

flush(timeoutMs?)

shutdown(timeoutMs?)

getStats()

How It Works

Backend API Expected Format

Error Handling

Examples

OpenAI Integration

Anthropic Claude Integration

Express.js Middleware

Performance Considerations

Production Recommendations

Requirements

License

Support

Contributing

Changelog

0.1.0 (Initial Release)

`log(targetModel, request, response, metadata?)`

`flush(timeoutMs?)`

`shutdown(timeoutMs?)`

`getStats()`