npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ml-cache

v1.2.0

Published

SDK to collect and store business/product events for future ML training. Store now, train later when AI becomes affordable.

Readme

ml-cache

Store your business data today. Train your AI models tomorrow.

npm version License: MIT TypeScript Node.js


The Problem

Machine learning is transforming every industry, but there's a catch: you need massive amounts of quality data to train effective models. Companies that start collecting data today will have a significant competitive advantage when:

  • ML training costs continue to drop exponentially
  • Your business grows and you need personalized AI features
  • You want to build recommendation engines, fraud detection, or predictive analytics
  • Custom models become essential for differentiation

The data you're generating right now is invaluable for future AI/ML applications. Don't let it slip away.

The Solution

ml-cache is a lightweight TypeScript SDK that captures your business events and stores them in Amazon S3 Glacier — the most cost-effective cold storage solution available. It's designed with a simple philosophy:

Collect everything now. Pay almost nothing. Train models when ready.

Why Cold Storage?

| Storage Type | Cost per TB/month | Retrieval | | ------------------------ | ----------------- | ---------------- | | S3 Standard | ~$23 | Instant | | S3 Glacier | ~$4 | Minutes to hours | | S3 Glacier Deep Archive | ~$1 | 12-48 hours |

For ML training data that you'll access months or years from now, cold storage is 20x cheaper than standard storage.


Features

  • Simple API — One method to cache all your data: cache()
  • Automatic Batching — Efficiently groups events to minimize API calls
  • Smart Retry Logic — Exponential backoff ensures no data loss
  • Type-Safe — Full TypeScript support with comprehensive type definitions
  • Flexible Storage — S3 Standard, Glacier, or Glacier Deep Archive
  • Rich Context — Capture user, device, page, and campaign data
  • Zero Dependencies on Analytics — Direct AWS integration, no middlemen
  • Production Ready — Battle-tested error handling and graceful shutdown
  • Backend Only — Designed for Node.js server-side applications

Platform Support

Important: This SDK is designed for backend/server-side use only (Node.js 18+).

It is not compatible with:

  • Browser environments
  • Edge runtimes (Cloudflare Workers, Vercel Edge)
  • React Native or mobile apps

The SDK requires Node.js APIs (crypto, Buffer) and direct AWS SDK access, which are not available in browser or edge environments.


Installation

npm install ml-cache
yarn add ml-cache
pnpm add ml-cache

Quick Start

import { MLCacheClient } from 'ml-cache';

// Initialize the client
const mlCache = new MLCacheClient({
  credentials: {
    accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
  },
  s3: {
    bucket: 'my-ml-data-lake',
    region: 'us-east-1',
    storageClass: 'GLACIER', // Cost-effective cold storage
  },
  storageMode: 'S3',
  sourceApp: 'my-webapp',
  environment: 'production',
});

// Cache business data
await mlCache.cache({
  data: {
    productId: 'SKU-12345',
    productName: 'Premium Widget',
    price: 99.99,
    currency: 'USD',
    quantity: 2,
  },
  context: {
    user: {
      userId: 'user-789',
      traits: {
        plan: 'premium',
        signupDate: '2024-01-15',
      },
    },
  },
});

// Graceful shutdown (flushes remaining events)
await mlCache.shutdown();

Configuration

Full Configuration Options

import { MLCacheClient, type MLCacheConfig } from 'ml-cache';

const config: MLCacheConfig = {
  // Required: AWS Credentials
  credentials: {
    accessKeyId: 'AKIA...',
    secretAccessKey: '...',
    sessionToken: '...', // Optional: for temporary credentials
  },

  // S3 Configuration (required for S3 or S3_TO_GLACIER mode)
  s3: {
    bucket: 'my-ml-data-bucket',
    region: 'us-east-1',
    prefix: 'events/', // Optional: folder prefix for objects
    storageClass: 'GLACIER', // STANDARD, GLACIER, DEEP_ARCHIVE, etc.
  },

  // Glacier Configuration (required for GLACIER mode)
  glacier: {
    vaultName: 'my-ml-vault',
    region: 'us-east-1',
    accountId: '-', // Optional: defaults to current account
  },

  // Storage mode
  storageMode: 'S3', // 'S3' | 'GLACIER' | 'S3_TO_GLACIER'

  // Batching configuration
  batch: {
    enabled: true, // Enable event batching
    maxSize: 100, // Max events per batch
    maxWaitMs: 30000, // Flush every 30 seconds
  },

  // Retry configuration
  retry: {
    maxRetries: 3,
    initialDelayMs: 1000,
    maxDelayMs: 30000,
    exponentialBackoff: true,
  },

  // Logging configuration
  log: {
    level: 'info', // 'debug' | 'info' | 'warn' | 'error' | 'silent'
    enabled: true,
    customLogger: (level, message, data) => {
      // Your custom logging logic
    },
  },

  // Metadata
  sourceApp: 'my-application',
  environment: 'production',
  debug: false,
};

const client = new MLCacheClient(config);

Storage Classes

Choose the right storage class for your needs:

| Storage Class | Use Case | Retrieval Time | | -------------- | ------------------------------ | -------------- | | STANDARD | Frequent access, testing | Instant | | STANDARD_IA | Infrequent access | Instant | | GLACIER | Recommended for ML data | 1-5 minutes | | DEEP_ARCHIVE | Rarely accessed, lowest cost | 12-48 hours |


Caching Data

Basic Usage

Cache any business data with rich context:

await mlCache.cache({
  data: {
    orderId: 'ORD-123456',
    total: 299.99,
    items: [
      { sku: 'WIDGET-A', quantity: 2, price: 49.99 },
      { sku: 'WIDGET-B', quantity: 1, price: 199.99 },
    ],
    paymentMethod: 'credit_card',
    shippingMethod: 'express',
  },
  context: {
    user: { userId: 'user-123' },
    campaign: {
      source: 'google',
      medium: 'cpc',
      name: 'summer_sale',
    },
  },
});

Event Context

Enrich events with contextual data:

await mlCache.cache({
  data: {
    action: 'feature_used',
    feature: 'dark_mode',
  },
  context: {
    // User context
    user: {
      userId: 'user-123',
      anonymousId: 'anon-456',
      traits: {
        plan: 'pro',
        role: 'admin',
      },
    },

    // Device context
    device: {
      userAgent: 'Mozilla/5.0...',
      deviceType: 'desktop',
      os: 'macOS',
      browser: 'Chrome',
      screenResolution: '1920x1080',
      locale: 'en-US',
      timezone: 'America/New_York',
    },

    // Page context
    page: {
      url: 'https://example.com/settings',
      path: '/settings',
      title: 'Settings',
      referrer: 'https://example.com/home',
    },

    // Campaign/UTM context
    campaign: {
      source: 'newsletter',
      medium: 'email',
      name: 'weekly_digest',
      content: 'cta_button',
    },

    // App context
    app: {
      name: 'MyApp',
      version: '2.1.0',
      build: '456',
    },

    // Custom context
    custom: {
      experimentId: 'exp-123',
      variant: 'B',
    },
  },
});

Callbacks & Monitoring

// Monitor all cached events
mlCache.onEvent((event) => {
  console.log('Event cached:', event.eventId);
});

// Handle errors
mlCache.onError((error, event) => {
  console.error('Failed to store event:', error.message);
  // Optionally: send to error tracking service
});

// Monitor flushes
mlCache.onFlush((result) => {
  console.log(`Flushed ${result.eventCount} events`);
  if (result.failedEventIds.length > 0) {
    console.warn('Failed events:', result.failedEventIds);
  }
});

// Health check
const health = await mlCache.getHealth();
console.log('SDK Health:', health);
// {
//   healthy: true,
//   s3Connected: true,
//   glacierConnected: false,
//   queueSize: 5,
//   lastFlush: '2024-01-15T10:30:00.000Z',
// }

Data Format

Events are stored in NDJSON (Newline Delimited JSON) format, perfect for:

  • Apache Spark — Native NDJSON support
  • AWS Athena — Query directly with SQL
  • Pandaspd.read_json(file, lines=True)
  • Any ML pipeline — Simple line-by-line parsing

S3 Object Structure

s3://my-bucket/ml-cache-events/
├── 2024/
│   ├── 01/
│   │   ├── 15/
│   │   │   ├── 10/
│   │   │   │   ├── batch_1705312200_a1b2c3d4.ndjson
│   │   │   │   └── batch_1705312500_e5f6g7h8.ndjson

Event Schema

{
  "eventId": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "data": {
    "productId": "SKU-123",
    "amount": 99.99
  },
  "context": {
    "user": { "userId": "user-456" }
  },
  "metadata": {
    "sdkVersion": "1.0.0",
    "sourceApp": "my-app",
    "environment": "production",
    "batchId": "batch_1705312200_a1b2c3d4"
  }
}

AWS Setup

IAM Policy

Create an IAM policy with minimal required permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetBucketLocation"],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

For Glacier mode, add:

{
  "Effect": "Allow",
  "Action": ["glacier:UploadArchive", "glacier:DescribeVault"],
  "Resource": "arn:aws:glacier:*:*:vaults/your-vault-name"
}

S3 Lifecycle Policy (Optional)

Automatically transition data to deeper cold storage:

{
  "Rules": [
    {
      "ID": "MLDataLifecycle",
      "Status": "Enabled",
      "Prefix": "ml-cache-events/",
      "Transitions": [
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ]
    }
  ]
}

Best Practices

1. Capture Rich Context

The more context you capture now, the better your models will be:

// Good: Rich context for future ML
await mlCache.cache({
  data: {
    action: 'product_viewed',
    productId: 'SKU-123',
    category: 'electronics',
    price: 299.99,
    inStock: true,
    viewDuration: 45,
    scrollDepth: 0.8,
  },
  context: {
    user: { userId: 'user-456', traits: { segment: 'high-value' } },
    page: { referrer: 'google.com' },
    device: { deviceType: 'mobile', os: 'iOS' },
    custom: { searchQuery: 'best headphones' },
  },
});

2. Graceful Shutdown

Always flush events before application exit:

process.on('SIGTERM', async () => {
  await mlCache.shutdown();
  process.exit(0);
});

3. Monitor Queue Size

Prevent memory issues in high-traffic scenarios:

setInterval(() => {
  const queueSize = mlCache.getQueueSize();
  if (queueSize > 5000) {
    console.warn(`Queue size high: ${queueSize}`);
  }
}, 60000);

Future ML Use Cases

The data you collect today can power tomorrow's AI features:

| Data Type | Future ML Application | | --------------------- | ---------------------------------------------- | | Purchase data | Recommendation engine, demand forecasting | | Page views | Content personalization, A/B test analysis | | Search queries | Search ranking, query understanding | | Support interactions | Automated responses, sentiment analysis | | User behavior | Churn prediction, engagement scoring | | Product interactions | Dynamic pricing, inventory optimization |


API Reference

MLCacheClient

| Method | Description | | ------------------- | ---------------------------------- | | cache(event) | Cache data for ML training | | flush() | Manually flush the event queue | | getHealth() | Get SDK health status | | getQueueSize() | Get current queue size | | getVersion() | Get SDK version | | shutdown() | Gracefully shutdown the client | | onEvent(callback) | Register event callback | | onError(callback) | Register error callback | | onFlush(callback) | Register flush callback |

Event Structure

interface MLCacheEvent {
  // Auto-generated if not provided
  eventId?: string;
  timestamp?: string;

  // Your business data
  data?: Record<string, unknown>;

  // Rich context
  context?: {
    user?: { userId?: string; anonymousId?: string; traits?: Record<string, unknown> };
    device?: { userAgent?: string; deviceType?: string; os?: string; /* ... */ };
    page?: { url?: string; path?: string; title?: string; referrer?: string };
    campaign?: { source?: string; medium?: string; name?: string; /* ... */ };
    app?: { name?: string; version?: string; build?: string };
    custom?: Record<string, unknown>;
  };
}

License

MIT


Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests to the GitHub repository.