npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

rehydra

v0.3.3

Published

On-device PII anonymization module for high-privacy AI workflows

Downloads

927

Readme

Rehydra

License Issues codecov

On-device PII anonymization module for high-privacy AI workflows. Detects and replaces Personally Identifiable Information (PII) with placeholder tags while maintaining an encrypted mapping for later rehydration.

npm install rehydra

Works in Node.js, Bun, and browsers

Features

  • Structured PII Detection: Regex-based detection for emails, phones, IBANs, credit cards, IPs, URLs
  • Soft PII Detection: ONNX-powered NER model for names, organizations, locations (auto-downloads on first use if enabled)
  • Semantic Enrichment: AI/MT-friendly tags with gender/location attributes for better translations
  • Secure PII Mapping: AES-256-GCM encrypted storage of original PII values
  • Cross-Platform: Works identically in Node.js, Bun, and browsers
  • Configurable Policies: Customizable detection rules, thresholds, and allowlists
  • Validation & Leak Scanning: Built-in validation and optional leak detection

Installation

Node.js / Bun

npm install rehydra

Browser (with bundler)

npm install rehydra onnxruntime-web

When using Vite, webpack, or other bundlers, the browser-safe entry point is automatically selected via conditional exports. This entry point excludes Node.js-specific modules like SQLite storage.

Browser (without bundler)

<script type="module">
  // Import directly from your dist folder or CDN
  import { createAnonymizer } from './node_modules/rehydra/dist/index.js';
  
  // onnxruntime-web is automatically loaded from CDN when needed
</script>

Quick Start

Regex-Only Mode (No Downloads Required)

For structured PII like emails, phones, IBANs, credit cards:

import { anonymizeRegexOnly } from 'rehydra';

const result = await anonymizeRegexOnly(
  'Contact [email protected] or call +49 30 123456. IBAN: DE89370400440532013000'
);

console.log(result.anonymizedText);
// "Contact <PII type="EMAIL" id="1"/> or call <PII type="PHONE" id="2"/>. IBAN: <PII type="IBAN" id="3"/>"

Full Mode with NER (Detects Names, Organizations, Locations)

The NER model is automatically downloaded on first use (~280 MB for quantized):

import { createAnonymizer } from 'rehydra';

const anonymizer = createAnonymizer({
  ner: { 
    mode: 'quantized',  // or 'standard' for full model (~1.1 GB)
    onStatus: (status) => console.log(status),
  }
});

await anonymizer.initialize();  // Downloads model if needed

const result = await anonymizer.anonymize(
  'Hello John Smith from Acme Corp in Berlin!'
);

console.log(result.anonymizedText);
// "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"

// Clean up when done
await anonymizer.dispose();

With Semantic Enrichment

Add gender and location scope for better machine translation:

import { createAnonymizer } from 'rehydra';

const anonymizer = createAnonymizer({
  ner: { mode: 'quantized' },
  semantic: { 
    enabled: true,  // Downloads ~12 MB of semantic data on first use
    onStatus: (status) => console.log(status),
  }
});

await anonymizer.initialize();

const result = await anonymizer.anonymize(
  'Hello Maria Schmidt from Berlin!'
);

console.log(result.anonymizedText);
// "Hello <PII type="PERSON" gender="female" id="1"/> from <PII type="LOCATION" scope="city" id="2"/>!"

Example: Translation Workflow (Anonymize → Translate → Rehydrate)

The full workflow for privacy-preserving translation:

import { 
  createAnonymizer, 
  decryptPIIMap, 
  rehydrate,
  InMemoryKeyProvider 
} from 'rehydra';

// 1. Create a key provider (required to decrypt later)
const keyProvider = new InMemoryKeyProvider();

// 2. Create anonymizer with key provider
const anonymizer = createAnonymizer({
  ner: { mode: 'quantized' },
  keyProvider: keyProvider
});

await anonymizer.initialize();

// 3. Anonymize before translation
const original = 'Hello John Smith from Acme Corp in Berlin!';
const result = await anonymizer.anonymize(original);

console.log(result.anonymizedText);
// "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"

// 4. Translate (or do other AI workloads that preserve placeholders)
const translated = await yourAIWorkflow(result.anonymizedText, { from: 'en', to: 'de' });
// "Hallo <PII type="PERSON" id="1"/> von <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"

// 5. Decrypt the PII map using the same key
const encryptionKey = await keyProvider.getKey();
const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);

// 6. Rehydrate - replace placeholders with original values
const rehydrated = rehydrate(translated, piiMap);

console.log(rehydrated);
// "Hallo John Smith von Acme Corp in Berlin!"

// 7. Clean up
await anonymizer.dispose();

Key Points

  • Save the encryption key - You need the same key to decrypt the PII map
  • Placeholders are XML-like - Most translation services preserve them automatically
  • PII stays local - Original values never leave your system during translation

API Reference

Configuration Options

import { createAnonymizer, InMemoryKeyProvider } from 'rehydra';

const anonymizer = createAnonymizer({
  // NER configuration
  ner: {
    mode: 'quantized',              // 'standard' | 'quantized' | 'disabled' | 'custom'
    backend: 'local',               // 'local' (default) | 'inference-server'
    autoDownload: true,             // Auto-download model if not present
    onStatus: (status) => {},       // Status messages callback
    onDownloadProgress: (progress) => {
      console.log(`${progress.file}: ${progress.percent}%`);
    },
    
    // For 'inference-server' backend:
    inferenceServerUrl: 'http://localhost:8080',
    
    // For 'custom' mode only:
    modelPath: './my-model.onnx',
    vocabPath: './vocab.txt',
  },
  
  // Semantic enrichment (adds gender/scope attributes)
  semantic: {
    enabled: true,                  // Enable MT-friendly attributes
    autoDownload: true,             // Auto-download semantic data (~12 MB)
    onStatus: (status) => {},
    onDownloadProgress: (progress) => {},
  },
  
  // Encryption key provider
  keyProvider: new InMemoryKeyProvider(),
  
  // Custom policy (optional)
  defaultPolicy: { /* see Policy section */ },
});

await anonymizer.initialize();

NER Modes

| Mode | Description | Size | Auto-Download | |------|-------------|------|---------------| | 'disabled' | No NER, regex only | 0 | N/A | | 'quantized' | Smaller model, ~95% accuracy | ~280 MB | Yes | | 'standard' | Full model, best accuracy | ~1.1 GB | Yes | | 'custom' | Your own ONNX model | Varies | No |

ONNX Session Options

Fine-tune ONNX Runtime performance with session options:

const anonymizer = createAnonymizer({
  ner: {
    mode: 'quantized',
    sessionOptions: {
      // Graph optimization level: 'disabled' | 'basic' | 'extended' | 'all'
      graphOptimizationLevel: 'all',  // default
      
      // Threading (Node.js only)
      intraOpNumThreads: 4,   // threads within operators
      interOpNumThreads: 1,   // threads between operators
      
      // Memory optimization
      enableCpuMemArena: true,
      enableMemPattern: true,
    }
  }
});

Execution Providers

By default, Rehydra uses:

  • Node.js: CPU (fastest for quantized models)
  • Browsers: WebGPU with WASM fallback

To enable CoreML on macOS (for non-quantized models):

const anonymizer = createAnonymizer({
  ner: {
    mode: 'standard',  // CoreML works better with FP32 models
    sessionOptions: {
      executionProviders: ['coreml', 'cpu'],
    }
  }
});

Note: CoreML provides minimal speedup for quantized (INT8) models since they're already optimized for CPU. Use CoreML with the standard FP32 model for best results.

Available execution providers (local inference): | Provider | Platform | Best For | |----------|----------|----------| | 'cpu' | All | Quantized models (default) | | 'coreml' | macOS | Standard (FP32) models on Apple Silicon | | 'webgpu' | Browsers | GPU acceleration in Chrome 113+ | | 'wasm' | Browsers | Fallback for all browsers |

Note: For NVIDIA GPU acceleration with CUDA/TensorRT, use the inference server backend (see GPU Acceleration).

GPU Acceleration (Enterprise)

For high-throughput production deployments, Rehydra supports GPU-accelerated inference via a dedicated inference server. This provides 10-37× speedup over CPU inference.

const anonymizer = createAnonymizer({
  ner: {
    backend: 'inference-server',
    inferenceServerUrl: 'http://localhost:8080',
  }
});

await anonymizer.initialize();

Performance Comparison:

| Text Size | CPU (local) | GPU (server) | Winner | |-----------|-------------|--------------|--------| | Short (~40 chars) | 4.3ms | 62ms | CPU 14× faster | | Medium (~500 chars) | 26ms | 73ms | CPU 2.8× faster | | Long (~2000 chars) | 93ms | 117ms | CPU 1.3× faster | | Entity-dense | 13ms | 68ms | CPU 5× faster |

Local CPU faster for most use cases due to network overhead. GPU is beneficial for batch processing and large documents.

Backend Options:

| Backend | Description | Latency (2K chars) | |---------|-------------|-------------------| | 'local' | CPU inference (default) | ~4,300ms | | 'inference-server' | GPU server (enterprise) | ~117ms |

Note: The GPU inference server is available as part of Rehydra Enterprise. Contact us for deployment options including Docker containers and Kubernetes helm charts.

Main Functions

createAnonymizer(config?)

Creates a reusable anonymizer instance:

const anonymizer = createAnonymizer({
  ner: { mode: 'quantized' }
});

await anonymizer.initialize();
const result = await anonymizer.anonymize('text');
await anonymizer.dispose();

anonymize(text, locale?, policy?)

One-off anonymization (regex-only by default):

import { anonymize } from 'rehydra';

const result = await anonymize('Contact [email protected]');

anonymizeWithNER(text, nerConfig, policy?)

One-off anonymization with NER:

import { anonymizeWithNER } from 'rehydra';

const result = await anonymizeWithNER(
  'Hello John Smith',
  { mode: 'quantized' }
);

anonymizeRegexOnly(text, policy?)

Fast regex-only anonymization:

import { anonymizeRegexOnly } from 'rehydra';

const result = await anonymizeRegexOnly('Card: 4111111111111111');

Rehydration Functions

decryptPIIMap(encryptedMap, key)

Decrypts the PII map for rehydration:

import { decryptPIIMap } from 'rehydra';

const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);
// Returns Map<string, string> where key is "PERSON:1" and value is "John Smith"

rehydrate(text, piiMap)

Replaces placeholders with original values:

import { rehydrate } from 'rehydra';

const original = rehydrate(translatedText, piiMap);

Result Structure

interface AnonymizationResult {
  // Text with PII replaced by placeholder tags
  anonymizedText: string;
  
  // Detected entities (without original text for safety)
  entities: Array<{
    type: PIIType;
    id: number;
    start: number;
    end: number;
    confidence: number;
    source: 'REGEX' | 'NER';
  }>;
  
  // Encrypted PII mapping (for later rehydration)
  piiMap: {
    ciphertext: string;  // Base64
    iv: string;          // Base64
    authTag: string;     // Base64
  };
  
  // Processing statistics
  stats: {
    countsByType: Record<PIIType, number>;
    totalEntities: number;
    processingTimeMs: number;
    modelVersion: string;
    leakScanPassed?: boolean;
  };
}

Supported PII Types

| Type | Description | Detection | Semantic Attributes | |------|-------------|-----------|---------------------| | EMAIL | Email addresses | Regex | - | | PHONE | Phone numbers (international) | Regex | - | | IBAN | International Bank Account Numbers | Regex + Checksum | - | | BIC_SWIFT | Bank Identifier Codes | Regex | - | | CREDIT_CARD | Credit card numbers | Regex + Luhn | - | | IP_ADDRESS | IPv4 and IPv6 addresses | Regex | - | | URL | Web URLs | Regex | - | | CASE_ID | Case/ticket numbers | Regex (configurable) | - | | CUSTOMER_ID | Customer identifiers | Regex (configurable) | - | | PERSON | Person names | NER | gender (male/female/neutral) | | ORG | Organization names | NER | - | | LOCATION | Location/place names | NER | scope (city/country/region) | | ADDRESS | Physical addresses | NER | - | | DATE_OF_BIRTH | Dates of birth | NER | - |

Configuration

Anonymization Policy

import { createAnonymizer, PIIType } from 'rehydra';

const anonymizer = createAnonymizer({
  ner: { mode: 'quantized' },
  defaultPolicy: {
    // Which PII types to detect
    enabledTypes: new Set([PIIType.EMAIL, PIIType.PHONE, PIIType.PERSON]),
    
    // Confidence thresholds per type (0.0 - 1.0)
    confidenceThresholds: new Map([
      [PIIType.PERSON, 0.8],
      [PIIType.EMAIL, 0.5],
    ]),
    
    // Terms to never treat as PII
    allowlistTerms: new Set(['Customer Service', 'Help Desk']),
    
    // Enable semantic enrichment (gender/scope)
    enableSemanticMasking: true,
    
    // Enable leak scanning on output
    enableLeakScan: true,
  },
});

Custom Recognizers

Add domain-specific patterns:

import { createCustomIdRecognizer, PIIType, createAnonymizer } from 'rehydra';

const customRecognizer = createCustomIdRecognizer([
  {
    name: 'Order Number',
    pattern: /\bORD-[A-Z0-9]{8}\b/g,
    type: PIIType.CASE_ID,
  },
]);

const anonymizer = createAnonymizer();
anonymizer.getRegistry().register(customRecognizer);

Data & Model Storage

Models and semantic data are cached locally for offline use.

Node.js Cache Locations

| Data | macOS | Linux | Windows | |------|-------|-------|---------| | NER Models | ~/Library/Caches/rehydra/models/ | ~/.cache/rehydra/models/ | %LOCALAPPDATA%/rehydra/models/ | | Semantic Data | ~/Library/Caches/rehydra/semantic-data/ | ~/.cache/rehydra/semantic-data/ | %LOCALAPPDATA%/rehydra/semantic-data/ |

Browser Cache

In browsers, data is stored using:

  • IndexedDB: For semantic data and smaller files
  • Origin Private File System (OPFS): For large model files (~280 MB)

Data persists across page reloads and browser sessions.

Manual Data Management

import { 
  // Model management
  isModelDownloaded, 
  downloadModel, 
  clearModelCache,
  listDownloadedModels,
  
  // Semantic data management
  isSemanticDataDownloaded,
  downloadSemanticData,
  clearSemanticDataCache,
} from 'rehydra';

// Check if model is downloaded
const hasModel = await isModelDownloaded('quantized');

// Manually download model with progress
await downloadModel('quantized', (progress) => {
  console.log(`${progress.file}: ${progress.percent}%`);
});

// Check semantic data
const hasSemanticData = await isSemanticDataDownloaded();

// List downloaded models
const models = await listDownloadedModels();

// Clear caches
await clearModelCache('quantized');  // or clearModelCache() for all
await clearSemanticDataCache();

Encryption & Security

The PII map is encrypted using AES-256-GCM via the Web Crypto API (works in both Node.js and browsers).

Key Providers

import { 
  InMemoryKeyProvider,    // For development/testing
  ConfigKeyProvider,      // For production with pre-configured key
  KeyProvider,            // Interface for custom implementations
  generateKey,
} from 'rehydra';

// Development: In-memory key (generates random key, lost on page refresh)
const devKeyProvider = new InMemoryKeyProvider();

// Production: Pre-configured key
// Generate key: openssl rand -base64 32
const keyBase64 = process.env.PII_ENCRYPTION_KEY;  // or read from config
const prodKeyProvider = new ConfigKeyProvider(keyBase64);

// Custom: Implement KeyProvider interface
class SecureKeyProvider implements KeyProvider {
  async getKey(): Promise<Uint8Array> {
    // Retrieve from secure storage, HSM, keychain, etc.
    return await getKeyFromSecureStorage();
  }
}

Security Best Practices

  • Never log the raw PII map - Always use encrypted storage
  • Persist the encryption key securely - Use platform keystores (iOS Keychain, Android Keystore, etc.)
  • Rotate keys - Implement key rotation for long-running applications
  • Enable leak scanning - Catch any missed PII in output

PII Map Storage

For applications that need to persist encrypted PII maps (e.g., chat applications where you need to rehydrate later), use sessions with built-in storage providers.

Storage Providers

| Provider | Environment | Persistence | Use Case | |----------|-------------|-------------|----------| | InMemoryPIIStorageProvider | All | None (lost on restart) | Development, testing | | SQLitePIIStorageProvider | Node.js, Bun only* | File-based | Server-side applications | | IndexedDBPIIStorageProvider | Browser | Browser storage | Client-side applications |

*Not available in browser builds. Use IndexedDBPIIStorageProvider for browser applications.

Important: Storage Only Works with Sessions

Note: The piiStorageProvider is only used when you call anonymizer.session(). Calling anonymizer.anonymize() directly does NOT save to storage - the encrypted PII map is only returned in the result for you to handle manually.

// ❌ Storage NOT used - you must handle the PII map yourself
const result = await anonymizer.anonymize('Hello John!');
// result.piiMap is returned but NOT saved to storage

// ✅ Storage IS used - auto-saves and auto-loads
const session = anonymizer.session('conversation-123');
const result = await session.anonymize('Hello John!');
// result.piiMap is automatically saved to storage

Example: Without Storage (Simple One-Off Usage)

For simple use cases where you don't need persistence:

import { createAnonymizer, decryptPIIMap, rehydrate, InMemoryKeyProvider } from 'rehydra';

const keyProvider = new InMemoryKeyProvider();
const anonymizer = createAnonymizer({
  ner: { mode: 'quantized' },
  keyProvider,
});
await anonymizer.initialize();

// Anonymize
const result = await anonymizer.anonymize('Hello John Smith!');

// Translate (or other processing)
const translated = await translateAPI(result.anonymizedText);

// Rehydrate manually using the returned PII map
const key = await keyProvider.getKey();
const piiMap = await decryptPIIMap(result.piiMap, key);
const original = rehydrate(translated, piiMap);

Example: With Storage (Persistent Sessions)

For applications that need to persist PII maps across requests/restarts:

import { 
  createAnonymizer,
  InMemoryKeyProvider,
  SQLitePIIStorageProvider,
} from 'rehydra';

// 1. Setup storage (once at app start)
const storage = new SQLitePIIStorageProvider('./pii-maps.db');
await storage.initialize();

// 2. Create anonymizer with storage and key provider
const anonymizer = createAnonymizer({
  ner: { mode: 'quantized' },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
});
await anonymizer.initialize();

// 3. Create a session for each conversation
const session = anonymizer.session('conversation-123');

// 4. Anonymize - auto-saves to storage
const result = await session.anonymize('Hello John Smith from Acme Corp!');
console.log(result.anonymizedText);
// "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="1"/>!"

// 5. Later (even after app restart): rehydrate - auto-loads and decrypts
const translated = await translateAPI(result.anonymizedText);
const original = await session.rehydrate(translated);
console.log(original);
// "Hello John Smith from Acme Corp!"

// 6. Optional: check existence or delete
await session.exists();  // true
await session.delete();  // removes from storage

Example: Multiple Conversations

Each session ID maps to a separate stored PII map:

// Different chat sessions
const chat1 = anonymizer.session('user-alice-chat');
const chat2 = anonymizer.session('user-bob-chat');

await chat1.anonymize('Alice: Contact me at [email protected]');
await chat2.anonymize('Bob: My number is +49 30 123456');

// Each session has independent storage
await chat1.rehydrate(translatedText1);  // Uses Alice's PII map
await chat2.rehydrate(translatedText2);  // Uses Bob's PII map

Multi-Message Conversations

Within a session, entity IDs are consistent across multiple anonymize() calls:

const session = anonymizer.session('chat-123');

// Message 1: User provides contact info
const msg1 = await session.anonymize('Contact me at [email protected]');
// → "Contact me at <PII type="EMAIL" id="1"/>"

// Message 2: References same email + new one  
const msg2 = await session.anonymize('CC: [email protected] and [email protected]');
// → "CC: <PII type="EMAIL" id="1"/> and <PII type="EMAIL" id="2"/>"
//        ↑ Same ID (reused)                ↑ New ID

// Message 3: No PII
await session.anonymize('Please translate to German');
// Previous PII preserved

// All messages can be rehydrated correctly
await session.rehydrate(msg1.anonymizedText); // ✓
await session.rehydrate(msg2.anonymizedText); // ✓

This ensures that follow-up messages referencing the same PII produce consistent placeholders, and rehydration works correctly across the entire conversation.

SQLite Provider (Node.js + Bun only)

The SQLite provider works on both Node.js and Bun with automatic runtime detection.

Note: SQLitePIIStorageProvider is not available in browser builds. When bundling for browser with Vite/webpack, use IndexedDBPIIStorageProvider instead. The browser-safe build automatically excludes SQLite to avoid bundling Node.js dependencies.

// Node.js / Bun only
import { SQLitePIIStorageProvider } from 'rehydra';
// Or explicitly: import { SQLitePIIStorageProvider } from 'rehydra/storage/sqlite';

// File-based database
const storage = new SQLitePIIStorageProvider('./data/pii-maps.db');
await storage.initialize();

// Or in-memory for testing
const testStorage = new SQLitePIIStorageProvider(':memory:');
await testStorage.initialize();

Dependencies:

  • Bun: Uses built-in bun:sqlite (no additional install needed)
  • Node.js: Requires better-sqlite3:
npm install better-sqlite3

IndexedDB Provider (Browser)

import { 
  createAnonymizer,
  InMemoryKeyProvider,
  IndexedDBPIIStorageProvider,
} from 'rehydra';

// Custom database name (defaults to 'rehydra-pii-storage')
const storage = new IndexedDBPIIStorageProvider('my-app-pii');

const anonymizer = createAnonymizer({
  ner: { mode: 'quantized' },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
});
await anonymizer.initialize();

// Use sessions as usual
const session = anonymizer.session('browser-chat-123');
const result = await session.anonymize('Hello John!');
const original = await session.rehydrate(result.anonymizedText);

Session Interface

The session object provides these methods:

interface AnonymizerSession {
  readonly sessionId: string;
  anonymize(text: string, locale?: string, policy?: Partial<AnonymizationPolicy>): Promise<AnonymizationResult>;
  rehydrate(text: string): Promise<string>;
  load(): Promise<StoredPIIMap | null>;
  delete(): Promise<boolean>;
  exists(): Promise<boolean>;
}

Data Retention

Entries persist forever by default. Use cleanup() on the storage provider to remove old entries:

// Delete entries older than 7 days
const count = await storage.cleanup(new Date(Date.now() - 7 * 24 * 60 * 60 * 1000));

// Or delete specific sessions
await session.delete();

// List all stored sessions
const sessionIds = await storage.list();

Browser Usage

The library works seamlessly in browsers without any special configuration.

Basic Browser Example

<!DOCTYPE html>
<html>
<head>
  <title>PII Anonymization</title>
</head>
<body>
  <script type="module">
    import { 
      createAnonymizer, 
      InMemoryKeyProvider,
      decryptPIIMap,
      rehydrate
    } from './node_modules/rehydra/dist/index.js';
    
    async function demo() {
      // Create anonymizer
      const keyProvider = new InMemoryKeyProvider();
      const anonymizer = createAnonymizer({
        ner: { 
          mode: 'quantized',
          onStatus: (s) => console.log('NER:', s),
          onDownloadProgress: (p) => console.log(`Download: ${p.percent}%`)
        },
        semantic: { enabled: true },
        keyProvider
      });
      
      // Initialize (downloads models on first use)
      await anonymizer.initialize();
      
      // Anonymize
      const result = await anonymizer.anonymize(
        'Contact Maria Schmidt at [email protected] in Berlin.'
      );
      
      console.log('Anonymized:', result.anonymizedText);
      // "Contact <PII type="PERSON" gender="female" id="1"/> at <PII type="EMAIL" id="2"/> in <PII type="LOCATION" scope="city" id="3"/>."
      
      // Rehydrate
      const key = await keyProvider.getKey();
      const piiMap = await decryptPIIMap(result.piiMap, key);
      const original = rehydrate(result.anonymizedText, piiMap);
      
      console.log('Rehydrated:', original);
      
      await anonymizer.dispose();
    }
    
    demo().catch(console.error);
  </script>
</body>
</html>

Browser Notes

  • First-use downloads: NER model (~280 MB) and semantic data (~12 MB) are downloaded on first use
  • ONNX runtime: Automatically loaded from CDN if not bundled
  • Offline support: After initial download, everything works offline
  • Storage: Uses IndexedDB and OPFS - data persists across sessions

Bundler Support (Vite, webpack, esbuild)

The package uses conditional exports to automatically provide a browser-safe build when bundling for the web. This means:

  • Automatic: Vite, webpack, esbuild, and other modern bundlers will automatically use dist/browser.js
  • No Node.js modules: The browser build excludes SQLitePIIStorageProvider and other Node.js-specific code
  • Tree-shakable: Only the code you use is included in your bundle
// package.json exports (simplified)
{
  "exports": {
    ".": {
      "browser": "./dist/browser.js",
      "node": "./dist/index.js",
      "default": "./dist/index.js"
    }
  }
}

Explicit imports (if needed):

// Browser-only build (excludes SQLite, Node.js fs, etc.)
import { createAnonymizer } from 'rehydra/browser';

// Node.js build (includes everything)
import { createAnonymizer, SQLitePIIStorageProvider } from 'rehydra/node';

// SQLite storage only (Node.js only)
import { SQLitePIIStorageProvider } from 'rehydra/storage/sqlite';

Browser build excludes:

  • SQLitePIIStorageProvider (use IndexedDBPIIStorageProvider instead)
  • Node.js fs, path, os modules

Browser build includes:

  • All recognizers (email, phone, IBAN, etc.)
  • NER model support (with onnxruntime-web)
  • Semantic enrichment
  • InMemoryPIIStorageProvider
  • IndexedDBPIIStorageProvider
  • All crypto utilities

Bun Support

This library works with Bun. Since onnxruntime-node is a native Node.js addon, Bun uses onnxruntime-web:

bun add rehydra onnxruntime-web

Usage is identical - the library auto-detects the runtime.

Performance

Benchmarks on Apple M-series (CPU) and NVIDIA T4 (GPU). Run npm run benchmark:compare to measure on your hardware.

Backend Comparison

| Backend | Short (~40 chars) | Medium (~500 chars) | Long (~2K chars) | Entity-dense | |---------|-------------------|---------------------|------------------|--------------| | Regex-only | 0.38 ms | 0.50 ms | 0.91 ms | 0.35 ms | | NER CPU | 4.3 ms | 26 ms | 93 ms | 13 ms | | NER GPU | 62 ms | 73 ms | 117 ms | 68 ms |

Local CPU inference is faster than GPU for typical workloads due to network overhead. GPU servers are beneficial for high-throughput batch processing where many requests can be parallelized.

Throughput (ops/sec)

| Backend | Short | Medium | Long | |---------|-------|--------|------| | Regex-only | ~2,640 | ~2,017 | ~1,096 | | NER CPU | ~234 | ~38 | ~11 | | NER GPU | ~16 | ~14 | ~9 |

Model Downloads

| Model | Size | First-Use Download | |-------|------|-------------------| | Quantized NER | ~265 MB | ~30s on fast connection | | Standard NER | ~1.1 GB | ~2min on fast connection | | Semantic Data | ~12 MB | ~5s on fast connection |

Recommendations

| Use Case | Recommended Backend | |----------|---------------------| | Structured PII only (email, phone, IBAN) | Regex-only | | General use with name/org/location detection | NER CPU (default) | | High-throughput batch processing (1000s of docs) | NER GPU | | Privacy-sensitive / zero-knowledge required | NER CPU (data never leaves device) |

Note: Local CPU inference now outperforms GPU for most use cases due to network overhead elimination. The trie-based tokenizer provides O(token_length) lookups instead of O(vocab_size), making local inference practical for production use.

Requirements

| Environment | Version | Notes | |-------------|---------|-------| | Node.js | >= 18.0.0 | Uses native onnxruntime-node | | Bun | >= 1.0.0 | Requires onnxruntime-web | | Browsers | Chrome 86+, Firefox 89+, Safari 15.4+, Edge 86+ | Uses OPFS for model storage |

Development

# Install dependencies
npm install

# Run tests
npm test

# Build
npm run build

# Lint
npm run lint

Building Custom Models

For development or custom models:

# Requires Python 3.8+
npm run setup:ner              # Standard model
npm run setup:ner:quantized    # Quantized model

License

MIT