client-llm-preprocessor

v0.1.0

Published

2 months ago

Privacy-first client-side LLM preprocessing SDK (rule-based + optional WebGPU LLM)

Client-Side LLM Preprocessor 🛡️

Client-Side LLM Preprocessor is a privacy-first JavaScript SDK that enables powerful text preprocessing entirely within the user's browser. It combines high-speed rule-based cleaning with optional high-reasoning LLM-based extraction and semantic cleaning.

🌟 Key Features

🕵️ Privacy-First: All data stay on the user's local machine. No API keys, no server-side processing.
💰 Cost Efficient: Clean and extract data locally to drastically reduce token usage before sending to paid APIs.
⚡ Hybrid Processing: High-speed rules for noise removal, LLM for semantic intelligence.
🏗️ Structured Extraction: Extract structured data (JSON) directly from messy text.
🧩 Flexible Chunking: Intelligent text splitting by length, sentence, or word.
🛡️ Hardened & Tested: 60+ tests covering extreme inputs, garbage text, and lifecycle chaos.
🔌 Easy Integration: Built-in WebGPU detection and standardized error handling.

⚠️ Experimental Project

This is a proof-of-concept / experiment. While the API is stable enough for testing, the performance and reliability are still evolving. Please do not rely on this for critical production workloads yet.

Future Ideas (Roadmap):

🙈 PII Scrubbing: Automatically detect and remove personal details (names, phones, emails) client-side before data ever leaves the device.
⚡ Optimized WebGPU: Better support for lower-end devices.

📑 Table of Contents

🚀 Quick Start

1. Verify Environment

Always check for WebGPU support before attempting to load LLM models:

import { Preprocessor } from 'client-llm-preprocessor';

const preprocessor = new Preprocessor();
const isSupported = await preprocessor.checkWebGPU();

if (!isSupported) {
    console.warn("WebGPU not supported. Falling back to rule-based cleaning only.");
}

2. Fast Rule-Based Cleaning (No Model Needed)

Clean text instantly without any downloads:

const text = "<html><body>Contact: [email protected]   -  Visit https://site.com</body></html>";
const cleaned = preprocessor.chunk(text, {
    removeHtml: true,
    removeUrls: true,
    removeExtraWhitespace: true
});
// Result: "Contact: [email protected] -"

3. Smart LLM Extraction (Model Required)

Load a local model to extract structured data:

await preprocessor.loadModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');

const resume = "John Doe, Email: [email protected], Phone: 123-456-7890...";
const data = await preprocessor.extract(resume, {
    format: 'json',
    fields: ['name', 'email', 'phone']
});

📦 Installation

npm install client-llm-preprocessor

📂 Project Structure

The project follows a modular and well-documented structure:

local_processing_llm/
├── .github/               # GitHub-specific workflows and templates
├── docs/                  # In-depth technical guides & architecture
├── examples/              # Ready-to-run demo pages
├── src/                   # Source code
│   ├── preprocess/        # Core logic (clean, chunk, extract)
│   ├── utils/             # Helpers (logger, validation, errors)
│   ├── engine.js          # WebLLM wrapper
│   └── index.js           # Package entry point
├── tests/                 # 60+ automated tests
│   ├── unit/              # Pure logic tests
│   ├── integration/       # Workflow & lifecycle tests
│   └── helpers/           # Test utilities & mocks
├── dist/                  # Compiled production build (ESM + Types)
├── package.json           # Meta-data & dependencies
└── README.md              # You are here

📊 Performance

| Input Size | Rule-Based | LLM-Based | | :--- | :--- | :--- | | 10 KB | < 1ms | 1-3 seconds | | 1 MB | 12ms | (Requires Chunking) | | 10 MB | 180ms | (Sequential Processing) |

[!TIP] For a full breakdown of memory usage and speed benchmarks, see BENCHMARKS.md.

🌐 Browser Requirements

Local Processing: Any modern browser (Chrome, Firefox, Safari, Edge).
LLM Features: Requires WebGPU support.
- ✅ Chrome 113+ (Windows, macOS, Linux)
- ✅ Edge 113+
- ⚠️ Safari (Experimental/Partial)
- ❌ Firefox (In progress by Mozilla)

📖 Useful Documents

Architecture Overview: How the engine works.
API Documentation: Full method signatures and options.
Contributing Guide: How to help improve the project.
Security Policy: Reporting vulnerabilities.
Troubleshooting: Solutions for common issues.

⚖️ License

Distributed under the MIT License. See LICENSE for more information.