client-llm-preprocessor
v0.1.0
Published
Privacy-first client-side LLM preprocessing SDK (rule-based + optional WebGPU LLM)
Maintainers
Readme
Client-Side LLM Preprocessor 🛡️
Client-Side LLM Preprocessor is a privacy-first JavaScript SDK that enables powerful text preprocessing entirely within the user's browser. It combines high-speed rule-based cleaning with optional high-reasoning LLM-based extraction and semantic cleaning.
🌟 Key Features
- 🕵️ Privacy-First: All data stay on the user's local machine. No API keys, no server-side processing.
- 💰 Cost Efficient: Clean and extract data locally to drastically reduce token usage before sending to paid APIs.
- ⚡ Hybrid Processing: High-speed rules for noise removal, LLM for semantic intelligence.
- 🏗️ Structured Extraction: Extract structured data (JSON) directly from messy text.
- 🧩 Flexible Chunking: Intelligent text splitting by length, sentence, or word.
- 🛡️ Hardened & Tested: 60+ tests covering extreme inputs, garbage text, and lifecycle chaos.
- 🔌 Easy Integration: Built-in WebGPU detection and standardized error handling.
⚠️ Experimental Project
This is a proof-of-concept / experiment. While the API is stable enough for testing, the performance and reliability are still evolving. Please do not rely on this for critical production workloads yet.
Future Ideas (Roadmap):
- 🙈 PII Scrubbing: Automatically detect and remove personal details (names, phones, emails) client-side before data ever leaves the device.
- ⚡ Optimized WebGPU: Better support for lower-end devices.
📑 Table of Contents
- Quick Start
- Installation
- Core Concepts
- API Reference
- Project Structure
- Performance
- Browser Requirements
- Contributing
- License
🚀 Quick Start
1. Verify Environment
Always check for WebGPU support before attempting to load LLM models:
import { Preprocessor } from 'client-llm-preprocessor';
const preprocessor = new Preprocessor();
const isSupported = await preprocessor.checkWebGPU();
if (!isSupported) {
console.warn("WebGPU not supported. Falling back to rule-based cleaning only.");
}2. Fast Rule-Based Cleaning (No Model Needed)
Clean text instantly without any downloads:
const text = "<html><body>Contact: [email protected] - Visit https://site.com</body></html>";
const cleaned = preprocessor.chunk(text, {
removeHtml: true,
removeUrls: true,
removeExtraWhitespace: true
});
// Result: "Contact: [email protected] -"3. Smart LLM Extraction (Model Required)
Load a local model to extract structured data:
await preprocessor.loadModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
const resume = "John Doe, Email: [email protected], Phone: 123-456-7890...";
const data = await preprocessor.extract(resume, {
format: 'json',
fields: ['name', 'email', 'phone']
});📦 Installation
npm install client-llm-preprocessor📂 Project Structure
The project follows a modular and well-documented structure:
local_processing_llm/
├── .github/ # GitHub-specific workflows and templates
├── docs/ # In-depth technical guides & architecture
├── examples/ # Ready-to-run demo pages
├── src/ # Source code
│ ├── preprocess/ # Core logic (clean, chunk, extract)
│ ├── utils/ # Helpers (logger, validation, errors)
│ ├── engine.js # WebLLM wrapper
│ └── index.js # Package entry point
├── tests/ # 60+ automated tests
│ ├── unit/ # Pure logic tests
│ ├── integration/ # Workflow & lifecycle tests
│ └── helpers/ # Test utilities & mocks
├── dist/ # Compiled production build (ESM + Types)
├── package.json # Meta-data & dependencies
└── README.md # You are here📊 Performance
| Input Size | Rule-Based | LLM-Based | | :--- | :--- | :--- | | 10 KB | < 1ms | 1-3 seconds | | 1 MB | 12ms | (Requires Chunking) | | 10 MB | 180ms | (Sequential Processing) |
[!TIP] For a full breakdown of memory usage and speed benchmarks, see BENCHMARKS.md.
🌐 Browser Requirements
- Local Processing: Any modern browser (Chrome, Firefox, Safari, Edge).
- LLM Features: Requires WebGPU support.
- ✅ Chrome 113+ (Windows, macOS, Linux)
- ✅ Edge 113+
- ⚠️ Safari (Experimental/Partial)
- ❌ Firefox (In progress by Mozilla)
📖 Useful Documents
- Architecture Overview: How the engine works.
- API Documentation: Full method signatures and options.
- Contributing Guide: How to help improve the project.
- Security Policy: Reporting vulnerabilities.
- Troubleshooting: Solutions for common issues.
⚖️ License
Distributed under the MIT License. See LICENSE for more information.
