cloakllm
v0.2.2
Published
PII cloaking and tamper-evident audit logs for LLM API calls. EU AI Act Article 12 compliance.
Maintainers
Readme
CloakLLM
PII cloaking and tamper-evident audit logs for LLM API calls.
CloakLLM intercepts your LLM API calls, detects and cloaks PII before it reaches the provider, and logs every event to a tamper-evident audit chain designed for EU AI Act Article 12 compliance.
Also available for Python:
pip install cloakllm— includes spaCy NER for name/org/location detection. See CloakLLM Python.
Install
npm install cloakllmQuick Start
With OpenAI SDK (one line)
const cloakllm = require('cloakllm');
const OpenAI = require('openai');
const client = new OpenAI();
cloakllm.enable(client); // That's it. All calls are now cloaked.
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{
role: 'user',
content: 'Write a reminder for [email protected] about the Q3 audit'
}]
});
// Provider never saw "[email protected]"
// Response has the real email restored automaticallyWith Vercel AI SDK
const { createCloakLLMMiddleware } = require('cloakllm');
const { generateText, wrapLanguageModel } = require('ai');
const { openai } = require('@ai-sdk/openai');
const middleware = createCloakLLMMiddleware();
const model = wrapLanguageModel({ model: openai('gpt-4o'), middleware });
const { text } = await generateText({
model,
prompt: 'Write a reminder for [email protected] about the Q3 audit'
});
// Provider never saw "[email protected]"
// Response has the real email restored automaticallyWorks with any AI SDK provider (OpenAI, Anthropic, Google, Mistral, etc.) and supports both generateText and streamText.
Standalone
const { Shield } = require('cloakllm');
const shield = new Shield();
const [sanitized, tokenMap] = shield.sanitize(
'Send report to [email protected], SSN 123-45-6789'
);
// sanitized: "Send report to [EMAIL_0], SSN [SSN_0]"
// ... send sanitized text to any LLM ...
const restored = shield.desanitize(llmResponse, tokenMap);
// Original values restoredRedaction Mode (irreversible)
const { Shield, ShieldConfig } = require('cloakllm');
const shield = new Shield(new ShieldConfig({ mode: 'redact' }));
const [redacted] = shield.sanitize('Email [email protected] about Sarah Johnson');
// redacted: "Email [EMAIL_REDACTED] about [PERSON_REDACTED]"
// No token map stored — cannot be reversedEntity Details (compliance metadata)
const { Shield } = require('cloakllm');
const shield = new Shield();
const [sanitized, tokenMap] = shield.sanitize('Email [email protected], SSN 123-45-6789');
// Per-entity metadata (no original text — PII-safe)
console.log(tokenMap.entityDetails);
// [
// { category: 'EMAIL', start: 6, end: 19, length: 13, confidence: 0.95, source: 'regex', token: '[EMAIL_0]' },
// { category: 'SSN', start: 25, end: 36, length: 11, confidence: 0.95, source: 'regex', token: '[SSN_0]' }
// ]
// Full report for dashboards
console.log(tokenMap.toReport());What It Detects
| Category | Examples | Method |
|----------|----------|--------|
| EMAIL | [email protected] | Regex |
| SSN | 123-45-6789 | Regex |
| CREDIT_CARD | 4111111111111111 | Regex |
| PHONE | +1-555-0142 | Regex |
| IP_ADDRESS | 192.168.1.1 | Regex |
| API_KEY | sk_live_abc123... | Regex |
| AWS_KEY | AKIAIOSFODNN7EXAMPLE | Regex |
| JWT | eyJhbG... | Regex |
| IBAN | DE89370400440532013000 | Regex |
| PERSON | John Smith | LLM (Local) |
| ORG | Acme Corp, Google | LLM (Local) |
| GPE | New York, Israel | LLM (Local) |
| ADDRESS | 742 Evergreen Terrace | LLM (Local) |
| DATE_OF_BIRTH | 1990-01-15 | LLM (Local) |
| MEDICAL | diabetes mellitus | LLM (Local) |
| FINANCIAL | account 4521-XXX | LLM (Local) |
| NATIONAL_ID | TZ 12345678 | LLM (Local) |
| BIOMETRIC | fingerprint hash | LLM (Local) |
| USERNAME | @johndoe42 | LLM (Local) |
| PASSWORD | P@ssw0rd123 | LLM (Local) |
| VEHICLE | plate ABC-1234 | LLM (Local) |
LLM categories require opt-in (
llmDetection: true) and a local Ollama instance. Data never leaves your machine.
How It Works
Your app: "Email [email protected] about Project Falcon"
Provider sees: "Email [EMAIL_0] about Project Falcon"
You receive: Original email restored in the response- Detect — Regex patterns find structured PII (emails, SSNs, credit cards, etc.)
- Cloak — Replace with deterministic tokens:
[EMAIL_0],[SSN_0] - Log — Write to hash-chained audit trail (each entry includes previous entry's SHA-256 hash)
- Uncloak — Restore original values in the LLM response
Tamper-Evident Audit Chain
Every event is logged to JSONL files with hash chaining:
{
"seq": 42,
"event_type": "sanitize",
"entity_count": 3,
"categories": {"EMAIL": 1, "SSN": 1, "PHONE": 1},
"prompt_hash": "sha256:9f86d0...",
"entity_details": [
{"category": "EMAIL", "start": 0, "end": 13, "length": 13, "confidence": 0.95, "source": "regex", "token": "[EMAIL_0]"},
{"category": "SSN", "start": 15, "end": 26, "length": 11, "confidence": 0.95, "source": "regex", "token": "[SSN_0]"},
{"category": "PHONE", "start": 28, "end": 40, "length": 12, "confidence": 0.95, "source": "regex", "token": "[PHONE_0]"}
],
"prev_hash": "sha256:7c4d2e...",
"entry_hash": "sha256:b5e8f3..."
}Modify any entry and every subsequent hash breaks. Verify with:
npx cloakllm verify ./cloakllm_audit/CLI
# Scan text for PII
npx cloakllm scan "Email [email protected], SSN 123-45-6789"
# Verify audit chain integrity
npx cloakllm verify ./cloakllm_audit/
# Show audit statistics
npx cloakllm stats ./cloakllm_audit/Configuration
const { Shield, ShieldConfig } = require('cloakllm');
const shield = new Shield(new ShieldConfig({
detectEmails: true, // default: true
detectPhones: true,
detectSsns: true,
detectCreditCards: true,
detectApiKeys: true,
detectIpAddresses: true,
detectIban: true,
logDir: './my-audit-logs', // default: ./cloakllm_audit
auditEnabled: true, // default: true
skipModels: ['ollama/'], // skip local models
customPatterns: [
{ name: 'EMPLOYEE_ID', pattern: 'EMP-\\d{6}' }
],
// LLM Detection (opt-in, requires Ollama)
llmDetection: true, // Enable LLM-based detection
llmModel: 'llama3.2', // Ollama model
llmOllamaUrl: 'http://localhost:11434', // Ollama endpoint
llmTimeout: 10000, // Timeout in ms
llmConfidence: 0.85, // Confidence score
}));EU AI Act Compliance
Article 12 of the EU AI Act requires tamper-evident audit logs for AI systems. Enforcement begins August 2, 2026. CloakLLM provides:
- Hash-chained logs — cryptographically linked, any modification breaks the chain
- O(n) verification —
cloakllm verifyaudits the entire chain - No PII in logs — only hashes and token counts are logged (original values never stored)
- Event-level detail — every sanitize/desanitize event is recorded
Roadmap
- [x] NER-based detection (names, orgs, locations) via local LLM
- [x] Local LLM detection (opt-in, via Ollama)
- [x] Streaming response support
- [x] Vercel AI SDK middleware
- [x] Redaction / scrubbing mode
- [x] Field-level PII metadata (entityDetails)
- [ ] LangChain.js integration
- [ ] OpenTelemetry span emission
- [ ] RFC 3161 trusted timestamping
License
MIT — See LICENSE.
See Also
- CloakLLM Python — Python version with spaCy NER + LiteLLM integration
