llmshield
v0.1.4
Published
Zero-config LLM cost shield. One line. Cut your OpenAI / Azure / Anthropic bill by 30-60%.
Maintainers
Readme
llmshield
Zero-config LLM cost shield. One line. Cut your OpenAI / Azure / Anthropic bill by 30–60%.
Enterprise Edition available — self-hosted gateway, real-time dashboard, multi-tenant policy engine, GDPR/HIPAA compliance, SLA support. Contact [email protected] for licensing and enterprise inquiries.
How it works
Before every LLM call, LLMShield:
- Deduplicates — removes repeated sentences across the conversation history
- Compresses — strips filler phrases, verbose openers, redundant adverbs (EN, FR, ES, IT, DE)
- Condenses — rewrites wordy constructions (
"degrees Celsius"→"°C","white blood cell count"→"WBC", …) - Trims — enforces a token budget, keeping the system prompt and most-recent messages
Structured content (bullet points, numbered lists, measurements like 38.2°C, 120/80 mmHg) is never touched.
Install
npm install llmshieldIntegration — which files to change?
If you use the openai or @anthropic-ai/sdk npm packages
One change only — app.js (or your entry point):
// app.js ← very first line, before anything else
require('llmshield/auto');That's it. Every openai.chat.completions.create() call is automatically optimized. No other files need to change.
If you call the LLM via raw fetch / axios / Azure REST API
The auto-patch cannot intercept raw HTTP calls. You need two small changes:
1. In your chat controller (e.g. chatController.js) — add at the top:
// ✅ REQUIRED — safe import, app works normally even if llmshield is not installed
let _optimizeMessages;
try { ({ optimizeMessages: _optimizeMessages } = require('llmshield')); } catch { _optimizeMessages = null; }2. Just before your LLM fetch call — add the optimize block:
// ✅ REQUIRED — optimize messages before sending
if (_optimizeMessages) {
const result = _optimizeMessages(outgoing.messages);
if (Array.isArray(result?.messages)) {
outgoing.messages = result.messages;
}
}
// optional — log what was saved ↓3. Log what was saved (optional):
if (_optimizeMessages) {
const result = _optimizeMessages(outgoing.messages);
if (Array.isArray(result?.messages)) {
outgoing.messages = result.messages;
// ⬇ remove this line if you don't want console output
console.log(`[llmshield] ${result.savedPercent}% saved (${result.tokensBefore} → ${result.tokensAfter} tokens)`);
}
}Then send the request as usual:
const resp = await fetch(url, {
method: 'POST',
headers: { 'api-key': apiKey, 'Content-Type': 'application/json' },
body: JSON.stringify(outgoing), // outgoing.messages is now optimized
});When to use this: Azure OpenAI REST API, AWS Bedrock, Google Vertex, any custom LLM proxy.
Usage — other options
Auto-patch (openai / anthropic SDK only)
// app.js — first line
require('llmshield/auto');Explicit wrap
const { wrap } = require('llmshield');
const openai = wrap(new OpenAI({ apiKey: process.env.OPENAI_API_KEY }));
await openai.chat.completions.create({ model: 'gpt-4o', messages });Manual
const { optimizeMessages } = require('llmshield');
const { messages, savedPercent, tokensBefore, tokensAfter } = optimizeMessages(rawMessages);
// use optimized messages in your LLM callConfiguration
All options can be set via environment variables (auto mode) or passed as an options object (manual/wrap mode).
| Env var | Default | Description |
|---------|---------|-------------|
| LLMSHIELD_DEBUG=true | false | Log savings per request to stdout |
| LLMSHIELD_MAX_TOKENS | 8192 | Hard token budget for trimming |
| LLMSHIELD_CONTEXT_WINDOW | 8192 | Context window for dynamic limit calculation |
| LLMSHIELD_DEDUP=false | true | Disable deduplication |
| LLMSHIELD_COMPRESS=false | true | Disable compression |
| LLMSHIELD_OUTPUT_CONSTRAINT=false | true | Disable injecting a concise-output system hint |
| LLMSHIELD_DYNAMIC_LIMIT=false | true | Disable dynamic max_tokens calculation |
| LLMSHIELD_KEY | — | API key to send savings stats to the cloud dashboard |
| LLMSHIELD_URL | — | Self-hosted reporting endpoint (must be https://) |
GDPR / HIPAA — PII Redaction
The scrubber runs before any content leaves your process:
LLMSHIELD_GDPR=true # redact emails, phones, credit cards, SSNs
LLMSHIELD_HIPAA=true # also redact MRNs, NPIs, dates of birth, IPsOnly user messages are scrubbed. System and assistant messages are never altered.
Benchmarks
Tested across real-world prompt types (gpt-4o pricing: $0.0025 / 1K input tokens).
| Prompt type | Tokens before | Tokens after | Savings | |-------------|-------------|------------|---------| | Verbose medical | 283 | 158 | 44% | | Verbose chat | 108 | 62 | 43% | | Coding question | 69 | 41 | 41% | | CRISPR explanation | 100 | 63 | 37% | | French medical | 50 | 27 | 46% | | Medical w/ measurements | 99 | 87 | 12% | | Repetitive prompt | 52 | 45 | 13% | | Already concise | 29 | 29 | 0% (intentionally skipped) | | Short prompt | 13 | 13 | 0% (intentionally skipped) |
Key points:
- Verbose, conversational, and medical prompts: 37–46% savings
- Already-concise and very short prompts: skipped automatically (no degradation)
- All medical measurements (
38.2°C,120/80 mmHg,WBC) preserved across all tests - 0 critical grammar artifacts across all outputs
- Languages tested: English, French (ES, IT, DE patterns also covered)
Supported SDKs
| SDK | Versions |
|-----|----------|
| openai | ≥ 4.0.0 |
| @anthropic-ai/sdk | ≥ 0.20.0 |
| Azure OpenAI (AzureOpenAI) | ≥ 4.0.0 |
Enterprise Edition
The open-source package covers client-side optimization.
The LLMShield Enterprise platform adds:
| Feature | Description |
|---------|-------------|
| Gateway server | Drop-in OpenAI-compatible proxy — just change base_url |
| Real-time dashboard | Token savings, cost trends, per-model and per-team breakdown |
| Multi-tenant policy engine | Token budgets, model allow-lists, rate limiting per team / API key |
| Audit log | Full request history with retention controls |
| GDPR / HIPAA compliance | PII redaction, no-body-logging mode, configurable data retention |
| SSO / RBAC | Single sign-on, role-based access control |
| SLA + Priority support | Dedicated support, uptime guarantee, custom deployment |
Enterprise is delivered under a commercial license and can be deployed on-premises or in your own cloud.
Contact: [email protected]
Disclaimer
This package is provided as-is, under the MIT License, without warranty of any kind — express or implied — including but not limited to warranties of merchantability, fitness for a particular purpose, or non-infringement.
Token optimization inherently modifies message content. While the engine is designed to preserve semantic meaning, LLMShield does not guarantee that optimized prompts will produce identical LLM responses.
Use in production is at your own risk. You are responsible for validating the output quality for your specific use case, particularly in regulated domains (medical, legal, financial). For compliance-critical deployments, evaluate the Enterprise Edition with its audit and policy controls.
The authors shall not be held liable for any damages, direct or indirect, arising from the use of this software.
License
Free for personal and non-commercial use. See LICENSE for full terms.
Commercial use (in a paid product, SaaS, or revenue-generating service) requires a commercial license.
Enterprise Edition with gateway, dashboard, and SLA support is available under a separate commercial agreement.
Contact: [email protected]
