compress-lightreach
v1.0.6
Published
AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking
Maintainers
Readme
Compress Light Reach
AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking
Compress Light Reach is a Node.js/TypeScript SDK that provides intelligent model routing and prompt compression for LLM applications, reducing token usage and costs while maintaining quality.
Features
- Intelligent Model Routing: Automatically selects optimal model based on quality requirements (HLE) and available provider keys
- Token-aware Compression: Replaces repeated substrings with shorter placeholders using a fast greedy algorithm
- Lossless: Perfect decompression guaranteed
- Output Compression: Optional model output compression support
- Cloud API: Uses Light Reach's cloud service for compression and routing
- Multi-provider Support: OpenAI, Anthropic, Google, DeepSeek, Moonshot
- TypeScript: Full TypeScript support with type definitions
- BYOK: Provider API keys managed securely in dashboard (never passed through SDK)
Installation
npm install compress-lightreachor
yarn add compress-lightreachQuick Start
The SDK uses intelligent model routing and targets POST /api/v2/complete.
- Authenticate with your LightReach API key (env var
PCOMPRESLR_API_KEYorLIGHTREACH_API_KEY) - Manage provider keys (OpenAI/Anthropic/Google/etc.) in the dashboard (BYOK)
- System automatically selects optimal model based on your requirements
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
const result = await client.complete({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain quantum computing in simple terms.' },
],
desired_hle: 30, // Quality ceiling (0-100). Current SOTA is ~40%.
});
console.log(result.decompressed_response);
console.log(`Selected: ${result.routing_info?.selected_model}`);
console.log(`Token savings: ${result.compression_stats.token_savings}`);OpenAI-compatible API (Cursor / OpenAI SDKs)
LightReach also exposes a strict OpenAI-compatible surface (including streaming SSE) so you can use standard OpenAI tooling without changing your app.
- Cursor base URL:
https://compress.lightreach.io/v1/cursor - Generic OpenAI-compatible base URL:
https://compress.lightreach.io/v1 - Endpoints:
GET /models,POST /chat/completions - Model id:
lightreach
Example (cURL):
curl -sS https://compress.lightreach.io/v1/chat/completions \
-H "Authorization: Bearer lr_your_lightreach_key" \
-H "Content-Type: application/json" \
-d '{
"model": "lightreach",
"messages": [{"role":"user","content":"Say hello"}],
"stream": true
}'With Output Compression
const result = await client.complete({
messages: [{ role: 'user', content: 'Generate a long report...' }],
desired_hle: 25,
compress_output: true,
});
console.log(result.decompressed_response);Intelligent Model Routing
The system automatically selects the optimal model based on quality requirements and your available provider keys:
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
// Cross-provider optimization: system picks cheapest model meeting your quality bar
const result = await client.complete({
messages: [{ role: 'user', content: 'Explain quantum computing' }],
desired_hle: 30, // Quality ceiling (0-100). Current SOTA is ~40%.
});
// Check what was selected
console.log(result.routing_info?.selected_model); // e.g., "gpt-4o-mini"
console.log(result.routing_info?.selected_provider); // e.g., "openai"
console.log(result.routing_info?.model_hle); // e.g., 32.5
console.log(result.routing_info?.model_price_per_million); // e.g., 0.15Provider-Constrained Routing
Optionally constrain to a specific provider:
// Only use OpenAI models, but pick the cheapest one meeting HLE 35
const result = await client.complete({
messages: [{ role: 'user', content: 'Write a poem' }],
llm_provider: 'openai', // Optional: constrain to one provider
desired_hle: 35,
});HLE Cascading with Admin Controls
Admins can set quality ceilings via the dashboard (global or per-tag) to control costs. Your desired_hle is a preference; if it exceeds an admin-set ceiling, the request will silently clamp to the ceiling and proceed.
// Admin set global HLE ceiling to 30%
// Requesting above the ceiling will be clamped to 30 (no error)
const result = await client.complete({
messages: [{ role: 'user', content: 'Process payment' }],
desired_hle: 35, // Will be clamped down to 30
tags: { env: 'production' },
});
// Correct usage: request within ceiling
const result = await client.complete({
messages: [{ role: 'user', content: 'Process payment' }],
desired_hle: 25, // OK: below ceiling of 30
tags: { env: 'production' },
});
// Check if your HLE was lowered by an admin ceiling
if (result.routing_info?.hle_clamped) {
console.log(`HLE lowered from ${result.routing_info.requested_hle} ` +
`to ${result.routing_info.effective_hle} ` +
`by ${result.routing_info.hle_source}-level ceiling`);
}With Compression Config
Configure per-role compression settings:
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
const result = await client.complete({
messages: [{ role: 'user', content: 'Hello!' }],
desired_hle: 30,
compress: true,
compress_output: false,
compression_config: {
compress_system: false,
compress_user: true,
compress_assistant: false,
compress_only_last_n_user: 1,
},
temperature: 0.7,
max_tokens: 1000,
tags: { env: 'production' },
});
console.log(result.decompressed_response);
console.log(`Model used: ${result.routing_info?.selected_model}`);Compression Only (No LLM Call)
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
// Compress text without making an LLM call
const compressed = await client.compress(
"Your text with repeated content here...",
"gpt-4", // Model for tokenization
{ env: 'dev' } // Optional tags
);
console.log(compressed.llm_format);
console.log(`Compression ratio: ${compressed.compression_ratio}`);
// Decompress later
const decompressed = await client.decompress(compressed.llm_format);
console.log(decompressed.decompressed);Command Line Interface
# Set your API key
export PCOMPRESLR_API_KEY=your-api-key
# Compress a prompt
npx pcompresslr "Your prompt with repeated text here..."API Reference
PcompresslrAPIClient
Main API client for intelligent model routing and compression.
Constructor
new PcompresslrAPIClient(apiKey?: string, apiUrl?: string, timeout?: number)Parameters:
apiKey(string, optional): LightReach API key. Falls back toLIGHTREACH_API_KEYorPCOMPRESLR_API_KEYenv vars.apiUrl(string, optional): Override base API URL. Falls back toPCOMPRESLR_API_URLenv var. Default:https://api.compress.lightreach.iotimeout(number, optional): Request timeout in milliseconds. Default:900000(15 minutes)
Methods
complete(request: CompleteV2Request): Promise<CompleteResponse>
Messages-first completion with intelligent routing (POST /api/v2/complete).
Request Parameters (CompleteV2Request):
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| messages | Message[] | required | Conversation history with role and content |
| llm_provider | 'openai' \| 'anthropic' \| 'google' \| 'deepseek' \| 'moonshot' | — | Optional provider constraint. Omit for cross-provider optimization |
| desired_hle | number | — | Quality ceiling (0-100). If above an admin ceiling, it is clamped down |
| compress | boolean | true | Whether to compress messages |
| compress_output | boolean | false | Whether to request compressed output from LLM |
| compression_config | object | — | Per-role compression settings (see below) |
| temperature | number | — | LLM temperature parameter |
| max_tokens | number | — | Maximum tokens to generate |
| tags | Record<string, string> | — | Tags for cost attribution and tag-level HLE ceilings |
| max_history_messages | number | — | Limit conversation history length |
compression_config options:
{
compress_system?: boolean; // default: false
compress_user?: boolean; // default: true
compress_assistant?: boolean; // default: false
compress_only_last_n_user?: number | null; // default: 1
}Response (CompleteResponse):
{
decompressed_response: string; // Final decompressed LLM response
compression_stats: {
compression_enabled: boolean;
original_tokens: number;
compressed_tokens: number;
token_savings: number;
compression_ratio: number;
token_count_exact?: boolean;
token_count_source?: string;
token_accounting_note?: string;
processing_time_ms?: number;
};
llm_stats: {
provider?: string;
model?: string;
input_tokens: number;
output_tokens: number;
total_tokens: number;
finish_reason?: string | null;
};
routing_info?: {
selected_model: string; // Model chosen by system
selected_provider: string; // Provider chosen by system
selected_model_id: string;
model_hle: number; // HLE score of selected model
model_price_per_million: number;
requested_hle: number | null;
effective_hle: number | null; // Effective HLE after admin ceilings
hle_source: 'request' | 'tag' | 'global' | 'none';
hle_clamped: boolean; // true if admin ceiling lowered your desired_hle
};
warnings?: string[];
// Convenience aliases
text?: string; // Alias for decompressed_response
tokens_saved?: number; // Alias for compression_stats.token_savings
tokens_used?: number; // Alias for llm_stats.total_tokens
compression_ratio?: number; // Alias for compression_stats.compression_ratio
}compress(prompt, model?, tags?): Promise<CompressResponse>
Also supports a legacy call shape: compress(prompt, model, algorithm, tags?) (only "greedy" is supported).
Compression-only (POST /api/v1/compress).
Parameters:
prompt(string, required): Text to compressmodel(string, optional): Model for tokenization. Default:'gpt-4'algorithm("greedy", optional): Legacy-only parameter. Only"greedy"is supported.tags(Record<string, string>, optional): Tags for attribution
Response (CompressResponse):
{
compressed: string;
dictionary: Record<string, string>;
llm_format: string;
compression_ratio: number;
original_size: number;
compressed_size: number;
processing_time_ms: number;
algorithm: string;
}decompress(llmFormat): Promise<DecompressResponse>
Decompress an LLM-formatted compressed prompt (POST /api/v1/decompress).
Parameters:
llmFormat(string, required): Thellm_formatstring from a compress response
Response (DecompressResponse):
{
decompressed: string;
processing_time_ms: number;
}healthCheck(): Promise<HealthCheckResponse>
Check API health status (GET /health).
Response:
{
status: string;
version?: string;
}Message Types
type MessageRole = 'system' | 'developer' | 'user' | 'assistant';
interface Message {
role: MessageRole;
content: string;
}Environment Variables
| Variable | Description |
|----------|-------------|
| PCOMPRESLR_API_KEY | Your LightReach API key (primary) |
| LIGHTREACH_API_KEY | Your LightReach API key (alternative) |
| PCOMPRESLR_API_URL | Override the API base URL (advanced/testing) |
Exceptions
| Exception | Description |
|-----------|-------------|
| PcompresslrAPIError | Base exception class |
| APIKeyError | Invalid or missing API key |
| RateLimitError | Rate limit exceeded |
| APIRequestError | General API errors (including routing failures) |
import { APIKeyError, RateLimitError, APIRequestError } from 'compress-lightreach';
try {
const result = await client.complete({ messages: [...] });
} catch (error) {
if (error instanceof APIKeyError) {
console.error('Invalid API key');
} else if (error instanceof RateLimitError) {
console.error('Rate limited, please retry later');
} else if (error instanceof APIRequestError) {
console.error('API error:', error.message);
}
}How It Works
Compress Light Reach uses intelligent algorithms to identify repeated substrings in your prompts and replace them with shorter placeholders.
The library:
- Identifies repeated substrings using efficient suffix array algorithms
- Calculates token savings for each potential replacement
- Selects optimal replacements that reduce total token count
- Intelligently routes to the best model based on your quality requirements
- Formats the result for easy LLM consumption
- Provides perfect decompression
Examples
Example 1: Complete with Compression
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
const prompt = `
Write a story about a cat. The cat is very friendly.
Write a story about a dog. The dog is very friendly.
Write a story about a bird. The bird is very friendly.
`;
const result = await client.complete({
messages: [{ role: "user", content: prompt }],
desired_hle: 30,
});
console.log(result.decompressed_response);
console.log(`Model used: ${result.routing_info?.selected_model}`);
console.log(`Token savings: ${result.compression_stats.token_savings} tokens`);
console.log(`Compression ratio: ${(result.compression_stats.compression_ratio * 100).toFixed(2)}%`);Example 2: Output Compression
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
const result = await client.complete({
messages: [{ role: "user", content: "Generate a long report with repeated sections..." }],
desired_hle: 35,
compress_output: true,
});
console.log(result.decompressed_response);Example 3: Multi-turn Conversation
import { PcompresslrAPIClient } from 'compress-lightreach';
const client = new PcompresslrAPIClient("your-lightreach-api-key");
const result = await client.complete({
messages: [
{ role: "system", content: "You are a helpful coding assistant." },
{ role: "user", content: "How do I read a file in Python?" },
{ role: "assistant", content: "You can use open() with a context manager..." },
{ role: "user", content: "How about writing to a file?" },
],
desired_hle: 30,
compression_config: {
compress_system: false,
compress_user: true,
compress_assistant: false,
compress_only_last_n_user: 2, // Only compress last 2 user messages
},
});Getting an API Key
To use Compress Light Reach, you need an API key from compress.lightreach.io.
- Visit compress.lightreach.io
- Sign up for an account
- Get your API key from the dashboard
- Set it as an environment variable:
export PCOMPRESLR_API_KEY=your-key
Security & Privacy
BYOK model: Provider keys (OpenAI/Anthropic/Google/etc.) are managed in the dashboard and never passed through this SDK. The SDK only uses your LightReach API key for authentication with the service.
Requirements
- Node.js 14.0.0 or higher
- TypeScript 5.3.0+ (for TypeScript projects)
License
MIT License - see LICENSE file for details.
Support
- Documentation: compress.lightreach.io/docs
- Issues: GitHub Issues
- Email: [email protected]
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
