compress-lightreach

v1.0.6

Published

16 days ago

AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking

0High
0Medium
0Low

lightreach

llm ai model-routing cost-management compression token-tracking openai anthropic optimization

Compress Light Reach

AI cost management SDK with intelligent model routing, prompt compression, and real-time token tracking

Compress Light Reach is a Node.js/TypeScript SDK that provides intelligent model routing and prompt compression for LLM applications, reducing token usage and costs while maintaining quality.

Features

Intelligent Model Routing: Automatically selects optimal model based on quality requirements (HLE) and available provider keys
Token-aware Compression: Replaces repeated substrings with shorter placeholders using a fast greedy algorithm
Lossless: Perfect decompression guaranteed
Output Compression: Optional model output compression support
Cloud API: Uses Light Reach's cloud service for compression and routing
Multi-provider Support: OpenAI, Anthropic, Google, DeepSeek, Moonshot
TypeScript: Full TypeScript support with type definitions
BYOK: Provider API keys managed securely in dashboard (never passed through SDK)

Installation

npm install compress-lightreach

yarn add compress-lightreach

Quick Start

The SDK uses intelligent model routing and targets POST /api/v2/complete.

Authenticate with your LightReach API key (env var PCOMPRESLR_API_KEY or LIGHTREACH_API_KEY)
Manage provider keys (OpenAI/Anthropic/Google/etc.) in the dashboard (BYOK)
System automatically selects optimal model based on your requirements

import { PcompresslrAPIClient } from 'compress-lightreach';

const client = new PcompresslrAPIClient("your-lightreach-api-key");

const result = await client.complete({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing in simple terms.' },
  ],
  desired_hle: 30,  // Quality ceiling (0-100). Current SOTA is ~40%.
});

console.log(result.decompressed_response);
console.log(`Selected: ${result.routing_info?.selected_model}`);
console.log(`Token savings: ${result.compression_stats.token_savings}`);

OpenAI-compatible API (Cursor / OpenAI SDKs)

LightReach also exposes a strict OpenAI-compatible surface (including streaming SSE) so you can use standard OpenAI tooling without changing your app.

Cursor base URL: https://compress.lightreach.io/v1/cursor
Generic OpenAI-compatible base URL: https://compress.lightreach.io/v1
Endpoints: GET /models, POST /chat/completions
Model id: lightreach

Example (cURL):

curl -sS https://compress.lightreach.io/v1/chat/completions \
  -H "Authorization: Bearer lr_your_lightreach_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lightreach",
    "messages": [{"role":"user","content":"Say hello"}],
    "stream": true
  }'

With Output Compression

const result = await client.complete({
  messages: [{ role: 'user', content: 'Generate a long report...' }],
  desired_hle: 25,
  compress_output: true,
});

console.log(result.decompressed_response);

Intelligent Model Routing

The system automatically selects the optimal model based on quality requirements and your available provider keys:

import { PcompresslrAPIClient } from 'compress-lightreach';

const client = new PcompresslrAPIClient("your-lightreach-api-key");

// Cross-provider optimization: system picks cheapest model meeting your quality bar
const result = await client.complete({
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  desired_hle: 30,  // Quality ceiling (0-100). Current SOTA is ~40%.
});

// Check what was selected
console.log(result.routing_info?.selected_model);           // e.g., "gpt-4o-mini"
console.log(result.routing_info?.selected_provider);        // e.g., "openai"
console.log(result.routing_info?.model_hle);                // e.g., 32.5
console.log(result.routing_info?.model_price_per_million);  // e.g., 0.15

Provider-Constrained Routing

Optionally constrain to a specific provider:

// Only use OpenAI models, but pick the cheapest one meeting HLE 35
const result = await client.complete({
  messages: [{ role: 'user', content: 'Write a poem' }],
  llm_provider: 'openai',  // Optional: constrain to one provider
  desired_hle: 35,
});

HLE Cascading with Admin Controls

Admins can set quality ceilings via the dashboard (global or per-tag) to control costs. Your desired_hle is a preference; if it exceeds an admin-set ceiling, the request will silently clamp to the ceiling and proceed.

// Admin set global HLE ceiling to 30%
// Requesting above the ceiling will be clamped to 30 (no error)
const result = await client.complete({
  messages: [{ role: 'user', content: 'Process payment' }],
  desired_hle: 35,  // Will be clamped down to 30
  tags: { env: 'production' },
});

// Correct usage: request within ceiling
const result = await client.complete({
  messages: [{ role: 'user', content: 'Process payment' }],
  desired_hle: 25,  // OK: below ceiling of 30
  tags: { env: 'production' },
});

// Check if your HLE was lowered by an admin ceiling
if (result.routing_info?.hle_clamped) {
  console.log(`HLE lowered from ${result.routing_info.requested_hle} ` +
              `to ${result.routing_info.effective_hle} ` +
              `by ${result.routing_info.hle_source}-level ceiling`);
}

With Compression Config

Configure per-role compression settings:

import { PcompresslrAPIClient } from 'compress-lightreach';

const client = new PcompresslrAPIClient("your-lightreach-api-key");

const result = await client.complete({
  messages: [{ role: 'user', content: 'Hello!' }],
  desired_hle: 30,
  compress: true,
  compress_output: false,
  compression_config: {
    compress_system: false,
    compress_user: true,
    compress_assistant: false,
    compress_only_last_n_user: 1,
  },
  temperature: 0.7,
  max_tokens: 1000,
  tags: { env: 'production' },
});

console.log(result.decompressed_response);
console.log(`Model used: ${result.routing_info?.selected_model}`);

Compression Only (No LLM Call)

import { PcompresslrAPIClient } from 'compress-lightreach';

const client = new PcompresslrAPIClient("your-lightreach-api-key");

// Compress text without making an LLM call
const compressed = await client.compress(
  "Your text with repeated content here...",
  "gpt-4",      // Model for tokenization
  { env: 'dev' } // Optional tags
);

console.log(compressed.llm_format);
console.log(`Compression ratio: ${compressed.compression_ratio}`);

// Decompress later
const decompressed = await client.decompress(compressed.llm_format);
console.log(decompressed.decompressed);

Command Line Interface

# Set your API key
export PCOMPRESLR_API_KEY=your-api-key

# Compress a prompt
npx pcompresslr "Your prompt with repeated text here..."

API Reference

`PcompresslrAPIClient`

Main API client for intelligent model routing and compression.

Constructor

new PcompresslrAPIClient(apiKey?: string, apiUrl?: string, timeout?: number)

Parameters:

apiKey (string, optional): LightReach API key. Falls back to LIGHTREACH_API_KEY or PCOMPRESLR_API_KEY env vars.
apiUrl (string, optional): Override base API URL. Falls back to PCOMPRESLR_API_URL env var. Default: https://api.compress.lightreach.io
timeout (number, optional): Request timeout in milliseconds. Default: 900000 (15 minutes)

Methods

`complete(request: CompleteV2Request): Promise<CompleteResponse>`

Messages-first completion with intelligent routing (POST /api/v2/complete).

Request Parameters (CompleteV2Request):

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | messages | Message[] | required | Conversation history with role and content | | llm_provider | 'openai' \| 'anthropic' \| 'google' \| 'deepseek' \| 'moonshot' | — | Optional provider constraint. Omit for cross-provider optimization | | desired_hle | number | — | Quality ceiling (0-100). If above an admin ceiling, it is clamped down | | compress | boolean | true | Whether to compress messages | | compress_output | boolean | false | Whether to request compressed output from LLM | | compression_config | object | — | Per-role compression settings (see below) | | temperature | number | — | LLM temperature parameter | | max_tokens | number | — | Maximum tokens to generate | | tags | Record<string, string> | — | Tags for cost attribution and tag-level HLE ceilings | | max_history_messages | number | — | Limit conversation history length |

compression_config options:

{
  compress_system?: boolean;         // default: false
  compress_user?: boolean;           // default: true
  compress_assistant?: boolean;      // default: false
  compress_only_last_n_user?: number | null;  // default: 1
}

Response (CompleteResponse):

{
  decompressed_response: string;     // Final decompressed LLM response
  compression_stats: {
    compression_enabled: boolean;
    original_tokens: number;
    compressed_tokens: number;
    token_savings: number;
    compression_ratio: number;
    token_count_exact?: boolean;
    token_count_source?: string;
    token_accounting_note?: string;
    processing_time_ms?: number;
  };
  llm_stats: {
    provider?: string;
    model?: string;
    input_tokens: number;
    output_tokens: number;
    total_tokens: number;
    finish_reason?: string | null;
  };
  routing_info?: {
    selected_model: string;          // Model chosen by system
    selected_provider: string;       // Provider chosen by system
    selected_model_id: string;
    model_hle: number;               // HLE score of selected model
    model_price_per_million: number;
    requested_hle: number | null;
    effective_hle: number | null;    // Effective HLE after admin ceilings
    hle_source: 'request' | 'tag' | 'global' | 'none';
    hle_clamped: boolean;            // true if admin ceiling lowered your desired_hle
  };
  warnings?: string[];
  
  // Convenience aliases
  text?: string;                     // Alias for decompressed_response
  tokens_saved?: number;             // Alias for compression_stats.token_savings
  tokens_used?: number;              // Alias for llm_stats.total_tokens
  compression_ratio?: number;        // Alias for compression_stats.compression_ratio
}

`compress(prompt, model?, tags?): Promise<CompressResponse>`

Also supports a legacy call shape: compress(prompt, model, algorithm, tags?) (only "greedy" is supported).

Compression-only (POST /api/v1/compress).

Parameters:

prompt (string, required): Text to compress
model (string, optional): Model for tokenization. Default: 'gpt-4'
algorithm ("greedy", optional): Legacy-only parameter. Only "greedy" is supported.
tags (Record<string, string>, optional): Tags for attribution

Response (CompressResponse):

{
  compressed: string;
  dictionary: Record<string, string>;
  llm_format: string;
  compression_ratio: number;
  original_size: number;
  compressed_size: number;
  processing_time_ms: number;
  algorithm: string;
}

`decompress(llmFormat): Promise<DecompressResponse>`

Decompress an LLM-formatted compressed prompt (POST /api/v1/decompress).

Parameters:

llmFormat (string, required): The llm_format string from a compress response

Response (DecompressResponse):

{
  decompressed: string;
  processing_time_ms: number;
}

`healthCheck(): Promise<HealthCheckResponse>`

Check API health status (GET /health).

Response:

{
  status: string;
  version?: string;
}

Message Types

type MessageRole = 'system' | 'developer' | 'user' | 'assistant';

interface Message {
  role: MessageRole;
  content: string;
}

Environment Variables

| Variable | Description | |----------|-------------| | PCOMPRESLR_API_KEY | Your LightReach API key (primary) | | LIGHTREACH_API_KEY | Your LightReach API key (alternative) | | PCOMPRESLR_API_URL | Override the API base URL (advanced/testing) |

Exceptions

| Exception | Description | |-----------|-------------| | PcompresslrAPIError | Base exception class | | APIKeyError | Invalid or missing API key | | RateLimitError | Rate limit exceeded | | APIRequestError | General API errors (including routing failures) |

import { APIKeyError, RateLimitError, APIRequestError } from 'compress-lightreach';

try {
  const result = await client.complete({ messages: [...] });
} catch (error) {
  if (error instanceof APIKeyError) {
    console.error('Invalid API key');
  } else if (error instanceof RateLimitError) {
    console.error('Rate limited, please retry later');
  } else if (error instanceof APIRequestError) {
    console.error('API error:', error.message);
  }
}

How It Works

Compress Light Reach uses intelligent algorithms to identify repeated substrings in your prompts and replace them with shorter placeholders.

The library:

Identifies repeated substrings using efficient suffix array algorithms
Calculates token savings for each potential replacement
Selects optimal replacements that reduce total token count
Intelligently routes to the best model based on your quality requirements
Formats the result for easy LLM consumption
Provides perfect decompression

Examples

Example 1: Complete with Compression

import { PcompresslrAPIClient } from 'compress-lightreach';

const client = new PcompresslrAPIClient("your-lightreach-api-key");

const prompt = `
Write a story about a cat. The cat is very friendly. 
Write a story about a dog. The dog is very friendly.
Write a story about a bird. The bird is very friendly.
`;

const result = await client.complete({
  messages: [{ role: "user", content: prompt }],
  desired_hle: 30,
});

console.log(result.decompressed_response);
console.log(`Model used: ${result.routing_info?.selected_model}`);
console.log(`Token savings: ${result.compression_stats.token_savings} tokens`);
console.log(`Compression ratio: ${(result.compression_stats.compression_ratio * 100).toFixed(2)}%`);

Example 2: Output Compression

import { PcompresslrAPIClient } from 'compress-lightreach';

const client = new PcompresslrAPIClient("your-lightreach-api-key");

const result = await client.complete({
  messages: [{ role: "user", content: "Generate a long report with repeated sections..." }],
  desired_hle: 35,
  compress_output: true,
});

console.log(result.decompressed_response);

Example 3: Multi-turn Conversation

import { PcompresslrAPIClient } from 'compress-lightreach';

const client = new PcompresslrAPIClient("your-lightreach-api-key");

const result = await client.complete({
  messages: [
    { role: "system", content: "You are a helpful coding assistant." },
    { role: "user", content: "How do I read a file in Python?" },
    { role: "assistant", content: "You can use open() with a context manager..." },
    { role: "user", content: "How about writing to a file?" },
  ],
  desired_hle: 30,
  compression_config: {
    compress_system: false,
    compress_user: true,
    compress_assistant: false,
    compress_only_last_n_user: 2,  // Only compress last 2 user messages
  },
});

Getting an API Key

To use Compress Light Reach, you need an API key from compress.lightreach.io.

Visit compress.lightreach.io
Sign up for an account
Get your API key from the dashboard
Set it as an environment variable: export PCOMPRESLR_API_KEY=your-key

Security & Privacy

BYOK model: Provider keys (OpenAI/Anthropic/Google/etc.) are managed in the dashboard and never passed through this SDK. The SDK only uses your LightReach API key for authentication with the service.

Requirements

Node.js 14.0.0 or higher
TypeScript 5.3.0+ (for TypeScript projects)

License

MIT License - see LICENSE file for details.

Support

Documentation: compress.lightreach.io/docs
Issues: GitHub Issues
Email: [email protected]

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Compress Light Reach

Features

Installation

Quick Start

OpenAI-compatible API (Cursor / OpenAI SDKs)

With Output Compression

Intelligent Model Routing

Provider-Constrained Routing

HLE Cascading with Admin Controls

With Compression Config

Compression Only (No LLM Call)

Command Line Interface

API Reference

PcompresslrAPIClient

Constructor

Methods

complete(request: CompleteV2Request): Promise<CompleteResponse>

compress(prompt, model?, tags?): Promise<CompressResponse>

decompress(llmFormat): Promise<DecompressResponse>

healthCheck(): Promise<HealthCheckResponse>

Message Types

Environment Variables

Exceptions

How It Works

Examples

Example 1: Complete with Compression

Example 2: Output Compression

Example 3: Multi-turn Conversation

Getting an API Key

Security & Privacy

Requirements

License

Support

Contributing

`PcompresslrAPIClient`

`complete(request: CompleteV2Request): Promise<CompleteResponse>`

`compress(prompt, model?, tags?): Promise<CompressResponse>`

`decompress(llmFormat): Promise<DecompressResponse>`

`healthCheck(): Promise<HealthCheckResponse>`