carnot-sdk

v1.0.0

Published

a month ago

Zero-latency edge routing for LLMs using Shannon entropy.

0High
0Medium
0Low

fleury99

ai llm edge-computing shannon-entropy optimization cost-reduction green-ai open-source

The Problem

Sending every user prompt to massive cloud models is a thermodynamic waste. The industry treats all prompts as equal, burning compute and dollars on simple queries that could be handled locally.

The Solution

Carnot is an 11kb stateless routing protocol that evaluates the "cognitive weight" of a prompt before a single token is sent over the network.

It runs entirely on the edge (client-side) and decides in < 5ms whether to route to:

Local Execution (Regex/Cache)
Edge Inference (On-device small models)
Cloud Relocation (Frontier massive-parameter models)

How it Works (Under the Hood)

We don't use AI to route AI. We use deterministic mathematics. Carnot relies on three O(N) heuristics:

Shannon Entropy Engine: Calculates the character unpredictability of the prompt.
Contextual Razor: Uses DJB2 hashing to detect and eliminate redundant system prompts in conversational loops.
Academic/Jargon Detection: Identifies high lexical density to prevent routing complex philosophy to small models.

Installation

npm install carnot-sdk

Usage

typescript import { CarnotAgent, ComputeTier } from 'carnot-sdk';

const agent = new CarnotAgent();

async function handleInference(rawPrompt: string) { // Analyze the prompt in < 5ms const verdict = agent.execute(rawPrompt);

if (verdict.tier === ComputeTier.CLOUD_GOD_TIER) { return await expensiveCloudLLM.fetch(rawPrompt); }

// Route to local or edge... }

Run the Benchmark

Want to see the routing logic in action? Clone the repo and run:

node examples/enterprise-demo.js

Why Open Source?

The compute crisis in AI is an infrastructure problem that requires a community standard. By open-sourcing the routing engine, we ensure it becomes the universal optimization layer for any developer building with LLMs, regardless of their cloud provider.

Concrete Integration Scenarios

Carnot is a routing engine. It does not execute the AI, it tells your app where to execute it. Here is how to implement the 3 tiers in a real-world application:

1. Local Execution (Regex / Cache / App State)

When to use: Simple factual queries, greetings, or UI interactions.

// Example: User asks for app settings
if (verdict.tier === ComputeTier.LOCAL_LOGIC) {
    return getAppSettingsFromLocalDatabase(); // Takes 0.001ms. $0 cost.
}

2. Edge Inference (On-Device Small Models)

When to use: Summarizations, basic translations, or formatting. Tasks too complex for Regex, but not requiring frontier logic.

// Example: User wants a text summarized
if (verdict.tier === ComputeTier.EDGE_LLM) {
    // Route to an on-device model like Llama-3-8B (using react-native-llama, for instance)
    const result = await LocalDeviceModel.generate(rawPrompt); // Takes ~2s. $0 cost.
    return result;
}

3. Cloud Relocation (Frontier Massive-Parameter Models)

When to use: Highly complex reasoning, deep philosophical analysis, or code generation that small models cannot handle.

// Example: Complex philosophical prompt
if (verdict.tier === ComputeTier.CLOUD_GOD_TIER) {
    // Safely send to OpenAI/Anthropic, knowing you avoided paying for the simple queries
    const response = await fetch('https://api.openai.com/v1/chat/completions', {
        body: JSON.stringify({ model: "gpt-4o", messages: [{ role: "user", content: rawPrompt }] })
    });
    return response;
}

The Result: Your users get the exact same experience, but your infrastructure costs drop drastically because simple tasks never touch the network.

License

MIT