@nxuss/lemma
v0.4.8
Published
Intelligent AI Gateway for IDEs & Agents — Semantic cache, Privacy Firewall, and Autonomous Cost-Optimization.
Maintainers
Readme
Lemma v0.4.7
The Intelligent AI Gateway — Privacy, Performance, and Precision for the Agentic Era.
Lemma is a high-performance orchestration layer that sits between your development environment and LLM providers. It transforms the way you build with AI by providing Shared Semantic Memory, Autonomous Cost Optimization, and Privacy Guardrails.
⚡ Killer Features
✂️ Codebase Context Squeezer (AST Tree-Shaker)
Unbelievable Prompt Compaction. Stop wasting Claude Pro context limits and throttled Cursor speeds on repetitive codebase payloads. Lemma automatically tree-shakes outgoing prompts, recursively pruning heavy, irrelevant implementations while keeping structural declarations and symbols intact. Get ultra-fast answers under 1 second and enjoy 10x longer chats without ever hitting Claude's 5-hour usage caps.
🛸 Local Cache-Augmented Response Synthesis (CARS)
Zero-Cost Response Generation. When you ask a query semantically similar to a previous question, Lemma avoids the cloud completely. Instead, it dynamically harnesses your local computational environment to adjust and synthesize the historical answer to fit your new requirements in less than a second. 0 Cloud Tokens, 0 Cloud API costs, and seamless offline fallback protection.
🛡️ Privacy Firewall (Semantic Scrubber)
Zero-Trust Prompts. Stop leaking sensitive data. Lemma automatically detects API keys, PII, and credentials in your prompts, masking them with secure tokens before they reach the cloud. Responses are seamlessly reconstructed locally.
🚦 Complexity Router (Cost-Optimizer)
Intelligence Where it Matters. Lemma analyzes the semantic complexity of every request. It autonomously routes lightweight tasks to hyper-efficient models like gpt-4o-mini, reserving premium models for high-reasoning challenges. Save up to 90% on simple tasks.
🧠 Telepathic Context Injector (Runtime Sync)
Bridge the Gap Between Code and Execution. Lemma synchronizes your application's live runtime state and exceptions directly with your IDE’s consciousness. Your AI assistant gains immediate "situational awareness" of crashes.
⚡ Shared Semantic Cache
Stop Paying for the Same Thought Twice. Lemma understands meaning. It recognizes similar prompts and returns instant (3ms) responses, saving 40-70% on total API expenditure.
🚀 Smart CLI (Zero-Config)
Lemma v0.4.7 introduces the Smart CLI, making it easier than ever to get started:
# 1. Install
npm install -g @nxuss/lemma
# 2. Initialize (Auto-configures .env and .lemma/)
lemma init
# 3. Start with Intelligence Report
lemma start🧠 Intelligence Report
On startup, Lemma performs a System Check to detect dependencies like Ollama and ChromaDB, providing a real-time report of active features and optimizations.
💎 Tier Comparison
| Feature | 🆓 Free (Standard) | 💎 Pro ($12/mo) | | :--- | :--- | :--- | | Privacy Firewall | ✅ Included | ✅ Advanced Masking | | Complexity Router | ✅ Included | ✅ Custom Routing Policies | | Caching | Exact Match | Semantic Memory | | Hive Mind | Local Only | Cloud Sync (Team Memory) | | Telepathy | Basic Sync | Advanced State Injection | | Limits | 300 requests/mo | Unlimited Agentic Power |
🛠️ Integration: Power Up Your Favorite Tools
Lemma is compatible with any tool that allows you to configure a custom OpenAI Base URL. This means you can add Lemma's intelligence to your existing workflow in seconds.
💬 Use it with AI Chats & IDEs
You don't need to change your habits. Just point your tool's "Base URL" to Lemma:
- Cursor: Go to
Settings > Models > OpenAI API > Override Base URLand set it tohttp://localhost:8081/v1. - VS Code (Continue): Update your
config.jsonto usehttp://localhost:8081/v1as theapiBase. - AutoGPT / BabyAGI: Set the
OPENAI_API_BASEenvironment variable. - Custom Apps: Replace
https://api.openai.com/v1withhttp://localhost:8081/v1in your SDK initialization.
⚡ Why use Lemma for Chat?
- Privacy: Your IDE won't leak your secrets to the cloud.
- Context: Lemma syncs your runtime crashes directly to your chat window.
- Speed: Instant responses for similar questions via Semantic Cache.
MIT © Nxus Studio | Upgrade to Lemma Pro
