awren-context-engine
v0.1.0
Published
A local AI context optimization layer that reduces LLM token usage by 50%–90% without modifying developer workflows.
Maintainers
Readme
Awren Context Efficiency Engine
A local AI context optimization layer that reduces LLM token usage by 50%–90% without modifying developer workflows.
🚀 What is this?
Awren Context Efficiency Engine (ACEE) is a local AI gateway that sits between your application and any LLM provider (OpenAI, Claude, etc.) and automatically optimizes the context sent to the model.
Instead of sending full conversation history every time, Awren:
- Converts conversations into structured semantic state
- Tracks changes using delta-based updates
- Removes redundant context automatically
- Reconstructs minimal prompts for LLM inference
Result: Same quality outputs. Significantly fewer tokens.
🧠 Why it exists
Modern LLM applications suffer from a structural inefficiency:
They resend too much context, too often.
Even with long-context models and RAG systems, most production workloads still include:
- Repeated instructions
- Duplicated history
- Irrelevant conversation data
- Inflated prompts
This leads to unnecessary cost, higher latency, and poor scalability. Awren fixes this at the infrastructure layer, not the application layer.
⚙️ How it works
Your App / Claude Code / Codex
↓
http://localhost:8765
↓
Awren Context Engine (Gateway)
↓
┌──────────────────────────────┐
│ 1. Context Compression │
│ 2. State Extraction │
│ 3. Delta Tracking │
│ 4. Prompt Optimization │
└──────────────────────────────┘
↓
LLM Provider📦 Installation
npx awren initOr install globally:
npm install -g awren-context-engine
awren init🔑 Setup
You will be prompted to configure:
- LLM provider (OpenAI / Claude)
- API key
- Local port configuration
Then Awren starts a local gateway:
http://localhost:8765💡 Usage
Point your LLM client to:
http://localhost:8765/chatThat's it. No code changes required.
API
POST /chat
{
"session_id": "abc123",
"messages": [
{ "role": "user", "content": "crie um sistema de login" }
]
}Response
{
"response": "...",
"session_id": "abc123",
"usage": {
"original_tokens": 1200,
"compressed_tokens": 350,
"final_prompt_tokens": 420,
"completion_tokens": 150,
"total_tokens": 570,
"compression_ratio": 0.71,
"tokens_saved": 850
},
"state": {
"intent": "development",
"domain": "software",
"stage": "implementation"
}
}📊 What changes?
Before Awren:
- Full conversation history sent every request
- Linear token growth
- High cost on long sessions
With Awren:
- Structured state representation
- Delta-based updates
- Minimal context transmission
📉 Benchmark
npm run benchmark| Metric | Without Awren | With Awren | | ------------------ | ------------- | ---------- | | Input tokens | 100% | 30%–60% | | Cost | High | Reduced | | Context redundancy | High | Low | | Output quality | Baseline | Comparable |
🧱 Architecture
awren-context-engine/
├── cli/ # Command-line interface
├── server/ # Fastify HTTP gateway
├── core/ # Engine (compression, state, delta)
├── providers/ # LLM provider abstraction
├── benchmark/ # Performance measurement
├── config/ # Configuration
└── utils/ # UtilitiesPipeline
User → /chat → Context Gate → State Engine → Delta Engine
→ Prompt Builder → LLM API → Response- Node.js + TypeScript — Type-safe, fast, widely adopted
- Fastify — Low-overhead HTTP server
- JSON Storage — Simple file-based session persistence (MVP)
- OpenAI / Anthropic — Multi-provider support
⚠️ Limitations
- Not a replacement for RAG systems
- Compression quality depends on task structure
- Best suited for agentic / multi-turn workflows
- MVP uses heuristic compression (LLM-based coming in v0.2)
🧭 Roadmap
- [x] MVP: Heuristic compression + state engine + delta tracking
- [ ] LLM-based semantic compression (v0.2)
- [ ] Learned compression policies
- [ ] Advanced semantic memory graph
- [ ] Multi-agent optimization layer
- [ ] Cloud-hosted enterprise version
📌 Philosophy
The next bottleneck in AI is not intelligence — it's context efficiency.
🏢 Built by
Awren Labs — AI infrastructure & intelligence systems
