awren-context-engine

v0.1.0

Published

17 days ago

A local AI context optimization layer that reduces LLM token usage by 50%–90% without modifying developer workflows.

0High
0Medium
0Low

awrenlabs

llm context optimization ai-gateway token-efficiency

Awren Context Efficiency Engine

A local AI context optimization layer that reduces LLM token usage by 50%–90% without modifying developer workflows.

🚀 What is this?

Awren Context Efficiency Engine (ACEE) is a local AI gateway that sits between your application and any LLM provider (OpenAI, Claude, etc.) and automatically optimizes the context sent to the model.

Instead of sending full conversation history every time, Awren:

Converts conversations into structured semantic state
Tracks changes using delta-based updates
Removes redundant context automatically
Reconstructs minimal prompts for LLM inference

Result: Same quality outputs. Significantly fewer tokens.

🧠 Why it exists

Modern LLM applications suffer from a structural inefficiency:

They resend too much context, too often.

Even with long-context models and RAG systems, most production workloads still include:

Repeated instructions
Duplicated history
Irrelevant conversation data
Inflated prompts

This leads to unnecessary cost, higher latency, and poor scalability. Awren fixes this at the infrastructure layer, not the application layer.

⚙️ How it works

Your App / Claude Code / Codex
          ↓
  http://localhost:8765
          ↓
  Awren Context Engine (Gateway)
          ↓
  ┌──────────────────────────────┐
  │ 1. Context Compression        │
  │ 2. State Extraction           │
  │ 3. Delta Tracking             │
  │ 4. Prompt Optimization        │
  └──────────────────────────────┘
          ↓
      LLM Provider

📦 Installation

npx awren init

Or install globally:

npm install -g awren-context-engine
awren init

🔑 Setup

You will be prompted to configure:

LLM provider (OpenAI / Claude)
API key
Local port configuration

Then Awren starts a local gateway:

http://localhost:8765

💡 Usage

Point your LLM client to:

http://localhost:8765/chat

That's it. No code changes required.

API

POST /chat
{
  "session_id": "abc123",
  "messages": [
    { "role": "user", "content": "crie um sistema de login" }
  ]
}

Response

{
  "response": "...",
  "session_id": "abc123",
  "usage": {
    "original_tokens": 1200,
    "compressed_tokens": 350,
    "final_prompt_tokens": 420,
    "completion_tokens": 150,
    "total_tokens": 570,
    "compression_ratio": 0.71,
    "tokens_saved": 850
  },
  "state": {
    "intent": "development",
    "domain": "software",
    "stage": "implementation"
  }
}

📊 What changes?

Before Awren:

Full conversation history sent every request
Linear token growth
High cost on long sessions

With Awren:

Structured state representation
Delta-based updates
Minimal context transmission

📉 Benchmark

npm run benchmark

| Metric | Without Awren | With Awren | | ------------------ | ------------- | ---------- | | Input tokens | 100% | 30%–60% | | Cost | High | Reduced | | Context redundancy | High | Low | | Output quality | Baseline | Comparable |

🧱 Architecture

awren-context-engine/
├── cli/          # Command-line interface
├── server/       # Fastify HTTP gateway
├── core/         # Engine (compression, state, delta)
├── providers/    # LLM provider abstraction
├── benchmark/    # Performance measurement
├── config/       # Configuration
└── utils/        # Utilities

Pipeline

User → /chat → Context Gate → State Engine → Delta Engine
  → Prompt Builder → LLM API → Response

Node.js + TypeScript — Type-safe, fast, widely adopted
Fastify — Low-overhead HTTP server
JSON Storage — Simple file-based session persistence (MVP)
OpenAI / Anthropic — Multi-provider support

⚠️ Limitations

Not a replacement for RAG systems
Compression quality depends on task structure
Best suited for agentic / multi-turn workflows
MVP uses heuristic compression (LLM-based coming in v0.2)

🧭 Roadmap

[x] MVP: Heuristic compression + state engine + delta tracking
[ ] LLM-based semantic compression (v0.2)
[ ] Learned compression policies
[ ] Advanced semantic memory graph
[ ] Multi-agent optimization layer
[ ] Cloud-hosted enterprise version

📌 Philosophy

The next bottleneck in AI is not intelligence — it's context efficiency.

🏢 Built by

Awren Labs — AI infrastructure & intelligence systems