prepia

v1.0.4

Published

2 months ago

AI middleware that reduces LLM quota usage by 80-95% through smart caching, task decomposition, and context optimization

0High
0Medium
0Low

adewale0o

ai middleware llm cache optimization agent openai anthropic token-saving quota ai-agent prepia

Prepia

AI Middleware that reduces LLM quota usage by 80-95%

Prepia is a smart middleware layer that sits between AI agents and LLMs. Instead of making many LLM calls for complex tasks, Prepia handles the heavy lifting (searching, scraping, processing, caching) and sends ONE optimized call to the LLM.

Architecture

                    ┌─────────────────────────────────────┐
                    │           AI Agent / Client          │
                    └──────────────┬──────────────────────┘
                                   │
                    ┌──────────────▼──────────────────────┐
                    │          Prepia API Server           │
                    │      (REST / CLI / Programmatic)     │
                    └──────────────┬──────────────────────┘
                                   │
                    ┌──────────────▼──────────────────────┐
                    │           Core Engine                │
                    │  ┌─────────┬──────────┬───────────┐ │
                    │  │ Task    │ Context  │ PrepiShot │ │
                    │  │Decompose│ Packager │ Optimizer │ │
                    │  └────┬────┴────┬─────┴─────┬─────┘ │
                    └───────┼─────────┼───────────┼───────┘
                            │         │           │
         ┌──────────────────┼─────────┼───────────┼──────────────────┐
         │                  │         │           │                  │
    ┌────▼────┐      ┌─────▼────┐ ┌──▼───┐ ┌────▼─────┐   ┌───────▼──────┐
    │  Tools  │      │  Cache   │ │Models│ │  Chain   │   │   Security   │
    │Orchestr.│      │ Manager  │ │Router│ │Scheduler │   │  Sanitizer   │
    └────┬────┘      └──────────┘ └──────┘ └──────────┘   └──────────────┘
         │
    ┌────┼────┬────────┬──────────┐
    │    │    │        │          │
  Web  Web  File   Calc    HTTP
 Search Scraper Ops  ulator  Client

Installation

# Clone or copy the prepia directory
cd prepia

# No npm install needed - uses only Node.js built-in modules!
# Requires Node.js >= 22.0.0

# Make CLI executable
chmod +x bin/prepia.mjs

Quick Start

🤖 For AI Agents (Most Important)

Before calling your LLM, route through Prepia. One line saves 80-95% of quota.

import { PrepiaEngine } from 'prepia';

const engine = new PrepiaEngine();

// Simple queries → 0 LLM calls
const r1 = await engine.process("What is 15% of 200?", { mode: 'flash' });
// Answer: "30", usedLLM: false, duration: 1ms

// Complex queries → 1 optimized LLM call (instead of many)
const r2 = await engine.process("Research quantum computing trends", { mode: 'shot' });
// Prepia searches, scrapes, compresses, makes ONE call

Agent decision flow:

User question → Prepia flash? → Yes → Answer (0 LLM calls)
                → No → Prepia shot → Search + scrape + 1 LLM call → Answer

CLI

# Process a simple query (flash mode - instant, no LLM)
node bin/prepia.mjs process "What is 15% of 200?" --mode flash

# Process with one LLM call (shot mode - default)
node bin/prepia.mjs process "Explain quantum computing"

# Start the API server
node bin/prepia.mjs serve --port 3000

# View analytics
node bin/prepia.mjs analytics

Programmatic

import { PrepiaEngine } from './src/index.mjs';

const engine = new PrepiaEngine();

// Flash mode - handles locally, no LLM needed
const result = await engine.process('Calculate 15% of 200', { mode: 'flash' });
console.log(result.answer); // "30"

// Shot mode - one optimized LLM call
const result2 = await engine.process('Summarize the latest AI trends', { mode: 'shot' });
console.log(result2.answer);

// Get analytics
const analytics = engine.getAnalytics();
console.log(analytics.metrics);

API

# Start server
node bin/prepia.mjs serve

# Submit a task
curl -X POST http://localhost:3000/task \
  -H "Content-Type: application/json" \
  -d '{"query": "What is machine learning?", "mode": "shot"}'

# Health check
curl http://localhost:3000/status

# Analytics
curl http://localhost:3000/analytics

API Reference

POST /task

Submit a task for processing.

Body:

{
  "query": "Your question or task",
  "mode": "flash|shot|stream",
  "options": {}
}

Response:

{
  "answer": "The response text",
  "mode": "shot",
  "usedLLM": true,
  "taskId": "task_123_abc",
  "provider": "openai",
  "tokens": { "prompt": 100, "completion": 50, "total": 150 },
  "quality": { "overall": 0.85 },
  "duration": 1234
}

GET /status

Health check endpoint.

GET /analytics

Usage statistics and cost savings.

POST /cache/clear

Clear all caches.

GET /plugins

List registered plugins.

POST /config

Update engine configuration.

Processing Modes

| Mode | Description | LLM Calls | Best For | |------|-------------|-----------|----------| | flash | Instant local answers | 0 | Math, greetings, time queries | | shot | One optimized LLM call | 1 | Most tasks (default) | | stream | Progressive updates | 1 | Long-running tasks |

Configuration

const engine = new PrepiaEngine({
  cache: {
    memoryMaxSize: 1000,      // Max memory cache entries
    memoryTTL: 300000,        // Memory TTL (5 min)
    diskTTL: 3600000,         // Disk TTL (1 hour)
    enableDisk: true,         // Enable disk persistence
    cacheDir: '.prepia/cache' // Cache directory
  },
  config: {
    maxContextTokens: 4000,   // Max context for LLM
    defaultMode: 'shot',      // Default processing mode
    enableLocalModel: true,   // Use local pattern matching
    enableCache: true,        // Enable caching
    enableQualityCheck: true  // Check output quality
  },
  rate: {
    providers: {
      openai: { requestsPerMinute: 60, tokensPerMinute: 100000 }
    }
  }
});

Plugin Development

Plugin Structure

my-plugin/
├── manifest.json    # Plugin metadata
└── index.mjs        # Plugin code

manifest.json

{
  "name": "my-plugin",
  "version": "1.0.0",
  "description": "My custom plugin",
  "main": "index.mjs",
  "dependencies": []
}

Plugin Interface

// index.mjs
export async function init(context) {
  // Called when plugin is initialized
}

export async function execute(params) {
  // Called when plugin is executed
  return { result: 'done' };
}

export async function cleanup() {
  // Called when plugin is cleaned up
}

PrepiScript

Custom task definition language:

TASK "Research and Summarize"
SEARCH "quantum computing breakthroughs 2024"
EXTRACT key_findings
FILTER relevance > 0.7
FORMAT markdown
DELIVER output

Testing

# Run all tests
node --test tests/**/*.test.mjs

# Run unit tests only
node --test tests/core/*.test.mjs tests/tools/*.test.mjs tests/cache/*.test.mjs

# Run integration tests
node --test tests/integration/*.test.mjs

# Run API tests
node --test tests/api/*.test.mjs

Modules

| Module | Description | |--------|-------------| | core/engine | Main orchestrator | | core/task-decomposer | Breaks tasks into sub-tasks | | core/context-packager | Compresses context for LLM | | core/prepimshot | One-shot prompt optimization | | tools/orchestrator | Tool routing and execution | | tools/web-search | Web search (DuckDuckGo, Wikipedia) | | tools/web-scraper | Content extraction | | tools/calculator | Safe math evaluation | | cache/manager | Cache orchestration | | cache/memory-store | LRU memory cache | | cache/disk-store | Persistent disk cache | | models/router | Multi-LLM routing | | models/local-model | Local pattern matching | | models/provider | LLM provider abstraction | | rate/shield | Rate limit protection | | rate/limiter | Token bucket / sliding window | | chain/dag | Task dependency graph | | chain/scheduler | Parallel task scheduler | | chain/executor | Task execution engine | | vault/knowledge-base | Persistent knowledge store | | vault/pattern-learner | Query pattern learning | | plugins/loader | Plugin discovery | | plugins/registry | Plugin lifecycle management | | plugins/sandbox | Plugin execution sandbox | | guard/checker | Output quality verification | | guard/fact-checker | Cross-reference verification | | guard/hallucination | Hallucination detection | | persona/detector | Context-aware persona selection | | analytics/tracker | Usage tracking | | analytics/dashboard | Cost/efficiency reporting | | stream/handler | Real-time progress updates | | security/sanitizer | Input/output sanitization | | security/privacy | PII detection and redaction | | shadow/daemon | Background task daemon | | edge/lite | Lightweight edge mode | | morph/optimizer | Workflow optimization | | network/p2p | Distributed task sharing | | script/parser | PrepiScript parser | | script/executor | PrepiScript executor | | api/server | HTTP API server | | api/routes | API route handlers | | api/middleware | Request/response middleware |

Contributing

Fork the repository
Create a feature branch
Write tests for new functionality
Ensure all tests pass: node --test tests/**/*.test.mjs
Submit a pull request

License

MIT