prepia
v1.0.4
Published
AI middleware that reduces LLM quota usage by 80-95% through smart caching, task decomposition, and context optimization
Maintainers
Readme
Prepia
AI Middleware that reduces LLM quota usage by 80-95%
Prepia is a smart middleware layer that sits between AI agents and LLMs. Instead of making many LLM calls for complex tasks, Prepia handles the heavy lifting (searching, scraping, processing, caching) and sends ONE optimized call to the LLM.
Architecture
┌─────────────────────────────────────┐
│ AI Agent / Client │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ Prepia API Server │
│ (REST / CLI / Programmatic) │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ Core Engine │
│ ┌─────────┬──────────┬───────────┐ │
│ │ Task │ Context │ PrepiShot │ │
│ │Decompose│ Packager │ Optimizer │ │
│ └────┬────┴────┬─────┴─────┬─────┘ │
└───────┼─────────┼───────────┼───────┘
│ │ │
┌──────────────────┼─────────┼───────────┼──────────────────┐
│ │ │ │ │
┌────▼────┐ ┌─────▼────┐ ┌──▼───┐ ┌────▼─────┐ ┌───────▼──────┐
│ Tools │ │ Cache │ │Models│ │ Chain │ │ Security │
│Orchestr.│ │ Manager │ │Router│ │Scheduler │ │ Sanitizer │
└────┬────┘ └──────────┘ └──────┘ └──────────┘ └──────────────┘
│
┌────┼────┬────────┬──────────┐
│ │ │ │ │
Web Web File Calc HTTP
Search Scraper Ops ulator ClientInstallation
# Clone or copy the prepia directory
cd prepia
# No npm install needed - uses only Node.js built-in modules!
# Requires Node.js >= 22.0.0
# Make CLI executable
chmod +x bin/prepia.mjsQuick Start
🤖 For AI Agents (Most Important)
Before calling your LLM, route through Prepia. One line saves 80-95% of quota.
import { PrepiaEngine } from 'prepia';
const engine = new PrepiaEngine();
// Simple queries → 0 LLM calls
const r1 = await engine.process("What is 15% of 200?", { mode: 'flash' });
// Answer: "30", usedLLM: false, duration: 1ms
// Complex queries → 1 optimized LLM call (instead of many)
const r2 = await engine.process("Research quantum computing trends", { mode: 'shot' });
// Prepia searches, scrapes, compresses, makes ONE callAgent decision flow:
User question → Prepia flash? → Yes → Answer (0 LLM calls)
→ No → Prepia shot → Search + scrape + 1 LLM call → AnswerCLI
# Process a simple query (flash mode - instant, no LLM)
node bin/prepia.mjs process "What is 15% of 200?" --mode flash
# Process with one LLM call (shot mode - default)
node bin/prepia.mjs process "Explain quantum computing"
# Start the API server
node bin/prepia.mjs serve --port 3000
# View analytics
node bin/prepia.mjs analyticsProgrammatic
import { PrepiaEngine } from './src/index.mjs';
const engine = new PrepiaEngine();
// Flash mode - handles locally, no LLM needed
const result = await engine.process('Calculate 15% of 200', { mode: 'flash' });
console.log(result.answer); // "30"
// Shot mode - one optimized LLM call
const result2 = await engine.process('Summarize the latest AI trends', { mode: 'shot' });
console.log(result2.answer);
// Get analytics
const analytics = engine.getAnalytics();
console.log(analytics.metrics);API
# Start server
node bin/prepia.mjs serve
# Submit a task
curl -X POST http://localhost:3000/task \
-H "Content-Type: application/json" \
-d '{"query": "What is machine learning?", "mode": "shot"}'
# Health check
curl http://localhost:3000/status
# Analytics
curl http://localhost:3000/analyticsAPI Reference
POST /task
Submit a task for processing.
Body:
{
"query": "Your question or task",
"mode": "flash|shot|stream",
"options": {}
}Response:
{
"answer": "The response text",
"mode": "shot",
"usedLLM": true,
"taskId": "task_123_abc",
"provider": "openai",
"tokens": { "prompt": 100, "completion": 50, "total": 150 },
"quality": { "overall": 0.85 },
"duration": 1234
}GET /status
Health check endpoint.
GET /analytics
Usage statistics and cost savings.
POST /cache/clear
Clear all caches.
GET /plugins
List registered plugins.
POST /config
Update engine configuration.
Processing Modes
| Mode | Description | LLM Calls | Best For |
|------|-------------|-----------|----------|
| flash | Instant local answers | 0 | Math, greetings, time queries |
| shot | One optimized LLM call | 1 | Most tasks (default) |
| stream | Progressive updates | 1 | Long-running tasks |
Configuration
const engine = new PrepiaEngine({
cache: {
memoryMaxSize: 1000, // Max memory cache entries
memoryTTL: 300000, // Memory TTL (5 min)
diskTTL: 3600000, // Disk TTL (1 hour)
enableDisk: true, // Enable disk persistence
cacheDir: '.prepia/cache' // Cache directory
},
config: {
maxContextTokens: 4000, // Max context for LLM
defaultMode: 'shot', // Default processing mode
enableLocalModel: true, // Use local pattern matching
enableCache: true, // Enable caching
enableQualityCheck: true // Check output quality
},
rate: {
providers: {
openai: { requestsPerMinute: 60, tokensPerMinute: 100000 }
}
}
});Plugin Development
Plugin Structure
my-plugin/
├── manifest.json # Plugin metadata
└── index.mjs # Plugin codemanifest.json
{
"name": "my-plugin",
"version": "1.0.0",
"description": "My custom plugin",
"main": "index.mjs",
"dependencies": []
}Plugin Interface
// index.mjs
export async function init(context) {
// Called when plugin is initialized
}
export async function execute(params) {
// Called when plugin is executed
return { result: 'done' };
}
export async function cleanup() {
// Called when plugin is cleaned up
}PrepiScript
Custom task definition language:
TASK "Research and Summarize"
SEARCH "quantum computing breakthroughs 2024"
EXTRACT key_findings
FILTER relevance > 0.7
FORMAT markdown
DELIVER outputTesting
# Run all tests
node --test tests/**/*.test.mjs
# Run unit tests only
node --test tests/core/*.test.mjs tests/tools/*.test.mjs tests/cache/*.test.mjs
# Run integration tests
node --test tests/integration/*.test.mjs
# Run API tests
node --test tests/api/*.test.mjsModules
| Module | Description |
|--------|-------------|
| core/engine | Main orchestrator |
| core/task-decomposer | Breaks tasks into sub-tasks |
| core/context-packager | Compresses context for LLM |
| core/prepimshot | One-shot prompt optimization |
| tools/orchestrator | Tool routing and execution |
| tools/web-search | Web search (DuckDuckGo, Wikipedia) |
| tools/web-scraper | Content extraction |
| tools/calculator | Safe math evaluation |
| cache/manager | Cache orchestration |
| cache/memory-store | LRU memory cache |
| cache/disk-store | Persistent disk cache |
| models/router | Multi-LLM routing |
| models/local-model | Local pattern matching |
| models/provider | LLM provider abstraction |
| rate/shield | Rate limit protection |
| rate/limiter | Token bucket / sliding window |
| chain/dag | Task dependency graph |
| chain/scheduler | Parallel task scheduler |
| chain/executor | Task execution engine |
| vault/knowledge-base | Persistent knowledge store |
| vault/pattern-learner | Query pattern learning |
| plugins/loader | Plugin discovery |
| plugins/registry | Plugin lifecycle management |
| plugins/sandbox | Plugin execution sandbox |
| guard/checker | Output quality verification |
| guard/fact-checker | Cross-reference verification |
| guard/hallucination | Hallucination detection |
| persona/detector | Context-aware persona selection |
| analytics/tracker | Usage tracking |
| analytics/dashboard | Cost/efficiency reporting |
| stream/handler | Real-time progress updates |
| security/sanitizer | Input/output sanitization |
| security/privacy | PII detection and redaction |
| shadow/daemon | Background task daemon |
| edge/lite | Lightweight edge mode |
| morph/optimizer | Workflow optimization |
| network/p2p | Distributed task sharing |
| script/parser | PrepiScript parser |
| script/executor | PrepiScript executor |
| api/server | HTTP API server |
| api/routes | API route handlers |
| api/middleware | Request/response middleware |
Contributing
- Fork the repository
- Create a feature branch
- Write tests for new functionality
- Ensure all tests pass:
node --test tests/**/*.test.mjs - Submit a pull request
License
MIT
