ai-inference-stepper

v1.0.0

Published

20 days ago

Production-grade AI inference stepper with multi-provider fallback

0High
0Medium
0Low

Stepper: Production-Grade AI Inference Orchestrator

Stepper is a resilient, multi-provider AI inference engine designed for high-load production environments. It handles provider fallbacks, intelligent caching, job queuing, and circuit breaking out of the box.

Back to root: ../../README.md
CommitDiary packages: ../api/README.md • ../web-dashboard/README.md • ../extension/README.md • ../core/README.md

✅ Standalone Setup (Local)

Stepper is open source and can run independently or inside this monorepo.

Prerequisites

Node.js 18+
pnpm
Redis (required)

Install

cd packages/stepper
pnpm install

Configure

cp .env.example .env

Add at least one provider key and Redis config in .env.

Run

docker run -d -p 6379:6379 redis:alpine
pnpm dev

✅ Why This Setup Works

Redis backs cache and queue state for resilient processing
Provider adapters allow fallback across multiple AI vendors
Callbacks let CommitDiary save reports and notify users reliably

🏗️ Architecture First

Understanding how Stepper handles your requests is key to using its full power.

Stepper Architecture

The Core Flow

Request Capture: Received via HTTP or internal Library Call.
Smart Caching: Checks Redis. Supports Stale-While-Revalidate (returns stale data while refreshing in background).
Job Queueing: If not cached, the request is enqueued via BullMQ to prevent overloading providers.
Resilient Orchestration:
- Priority Fallback: Tries Gemini → Cohere → HF Space in sequence.
- Circuit Breakers: Stops calling failing providers to allow them to recover.
- Rate Limiting: Per-provider bottlenecking to respect API quotas.
Finalize: Result is cached, and onSuccess callbacks are triggered.

[!TIP] For a deep dive into the system design, see ARCHITECTURE.md.

🔗 CommitDiary Integration Flow

flowchart LR
  A[API Server] --> B[Stepper enqueueReport]
  B --> C[Queue + Providers]
  C --> D[Callback to API]
  D --> E[Report saved + webhooks]

How CommitDiary Uses Stepper

API calls Stepper to generate commit reports
Stepper returns a jobId or cached result
Stepper posts back to API callbacks for delivery and persistence

See API docs for endpoints and callbacks: ../api/README.md

🧩 Component Deep Dives

Stepper is modular. Explore each subsystem's technical documentation:

🌟 Provider-Specific Optimizations

Google Gemini (Gemini 3 Models)

Stepper includes specialized optimizations for Google's Gemini 3 models based on official Google prompting strategies.

Why Gemini is Different:

XML-Structured Prompts: Uses <role>, <instructions>, <context>, <task> tags for better model understanding
Query Parameter Authentication: API key passed in URL (?key=YOUR_KEY) instead of headers
Locked Temperature: Must use temperature: 1.0 (Google requirement for optimal Gemini 3 performance)
Increased Token Limit: 4096 tokens for detailed analysis

Conditional Implementation:

if (provider === 'gemini') {
    // Use XML-structured prompt
    prompt = buildGeminiPrompt(input);
    // Append API key to URL
    endpoint = `${endpoint}?key=${apiKey}`;
}

This pattern allows each provider to have unique optimizations while maintaining clean code separation. See Provider Documentation for details.

⚡ Quick Start (3 Minutes)

1. Install Dependencies

pnpm install

2. Configure Environment

Copy the example and add your API keys:

cp .env.example .env

3. Spin Up Redis & Stepper

# Start Redis (Required)
docker run -d -p 6379:6379 redis:alpine

# Start in Dev Mode
pnpm dev

🧭 Monorepo Notes

API expects Stepper at STEPPER_URL (default http://localhost:3005)
If running inside the monorepo, keep API and Stepper dev servers up
See root setup guide: ../../README.md

🛠️ Usage Modes

Mode A: As a Library (Direct Integration)

Best for monorepos or when you want to avoid network overhead.

import { enqueueReport, registerCallbacks, initStepper } from "ai-inference-stepper";

// Optional: programmatic config overrides (no env file required)
initStepper({
  config: {
    redis: { url: "redis://localhost:6379" },
  },
  providers: [
    { name: "gemini", enabled: true, apiKey: process.env.GEMINI_API_KEY, baseUrl: "https://generativelanguage.googleapis.com/v1", modelName: "gemini-pro", concurrency: 2, rateLimitRPM: 5 }
  ]
});

// 1. Setup notification logic
registerCallbacks({
  onSuccess: (id, provider, data) => console.log(`✅ Success via ${provider}`),
  onFailure: (id, errors) => console.error("❌ Failed:", errors),
});

// 2. Trigger a request (returns immediately if queued or cached)
const result = await enqueueReport({
  commitSha: "abc123",
  message: "Fix bug",
  // ...other input
});

Mode B: As an HTTP Service

Best for microservices or remote deployments (Render/Railway).

# Send a report generation request
curl -X POST http://localhost:3001/v1/reports \
  -H "Content-Type: application/json" \
  -d '{ "message": "Refactor API", "files": ["src/app.ts"] }'

#### CLI (npm)

```bash
# One-off run
npx ai-inference-stepper

# Or install and run
npm i -g ai-inference-stepper
ai-inference-stepper

Environment Setup (Service Mode)

Stepper reads config from environment variables. Use .env for local runs:

cp .env.example .env

If you install Stepper as a library, you can either:

Provide env variables in your host app process (recommended for deployments), or
Call initStepper({ config, providers }) programmatically to override defaults.

Response gives you a JobID to poll

{ "status": "queued", "jobId": "...", "statusUrl": "..." }


---

## 🤝 Contributing & Community

We love contributors! Whether it's a bug report or a new provider adapter:

- **Issues**: Found a bug? [Raise an issue](https://github.com/samuel-adedigba/ai-inference-stepper/issues).
- **Pull Requests**: Have a fix? [Open a PR](https://github.com/samuel-adedigba/ai-inference-stepper/pulls).

If contributing inside the CommitDiary monorepo, start at [../../README.md](../../README.md) for the full workflow.

---

## 📜 License

**Custom Attribution License**

You are free to use, modify, and distribute this software for personal or commercial projects, provided that:

1.  **Credit is given**: You must attribute the original work to **Samuel Adedigba (@samuel-adedigba)**.
2.  **Pull Requests**: Contributions and improvements are encouraged back to this core repository.

_For full details, see the [LICENSE](./LICENSE) file (MIT-based with attribution)._