@sumitchn12/voice-ai-backend-sdk
v1.1.0
Published
Voice AI Backend SDK
Maintainers
Readme
🧠 Voice AI Backend SDK
The Intelligent Logic Layer for Voice-Driven Web Applications.
The Voice AI Backend SDK is a powerful Node.js/Express service that transforms raw user speech and page context into actionable UI commands. It acts as the "Prefrontal Cortex" for the @voice-ai/frontend-sdk.
🔥 Features
🛡️ IntentGuard™ Technology
Built for high-reliability systems. IntentGuard acts as a safety net that detects when a user is asking for Information vs. requesting an Action.
- It forcefully strips hallucinated "Click" actions from LLM responses if the user only asked a question (e.g., "What is the status?").
🤖 LLM Agnostic (Gemini & Local)
Choose your provider based on your privacy and latency needs:
- Google Gemini Pro: High-end reasoning and world-class speed.
- Local LLMs (Llama 3.2, Qwen): Run via Ollama or LM Studio for 100% private, local processing.
🎯 Optimized Prompt Engineering
Uses a Few-Shot Prompting strategy with strict JSON schemas. This ensures that even smaller 3B or 7B models can target specific DOM IDs with 99% accuracy.
🌍 Multi-Lingual Support
Native support for Hindi, English, and Hinglish. The engine automatically detects the user's language and replies in the same tone.
🛠️ Installation
npm install @voice-ai/backend-sdk📖 Quick Start
1. Configure .env
LLM_PROVIDER=gemini # or 'local'
GEMINI_API_KEY=your_key
LOCAL_LLM_URL=http://localhost:11434/v1/chat/completions # For Ollama
LOCAL_LLM_MODEL=llama3.22. Integration
import { voiceController } from '@voice-ai/backend-sdk';
import express from 'express';
const app = express();
app.use(express.json());
// Main Voice Endpoint
app.post('/voice', voiceController);
app.listen(5000, () => console.log("Voice AI Brain Online on 5000"));🔄 How It Works (The Loop)
- Capture: Frontend captures speech and scrapes "Fingerprinted" UI context.
- Process: Backend receives context and speech.
- Reason: LLM identifies the specific ID (e.g.,
voice-ai-el-9n8t) to target. - Guard: IntentGuard verifies the command isn't a hallucination.
- Execute: Frontend receives JSON and simulates human-like input events.
♿ Inclusion Benefits
- Accessible Logic: Simplifies complex UI trees into simple vocal questions.
- Context Awareness: Can explain what is happening on screen to users who cannot see it.
- Safety: Robust error handling ensures the AI never performs dangerous actions by mistake.
Empowering developers to build the next generation of voice-first interfaces.
