@sumitchn12/voice-ai-backend-sdk

v1.1.0

Published

a month ago

Voice AI Backend SDK

0High
0Medium
0Low

voice-ai-backend gemini-pro llm-logic intent-guard natural-language-processing ollama-support voice-automation ai-controller

🧠 Voice AI Backend SDK

The Intelligent Logic Layer for Voice-Driven Web Applications.

The Voice AI Backend SDK is a powerful Node.js/Express service that transforms raw user speech and page context into actionable UI commands. It acts as the "Prefrontal Cortex" for the @voice-ai/frontend-sdk.

🔥 Features

🛡️ IntentGuard™ Technology

Built for high-reliability systems. IntentGuard acts as a safety net that detects when a user is asking for Information vs. requesting an Action.

It forcefully strips hallucinated "Click" actions from LLM responses if the user only asked a question (e.g., "What is the status?").

🤖 LLM Agnostic (Gemini & Local)

Choose your provider based on your privacy and latency needs:

Google Gemini Pro: High-end reasoning and world-class speed.
Local LLMs (Llama 3.2, Qwen): Run via Ollama or LM Studio for 100% private, local processing.

🎯 Optimized Prompt Engineering

Uses a Few-Shot Prompting strategy with strict JSON schemas. This ensures that even smaller 3B or 7B models can target specific DOM IDs with 99% accuracy.

🌍 Multi-Lingual Support

Native support for Hindi, English, and Hinglish. The engine automatically detects the user's language and replies in the same tone.

🛠️ Installation

npm install @voice-ai/backend-sdk

📖 Quick Start

1. Configure `.env`

LLM_PROVIDER=gemini # or 'local'
GEMINI_API_KEY=your_key
LOCAL_LLM_URL=http://localhost:11434/v1/chat/completions # For Ollama
LOCAL_LLM_MODEL=llama3.2

2. Integration

import { voiceController } from '@voice-ai/backend-sdk';
import express from 'express';

const app = express();
app.use(express.json());

// Main Voice Endpoint
app.post('/voice', voiceController);

app.listen(5000, () => console.log("Voice AI Brain Online on 5000"));

🔄 How It Works (The Loop)

Capture: Frontend captures speech and scrapes "Fingerprinted" UI context.
Process: Backend receives context and speech.
Reason: LLM identifies the specific ID (e.g., voice-ai-el-9n8t) to target.
Guard: IntentGuard verifies the command isn't a hallucination.
Execute: Frontend receives JSON and simulates human-like input events.

♿ Inclusion Benefits

Accessible Logic: Simplifies complex UI trees into simple vocal questions.
Context Awareness: Can explain what is happening on screen to users who cannot see it.
Safety: Robust error handling ensures the AI never performs dangerous actions by mistake.

Empowering developers to build the next generation of voice-first interfaces.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme