mbz-voice-sdk

v1.0.21

Published

a year ago

🎙️ MBZ Voice SDK: Easily add voice recognition, Gemini-based AI replies, and TTS to any web app.

0High
0Medium
0Low

mbz1415

voice ai sdk gemini speech fastapi tts chatbot mbz mbz-voice-sdk

🎙️ MBZ Voice SDK

Speak. Think. Respond. Seamlessly.

MBZ-Voice-SDK is a powerful developer tool that enables you to integrate voice input, AI understanding (via Gemini), and spoken responses into any modern web app. Whether you're building a chatbot, AI assistant, or a voice-powered UI — this SDK makes it plug-and-play.

📋 Table of Contents

🔥 Features

✅ Voice Input: Capture user speech via browser microphone using Web Speech API
✅ AI Processing: Gemini-powered AI backend built with FastAPI
✅ Voice Response: Convert AI text responses to spoken words using Web Speech TTS
✅ Audio Controls: Easily toggle mute/unmute functionality
✅ Conversation Memory: Store the last 3 Q&A exchanges using localStorage
✅ Framework Agnostic: Seamlessly integrate with plain JavaScript, React, Vue, or any modern frontend framework
✅ Customizable: Configure language, voice type, and response behavior
✅ Lightweight: Minimal dependencies for optimal performance

💻 Requirements

Modern web browser with support for:
- Web Speech API (SpeechRecognition)
- Web Speech API (SpeechSynthesis)
- localStorage
Node.js 14+ (for development)
Python 3.8+ (for backend)
Gemini API key from Google AI Studio

📦 Install the SDK

NPM Installation

After publishing on npm:

npx mbz-voice-sdk init
### Creating a Comprehensive README.md File

Here's an enhanced README.md file with more complete details for the MBZ Voice SDK:

```markdown
...

Yarn Installation

yarn add mbz-voice-sdk

Local Installation (if cloned)

cd mbz-voice-sdk/sdk
npm install

CDN Usage

<script src="https://unpkg.com/mbz-voice-sdk@latest/dist/mbz-voice-sdk.min.js"></script>

⚙️ Backend Setup Guide

This SDK requires a backend API endpoint connected to Gemini (Google AI). We've provided a ready-to-use FastAPI backend in the /backend folder.

1️⃣ Navigate to the backend directory

cd ../backend

2️⃣ Install Python dependencies

pip install -r requirements.txt

3️⃣ Add Your Gemini API Key

Create a .env file in the backend folder and paste your Gemini API key:

GEMINI_API_KEY=your_google_gemini_api_key_here

👉 Get your key from: https://makersuite.google.com/app/apikey

4️⃣ Run the server

uvicorn main:app --reload

Now your backend is live at:

http://localhost:8000/ask

🧠 SDK Usage Example

Basic Usage

import { MBZVoiceAgent } from "mbz-voice-sdk";

const agent = new MBZVoiceAgent({
  apiUrl: "http://localhost:8000/ask",
  lang: "en-US",
  speak: true
});

agent.onTranscript((text) => {
  console.log("User said:", text);
});

agent.onResponse((reply) => {
  console.log("AI replied:", reply);
});

document.getElementById("start-btn").onclick = () => agent.listen();

React Integration

import React, { useEffect, useState } from 'react';
import { MBZVoiceAgent } from 'mbz-voice-sdk';

function VoiceAssistant() {
  const [transcript, setTranscript] = useState('');
  const [response, setResponse] = useState('');
  const [isListening, setIsListening] = useState(false);
  const [agent, setAgent] = useState(null);

  useEffect(() => {
    // Initialize the agent
    const voiceAgent = new MBZVoiceAgent({
      apiUrl: "http://localhost:8000/ask",
      lang: "en-US",
      speak: true
    });

    // Set up event handlers
    voiceAgent.onTranscript((text) => {
      setTranscript(text);
    });

    voiceAgent.onResponse((reply) => {
      setResponse(reply);
    });

    voiceAgent.onListeningChange((listening) => {
      setIsListening(listening);
    });

    setAgent(voiceAgent);

    // Cleanup on unmount
    return () => {
      voiceAgent.cleanup();
    };
  }, []);

  const handleListen = () => {
    if (agent) {
      agent.listen();
    }
  };

  return (
    <div className="voice-assistant">
      <button 
        onClick={handleListen}
        className={isListening ? 'listening' : ''}
      >
        {isListening ? '🔴 Listening...' : '🎙️ Start Talking'}
      </button>
      
      {transcript && (
        <div className="transcript">
          <h3>You said:</h3>
          <p>{transcript}</p>
        </div>
      )}
      
      {response && (
        <div className="response">
          <h3>AI response:</h3>
          <p>{response}</p>
        </div>
      )}
    </div>
  );
}

export default VoiceAssistant;

🧪 HTML Quick Test

<button id="start-btn">🎙️ Start Talking</button>
<div id="transcript"></div>
<div id="response"></div>

<script type="module">
  import { MBZVoiceAgent } from 'mbz-voice-sdk';

  const agent = new MBZVoiceAgent({ 
    apiUrl: 'http://localhost:8000/ask',
    speak: true
  });

  const transcriptEl = document.getElementById('transcript');
  const responseEl = document.getElementById('response');

  agent.onTranscript(text => {
    console.log("🎤", text);
    transcriptEl.textContent = `You said: ${text}`;
  });
  
  agent.onResponse(reply => {
    console.log("🤖", reply);
    responseEl.textContent = `AI says: ${reply}`;
  });

  document.getElementById("start-btn").onclick = () => agent.listen();
</script>

📚 API Documentation

`MBZVoiceAgent` Class

The main class for interacting with the SDK.

Constructor

const agent = new MBZVoiceAgent(options);

Options

| Option | Type | Default | Description |-----|-----|-----|----- | apiUrl | String | Required | The URL of your backend API endpoint | lang | String | 'en-US' | The language for speech recognition | speak | Boolean | true | Whether to speak the AI's response | voiceIndex | Number | 0 | Index of the voice to use for speech synthesis | pitch | Number | 1.0 | The pitch of the voice (0.1 to 2.0) | rate | Number | 1.0 | The speed of the voice (0.1 to 10.0) | volume | Number | 1.0 | The volume of the voice (0.0 to 1.0) | maxHistory | Number | 3 | Maximum number of Q&A pairs to store in history

Methods

| Method | Parameters | Description |-----|-----|-----|----- | listen() | None | Start listening for voice input | stop() | None | Stop listening for voice input | mute() | None | Mute the voice response | unmute() | None | Unmute the voice response | cleanup() | None | Clean up resources and event listeners | onTranscript(callback) | Function | Set callback for transcript events | onResponse(callback) | Function | Set callback for AI response events | onListeningChange(callback) | Function | Set callback for listening state changes | onError(callback) | Function | Set callback for error events | getHistory() | None | Get the conversation history | clearHistory() | None | Clear the conversation history

🔧 Troubleshooting

Microphone Not Working

Ensure your browser has permission to access the microphone
Check if your microphone is properly connected and working
Try using a different browser (Chrome and Edge have the best support)

Speech Recognition Not Starting

Make sure you're using a supported browser (Chrome, Edge, Safari)
Check your internet connection
Verify that your site is served over HTTPS (required for production)

Backend Connection Issues

Confirm your backend server is running
Check for CORS issues (the backend should allow requests from your frontend)
Verify your API URL is correct in the SDK initialization

Voice Response Not Working

Check if your device's volume is turned on
Make sure the speak option is set to true
Try using a different voice by changing the voiceIndex

🤝 Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a feature branch:

git checkout -b feature/amazing-feature

Commit your changes:

git commit -m 'Add some amazing feature'

Push to the branch:

git push origin feature/amazing-feature

Open a Pull Request

Development Setup

# Clone the repository
git clone https://github.com/ProMBZ/mbz-voice-sdk.git

# Install dependencies
cd mbz-voice-sdk
npm install

# Run development server
npm run dev

# Build for production
npm run build

🔐 Security Notice

This SDK does not use any built-in Gemini key.

🔐 You are responsible for adding your own Gemini key to the backend.

Never include your Gemini key in frontend code.

🧰 Tools Used

Frontend:
JavaScript (SpeechRecognition + TTS APIs)
localStorage for conversation persistence
Rollup for bundling
Backend:
FastAPI (Python)
Google Generative AI SDK (Gemini 1.5 Flash)
Python-dotenv for environment variables

📄 License

💬 Support

If you have questions, suggestions, or want to collaborate:📧 Email: [email protected]🌍 Portfolio: https://kzml8bqhnxp4cn0duf08.lite.vusercontent.net/

Made with ❤️ by Muhammad


This comprehensive README.md file includes all the essential details about the MBZ Voice SDK, including installation instructions, usage examples, API documentation, troubleshooting tips, and contribution guidelines. It's well-structured with clear sections and formatting to make it easy to navigate and understand.



<Actions>
  <Action name="Create a demo implementation" description="Build a simple demo app using the MBZ Voice SDK" />
  <Action name="Add code examples for Vue.js" description="Add specific code examples for Vue.js integration" />
  <Action name="Create backend API documentation" description="Generate detailed API documentation for the backend endpoints" />
  <Action name="Add deployment instructions" description="Create a guide for deploying the backend to production" />
  <Action name="Create a video tutorial" description="Outline steps for creating a video tutorial for the SDK" />
</Actions>