mbz-voice-sdk
v1.0.21
Published
ποΈ MBZ Voice SDK: Easily add voice recognition, Gemini-based AI replies, and TTS to any web app.
Maintainers
Readme
ποΈ MBZ Voice SDK
Speak. Think. Respond. Seamlessly.
MBZ-Voice-SDK is a powerful developer tool that enables you to integrate voice input, AI understanding (via Gemini), and spoken responses into any modern web app. Whether you're building a chatbot, AI assistant, or a voice-powered UI β this SDK makes it plug-and-play.
π Table of Contents
- Features
- Requirements
- Installation
- Backend Setup
- Usage Examples
- API Documentation
- Troubleshooting
- Contributing
- Security Notice
- Tools Used
- License
- Support
π₯ Features
β
Voice Input: Capture user speech via browser microphone using Web Speech API
β
AI Processing: Gemini-powered AI backend built with FastAPI
β
Voice Response: Convert AI text responses to spoken words using Web Speech TTS
β
Audio Controls: Easily toggle mute/unmute functionality
β
Conversation Memory: Store the last 3 Q&A exchanges using localStorage
β
Framework Agnostic: Seamlessly integrate with plain JavaScript, React, Vue, or any modern frontend framework
β
Customizable: Configure language, voice type, and response behavior
β
Lightweight: Minimal dependencies for optimal performance
π» Requirements
- Modern web browser with support for:
- Web Speech API (SpeechRecognition)
- Web Speech API (SpeechSynthesis)
- localStorage
- Node.js 14+ (for development)
- Python 3.8+ (for backend)
- Gemini API key from Google AI Studio
π¦ Install the SDK
NPM Installation
After publishing on npm:
npx mbz-voice-sdk init
### Creating a Comprehensive README.md File
Here's an enhanced README.md file with more complete details for the MBZ Voice SDK:
```markdown
...Yarn Installation
yarn add mbz-voice-sdkLocal Installation (if cloned)
cd mbz-voice-sdk/sdk
npm installCDN Usage
<script src="https://unpkg.com/mbz-voice-sdk@latest/dist/mbz-voice-sdk.min.js"></script>βοΈ Backend Setup Guide
This SDK requires a backend API endpoint connected to Gemini (Google AI). We've provided a ready-to-use FastAPI backend in the /backend folder.
1οΈβ£ Navigate to the backend directory
cd ../backend2οΈβ£ Install Python dependencies
pip install -r requirements.txt3οΈβ£ Add Your Gemini API Key
Create a .env file in the backend folder and paste your Gemini API key:
GEMINI_API_KEY=your_google_gemini_api_key_hereπ Get your key from: https://makersuite.google.com/app/apikey
4οΈβ£ Run the server
uvicorn main:app --reloadNow your backend is live at:
http://localhost:8000/askπ§ SDK Usage Example
Basic Usage
import { MBZVoiceAgent } from "mbz-voice-sdk";
const agent = new MBZVoiceAgent({
apiUrl: "http://localhost:8000/ask",
lang: "en-US",
speak: true
});
agent.onTranscript((text) => {
console.log("User said:", text);
});
agent.onResponse((reply) => {
console.log("AI replied:", reply);
});
document.getElementById("start-btn").onclick = () => agent.listen();React Integration
import React, { useEffect, useState } from 'react';
import { MBZVoiceAgent } from 'mbz-voice-sdk';
function VoiceAssistant() {
const [transcript, setTranscript] = useState('');
const [response, setResponse] = useState('');
const [isListening, setIsListening] = useState(false);
const [agent, setAgent] = useState(null);
useEffect(() => {
// Initialize the agent
const voiceAgent = new MBZVoiceAgent({
apiUrl: "http://localhost:8000/ask",
lang: "en-US",
speak: true
});
// Set up event handlers
voiceAgent.onTranscript((text) => {
setTranscript(text);
});
voiceAgent.onResponse((reply) => {
setResponse(reply);
});
voiceAgent.onListeningChange((listening) => {
setIsListening(listening);
});
setAgent(voiceAgent);
// Cleanup on unmount
return () => {
voiceAgent.cleanup();
};
}, []);
const handleListen = () => {
if (agent) {
agent.listen();
}
};
return (
<div className="voice-assistant">
<button
onClick={handleListen}
className={isListening ? 'listening' : ''}
>
{isListening ? 'π΄ Listening...' : 'ποΈ Start Talking'}
</button>
{transcript && (
<div className="transcript">
<h3>You said:</h3>
<p>{transcript}</p>
</div>
)}
{response && (
<div className="response">
<h3>AI response:</h3>
<p>{response}</p>
</div>
)}
</div>
);
}
export default VoiceAssistant;π§ͺ HTML Quick Test
<button id="start-btn">ποΈ Start Talking</button>
<div id="transcript"></div>
<div id="response"></div>
<script type="module">
import { MBZVoiceAgent } from 'mbz-voice-sdk';
const agent = new MBZVoiceAgent({
apiUrl: 'http://localhost:8000/ask',
speak: true
});
const transcriptEl = document.getElementById('transcript');
const responseEl = document.getElementById('response');
agent.onTranscript(text => {
console.log("π€", text);
transcriptEl.textContent = `You said: ${text}`;
});
agent.onResponse(reply => {
console.log("π€", reply);
responseEl.textContent = `AI says: ${reply}`;
});
document.getElementById("start-btn").onclick = () => agent.listen();
</script>π API Documentation
MBZVoiceAgent Class
The main class for interacting with the SDK.
Constructor
const agent = new MBZVoiceAgent(options);Options
| Option | Type | Default | Description
|-----|-----|-----|-----
| apiUrl | String | Required | The URL of your backend API endpoint
| lang | String | 'en-US' | The language for speech recognition
| speak | Boolean | true | Whether to speak the AI's response
| voiceIndex | Number | 0 | Index of the voice to use for speech synthesis
| pitch | Number | 1.0 | The pitch of the voice (0.1 to 2.0)
| rate | Number | 1.0 | The speed of the voice (0.1 to 10.0)
| volume | Number | 1.0 | The volume of the voice (0.0 to 1.0)
| maxHistory | Number | 3 | Maximum number of Q&A pairs to store in history
Methods
| Method | Parameters | Description
|-----|-----|-----|-----
| listen() | None | Start listening for voice input
| stop() | None | Stop listening for voice input
| mute() | None | Mute the voice response
| unmute() | None | Unmute the voice response
| cleanup() | None | Clean up resources and event listeners
| onTranscript(callback) | Function | Set callback for transcript events
| onResponse(callback) | Function | Set callback for AI response events
| onListeningChange(callback) | Function | Set callback for listening state changes
| onError(callback) | Function | Set callback for error events
| getHistory() | None | Get the conversation history
| clearHistory() | None | Clear the conversation history
π§ Troubleshooting
Microphone Not Working
- Ensure your browser has permission to access the microphone
- Check if your microphone is properly connected and working
- Try using a different browser (Chrome and Edge have the best support)
Speech Recognition Not Starting
- Make sure you're using a supported browser (Chrome, Edge, Safari)
- Check your internet connection
- Verify that your site is served over HTTPS (required for production)
Backend Connection Issues
- Confirm your backend server is running
- Check for CORS issues (the backend should allow requests from your frontend)
- Verify your API URL is correct in the SDK initialization
Voice Response Not Working
- Check if your device's volume is turned on
- Make sure the
speakoption is set totrue - Try using a different voice by changing the
voiceIndex
π€ Contributing
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature- Commit your changes:
git commit -m 'Add some amazing feature'- Push to the branch:
git push origin feature/amazing-feature- Open a Pull Request
Development Setup
# Clone the repository
git clone https://github.com/ProMBZ/mbz-voice-sdk.git
# Install dependencies
cd mbz-voice-sdk
npm install
# Run development server
npm run dev
# Build for production
npm run buildπ Security Notice
This SDK does not use any built-in Gemini key.
π You are responsible for adding your own Gemini key to the backend.
Never include your Gemini key in frontend code.
π§° Tools Used
Frontend:
JavaScript (SpeechRecognition + TTS APIs)
localStorage for conversation persistence
Rollup for bundling
Backend:
FastAPI (Python)
Google Generative AI SDK (Gemini 1.5 Flash)
Python-dotenv for environment variables
π License
MIT Β© 2025 β Developed by Muhammad (MBZ-Voice-SDK)π GitHub: @ProMBZ
π¬ Support
If you have questions, suggestions, or want to collaborate:π§ Email: [email protected]π Portfolio: https://kzml8bqhnxp4cn0duf08.lite.vusercontent.net/
Made with β€οΈ by Muhammad
This comprehensive README.md file includes all the essential details about the MBZ Voice SDK, including installation instructions, usage examples, API documentation, troubleshooting tips, and contribution guidelines. It's well-structured with clear sections and formatting to make it easy to navigate and understand.
<Actions>
<Action name="Create a demo implementation" description="Build a simple demo app using the MBZ Voice SDK" />
<Action name="Add code examples for Vue.js" description="Add specific code examples for Vue.js integration" />
<Action name="Create backend API documentation" description="Generate detailed API documentation for the backend endpoints" />
<Action name="Add deployment instructions" description="Create a guide for deploying the backend to production" />
<Action name="Create a video tutorial" description="Outline steps for creating a video tutorial for the SDK" />
</Actions>
