esap-aiui-react
v1.0.5
Published
Zero-configuration voice and chat control for React applications through autonomous semantic UI discovery
Maintainers
Readme
AIUI React SDK
Zero-configuration voice and chat control for any React applications through autonomous semantic UI discovery.
AIUI React SDK enables natural language interaction with any React application through both voice commands and text chat, without manual UI annotation or intent mapping. The framework employs real-time DOM observation and semantic element discovery to automatically understand your application's interface, allowing users to control your app through conversational voice or text-based commands.
Enterprise features at a glance
- Security: API-key protected WebSocket channels and safety-rule gates
- Privacy: client-side redaction and sensitive-field filtering
- Auditability: server-side action logs and context update tracing
- Deployment: cloud, private VPC, or fully on-prem
Overview
Traditional voice control solutions require extensive manual configuration, predefined intent schemas, or explicit UI element annotation. AIUI eliminates this overhead through a novel semantic discovery architecture that automatically maps UI elements to their contextual meaning, enabling immediate voice interaction with zero setup.
Core Innovation
The SDK implements a hybrid discovery engine combining MutationObserver-based DOM monitoring with intelligent semantic labeling to achieve sub-500ms voice-to-action latency. An incremental context synchronization protocol reduces bandwidth consumption by 70% compared to full-state transmission while maintaining real-time UI awareness.
Key differentiators:
- Zero-configuration deployment — Works with existing React applications without code modification
- Framework-agnostic compatibility — Supports Material-UI, Ant Design, Chakra UI, and native HTML
- Semantic element discovery — Automatic identification of interactive elements via ARIA labels and heuristic analysis
- Privacy-preserving architecture — Client-side filtering with configurable redaction patterns
- Multi-backend AI support — Compatible with OpenAI GPT-4, Anthropic Claude, Google Gemini, and local models
Architecture

Protocol Design
The SDK implements a multi-channel WebSocket architecture to optimize for both latency and bandwidth:
| Channel | Transport | Purpose | Update Frequency | Latency Requirement |
|---------|-----------|---------|------------------|---------------------|
| /context | JSON over WebSocket | UI state synchronization | Event-driven (~1/sec) | Non-critical |
| /audio | Binary PCM over WebSocket | Voice I/O streams | Continuous (16kHz) | <500ms critical |
| /chat | JSON over WebSocket | Text-based messaging | On-demand | <200ms preferred |
This separation prevents JSON parsing overhead from blocking time-sensitive audio transmission while enabling efficient differential UI updates and real-time text chat.
Installation
npm install esap-aiui-reactPrerequisites
Required browser APIs:
- AudioWorklet API (Chrome 66+, Firefox 76+, Safari 14.5+)
- WebSocket API
- MediaDevices API (microphone access)
Required static assets:
The SDK requires two AudioWorklet processor files in your public directory for audio I/O:
1. Create public/player-processor.js
class PlayerProcessor extends AudioWorkletProcessor {
constructor() {
super();
this.queue = [];
this.offset = 0;
this.port.onmessage = e => this.queue.push(e.data);
}
process(_, outputs) {
const out = outputs[0][0];
let idx = 0;
while (idx < out.length) {
if (!this.queue.length) {
out.fill(0, idx);
break;
}
const buf = this.queue[0];
const copy = Math.min(buf.length - this.offset, out.length - idx);
out.set(buf.subarray(this.offset, this.offset + copy), idx);
idx += copy;
this.offset += copy;
if (this.offset >= buf.length) {
this.queue.shift();
this.offset = 0;
}
}
return true;
}
}
registerProcessor('player-processor', PlayerProcessor);2. Create public/worklet-processor.js
class MicProcessor extends AudioWorkletProcessor {
constructor() {
super();
this.dstRate = 16_000;
this.frameMs = 20;
this.srcRate = sampleRate;
this.ratio = this.srcRate / this.dstRate;
this.samplesPerPacket = Math.round(this.dstRate * this.frameMs / 1_000);
this.packet = new Int16Array(this.samplesPerPacket);
this.pIndex = 0;
this.acc = 0;
this.seq = 0;
}
process(inputs) {
const input = inputs[0];
if (!input || !input[0]?.length) return true;
const ch = input[0];
for (let i = 0; i < ch.length; i++) {
this.acc += 1;
if (this.acc >= this.ratio) {
const s = Math.max(-1, Math.min(1, ch[i]));
this.packet[this.pIndex++] = s < 0 ? s * 32768 : s * 32767;
this.acc -= this.ratio;
if (this.pIndex === this.packet.length) {
this.port.postMessage(this.packet.buffer, [this.packet.buffer]);
this.packet = new Int16Array(this.samplesPerPacket);
this.pIndex = 0;
this.seq++;
}
}
}
return true;
}
}
registerProcessor("mic-processor", MicProcessor);Project structure:
your-application/
├── public/
│ ├── player-processor.js # Audio playback processor
│ ├── worklet-processor.js # Microphone input processor
│ └── index.html
├── src/
│ ├── aiui.config.json # AIUI configuration (REQUIRED)
│ └── App.tsx
└── package.jsonRequired: AIUI Configuration File
You must create an aiui.config.json file in your src/ directory. This file defines how the AIUI SDK connects to your backend and which actions are allowed on each page.
Create src/aiui.config.json:
{
"applicationId": "your-app-name",
"serverUrl": "http://localhost:8000",
"apiKey": "your-secret-key",
"pages": [
{
"route": "/",
"title": "Home Page",
"safeActions": ["click", "fill", "navigate"],
"dangerousActions": []
},
{
"route": "/dashboard",
"title": "Dashboard",
"safeActions": ["view", "filter", "export"],
"dangerousActions": ["delete"]
},
{
"route": "/users/:id/edit",
"title": "Edit User",
"safeActions": ["edit", "save"],
"dangerousActions": ["delete user", "deactivate"]
}
],
"safetyRules": {
"requireConfirmation": [
"delete user",
"delete item",
"cancel order"
],
"blockedSelectors": [
"input[type=\"password\"]",
"[data-sensitive=\"true\"]",
".admin-only"
],
"allowedDomains": [
"localhost",
"yourdomain.com"
]
},
"privacy": {
"exposePasswords": false,
"exposeCreditCards": false,
"redactPatterns": [
"[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}",
"\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b"
]
}
}Configuration Properties:
| Property | Required | Description |
|----------|----------|-------------|
| applicationId | ✅ | Unique identifier for your application |
| serverUrl | ✅ | Backend API endpoint (WebSocket server URL) |
| apiKey | ✅ | Authentication key for backend connection |
| pages | ✅ | Array of page routes with allowed actions |
| safetyRules | ❌ | Safety configurations for dangerous actions |
| privacy | ❌ | Privacy settings for sensitive data |
Page Configuration Details:
Each page in the pages array defines:
route: URL path (supports dynamic routes like/users/:id)title: Human-readable page namesafeActions: Actions users can perform via voice/chat (e.g., "click submit", "fill email")dangerousActions: Actions requiring user confirmation before execution
Load the configuration in your app:
import { AIUIProvider } from 'esap-aiui-react';
import aiuiConfig from './aiui.config.json';
function App() {
return (
<AIUIProvider config={aiuiConfig}>
<YourApplication />
</AIUIProvider>
);
}Why this file is required:
- ✅ Backend connection - SDK needs
serverUrlandapiKeyto connect - ✅ Route mapping - Defines which actions are available on each page
- ✅ Security - Prevents dangerous actions without confirmation
- ✅ Navigation - Enables voice commands like "navigate to dashboard"
Without this file, the SDK cannot connect to your backend or understand your application's structure.
Quick Start
Basic Integration
import { AIUIProvider, useAIUI } from 'esap-aiui-react';
import type { AIUIConfig } from 'esap-aiui-react';
const config: AIUIConfig = {
applicationId: 'production-app-v1',
serverUrl: 'wss://aiui.yourdomain.com',
apiKey: process.env.AIUI_API_KEY,
pages: [
{
route: '/',
title: 'Home',
safeActions: ['click', 'set_value'],
},
{
route: '/dashboard',
title: 'Dashboard',
safeActions: ['click', 'set_value', 'select_from_dropdown'],
dangerousActions: ['delete']
}
],
safetyRules: {
requireConfirmation: ['delete', 'submit_payment'],
blockedSelectors: ['.admin-only', '[data-sensitive]'],
allowedDomains: ['yourdomain.com']
},
privacy: {
exposePasswords: false,
exposeCreditCards: false,
redactPatterns: ['ssn', 'social-security']
}
};
function App() {
return (
<AIUIProvider config={config}>
<YourApplication />
</AIUIProvider>
);
}Voice Control Component
import { useAIUI } from 'esap-aiui-react';
function VoiceController() {
const {
isConnected,
isListening,
startListening,
stopListening
} = useAIUI();
return (
<div className="voice-control">
<div className="status">
{isConnected ? (
<span className="connected">Connected</span>
) : (
<span className="disconnected">Disconnected</span>
)}
</div>
<button
onClick={isListening ? stopListening : startListening}
disabled={!isConnected}
>
{isListening ? 'Stop Listening' : 'Start Voice Control'}
</button>
</div>
);
}Chat Interface Component
import { useAIUI } from 'esap-aiui-react';
import { useState } from 'react';
function ChatController() {
const {
isChatConnected,
chatMessages,
connectChat,
sendChatMessage
} = useAIUI();
const [input, setInput] = useState('');
const handleSend = async () => {
if (!input.trim()) return;
await sendChatMessage(input);
setInput('');
};
return (
<div className="chat-interface">
<div className="chat-header">
<span>AI Assistant</span>
<span className={isChatConnected ? 'connected' : 'disconnected'}>
{isChatConnected ? 'Connected' : 'Disconnected'}
</span>
{!isChatConnected && (
<button onClick={connectChat}>Connect Chat</button>
)}
</div>
<div className="chat-messages">
{chatMessages.map((msg, idx) => (
<div key={idx} className={`message ${msg.role}`}>
<div className="content">{msg.content}</div>
<div className="timestamp">
{new Date(msg.timestamp).toLocaleTimeString()}
</div>
</div>
))}
</div>
<div className="chat-input">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && handleSend()}
placeholder="Type a command or question..."
disabled={!isChatConnected}
/>
<button
onClick={handleSend}
disabled={!isChatConnected || !input.trim()}
>
Send
</button>
</div>
</div>
);
}Natural Language Interaction
Once integrated, users can control your application through either voice commands or text chat:
Voice Commands:
User: "Click the submit button"
→ SDK locates and clicks the submit button
User: "Fill the email field with [email protected]"
→ SDK identifies email input and sets its value
User: "Select Engineering and Design from the department dropdown"
→ SDK handles multi-select interaction
User: "Navigate to the dashboard page"
→ SDK triggers navigation to /dashboardChat Commands:
User types: "Click the submit button"
Assistant: "Clicking the submit button now."
→ SDK executes the action and confirms
User types: "What options are available in the status dropdown?"
Assistant: "The status dropdown has: Active, Pending, Completed, Archived"
→ SDK analyzes UI context and responds
User types: "Fill out the form with my default information"
Assistant: "I've filled in your name, email, and phone number."
→ SDK executes multiple form actions
User types: "Show me all the buttons on this page"
Assistant: "I found 5 buttons: Submit, Cancel, Save Draft, Delete, and Export"
→ SDK provides context awareness without actionConfiguration
AIUIConfig Interface
interface AIUIConfig {
applicationId: string; // Unique application identifier
serverUrl: string; // WebSocket server URL (wss://)
apiKey?: string; // Optional authentication key
pages: MinimalPageConfig[]; // Page-level configurations
safetyRules?: SafetyRules; // Security constraints
privacy?: PrivacyConfig; // Privacy settings
onNavigate?: (route: string) => void | Promise<void>; // Navigation handler
}Page Configuration
interface MinimalPageConfig {
route: string; // Page route pattern
title?: string; // Human-readable page title
safeActions?: string[]; // Permitted action types
dangerousActions?: string[]; // Actions requiring confirmation
}Example configuration:
{
route: '/users/:id/edit',
title: 'Edit User Profile',
safeActions: ['click', 'set_value', 'select_from_dropdown'],
dangerousActions: ['delete', 'deactivate_account']
}Safety Rules
interface SafetyRules {
requireConfirmation?: string[]; // Actions requiring user confirmation
blockedSelectors?: string[]; // CSS selectors to exclude from discovery
allowedDomains?: string[]; // Whitelist for external navigation
}Implementation example:
safetyRules: {
requireConfirmation: [
'delete',
'submit_payment',
'transfer_funds',
'deactivate_account'
],
blockedSelectors: [
'.admin-controls',
'[data-role="administrative"]',
'#danger-zone'
],
allowedDomains: [
'yourdomain.com',
'api.yourdomain.com',
'cdn.yourdomain.com'
]
}Privacy Configuration
interface PrivacyConfig {
redactPatterns?: string[]; // Custom patterns to filter from context
exposePasswords?: boolean; // Include password field values (default: false)
exposeCreditCards?: boolean; // Include credit card inputs (default: false)
}Privacy implementation:
privacy: {
exposePasswords: false,
exposeCreditCards: false,
redactPatterns: [
'ssn',
'social-security',
'tax-id',
'employee-id',
'patient-id'
]
}The SDK automatically filters sensitive information before transmission. Elements matching privacy patterns are labeled generically (e.g., "Password Input Field") without exposing their values.
API Reference
useAIUI Hook
The primary interface for interacting with the AIUI system.
interface AIUIContextValue {
// Connection state
isConnected: boolean; // Context channel connection status
isListening: boolean; // Microphone active status
isChatConnected: boolean; // Chat channel connection status
currentPage: string | null; // Current route
// Voice control methods
connect: () => void; // Establish context & audio channels
disconnect: () => void; // Close all connections
startListening: () => Promise<void>; // Start microphone capture
stopListening: () => void; // Stop microphone capture
// Chat interface methods
connectChat: () => void; // Establish chat channel
sendChatMessage: (message: string) => Promise<void>; // Send text message
chatMessages: ChatMessage[]; // Message history
// Programmatic control
executeAction: (action: string, params: any) => Promise<any>;
getComponentValue: (selector: string) => any;
registerComponent: (componentId: string, element: HTMLElement) => void;
unregisterComponent: (componentId: string) => void;
// Configuration
config: AIUIConfig;
}
interface ChatMessage {
role: 'user' | 'assistant'; // Message sender
content: string; // Message text
timestamp: string; // ISO 8601 timestamp
}Programmatic Action Execution
Execute UI actions programmatically without voice input:
import { useAIUI } from 'esap-aiui-react';
function DataTable() {
const { executeAction } = useAIUI();
const handleBulkDelete = async (itemIds: string[]) => {
for (const id of itemIds) {
try {
await executeAction('click', {
semantic: `delete button for item ${id}`
});
// Wait for confirmation dialog
await new Promise(resolve => setTimeout(resolve, 500));
await executeAction('click', {
semantic: 'confirm delete'
});
} catch (error) {
console.error(`Failed to delete item ${id}:`, error);
}
}
};
return (
<button onClick={() => handleBulkDelete(selectedIds)}>
Delete Selected
</button>
);
}Supported Actions
| Action | Target Elements | Parameters | Description |
|--------|----------------|------------|-------------|
| click | button, a, [role="button"] | { semantic: string } | Dispatches native click event |
| set_value | input, textarea | { semantic: string, value: string } | Sets input value with React compatibility |
| select_from_dropdown | select, custom dropdowns | { semantic: string, values: string[] } | Handles single/multi-select |
| toggle | input[type="checkbox"] | { semantic: string } | Toggles checkbox state |
| navigate | N/A | { route: string } | Triggers application navigation |
| get_value | input, textarea, select | { semantic: string } | Retrieves current element value |
Advanced Usage
Dual-Mode Interface (Voice + Chat)
Combine both voice and chat interfaces for flexible user interaction:
import { useAIUI } from 'esap-aiui-react';
import { useState } from 'react';
function AIAssistant() {
const {
isConnected,
isListening,
isChatConnected,
chatMessages,
startListening,
stopListening,
connectChat,
sendChatMessage
} = useAIUI();
const [input, setInput] = useState('');
const [mode, setMode] = useState<'voice' | 'chat'>('chat');
const handleSendChat = async () => {
if (!input.trim()) return;
await sendChatMessage(input);
setInput('');
};
return (
<div className="ai-assistant">
{/* Mode selector */}
<div className="mode-selector">
<button
className={mode === 'voice' ? 'active' : ''}
onClick={() => setMode('voice')}
>
Voice Mode
</button>
<button
className={mode === 'chat' ? 'active' : ''}
onClick={() => {
setMode('chat');
if (!isChatConnected) connectChat();
}}
>
Chat Mode
</button>
</div>
{/* Voice mode interface */}
{mode === 'voice' && (
<div className="voice-mode">
<div className="status">
{isConnected ? 'Connected' : 'Disconnected'}
</div>
<button
onClick={isListening ? stopListening : startListening}
disabled={!isConnected}
className={isListening ? 'listening' : ''}
>
{isListening ? 'Listening...' : 'Start Voice Control'}
</button>
<p className="hint">
Try: "Click the submit button" or "Fill email with [email protected]"
</p>
</div>
)}
{/* Chat mode interface */}
{mode === 'chat' && (
<div className="chat-mode">
<div className="chat-header">
<span>AI Assistant</span>
<span className={isChatConnected ? 'connected' : 'disconnected'}>
{isChatConnected ? 'Connected' : 'Disconnected'}
</span>
</div>
<div className="chat-messages">
{chatMessages.length === 0 ? (
<div className="welcome-message">
<p>Hi! I can help you navigate and control this application.</p>
<p>Try asking me to:</p>
<ul>
<li>Click buttons or links</li>
<li>Fill out forms</li>
<li>Select from dropdowns</li>
<li>Navigate to different pages</li>
<li>Get information about what's on the page</li>
</ul>
</div>
) : (
chatMessages.map((msg, idx) => (
<div key={idx} className={`message ${msg.role}`}>
<div className="avatar">
{msg.role === 'user' ? 'User' : 'Assistant'}
</div>
<div className="content">
<div className="text">{msg.content}</div>
<div className="timestamp">
{new Date(msg.timestamp).toLocaleTimeString()}
</div>
</div>
</div>
))
)}
</div>
<div className="chat-input">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && handleSendChat()}
placeholder="Type a command or question..."
disabled={!isChatConnected}
/>
<button
onClick={handleSendChat}
disabled={!isChatConnected || !input.trim()}
>
Send
</button>
</div>
</div>
)}
</div>
);
}Custom Multi-Select Components
The SDK automatically detects and handles complex multi-select implementations:
// Material-UI Autocomplete
<Autocomplete
multiple
options={categories}
renderInput={(params) => (
<TextField
{...params}
label="Categories"
placeholder="Select categories"
aria-label="Category Selection" // Used for semantic matching
/>
)}
/>
// Voice command: "Select Engineering and Design from categories"For custom implementations using data-select-field:
<div className="custom-multiselect">
<input
data-select-field="departments"
data-select-options="Engineering|||Marketing|||Sales|||Design"
placeholder="Select departments..."
aria-label="Department Selection"
/>
</div>
// Voice command: "Select Engineering, Marketing, and Design from departments"The data-select-options attribute defines available options using ||| as a delimiter.
Form Automation
<form className="user-registration">
<input
type="text"
name="fullName"
placeholder="Full Name"
aria-label="Full Name Input"
/>
<input
type="email"
name="email"
placeholder="Email Address"
aria-label="Email Address Input"
/>
<input
type="tel"
name="phone"
placeholder="Phone Number"
aria-label="Phone Number Input"
/>
<select name="country" aria-label="Country Selection">
<option value="">Select Country</option>
<option value="US">United States</option>
<option value="UK">United Kingdom</option>
<option value="CA">Canada</option>
</select>
<button type="submit">Create Account</button>
</form>Voice automation sequence:
1. "Set full name to John Smith"
2. "Fill email with [email protected]"
3. "Set phone number to 555-123-4567"
4. "Select United States from country"
5. "Click create account"Programmatic Navigation
import { useAIUI } from 'esap-aiui-react';
function NavigationHandler() {
const { executeAction } = useAIUI();
const navigateToCheckout = async () => {
await executeAction('navigate', {
route: '/checkout'
});
};
const performWorkflow = async () => {
// Navigate to products
await executeAction('navigate', { route: '/products' });
// Wait for page load
await new Promise(resolve => setTimeout(resolve, 1000));
// Add item to cart
await executeAction('click', { semantic: 'add to cart' });
// Navigate to cart
await executeAction('navigate', { route: '/cart' });
// Proceed to checkout
await executeAction('click', { semantic: 'checkout button' });
};
return (
<button onClick={performWorkflow}>
Quick Purchase Flow
</button>
);
}Server Implementation
The SDK communicates with a backend server implementing the AIUI protocol. The server processes voice input, interprets user intent through an LLM, and sends action commands back to the client.
Protocol Specification
Context Channel (/context)
Client → Server: UI State Update
{
"type": "context_update",
"context": {
"timestamp": "2025-02-02T10:30:00Z",
"page": {
"route": "/dashboard",
"title": "Analytics Dashboard"
},
"elements": [
{
"selector": "button:nth-of-type(1)",
"semantic": "Export Report Button",
"type": "button",
"actions": ["click"],
"attributes": {
"id": "export-btn",
"aria-label": "Export Report"
}
},
{
"selector": "input[name='dateRange']",
"semantic": "Date Range Input",
"type": "input",
"actions": ["set_value"],
"attributes": {
"placeholder": "Select date range",
"aria-label": "Date Range Selection"
}
}
],
"viewport": {
"width": 1920,
"height": 1080
}
},
"trigger": "navigation"
}Client → Server: Incremental Update
{
"type": "context_append",
"elements": [
{
"selector": "div.modal button:nth-of-type(1)",
"semantic": "Confirm Action",
"type": "button",
"actions": ["click"]
}
]
}Server → Client: Action Command
{
"type": "action",
"action": "click",
"params": {
"semantic": "Export Report Button"
},
"timestamp": "2025-02-02T10:30:05Z"
}Audio Channel (/audio)
Binary PCM Stream Format:
| Direction | Sample Rate | Bit Depth | Channels | Encoding | Frame Size | |-----------|-------------|-----------|----------|----------|------------| | Client → Server | 16 kHz | 16-bit | Mono | Int16 PCM | 20ms (320 samples) | | Server → Client | 24 kHz | 16-bit | Mono | Int16 PCM | Variable |
Chat Channel (/chat)
Client → Server: Text Message
{
"type": "chat_message",
"content": "Click the submit button",
"timestamp": "2025-02-02T10:30:00Z"
}Server → Client: AI Response
{
"type": "chat_message",
"role": "assistant",
"content": "I've clicked the submit button for you.",
"timestamp": "2025-02-02T10:30:02Z"
}Server → Client: Typing Indicator
{
"type": "chat_typing",
"typing": true
}Server → Client: Connection Status
{
"type": "chat_connected"
}{
"type": "chat_disconnected",
"reason": "Server shutdown"
}Production Deployment Considerations
Security:
- Implement API key validation on WebSocket connection
- Use WSS (WebSocket Secure) in production
- Rate limit context updates per client
- Sanitize all client-provided data before LLM processing
Performance:
- Cache UI context to minimize LLM token usage
- Implement connection pooling for concurrent clients
- Use streaming STT/TTS for reduced latency
- Deploy geographically distributed WebSocket servers
Reliability:
- Implement heartbeat/ping-pong for connection health
- Add automatic reconnection with exponential backoff
- Log all action executions for audit trail
- Monitor action success rates and latency metrics
Performance Characteristics
Benchmarks
Test environment: Chrome 120, Intel Core i7-12700H, 100 Mbps network, 30ms RTT
| Operation | Mean | Std Dev | P50 | P95 | P99 | |-----------|------|---------|-----|-----|-----| | Element Discovery (1000 elements) | 12ms | 3ms | 11ms | 18ms | 24ms | | Context Update (Full) | 45ms | 8ms | 43ms | 62ms | 78ms | | Context Update (Delta) | 18ms | 4ms | 17ms | 26ms | 31ms | | Semantic Match | 2ms | 0.5ms | 2ms | 3ms | 4ms | | Action Execution (click) | 15ms | 5ms | 14ms | 24ms | 32ms | | Voice → Action (End-to-End) | 440ms | 95ms | 380ms | 620ms | 780ms |
Optimization Guidelines
Minimize Discovery Overhead:
- Use
aria-labelattributes for explicit semantic labeling - Avoid deeply nested DOM structures where possible
- Limit dynamic DOM mutations during active voice sessions
Reduce Context Size:
- Configure
blockedSelectorsto exclude non-interactive regions - Use page-specific
safeActionsto filter action types - Implement privacy patterns to redact verbose text content
Improve Action Reliability:
- Ensure unique semantic labels for critical actions
- Use stable selectors (data attributes preferred over CSS classes)
- Add proper ARIA labels to custom components
Browser Compatibility
| Browser | Version | Status | Notes | |---------|---------|--------|-------| | Chrome | 66+ | ✓ Full Support | Recommended platform | | Edge | 79+ | ✓ Full Support | Chromium-based | | Firefox | 76+ | ✓ Full Support | | | Safari | 14.5+ | ✓ Full Support | Requires webkit prefix for AudioContext | | Mobile Chrome | 66+ | ✓ Full Support | Microphone permissions required | | Mobile Safari | 14.5+ | ✓ Full Support | Requires user gesture for AudioContext |
Minimum Requirements:
- ES2020 language features
- WebSocket API
- Web Audio API with AudioWorklet support
- MediaDevices getUserMedia API
- MutationObserver API
Troubleshooting
Common Issues
AudioWorklet Files Not Found
Error: Failed to load worklet-processor.jsSolution: Ensure player-processor.js and worklet-processor.js exist in the public/ directory and are accessible at /player-processor.js and /worklet-processor.js URLs.
WebSocket Connection Failed
WebSocket error: Connection refusedSolution: Verify the server is running and serverUrl uses the correct protocol (wss:// for production, ws:// for local development).
No Elements Discovered
Warning: 0 interactive elements foundSolution: Add aria-label attributes to interactive elements or verify elements match discovery selectors (button, input, select, a[href], [role="button"]).
Action Ambiguity
Found 3 matches for "Delete"Solution: Use index notation in voice commands ("Delete item number 2") or ensure unique semantic labels for elements with identical text.
Debug Mode
Enable verbose logging for development:
// Client-side (browser console)
localStorage.setItem('AIUI_DEBUG', 'true');
// Server-side (environment variable)
export LOG_LEVEL=DEBUG
node server.jsInspect Discovered Elements:
const { executeAction } = useAIUI();
// Retrieve internal context for debugging
await executeAction('get_value', { semantic: '_debug_context' });Monitor WebSocket Traffic:
- Open Chrome DevTools → Network tab
- Filter by "WS" (WebSocket)
- Select connection → Messages tab
- Inspect JSON payloads and binary frames
Code Standards
- TypeScript strict mode required
- ESLint configuration enforced
- Minimum 80% test coverage for new features
- Conventional Commits specification for commit messages
Roadmap
Version 1.2.0 (Q2 2026)
- Vue.js adapter
- Safari performance improvements
- Enhanced debugging tools with visual element highlighting
Version 2.0.0 (Q4 2026)
- Multimodal interaction (vision + voice)
- Context compression using learned embeddings
- Offline mode with service worker caching
Research Track (2027+)
- Adaptive discovery via reinforcement learning
- Visual grounding for spatial element disambiguation
- Federated learning for privacy-preserving model improvement
License
MIT License - see LICENSE file for details.
Copyright (c) 2025 AIUI Project Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Support
Community Support:
- GitHub Issues: Report bugs and request features
- GitHub Discussions: Community Q&A and ideas
- Stack Overflow: Tag questions with
aiui-react
Commercial Support:
- Enterprise integration assistance
- Custom feature development
- On-premise deployment support
- SLA-backed support contracts
Contact:
- Email: [email protected]
- Documentation: https://docs.espai.dev/aiui
- Website: https://espai.dev
Acknowledgments
AIUI builds upon foundational work from the open-source community:
- React Team — Context API and Hooks architecture
- W3C Web Audio Community Group — AudioWorklet specification
- ARIA Working Group — Accessibility semantic standards
- WebSocket Protocol (RFC 6455) — Real-time bidirectional communication
Special thanks to early adopters who provided production feedback and contributed to the framework's evolution.
Built with precision for voice-first web experiences.
