npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

esap-aiui-react

v1.0.5

Published

Zero-configuration voice and chat control for React applications through autonomous semantic UI discovery

Readme

AIUI React SDK

npm version npm downloads License: MIT TypeScript PRs Welcome

Zero-configuration voice and chat control for any React applications through autonomous semantic UI discovery.

AIUI React SDK enables natural language interaction with any React application through both voice commands and text chat, without manual UI annotation or intent mapping. The framework employs real-time DOM observation and semantic element discovery to automatically understand your application's interface, allowing users to control your app through conversational voice or text-based commands.

Enterprise features at a glance

  • Security: API-key protected WebSocket channels and safety-rule gates
  • Privacy: client-side redaction and sensitive-field filtering
  • Auditability: server-side action logs and context update tracing
  • Deployment: cloud, private VPC, or fully on-prem

Overview

Traditional voice control solutions require extensive manual configuration, predefined intent schemas, or explicit UI element annotation. AIUI eliminates this overhead through a novel semantic discovery architecture that automatically maps UI elements to their contextual meaning, enabling immediate voice interaction with zero setup.

Core Innovation

The SDK implements a hybrid discovery engine combining MutationObserver-based DOM monitoring with intelligent semantic labeling to achieve sub-500ms voice-to-action latency. An incremental context synchronization protocol reduces bandwidth consumption by 70% compared to full-state transmission while maintaining real-time UI awareness.

Key differentiators:

  • Zero-configuration deployment — Works with existing React applications without code modification
  • Framework-agnostic compatibility — Supports Material-UI, Ant Design, Chakra UI, and native HTML
  • Semantic element discovery — Automatic identification of interactive elements via ARIA labels and heuristic analysis
  • Privacy-preserving architecture — Client-side filtering with configurable redaction patterns
  • Multi-backend AI support — Compatible with OpenAI GPT-4, Anthropic Claude, Google Gemini, and local models

Architecture

AIUI Architecture

Protocol Design

The SDK implements a multi-channel WebSocket architecture to optimize for both latency and bandwidth:

| Channel | Transport | Purpose | Update Frequency | Latency Requirement | |---------|-----------|---------|------------------|---------------------| | /context | JSON over WebSocket | UI state synchronization | Event-driven (~1/sec) | Non-critical | | /audio | Binary PCM over WebSocket | Voice I/O streams | Continuous (16kHz) | <500ms critical | | /chat | JSON over WebSocket | Text-based messaging | On-demand | <200ms preferred |

This separation prevents JSON parsing overhead from blocking time-sensitive audio transmission while enabling efficient differential UI updates and real-time text chat.

Installation

npm install esap-aiui-react

Prerequisites

Required browser APIs:

  • AudioWorklet API (Chrome 66+, Firefox 76+, Safari 14.5+)
  • WebSocket API
  • MediaDevices API (microphone access)

Required static assets:

The SDK requires two AudioWorklet processor files in your public directory for audio I/O:

1. Create public/player-processor.js

class PlayerProcessor extends AudioWorkletProcessor {
  constructor() {
    super();
    this.queue = [];
    this.offset = 0;
    this.port.onmessage = e => this.queue.push(e.data);
  }

  process(_, outputs) {
    const out = outputs[0][0];
    let idx = 0;

    while (idx < out.length) {
      if (!this.queue.length) {
        out.fill(0, idx);
        break;
      }
      const buf = this.queue[0];
      const copy = Math.min(buf.length - this.offset, out.length - idx);
      out.set(buf.subarray(this.offset, this.offset + copy), idx);

      idx += copy;
      this.offset += copy;

      if (this.offset >= buf.length) {
        this.queue.shift();
        this.offset = 0;
      }
    }
    return true;
  }
}

registerProcessor('player-processor', PlayerProcessor);

2. Create public/worklet-processor.js

class MicProcessor extends AudioWorkletProcessor {
  constructor() {
    super();
    this.dstRate = 16_000;
    this.frameMs = 20;
    this.srcRate = sampleRate;
    this.ratio = this.srcRate / this.dstRate;
    this.samplesPerPacket = Math.round(this.dstRate * this.frameMs / 1_000);
    this.packet = new Int16Array(this.samplesPerPacket);
    this.pIndex = 0;
    this.acc = 0;
    this.seq = 0;
  }

  process(inputs) {
    const input = inputs[0];
    if (!input || !input[0]?.length) return true;

    const ch = input[0];
    for (let i = 0; i < ch.length; i++) {
      this.acc += 1;
      if (this.acc >= this.ratio) {
        const s = Math.max(-1, Math.min(1, ch[i]));
        this.packet[this.pIndex++] = s < 0 ? s * 32768 : s * 32767;
        this.acc -= this.ratio;

        if (this.pIndex === this.packet.length) {
          this.port.postMessage(this.packet.buffer, [this.packet.buffer]);
          this.packet = new Int16Array(this.samplesPerPacket);
          this.pIndex = 0;
          this.seq++;
        }
      }
    }
    return true;
  }
}

registerProcessor("mic-processor", MicProcessor);

Project structure:

your-application/
├── public/
│   ├── player-processor.js    # Audio playback processor
│   ├── worklet-processor.js   # Microphone input processor
│   └── index.html
├── src/
│   ├── aiui.config.json       # AIUI configuration (REQUIRED)
│   └── App.tsx
└── package.json

Required: AIUI Configuration File

You must create an aiui.config.json file in your src/ directory. This file defines how the AIUI SDK connects to your backend and which actions are allowed on each page.

Create src/aiui.config.json:

{
  "applicationId": "your-app-name",
  "serverUrl": "http://localhost:8000",
  "apiKey": "your-secret-key",
  
  "pages": [
    {
      "route": "/",
      "title": "Home Page",
      "safeActions": ["click", "fill", "navigate"],
      "dangerousActions": []
    },
    {
      "route": "/dashboard",
      "title": "Dashboard",
      "safeActions": ["view", "filter", "export"],
      "dangerousActions": ["delete"]
    },
    {
      "route": "/users/:id/edit",
      "title": "Edit User",
      "safeActions": ["edit", "save"],
      "dangerousActions": ["delete user", "deactivate"]
    }
  ],
  
  "safetyRules": {
    "requireConfirmation": [
      "delete user",
      "delete item",
      "cancel order"
    ],
    "blockedSelectors": [
      "input[type=\"password\"]",
      "[data-sensitive=\"true\"]",
      ".admin-only"
    ],
    "allowedDomains": [
      "localhost",
      "yourdomain.com"
    ]
  },
  
  "privacy": {
    "exposePasswords": false,
    "exposeCreditCards": false,
    "redactPatterns": [
      "[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}",
      "\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b"
    ]
  }
}

Configuration Properties:

| Property | Required | Description | |----------|----------|-------------| | applicationId | ✅ | Unique identifier for your application | | serverUrl | ✅ | Backend API endpoint (WebSocket server URL) | | apiKey | ✅ | Authentication key for backend connection | | pages | ✅ | Array of page routes with allowed actions | | safetyRules | ❌ | Safety configurations for dangerous actions | | privacy | ❌ | Privacy settings for sensitive data |

Page Configuration Details:

Each page in the pages array defines:

  • route: URL path (supports dynamic routes like /users/:id)
  • title: Human-readable page name
  • safeActions: Actions users can perform via voice/chat (e.g., "click submit", "fill email")
  • dangerousActions: Actions requiring user confirmation before execution

Load the configuration in your app:

import { AIUIProvider } from 'esap-aiui-react';
import aiuiConfig from './aiui.config.json';

function App() {
  return (
    <AIUIProvider config={aiuiConfig}>
      <YourApplication />
    </AIUIProvider>
  );
}

Why this file is required:

  • ✅ Backend connection - SDK needs serverUrl and apiKey to connect
  • ✅ Route mapping - Defines which actions are available on each page
  • ✅ Security - Prevents dangerous actions without confirmation
  • ✅ Navigation - Enables voice commands like "navigate to dashboard"

Without this file, the SDK cannot connect to your backend or understand your application's structure.

Quick Start

Basic Integration

import { AIUIProvider, useAIUI } from 'esap-aiui-react';
import type { AIUIConfig } from 'esap-aiui-react';

const config: AIUIConfig = {
  applicationId: 'production-app-v1',
  serverUrl: 'wss://aiui.yourdomain.com',
  apiKey: process.env.AIUI_API_KEY,
  pages: [
    {
      route: '/',
      title: 'Home',
      safeActions: ['click', 'set_value'],
    },
    {
      route: '/dashboard',
      title: 'Dashboard',
      safeActions: ['click', 'set_value', 'select_from_dropdown'],
      dangerousActions: ['delete']
    }
  ],
  safetyRules: {
    requireConfirmation: ['delete', 'submit_payment'],
    blockedSelectors: ['.admin-only', '[data-sensitive]'],
    allowedDomains: ['yourdomain.com']
  },
  privacy: {
    exposePasswords: false,
    exposeCreditCards: false,
    redactPatterns: ['ssn', 'social-security']
  }
};

function App() {
  return (
    <AIUIProvider config={config}>
      <YourApplication />
    </AIUIProvider>
  );
}

Voice Control Component

import { useAIUI } from 'esap-aiui-react';

function VoiceController() {
  const { 
    isConnected, 
    isListening, 
    startListening, 
    stopListening 
  } = useAIUI();

  return (
    <div className="voice-control">
      <div className="status">
        {isConnected ? (
          <span className="connected">Connected</span>
        ) : (
          <span className="disconnected">Disconnected</span>
        )}
      </div>
      
      <button 
        onClick={isListening ? stopListening : startListening}
        disabled={!isConnected}
      >
        {isListening ? 'Stop Listening' : 'Start Voice Control'}
      </button>
    </div>
  );
}

Chat Interface Component

import { useAIUI } from 'esap-aiui-react';
import { useState } from 'react';

function ChatController() {
  const { 
    isChatConnected,
    chatMessages,
    connectChat,
    sendChatMessage 
  } = useAIUI();
  
  const [input, setInput] = useState('');

  const handleSend = async () => {
    if (!input.trim()) return;
    
    await sendChatMessage(input);
    setInput('');
  };

  return (
    <div className="chat-interface">
      <div className="chat-header">
        <span>AI Assistant</span>
        <span className={isChatConnected ? 'connected' : 'disconnected'}>
          {isChatConnected ? 'Connected' : 'Disconnected'}
        </span>
        {!isChatConnected && (
          <button onClick={connectChat}>Connect Chat</button>
        )}
      </div>
      
      <div className="chat-messages">
        {chatMessages.map((msg, idx) => (
          <div key={idx} className={`message ${msg.role}`}>
            <div className="content">{msg.content}</div>
            <div className="timestamp">
              {new Date(msg.timestamp).toLocaleTimeString()}
            </div>
          </div>
        ))}
      </div>
      
      <div className="chat-input">
        <input
          type="text"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyPress={(e) => e.key === 'Enter' && handleSend()}
          placeholder="Type a command or question..."
          disabled={!isChatConnected}
        />
        <button 
          onClick={handleSend}
          disabled={!isChatConnected || !input.trim()}
        >
          Send
        </button>
      </div>
    </div>
  );
}

Natural Language Interaction

Once integrated, users can control your application through either voice commands or text chat:

Voice Commands:

User: "Click the submit button"
→ SDK locates and clicks the submit button

User: "Fill the email field with [email protected]"
→ SDK identifies email input and sets its value

User: "Select Engineering and Design from the department dropdown"
→ SDK handles multi-select interaction

User: "Navigate to the dashboard page"
→ SDK triggers navigation to /dashboard

Chat Commands:

User types: "Click the submit button"
Assistant: "Clicking the submit button now."
→ SDK executes the action and confirms

User types: "What options are available in the status dropdown?"
Assistant: "The status dropdown has: Active, Pending, Completed, Archived"
→ SDK analyzes UI context and responds

User types: "Fill out the form with my default information"
Assistant: "I've filled in your name, email, and phone number."
→ SDK executes multiple form actions

User types: "Show me all the buttons on this page"
Assistant: "I found 5 buttons: Submit, Cancel, Save Draft, Delete, and Export"
→ SDK provides context awareness without action

Configuration

AIUIConfig Interface

interface AIUIConfig {
  applicationId: string;           // Unique application identifier
  serverUrl: string;                // WebSocket server URL (wss://)
  apiKey?: string;                  // Optional authentication key
  pages: MinimalPageConfig[];       // Page-level configurations
  safetyRules?: SafetyRules;        // Security constraints
  privacy?: PrivacyConfig;          // Privacy settings
  onNavigate?: (route: string) => void | Promise<void>;  // Navigation handler
}

Page Configuration

interface MinimalPageConfig {
  route: string;                    // Page route pattern
  title?: string;                   // Human-readable page title
  safeActions?: string[];           // Permitted action types
  dangerousActions?: string[];      // Actions requiring confirmation
}

Example configuration:

{
  route: '/users/:id/edit',
  title: 'Edit User Profile',
  safeActions: ['click', 'set_value', 'select_from_dropdown'],
  dangerousActions: ['delete', 'deactivate_account']
}

Safety Rules

interface SafetyRules {
  requireConfirmation?: string[];   // Actions requiring user confirmation
  blockedSelectors?: string[];      // CSS selectors to exclude from discovery
  allowedDomains?: string[];        // Whitelist for external navigation
}

Implementation example:

safetyRules: {
  requireConfirmation: [
    'delete',
    'submit_payment',
    'transfer_funds',
    'deactivate_account'
  ],
  blockedSelectors: [
    '.admin-controls',
    '[data-role="administrative"]',
    '#danger-zone'
  ],
  allowedDomains: [
    'yourdomain.com',
    'api.yourdomain.com',
    'cdn.yourdomain.com'
  ]
}

Privacy Configuration

interface PrivacyConfig {
  redactPatterns?: string[];        // Custom patterns to filter from context
  exposePasswords?: boolean;        // Include password field values (default: false)
  exposeCreditCards?: boolean;      // Include credit card inputs (default: false)
}

Privacy implementation:

privacy: {
  exposePasswords: false,
  exposeCreditCards: false,
  redactPatterns: [
    'ssn',
    'social-security',
    'tax-id',
    'employee-id',
    'patient-id'
  ]
}

The SDK automatically filters sensitive information before transmission. Elements matching privacy patterns are labeled generically (e.g., "Password Input Field") without exposing their values.

API Reference

useAIUI Hook

The primary interface for interacting with the AIUI system.

interface AIUIContextValue {
  // Connection state
  isConnected: boolean;              // Context channel connection status
  isListening: boolean;              // Microphone active status
  isChatConnected: boolean;          // Chat channel connection status
  currentPage: string | null;        // Current route
  
  // Voice control methods
  connect: () => void;               // Establish context & audio channels
  disconnect: () => void;            // Close all connections
  startListening: () => Promise<void>;  // Start microphone capture
  stopListening: () => void;         // Stop microphone capture
  
  // Chat interface methods
  connectChat: () => void;           // Establish chat channel
  sendChatMessage: (message: string) => Promise<void>;  // Send text message
  chatMessages: ChatMessage[];       // Message history
  
  // Programmatic control
  executeAction: (action: string, params: any) => Promise<any>;
  getComponentValue: (selector: string) => any;
  registerComponent: (componentId: string, element: HTMLElement) => void;
  unregisterComponent: (componentId: string) => void;
  
  // Configuration
  config: AIUIConfig;
}

interface ChatMessage {
  role: 'user' | 'assistant';        // Message sender
  content: string;                   // Message text
  timestamp: string;                 // ISO 8601 timestamp
}

Programmatic Action Execution

Execute UI actions programmatically without voice input:

import { useAIUI } from 'esap-aiui-react';

function DataTable() {
  const { executeAction } = useAIUI();

  const handleBulkDelete = async (itemIds: string[]) => {
    for (const id of itemIds) {
      try {
        await executeAction('click', {
          semantic: `delete button for item ${id}`
        });
        
        // Wait for confirmation dialog
        await new Promise(resolve => setTimeout(resolve, 500));
        
        await executeAction('click', {
          semantic: 'confirm delete'
        });
      } catch (error) {
        console.error(`Failed to delete item ${id}:`, error);
      }
    }
  };

  return (
    <button onClick={() => handleBulkDelete(selectedIds)}>
      Delete Selected
    </button>
  );
}

Supported Actions

| Action | Target Elements | Parameters | Description | |--------|----------------|------------|-------------| | click | button, a, [role="button"] | { semantic: string } | Dispatches native click event | | set_value | input, textarea | { semantic: string, value: string } | Sets input value with React compatibility | | select_from_dropdown | select, custom dropdowns | { semantic: string, values: string[] } | Handles single/multi-select | | toggle | input[type="checkbox"] | { semantic: string } | Toggles checkbox state | | navigate | N/A | { route: string } | Triggers application navigation | | get_value | input, textarea, select | { semantic: string } | Retrieves current element value |

Advanced Usage

Dual-Mode Interface (Voice + Chat)

Combine both voice and chat interfaces for flexible user interaction:

import { useAIUI } from 'esap-aiui-react';
import { useState } from 'react';

function AIAssistant() {
  const {
    isConnected,
    isListening,
    isChatConnected,
    chatMessages,
    startListening,
    stopListening,
    connectChat,
    sendChatMessage
  } = useAIUI();
  
  const [input, setInput] = useState('');
  const [mode, setMode] = useState<'voice' | 'chat'>('chat');

  const handleSendChat = async () => {
    if (!input.trim()) return;
    await sendChatMessage(input);
    setInput('');
  };

  return (
    <div className="ai-assistant">
      {/* Mode selector */}
      <div className="mode-selector">
        <button 
          className={mode === 'voice' ? 'active' : ''}
          onClick={() => setMode('voice')}
        >
          Voice Mode
        </button>
        <button 
          className={mode === 'chat' ? 'active' : ''}
          onClick={() => {
            setMode('chat');
            if (!isChatConnected) connectChat();
          }}
        >
          Chat Mode
        </button>
      </div>

      {/* Voice mode interface */}
      {mode === 'voice' && (
        <div className="voice-mode">
          <div className="status">
            {isConnected ? 'Connected' : 'Disconnected'}
          </div>
          <button 
            onClick={isListening ? stopListening : startListening}
            disabled={!isConnected}
            className={isListening ? 'listening' : ''}
          >
            {isListening ? 'Listening...' : 'Start Voice Control'}
          </button>
          <p className="hint">
            Try: "Click the submit button" or "Fill email with [email protected]"
          </p>
        </div>
      )}

      {/* Chat mode interface */}
      {mode === 'chat' && (
        <div className="chat-mode">
          <div className="chat-header">
            <span>AI Assistant</span>
            <span className={isChatConnected ? 'connected' : 'disconnected'}>
              {isChatConnected ? 'Connected' : 'Disconnected'}
            </span>
          </div>
          
          <div className="chat-messages">
            {chatMessages.length === 0 ? (
              <div className="welcome-message">
                <p>Hi! I can help you navigate and control this application.</p>
                <p>Try asking me to:</p>
                <ul>
                  <li>Click buttons or links</li>
                  <li>Fill out forms</li>
                  <li>Select from dropdowns</li>
                  <li>Navigate to different pages</li>
                  <li>Get information about what's on the page</li>
                </ul>
              </div>
            ) : (
              chatMessages.map((msg, idx) => (
                <div key={idx} className={`message ${msg.role}`}>
                  <div className="avatar">
                    {msg.role === 'user' ? 'User' : 'Assistant'}
                  </div>
                  <div className="content">
                    <div className="text">{msg.content}</div>
                    <div className="timestamp">
                      {new Date(msg.timestamp).toLocaleTimeString()}
                    </div>
                  </div>
                </div>
              ))
            )}
          </div>
          
          <div className="chat-input">
            <input
              type="text"
              value={input}
              onChange={(e) => setInput(e.target.value)}
              onKeyPress={(e) => e.key === 'Enter' && handleSendChat()}
              placeholder="Type a command or question..."
              disabled={!isChatConnected}
            />
            <button 
              onClick={handleSendChat}
              disabled={!isChatConnected || !input.trim()}
            >
              Send
            </button>
          </div>
        </div>
      )}
    </div>
  );
}

Custom Multi-Select Components

The SDK automatically detects and handles complex multi-select implementations:

// Material-UI Autocomplete
<Autocomplete
  multiple
  options={categories}
  renderInput={(params) => (
    <TextField
      {...params}
      label="Categories"
      placeholder="Select categories"
      aria-label="Category Selection"  // Used for semantic matching
    />
  )}
/>

// Voice command: "Select Engineering and Design from categories"

For custom implementations using data-select-field:

<div className="custom-multiselect">
  <input
    data-select-field="departments"
    data-select-options="Engineering|||Marketing|||Sales|||Design"
    placeholder="Select departments..."
    aria-label="Department Selection"
  />
</div>

// Voice command: "Select Engineering, Marketing, and Design from departments"

The data-select-options attribute defines available options using ||| as a delimiter.

Form Automation

<form className="user-registration">
  <input
    type="text"
    name="fullName"
    placeholder="Full Name"
    aria-label="Full Name Input"
  />
  
  <input
    type="email"
    name="email"
    placeholder="Email Address"
    aria-label="Email Address Input"
  />
  
  <input
    type="tel"
    name="phone"
    placeholder="Phone Number"
    aria-label="Phone Number Input"
  />
  
  <select name="country" aria-label="Country Selection">
    <option value="">Select Country</option>
    <option value="US">United States</option>
    <option value="UK">United Kingdom</option>
    <option value="CA">Canada</option>
  </select>
  
  <button type="submit">Create Account</button>
</form>

Voice automation sequence:

1. "Set full name to John Smith"
2. "Fill email with [email protected]"
3. "Set phone number to 555-123-4567"
4. "Select United States from country"
5. "Click create account"

Programmatic Navigation

import { useAIUI } from 'esap-aiui-react';

function NavigationHandler() {
  const { executeAction } = useAIUI();

  const navigateToCheckout = async () => {
    await executeAction('navigate', {
      route: '/checkout'
    });
  };

  const performWorkflow = async () => {
    // Navigate to products
    await executeAction('navigate', { route: '/products' });
    
    // Wait for page load
    await new Promise(resolve => setTimeout(resolve, 1000));
    
    // Add item to cart
    await executeAction('click', { semantic: 'add to cart' });
    
    // Navigate to cart
    await executeAction('navigate', { route: '/cart' });
    
    // Proceed to checkout
    await executeAction('click', { semantic: 'checkout button' });
  };

  return (
    <button onClick={performWorkflow}>
      Quick Purchase Flow
    </button>
  );
}

Server Implementation

The SDK communicates with a backend server implementing the AIUI protocol. The server processes voice input, interprets user intent through an LLM, and sends action commands back to the client.

Protocol Specification

Context Channel (/context)

Client → Server: UI State Update

{
  "type": "context_update",
  "context": {
    "timestamp": "2025-02-02T10:30:00Z",
    "page": {
      "route": "/dashboard",
      "title": "Analytics Dashboard"
    },
    "elements": [
      {
        "selector": "button:nth-of-type(1)",
        "semantic": "Export Report Button",
        "type": "button",
        "actions": ["click"],
        "attributes": {
          "id": "export-btn",
          "aria-label": "Export Report"
        }
      },
      {
        "selector": "input[name='dateRange']",
        "semantic": "Date Range Input",
        "type": "input",
        "actions": ["set_value"],
        "attributes": {
          "placeholder": "Select date range",
          "aria-label": "Date Range Selection"
        }
      }
    ],
    "viewport": {
      "width": 1920,
      "height": 1080
    }
  },
  "trigger": "navigation"
}

Client → Server: Incremental Update

{
  "type": "context_append",
  "elements": [
    {
      "selector": "div.modal button:nth-of-type(1)",
      "semantic": "Confirm Action",
      "type": "button",
      "actions": ["click"]
    }
  ]
}

Server → Client: Action Command

{
  "type": "action",
  "action": "click",
  "params": {
    "semantic": "Export Report Button"
  },
  "timestamp": "2025-02-02T10:30:05Z"
}

Audio Channel (/audio)

Binary PCM Stream Format:

| Direction | Sample Rate | Bit Depth | Channels | Encoding | Frame Size | |-----------|-------------|-----------|----------|----------|------------| | Client → Server | 16 kHz | 16-bit | Mono | Int16 PCM | 20ms (320 samples) | | Server → Client | 24 kHz | 16-bit | Mono | Int16 PCM | Variable |

Chat Channel (/chat)

Client → Server: Text Message

{
  "type": "chat_message",
  "content": "Click the submit button",
  "timestamp": "2025-02-02T10:30:00Z"
}

Server → Client: AI Response

{
  "type": "chat_message",
  "role": "assistant",
  "content": "I've clicked the submit button for you.",
  "timestamp": "2025-02-02T10:30:02Z"
}

Server → Client: Typing Indicator

{
  "type": "chat_typing",
  "typing": true
}

Server → Client: Connection Status

{
  "type": "chat_connected"
}
{
  "type": "chat_disconnected",
  "reason": "Server shutdown"
}

Production Deployment Considerations

Security:

  • Implement API key validation on WebSocket connection
  • Use WSS (WebSocket Secure) in production
  • Rate limit context updates per client
  • Sanitize all client-provided data before LLM processing

Performance:

  • Cache UI context to minimize LLM token usage
  • Implement connection pooling for concurrent clients
  • Use streaming STT/TTS for reduced latency
  • Deploy geographically distributed WebSocket servers

Reliability:

  • Implement heartbeat/ping-pong for connection health
  • Add automatic reconnection with exponential backoff
  • Log all action executions for audit trail
  • Monitor action success rates and latency metrics

Performance Characteristics

Benchmarks

Test environment: Chrome 120, Intel Core i7-12700H, 100 Mbps network, 30ms RTT

| Operation | Mean | Std Dev | P50 | P95 | P99 | |-----------|------|---------|-----|-----|-----| | Element Discovery (1000 elements) | 12ms | 3ms | 11ms | 18ms | 24ms | | Context Update (Full) | 45ms | 8ms | 43ms | 62ms | 78ms | | Context Update (Delta) | 18ms | 4ms | 17ms | 26ms | 31ms | | Semantic Match | 2ms | 0.5ms | 2ms | 3ms | 4ms | | Action Execution (click) | 15ms | 5ms | 14ms | 24ms | 32ms | | Voice → Action (End-to-End) | 440ms | 95ms | 380ms | 620ms | 780ms |

Optimization Guidelines

Minimize Discovery Overhead:

  • Use aria-label attributes for explicit semantic labeling
  • Avoid deeply nested DOM structures where possible
  • Limit dynamic DOM mutations during active voice sessions

Reduce Context Size:

  • Configure blockedSelectors to exclude non-interactive regions
  • Use page-specific safeActions to filter action types
  • Implement privacy patterns to redact verbose text content

Improve Action Reliability:

  • Ensure unique semantic labels for critical actions
  • Use stable selectors (data attributes preferred over CSS classes)
  • Add proper ARIA labels to custom components

Browser Compatibility

| Browser | Version | Status | Notes | |---------|---------|--------|-------| | Chrome | 66+ | ✓ Full Support | Recommended platform | | Edge | 79+ | ✓ Full Support | Chromium-based | | Firefox | 76+ | ✓ Full Support | | | Safari | 14.5+ | ✓ Full Support | Requires webkit prefix for AudioContext | | Mobile Chrome | 66+ | ✓ Full Support | Microphone permissions required | | Mobile Safari | 14.5+ | ✓ Full Support | Requires user gesture for AudioContext |

Minimum Requirements:

  • ES2020 language features
  • WebSocket API
  • Web Audio API with AudioWorklet support
  • MediaDevices getUserMedia API
  • MutationObserver API

Troubleshooting

Common Issues

AudioWorklet Files Not Found

Error: Failed to load worklet-processor.js

Solution: Ensure player-processor.js and worklet-processor.js exist in the public/ directory and are accessible at /player-processor.js and /worklet-processor.js URLs.

WebSocket Connection Failed

WebSocket error: Connection refused

Solution: Verify the server is running and serverUrl uses the correct protocol (wss:// for production, ws:// for local development).

No Elements Discovered

Warning: 0 interactive elements found

Solution: Add aria-label attributes to interactive elements or verify elements match discovery selectors (button, input, select, a[href], [role="button"]).

Action Ambiguity

Found 3 matches for "Delete"

Solution: Use index notation in voice commands ("Delete item number 2") or ensure unique semantic labels for elements with identical text.

Debug Mode

Enable verbose logging for development:

// Client-side (browser console)
localStorage.setItem('AIUI_DEBUG', 'true');

// Server-side (environment variable)
export LOG_LEVEL=DEBUG
node server.js

Inspect Discovered Elements:

const { executeAction } = useAIUI();

// Retrieve internal context for debugging
await executeAction('get_value', { semantic: '_debug_context' });

Monitor WebSocket Traffic:

  1. Open Chrome DevTools → Network tab
  2. Filter by "WS" (WebSocket)
  3. Select connection → Messages tab
  4. Inspect JSON payloads and binary frames

Code Standards

  • TypeScript strict mode required
  • ESLint configuration enforced
  • Minimum 80% test coverage for new features
  • Conventional Commits specification for commit messages

Roadmap

Version 1.2.0 (Q2 2026)

  • Vue.js adapter
  • Safari performance improvements
  • Enhanced debugging tools with visual element highlighting

Version 2.0.0 (Q4 2026)

  • Multimodal interaction (vision + voice)
  • Context compression using learned embeddings
  • Offline mode with service worker caching

Research Track (2027+)

  • Adaptive discovery via reinforcement learning
  • Visual grounding for spatial element disambiguation
  • Federated learning for privacy-preserving model improvement

License

MIT License - see LICENSE file for details.

Copyright (c) 2025 AIUI Project Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Support

Community Support:

Commercial Support:

  • Enterprise integration assistance
  • Custom feature development
  • On-premise deployment support
  • SLA-backed support contracts

Contact:

  • Email: [email protected]
  • Documentation: https://docs.espai.dev/aiui
  • Website: https://espai.dev

Acknowledgments

AIUI builds upon foundational work from the open-source community:

  • React Team — Context API and Hooks architecture
  • W3C Web Audio Community Group — AudioWorklet specification
  • ARIA Working Group — Accessibility semantic standards
  • WebSocket Protocol (RFC 6455) — Real-time bidirectional communication

Special thanks to early adopters who provided production feedback and contributed to the framework's evolution.


Built with precision for voice-first web experiences.