voicegis

v2.0.0-alpha.6

Published

5 days ago

Voice-Enabled Geospatial Map Web Application — offline-first voice command interface for Leaflet and OpenLayers

0High
0Medium
0Low

voicegis voice gis leaflet openlayers speech whisper tensorflow maps wms geospatial offline pwa

VoiceGIS is a JavaScript library that adds powerful voice-control capabilities to web-based maps (Leaflet and OpenLayers). It uses a hybrid architecture with multiple deployment profiles—ranging from fast online cloud STT, to robust on-device AI (Whisper), down to constrained offline edge keyword spotting.

🚀 Why Hybrid? (Reality Check)

Browser-based speech recognition is a game of tradeoffs between bandwidth, accuracy, and latency. VoiceGIS embraces this reality:

Web Speech API (Cloud): Convenient, instant, and highly accurate. However, it is often cloud‑backed and not fully under app control (audio may leave the device). Furthermore, it is only natively supported in some browsers (e.g., Chrome/Edge) and not fully supported in Firefox/Safari. It typically does not work offline.
Browser-Only Whisper (On-Device): Provides excellent offline accuracy and privacy, but involves downloading a large model (on the order of a few-dozen MB for tiny.en, and 2-3x larger for base.en) and is compute-heavy. It is best used as an advanced or offline-fallback option.
TF.js Command Mode (Edge KWS): Tiny, instantaneous keyword spotting engines are what real edge/embedded systems use for always-on commands. While limited to a fixed vocabulary, it operates flawlessly offline on constrained devices without massive downloads.

VoiceGIS's default auto strategy seamlessly routes between these engines depending on the user's connection and browser capabilities.

📦 Engine Profiles

| Profile | Engine Layer | Best For | Tradeoffs | | --- | --- | --- | --- | | Auto (Hybrid) | VoiceGIS orchestrator | Consumer web apps | Switches dynamically between Cloud and Whisper based on network. | | Online Cloud | WebSpeechEngine | Fast, low-latency UX | Requires internet; audio may be sent to third-party cloud. | | Offline Advanced | WhisperEngine | Privacy, offline fields | Requires downloading 40MB+ model; high CPU usage. | | Offline Command | TfjsEngine | Constrained devices | Only understands a small, fixed set of GIS commands. | | Private Server | WhisperServerEngine | Enterprise / Secure Intranets | Requires hosting your own Whisper API backend. |

📦 Installation

npm install voicegis

(Note: leaflet or ol are peer dependencies. Install the one you plan to use.)

⏱️ Quickstart

1. Auto Mode (Recommended)

import { VoiceGIS } from 'voicegis';

const app = new VoiceGIS({
  mapEngine: 'leaflet',
  mapContainerId: 'map',
  speechEngine: 'auto', // Intelligently routes between WebSpeech and Whisper
  autoExecute: true
});

await app.initSpeech();
app.start();

2. Private Server Mode

If you are hosting your own Whisper endpoint (e.g. whisper.cpp server):

import { VoiceGIS } from 'voicegis';

const app = new VoiceGIS({
  speechEngine: 'server',
  autoExecute: true
});

// Configure the backend API URL
app.speech.apiUrl = 'http://localhost:8000/transcribe';

await app.initSpeech();
app.start();

3. Explicit Offline Command Mode

import { VoiceGIS } from 'voicegis';

const app = new VoiceGIS({
  speechEngine: 'tfjs', // Forces the tiny edge keyword spotter
  autoExecute: true
});

await app.initSpeech();
app.start();

🔌 Public API Reference

The core of the library is the VoiceGIS orchestrator class.

`new VoiceGIS(options)`

| Option | Type | Default | Description | | --- | --- | --- | --- | | mapEngine | 'leaflet' | 'openlayers' | 'leaflet' | The underlying map library you are using. | | mapContainerId | string | undefined | DOM ID of the map container (if provided, map is auto-initialized). | | speechEngine | 'auto' | 'webspeech' | 'whisper' | 'tfjs' | 'auto' | The STT engine strategy to use. | | autoExecute | boolean | true | Whether to automatically run parsed intents on the map controller. | | enableGeocoding| boolean | true | Whether to use Nominatim for resolving unknown place names offline. | | onCommandParsed| function | undefined | Callback (result, rawText) triggered when a voice command is understood. | | onStateChange | function | undefined | Callback (state) where state is 'listening', 'idle', or 'error'. | | onEngineSwitched| function | undefined | Callback (engineType) triggered when the hybrid router switches engines. |

Methods

initSpeech(): Instantiates and warms up the selected speech engine. Returns a Promise.
start(): Begins listening for voice commands.
stop(): Stops listening.
registerCommand(intentName, pattern, action): Register custom application logic (see Recipes below).

🍳 Recipes: Custom Commands

While voicegis comes with built-in intents (zoom, pan, layers, markers), its real power lies in adding domain-specific commands for your application.

import { VoiceGIS } from 'voicegis';

const app = new VoiceGIS({ mapContainerId: 'map' });

// 1. Register a custom regex command
app.registerCommand(
  'NAVIGATE_TO_ROOM', 
  /(?:take me to|navigate to) (conference room [a-c]|the cafeteria)/i, 
  (mapController, match) => {
    const destination = match[1];
    console.log(`Routing user to: ${destination}`);
    // Hook into your custom routing backend here (e.g. OSRM, AR.js)
  }
);

await app.initSpeech();
app.start();

⚡ Performance & Bundle Size

VoiceGIS implements a hybrid engine strategy. Because local on-device AI requires large weights, keep the following in mind:

WebSpeech Engine adds virtually 0 bytes to your bundle, relying on the browser's native C++ implementation.
Whisper & TF.js engines will dynamically load their weights (~40MB for Whisper tiny.en, ~5MB for TF.js) into the browser cache only when instantiated.
If you are building a strict web-app that will never go offline, you can configure your bundler (e.g. Vite/Webpack) to tree-shake @huggingface/transformers to aggressively reduce your vendor chunk size.

🧪 Evaluation Harness

We ship an offline evaluation harness to measure parser accuracy and prevent regressions.

# Run the benchmark suite against src/evaluation/benchmarks.json
npm run evaluate

📖 Architecture & Advanced Usage

See docs/ARCHITECTURE.md for a deep dive into the internal modules, state management, and deployment profiles.

🛠️ Limitations & Future Work

TF.js Engine: The current TfjsEngine acts as a limited "Offline Command Mode" and is not a purpose-trained Keyword Spotting (KWS) model for VoiceGIS. A future iteration could train a custom small model specifically tuned to GIS commands.
Server Engine: The WhisperServerEngine acts as an HTTP client stub. The actual Python/Go backend server implementation is out of scope for the core library.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme