voicegis
v2.0.0-alpha.6
Published
Voice-Enabled Geospatial Map Web Application — offline-first voice command interface for Leaflet and OpenLayers
Maintainers
Readme
VoiceGIS is a JavaScript library that adds powerful voice-control capabilities to web-based maps (Leaflet and OpenLayers). It uses a hybrid architecture with multiple deployment profiles—ranging from fast online cloud STT, to robust on-device AI (Whisper), down to constrained offline edge keyword spotting.
🚀 Why Hybrid? (Reality Check)
Browser-based speech recognition is a game of tradeoffs between bandwidth, accuracy, and latency. VoiceGIS embraces this reality:
- Web Speech API (Cloud): Convenient, instant, and highly accurate. However, it is often cloud‑backed and not fully under app control (audio may leave the device). Furthermore, it is only natively supported in some browsers (e.g., Chrome/Edge) and not fully supported in Firefox/Safari. It typically does not work offline.
- Browser-Only Whisper (On-Device): Provides excellent offline accuracy and privacy, but involves downloading a large model (on the order of a few-dozen MB for
tiny.en, and 2-3x larger forbase.en) and is compute-heavy. It is best used as an advanced or offline-fallback option. - TF.js Command Mode (Edge KWS): Tiny, instantaneous keyword spotting engines are what real edge/embedded systems use for always-on commands. While limited to a fixed vocabulary, it operates flawlessly offline on constrained devices without massive downloads.
VoiceGIS's default auto strategy seamlessly routes between these engines depending on the user's connection and browser capabilities.
📦 Engine Profiles
| Profile | Engine Layer | Best For | Tradeoffs |
| --- | --- | --- | --- |
| Auto (Hybrid) | VoiceGIS orchestrator | Consumer web apps | Switches dynamically between Cloud and Whisper based on network. |
| Online Cloud | WebSpeechEngine | Fast, low-latency UX | Requires internet; audio may be sent to third-party cloud. |
| Offline Advanced | WhisperEngine | Privacy, offline fields | Requires downloading 40MB+ model; high CPU usage. |
| Offline Command | TfjsEngine | Constrained devices | Only understands a small, fixed set of GIS commands. |
| Private Server | WhisperServerEngine | Enterprise / Secure Intranets | Requires hosting your own Whisper API backend. |
📦 Installation
npm install voicegis(Note: leaflet or ol are peer dependencies. Install the one you plan to use.)
⏱️ Quickstart
1. Auto Mode (Recommended)
import { VoiceGIS } from 'voicegis';
const app = new VoiceGIS({
mapEngine: 'leaflet',
mapContainerId: 'map',
speechEngine: 'auto', // Intelligently routes between WebSpeech and Whisper
autoExecute: true
});
await app.initSpeech();
app.start();2. Private Server Mode
If you are hosting your own Whisper endpoint (e.g. whisper.cpp server):
import { VoiceGIS } from 'voicegis';
const app = new VoiceGIS({
speechEngine: 'server',
autoExecute: true
});
// Configure the backend API URL
app.speech.apiUrl = 'http://localhost:8000/transcribe';
await app.initSpeech();
app.start();3. Explicit Offline Command Mode
import { VoiceGIS } from 'voicegis';
const app = new VoiceGIS({
speechEngine: 'tfjs', // Forces the tiny edge keyword spotter
autoExecute: true
});
await app.initSpeech();
app.start();🔌 Public API Reference
The core of the library is the VoiceGIS orchestrator class.
new VoiceGIS(options)
| Option | Type | Default | Description |
| --- | --- | --- | --- |
| mapEngine | 'leaflet' | 'openlayers' | 'leaflet' | The underlying map library you are using. |
| mapContainerId | string | undefined | DOM ID of the map container (if provided, map is auto-initialized). |
| speechEngine | 'auto' | 'webspeech' | 'whisper' | 'tfjs' | 'auto' | The STT engine strategy to use. |
| autoExecute | boolean | true | Whether to automatically run parsed intents on the map controller. |
| enableGeocoding| boolean | true | Whether to use Nominatim for resolving unknown place names offline. |
| onCommandParsed| function | undefined | Callback (result, rawText) triggered when a voice command is understood. |
| onStateChange | function | undefined | Callback (state) where state is 'listening', 'idle', or 'error'. |
| onEngineSwitched| function | undefined | Callback (engineType) triggered when the hybrid router switches engines. |
Methods
initSpeech(): Instantiates and warms up the selected speech engine. Returns a Promise.start(): Begins listening for voice commands.stop(): Stops listening.registerCommand(intentName, pattern, action): Register custom application logic (see Recipes below).
🍳 Recipes: Custom Commands
While voicegis comes with built-in intents (zoom, pan, layers, markers), its real power lies in adding domain-specific commands for your application.
import { VoiceGIS } from 'voicegis';
const app = new VoiceGIS({ mapContainerId: 'map' });
// 1. Register a custom regex command
app.registerCommand(
'NAVIGATE_TO_ROOM',
/(?:take me to|navigate to) (conference room [a-c]|the cafeteria)/i,
(mapController, match) => {
const destination = match[1];
console.log(`Routing user to: ${destination}`);
// Hook into your custom routing backend here (e.g. OSRM, AR.js)
}
);
await app.initSpeech();
app.start();⚡ Performance & Bundle Size
VoiceGIS implements a hybrid engine strategy. Because local on-device AI requires large weights, keep the following in mind:
- WebSpeech Engine adds virtually 0 bytes to your bundle, relying on the browser's native C++ implementation.
- Whisper & TF.js engines will dynamically load their weights (~40MB for Whisper
tiny.en, ~5MB for TF.js) into the browser cache only when instantiated. - If you are building a strict web-app that will never go offline, you can configure your bundler (e.g. Vite/Webpack) to tree-shake
@huggingface/transformersto aggressively reduce your vendor chunk size.
🧪 Evaluation Harness
We ship an offline evaluation harness to measure parser accuracy and prevent regressions.
# Run the benchmark suite against src/evaluation/benchmarks.json
npm run evaluate📖 Architecture & Advanced Usage
See docs/ARCHITECTURE.md for a deep dive into the internal modules, state management, and deployment profiles.
🛠️ Limitations & Future Work
- TF.js Engine: The current
TfjsEngineacts as a limited "Offline Command Mode" and is not a purpose-trained Keyword Spotting (KWS) model for VoiceGIS. A future iteration could train a custom small model specifically tuned to GIS commands. - Server Engine: The
WhisperServerEngineacts as an HTTP client stub. The actual Python/Go backend server implementation is out of scope for the core library.
📝 License
MIT © Sanish Kumar
