@hypothesi/tauri-plugin-device-ai-apis

v0.1.0

Published

4 days ago

Tauri plugin for on-device AI APIs: speech recognition, text-to-speech, OCR, barcode detection, face detection, image classification, and language identification.

Downloads

0High
0Medium
0Low

mluedke

tauri tauri-plugin tauri-v2 ai on-device-ai speech-recognition text-to-speech ocr barcode-detection face-detection image-classification language-identification

Tauri Plugin: Device AI APIs

A Tauri v2 plugin for device-native AI capabilities, plus a reusable device-ai Rust crate for direct desktop use. The repository covers speech recognition, text-to-speech, OCR, barcode detection, face detection, image classification, language identification, and conditional on-device LLM access.

Features

| Feature | iOS | Android | macOS | Windows | Linux | | ----------------------- | --- | ------- | ----- | -------- | ----- | | Speech Recognition | ✅ | ✅ | ✅ | ✅ | ❌ | | Text-to-Speech | ✅ | ✅ | ✅ | ✅* | ❌ | | Text Recognition (OCR) | ✅ | ✅ | ✅ | ✅ | ❌ | | Barcode/QR Detection | ✅ | ✅ | ✅ | ❌ | ❌ | | Face Detection | ✅ | ✅ | ✅ | ❌ | ❌ | | Image Classification | ✅ | ✅ | ✅ | ❌ | ❌ | | Language Identification | ✅ | ✅ | ✅ | ❌ | ❌ | | Translation | ❌ | ❌ | ❌ | ❌ | ❌ | | Language Model (LLM) | ✅† | ❌ | ✅† | ❌‡ | ❌ |

Legend: ✅ Implemented | ❌ Not Available

* Windows TTS completes synthesis, but the current Rust backend does not yet play the generated stream.

† Requires macOS 26+ (Tahoe) or iOS 26+ with Apple FoundationModels. The feature compiles conditionally — if the SDK is not present, LLM commands return a "not available" error gracefully.

‡ Windows Phi Silica APIs (Microsoft.Windows.AI.Text) are not yet accessible from Rust via the windows crate. Stubs return a clear "not available" error. Full support is planned once WinRT bindings are available.

Installation

Rust

Add the plugin to your Tauri project's Cargo.toml:

[dependencies]
tauri-plugin-device-ai-apis = { git = "https://github.com/hypothesi/tauri-plugin-device-ai-apis" }

pub fn run() {
    tauri::Builder::default()
        .plugin(tauri_plugin_device_ai_apis::init())
        .run(tauri::generate_context!())
        .expect("error while running tauri application");
}

Rust library

Use the extracted device-ai crate when you want direct Rust access without Tauri on desktop. The root crate remains the Tauri plugin adapter, and iOS/Android still route through the plugin-hosted mobile bridge:

[dependencies]
device-ai = { git = "https://github.com/hypothesi/tauri-plugin-device-ai-apis" }

use device_ai::{DeviceAi, ImageSource, OcrOptions};

fn main() -> device_ai::Result<()> {
    let ai = DeviceAi::new();
    let capabilities = ai.capabilities();

    println!("speech recognition: {}", capabilities.speech_recognition.available);

    let ocr = ai.vision().recognize_text(
        ImageSource::from_path("receipt.png"),
        OcrOptions::new(),
    )?;

    println!("{}", ocr.text);
    Ok(())
}

JavaScript/TypeScript

Install the API package:

npm install @hypothesi/tauri-plugin-device-ai-apis

Usage

Capability Detection

Check which features are available on the current device:

import { getCapabilities } from "@hypothesi/tauri-plugin-device-ai-apis";

const capabilities = await getCapabilities();
if (capabilities.speechRecognition.available) {
  console.log("Speech recognition is available!");
}

Speech Recognition

Convert speech to text:

import { speech } from "@hypothesi/tauri-plugin-device-ai-apis";

// One-shot recognition
const result = await speech.recognize({ language: "en-US" });
console.log("Recognized:", result.text);
console.log("Confidence:", result.confidence);

Text-to-Speech

Synthesize speech from text:

import { speech } from "@hypothesi/tauri-plugin-device-ai-apis";

// Speak text
await speech.synthesize("Hello, world!", {
  rate: 1.0,
  pitch: 1.0,
  volume: 1.0,
});

// List available voices
const voices = await speech.getVoices();
console.log("Available voices:", voices);

// Use a specific voice
await speech.synthesize("Hello!", { voice: voices[0].id });

Text Recognition (OCR)

Extract text from images:

import { vision } from "@hypothesi/tauri-plugin-device-ai-apis";

// From base64 image data
const result = await vision.recognizeText({ base64: imageData });
console.log("Extracted text:", result.text);

// Access individual text blocks
for (const block of result.blocks) {
  console.log("Block:", block.text, "at", block.boundingBox);
}

Barcode Detection

Detect and decode barcodes and QR codes:

import { vision } from "@hypothesi/tauri-plugin-device-ai-apis";

const barcodes = await vision.detectBarcodes({ base64: imageData });
for (const barcode of barcodes) {
  console.log(`${barcode.format}: ${barcode.rawValue}`);
}

Face Detection

Detect faces with optional landmarks:

import { vision } from "@hypothesi/tauri-plugin-device-ai-apis";

const faces = await vision.detectFaces(
  { base64: imageData },
  { detectLandmarks: true, classifyAttributes: true },
);

for (const face of faces) {
  console.log("Face at:", face.boundingBox);
  if (face.landmarks) {
    console.log("Left eye:", face.landmarks.leftEye);
  }
  if (face.attributes) {
    console.log("Smiling:", face.attributes.smilingProbability);
  }
}

Image Classification

Classify images with labels:

import { vision } from "@hypothesi/tauri-plugin-device-ai-apis";

const classifications = await vision.classifyImage(
  { base64: imageData },
  { maxResults: 5, minConfidence: 0.5 },
);

for (const classification of classifications) {
  console.log(`${classification.identifier}: ${classification.confidence}`);
}

Language Identification

Identify the language of text:

import { text } from "@hypothesi/tauri-plugin-device-ai-apis";

const result = await text.identifyLanguage("Bonjour, comment allez-vous?");
console.log("Language:", result.language); // 'fr'
console.log("Confidence:", result.confidence);

Language model (LLM)

import { llm } from "@hypothesi/tauri-plugin-device-ai-apis";

// Check availability
const status = await llm.checkAvailability();
if (!status.available) {
  console.log("LLM not available:", status.reason);
}

// Single-shot generation
const result = await llm.generate({
  prompt: "Explain quantum computing in one paragraph.",
  temperature: 0.7,
  maxTokens: 256,
});
console.log(result.content);

// Streaming generation
let streamed = "";
await llm.generateStream(
  { prompt: "Write a short poem about the sea." },
  (event) => {
    if (event.type === "delta") streamed += event.content;
    if (event.type === "done") {
      console.log(streamed || event.content);
      console.log("Done:", event.finishReason);
    }
  },
);

// Multi-turn session
const sessionId = await llm.createSession({
  systemPrompt: "You are a helpful assistant.",
});
const reply = await llm.sessionSend(sessionId, "What is 2+2?");
console.log(reply.content);
await llm.destroySession(sessionId);

// Text intelligence
const summary = await llm.summarize({
  text: "Long article text here...",
});
console.log(summary.summary);

const rewritten = await llm.rewrite({
  text: "hey wanna grab lunch tmrw?",
  tone: "formal",
});
console.log(rewritten.rewrittenText);

Permissions

Add permissions in your app's capability file (e.g. src-tauri/capabilities/default.json).

Grant all plugin permissions:

{
  "permissions": ["core:default", "device-ai-apis:all"]
}

Or use granular permission sets:

{
  "permissions": [
    "core:default",
    "device-ai-apis:allow-get-capabilities",
    "device-ai-apis:speech-recognition",
    "device-ai-apis:speech-synthesis",
    "device-ai-apis:vision-all",
    "device-ai-apis:text-all"
  ]
}

Individual permissions are also available:

| Permission | --------------------------------------- | allow-get-capabilities | allow-speech-recognize | allow-speech-recognize-start | allow-speech-recognize-stop | allow-speech-synthesize | allow-speech-get-voices | allow-vision-recognize-text | allow-vision-detect-barcodes | allow-vision-detect-faces | allow-vision-classify-image | allow-text-identify-language | allow-text-translate | allow-llm-check-availability | allow-llm-get-model-info | allow-llm-generate | allow-llm-generate-stream | allow-llm-create-session | allow-llm-session-send | allow-llm-session-send-stream | allow-llm-destroy-session | allow-llm-summarize | allow-llm-rewrite | Description | | ---------------------------------- | | Query available AI features | | One-shot speech recognition | | Start streaming recognition | | Stop streaming recognition | | Text-to-speech synthesis | | List available TTS voices | | OCR text recognition | | Barcode and QR code detection | | Face detection | | Image classification | | Language identification | | Text translation | | Check on-device LLM availability | | Get language model metadata | | Single-shot text generation | | Streaming text generation | | Create multi-turn session | | Send message in a session | | Stream response in a session | | Destroy a session | | Summarize text | | Rewrite text with a given tone |

Permission sets: speech-recognition, speech-synthesis, vision-all, text-all, llm-all, and all (everything).

Platform Requirements

iOS

iOS 13.0+

Add to Info.plist:

<key>NSSpeechRecognitionUsageDescription</key>
<string>Speech recognition is used for voice commands</string>
<key>NSMicrophoneUsageDescription</key>
<string>Microphone access is needed for speech recognition</string>

Android

Android API 21+
Permissions are declared in the plugin's AndroidManifest.xml
Runtime permission request for RECORD_AUDIO is handled automatically

macOS/Windows

macOS and Windows desktop support is available (feature coverage varies by API).

Architecture

The repository now has two Rust entry points:

tauri-plugin-device-ai-apis: Tauri plugin wiring, permissions, commands, and mobile host bridging
crates/device-ai: Tauri-agnostic Rust library that owns the reusable desktop-native implementation used by the plugin on macOS and Windows

On desktop, plugin commands delegate into device-ai. On iOS/Android, commands still flow through the Swift/Kotlin mobile bridge in this repository.

The native backends use:

iOS/macOS: Speech framework, AVFoundation, Vision framework
Android: SpeechRecognizer, TextToSpeech, ML Kit
Windows: Windows.Media.SpeechRecognition, Windows.Media.SpeechSynthesis, Windows.Media.Ocr

Most native features use local platform APIs, but there are important exceptions and current gaps:

Android speech recognition and browser speech fallbacks may rely on platform/browser services rather than strictly on-device inference
Native streaming speech recognition (speech_recognize_start / speech_recognize_stop) is not implemented yet
Translation commands exist in the API surface but currently return "not available"
Windows text-to-speech synthesizes successfully but does not yet play audio

Manual verification

Direct library verification

Run the desktop example CLI in crates/device-ai/examples/device-ai.rs:

cargo run -p device-ai --example device-ai -- capabilities
cargo run -p device-ai --example device-ai -- speech-voices
cargo run -p device-ai --example device-ai -- speech-speak "Hello from device-ai"
cargo run -p device-ai --example device-ai -- speech-recognize ./path/to/audio.wav
cargo run -p device-ai --example device-ai -- speech-stream
cargo run -p device-ai --example device-ai -- vision-ocr ./path/to/image.png
cargo run -p device-ai --example device-ai -- vision-barcodes ./path/to/image.png
cargo run -p device-ai --example device-ai -- vision-faces ./path/to/image.png
cargo run -p device-ai --example device-ai -- vision-classify ./path/to/image.png
cargo run -p device-ai --example device-ai -- text-language "Bonjour tout le monde"
cargo run -p device-ai --example device-ai -- text-translate en es "hello world"
cargo run -p device-ai --example device-ai -- llm-availability
cargo run -p device-ai --example device-ai -- llm-model-info
cargo run -p device-ai --example device-ai -- llm-generate "Explain OCR in one paragraph."
cargo run -p device-ai --example device-ai -- llm-stream "Write a haiku about Rust."
cargo run -p device-ai --example device-ai -- llm-session "What is 2 + 2?"
cargo run -p device-ai --example device-ai -- llm-session-stream "Describe this platform."
cargo run -p device-ai --example device-ai -- llm-summarize "Long text to summarize..."
cargo run -p device-ai --example device-ai -- llm-rewrite formal "hey wanna meet tomorrow?"

Use local audio/image assets that make sense for your host platform. This example is aimed at the direct desktop library (device-ai on macOS/Windows). Unsupported features should return explicit structured errors; for example, speech-stream is currently expected to report that native streaming recognition is not implemented.

Plugin + sample app verification

npm run build
cd examples/tauri-app
npm run tauri dev

Use the sample app to exercise capabilities, speech, vision, language identification, and LLM flows end-to-end through the plugin surface.

Error Handling

All API calls can throw errors with structured error codes:

import { DeviceAiError } from "@hypothesi/tauri-plugin-device-ai-apis";

try {
  await speech.recognize();
} catch (error) {
  if ((error as DeviceAiError).code === "FEATURE_NOT_AVAILABLE") {
    console.log("Speech recognition not available on this device");
  } else if ((error as DeviceAiError).code === "PERMISSION_DENIED") {
    console.log("Permission was denied");
  }
}

License

MIT OR Apache-2.0