npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@hypothesi/tauri-plugin-device-ai-apis

v0.1.0

Published

Tauri plugin for on-device AI APIs: speech recognition, text-to-speech, OCR, barcode detection, face detection, image classification, and language identification.

Downloads

54

Readme

Tauri Plugin: Device AI APIs

A Tauri v2 plugin for device-native AI capabilities, plus a reusable device-ai Rust crate for direct desktop use. The repository covers speech recognition, text-to-speech, OCR, barcode detection, face detection, image classification, language identification, and conditional on-device LLM access.

Features

| Feature | iOS | Android | macOS | Windows | Linux | | ----------------------- | --- | ------- | ----- | -------- | ----- | | Speech Recognition | ✅ | ✅ | ✅ | ✅ | ❌ | | Text-to-Speech | ✅ | ✅ | ✅ | ✅* | ❌ | | Text Recognition (OCR) | ✅ | ✅ | ✅ | ✅ | ❌ | | Barcode/QR Detection | ✅ | ✅ | ✅ | ❌ | ❌ | | Face Detection | ✅ | ✅ | ✅ | ❌ | ❌ | | Image Classification | ✅ | ✅ | ✅ | ❌ | ❌ | | Language Identification | ✅ | ✅ | ✅ | ❌ | ❌ | | Translation | ❌ | ❌ | ❌ | ❌ | ❌ | | Language Model (LLM) | ✅† | ❌ | ✅† | ❌‡ | ❌ |

Legend: ✅ Implemented | ❌ Not Available

* Windows TTS completes synthesis, but the current Rust backend does not yet play the generated stream.

† Requires macOS 26+ (Tahoe) or iOS 26+ with Apple FoundationModels. The feature compiles conditionally — if the SDK is not present, LLM commands return a "not available" error gracefully.

‡ Windows Phi Silica APIs (Microsoft.Windows.AI.Text) are not yet accessible from Rust via the windows crate. Stubs return a clear "not available" error. Full support is planned once WinRT bindings are available.

Installation

Rust

Add the plugin to your Tauri project's Cargo.toml:

[dependencies]
tauri-plugin-device-ai-apis = { git = "https://github.com/hypothesi/tauri-plugin-device-ai-apis" }

Register the plugin in your lib.rs:

pub fn run() {
    tauri::Builder::default()
        .plugin(tauri_plugin_device_ai_apis::init())
        .run(tauri::generate_context!())
        .expect("error while running tauri application");
}

Rust library

Use the extracted device-ai crate when you want direct Rust access without Tauri on desktop. The root crate remains the Tauri plugin adapter, and iOS/Android still route through the plugin-hosted mobile bridge:

[dependencies]
device-ai = { git = "https://github.com/hypothesi/tauri-plugin-device-ai-apis" }
use device_ai::{DeviceAi, ImageSource, OcrOptions};

fn main() -> device_ai::Result<()> {
    let ai = DeviceAi::new();
    let capabilities = ai.capabilities();

    println!("speech recognition: {}", capabilities.speech_recognition.available);

    let ocr = ai.vision().recognize_text(
        ImageSource::from_path("receipt.png"),
        OcrOptions::new(),
    )?;

    println!("{}", ocr.text);
    Ok(())
}

JavaScript/TypeScript

Install the API package:

npm install @hypothesi/tauri-plugin-device-ai-apis

Usage

Capability Detection

Check which features are available on the current device:

import { getCapabilities } from "@hypothesi/tauri-plugin-device-ai-apis";

const capabilities = await getCapabilities();
if (capabilities.speechRecognition.available) {
  console.log("Speech recognition is available!");
}

Speech Recognition

Convert speech to text:

import { speech } from "@hypothesi/tauri-plugin-device-ai-apis";

// One-shot recognition
const result = await speech.recognize({ language: "en-US" });
console.log("Recognized:", result.text);
console.log("Confidence:", result.confidence);

Text-to-Speech

Synthesize speech from text:

import { speech } from "@hypothesi/tauri-plugin-device-ai-apis";

// Speak text
await speech.synthesize("Hello, world!", {
  rate: 1.0,
  pitch: 1.0,
  volume: 1.0,
});

// List available voices
const voices = await speech.getVoices();
console.log("Available voices:", voices);

// Use a specific voice
await speech.synthesize("Hello!", { voice: voices[0].id });

Text Recognition (OCR)

Extract text from images:

import { vision } from "@hypothesi/tauri-plugin-device-ai-apis";

// From base64 image data
const result = await vision.recognizeText({ base64: imageData });
console.log("Extracted text:", result.text);

// Access individual text blocks
for (const block of result.blocks) {
  console.log("Block:", block.text, "at", block.boundingBox);
}

Barcode Detection

Detect and decode barcodes and QR codes:

import { vision } from "@hypothesi/tauri-plugin-device-ai-apis";

const barcodes = await vision.detectBarcodes({ base64: imageData });
for (const barcode of barcodes) {
  console.log(`${barcode.format}: ${barcode.rawValue}`);
}

Face Detection

Detect faces with optional landmarks:

import { vision } from "@hypothesi/tauri-plugin-device-ai-apis";

const faces = await vision.detectFaces(
  { base64: imageData },
  { detectLandmarks: true, classifyAttributes: true },
);

for (const face of faces) {
  console.log("Face at:", face.boundingBox);
  if (face.landmarks) {
    console.log("Left eye:", face.landmarks.leftEye);
  }
  if (face.attributes) {
    console.log("Smiling:", face.attributes.smilingProbability);
  }
}

Image Classification

Classify images with labels:

import { vision } from "@hypothesi/tauri-plugin-device-ai-apis";

const classifications = await vision.classifyImage(
  { base64: imageData },
  { maxResults: 5, minConfidence: 0.5 },
);

for (const classification of classifications) {
  console.log(`${classification.identifier}: ${classification.confidence}`);
}

Language Identification

Identify the language of text:

import { text } from "@hypothesi/tauri-plugin-device-ai-apis";

const result = await text.identifyLanguage("Bonjour, comment allez-vous?");
console.log("Language:", result.language); // 'fr'
console.log("Confidence:", result.confidence);

Language model (LLM)

import { llm } from "@hypothesi/tauri-plugin-device-ai-apis";

// Check availability
const status = await llm.checkAvailability();
if (!status.available) {
  console.log("LLM not available:", status.reason);
}

// Single-shot generation
const result = await llm.generate({
  prompt: "Explain quantum computing in one paragraph.",
  temperature: 0.7,
  maxTokens: 256,
});
console.log(result.content);

// Streaming generation
let streamed = "";
await llm.generateStream(
  { prompt: "Write a short poem about the sea." },
  (event) => {
    if (event.type === "delta") streamed += event.content;
    if (event.type === "done") {
      console.log(streamed || event.content);
      console.log("Done:", event.finishReason);
    }
  },
);

// Multi-turn session
const sessionId = await llm.createSession({
  systemPrompt: "You are a helpful assistant.",
});
const reply = await llm.sessionSend(sessionId, "What is 2+2?");
console.log(reply.content);
await llm.destroySession(sessionId);

// Text intelligence
const summary = await llm.summarize({
  text: "Long article text here...",
});
console.log(summary.summary);

const rewritten = await llm.rewrite({
  text: "hey wanna grab lunch tmrw?",
  tone: "formal",
});
console.log(rewritten.rewrittenText);

Permissions

Add permissions in your app's capability file (e.g. src-tauri/capabilities/default.json).

Grant all plugin permissions:

{
  "permissions": ["core:default", "device-ai-apis:all"]
}

Or use granular permission sets:

{
  "permissions": [
    "core:default",
    "device-ai-apis:allow-get-capabilities",
    "device-ai-apis:speech-recognition",
    "device-ai-apis:speech-synthesis",
    "device-ai-apis:vision-all",
    "device-ai-apis:text-all"
  ]
}

Individual permissions are also available:

| Permission | Description | | --------------------------------------- | ---------------------------------- | | allow-get-capabilities | Query available AI features | | allow-speech-recognize | One-shot speech recognition | | allow-speech-recognize-start | Start streaming recognition | | allow-speech-recognize-stop | Stop streaming recognition | | allow-speech-synthesize | Text-to-speech synthesis | | allow-speech-get-voices | List available TTS voices | | allow-vision-recognize-text | OCR text recognition | | allow-vision-detect-barcodes | Barcode and QR code detection | | allow-vision-detect-faces | Face detection | | allow-vision-classify-image | Image classification | | allow-text-identify-language | Language identification | | allow-text-translate | Text translation | | allow-llm-check-availability | Check on-device LLM availability | | allow-llm-get-model-info | Get language model metadata | | allow-llm-generate | Single-shot text generation | | allow-llm-generate-stream | Streaming text generation | | allow-llm-create-session | Create multi-turn session | | allow-llm-session-send | Send message in a session | | allow-llm-session-send-stream | Stream response in a session | | allow-llm-destroy-session | Destroy a session | | allow-llm-summarize | Summarize text | | allow-llm-rewrite | Rewrite text with a given tone |

Permission sets: speech-recognition, speech-synthesis, vision-all, text-all, llm-all, and all (everything).

Platform Requirements

iOS

  • iOS 13.0+
  • Add to Info.plist:
    <key>NSSpeechRecognitionUsageDescription</key>
    <string>Speech recognition is used for voice commands</string>
    <key>NSMicrophoneUsageDescription</key>
    <string>Microphone access is needed for speech recognition</string>

Android

  • Android API 21+
  • Permissions are declared in the plugin's AndroidManifest.xml
  • Runtime permission request for RECORD_AUDIO is handled automatically

macOS/Windows

  • macOS and Windows desktop support is available (feature coverage varies by API).

Architecture

The repository now has two Rust entry points:

  • tauri-plugin-device-ai-apis: Tauri plugin wiring, permissions, commands, and mobile host bridging
  • crates/device-ai: Tauri-agnostic Rust library that owns the reusable desktop-native implementation used by the plugin on macOS and Windows

On desktop, plugin commands delegate into device-ai. On iOS/Android, commands still flow through the Swift/Kotlin mobile bridge in this repository.

The native backends use:

  • iOS/macOS: Speech framework, AVFoundation, Vision framework
  • Android: SpeechRecognizer, TextToSpeech, ML Kit
  • Windows: Windows.Media.SpeechRecognition, Windows.Media.SpeechSynthesis, Windows.Media.Ocr

Most native features use local platform APIs, but there are important exceptions and current gaps:

  • Android speech recognition and browser speech fallbacks may rely on platform/browser services rather than strictly on-device inference
  • Native streaming speech recognition (speech_recognize_start / speech_recognize_stop) is not implemented yet
  • Translation commands exist in the API surface but currently return "not available"
  • Windows text-to-speech synthesizes successfully but does not yet play audio

Manual verification

Direct library verification

Run the desktop example CLI in crates/device-ai/examples/device-ai.rs:

cargo run -p device-ai --example device-ai -- capabilities
cargo run -p device-ai --example device-ai -- speech-voices
cargo run -p device-ai --example device-ai -- speech-speak "Hello from device-ai"
cargo run -p device-ai --example device-ai -- speech-recognize ./path/to/audio.wav
cargo run -p device-ai --example device-ai -- speech-stream
cargo run -p device-ai --example device-ai -- vision-ocr ./path/to/image.png
cargo run -p device-ai --example device-ai -- vision-barcodes ./path/to/image.png
cargo run -p device-ai --example device-ai -- vision-faces ./path/to/image.png
cargo run -p device-ai --example device-ai -- vision-classify ./path/to/image.png
cargo run -p device-ai --example device-ai -- text-language "Bonjour tout le monde"
cargo run -p device-ai --example device-ai -- text-translate en es "hello world"
cargo run -p device-ai --example device-ai -- llm-availability
cargo run -p device-ai --example device-ai -- llm-model-info
cargo run -p device-ai --example device-ai -- llm-generate "Explain OCR in one paragraph."
cargo run -p device-ai --example device-ai -- llm-stream "Write a haiku about Rust."
cargo run -p device-ai --example device-ai -- llm-session "What is 2 + 2?"
cargo run -p device-ai --example device-ai -- llm-session-stream "Describe this platform."
cargo run -p device-ai --example device-ai -- llm-summarize "Long text to summarize..."
cargo run -p device-ai --example device-ai -- llm-rewrite formal "hey wanna meet tomorrow?"

Use local audio/image assets that make sense for your host platform. This example is aimed at the direct desktop library (device-ai on macOS/Windows). Unsupported features should return explicit structured errors; for example, speech-stream is currently expected to report that native streaming recognition is not implemented.

Plugin + sample app verification

npm run build
cd examples/tauri-app
npm run tauri dev

Use the sample app to exercise capabilities, speech, vision, language identification, and LLM flows end-to-end through the plugin surface.

Error Handling

All API calls can throw errors with structured error codes:

import { DeviceAiError } from "@hypothesi/tauri-plugin-device-ai-apis";

try {
  await speech.recognize();
} catch (error) {
  if ((error as DeviceAiError).code === "FEATURE_NOT_AVAILABLE") {
    console.log("Speech recognition not available on this device");
  } else if ((error as DeviceAiError).code === "PERMISSION_DENIED") {
    console.log("Permission was denied");
  }
}

License

MIT OR Apache-2.0