capacitor-gemma-3n
v1.0.0
Published
Capacitor plugin for Google Gemma 3n on-device AI via MediaPipe LLM Inference API
Maintainers
Readme
capacitor-gemma-3n
Capacitor plugin for running Google Gemma 3n on-device AI models on Android using LiteRT-LM.
Features
- On-device inference - No internet required after model download
- Text generation with streaming support
- Audio transcription (batch processing)
- Image understanding (vision) - structure ready, implementation pending
- Multimodal inputs - text + audio + image
Requirements
- Android 8.0+ (API 26+)
- ~4-5GB RAM available for E4B model
- Gemma 3n model file (
.litertlmformat) - Capacitor 7.x
Installation
npm install capacitor-gemma-3n
npx cap sync androidModel Setup
Download the Gemma 3n model from HuggingFace or copy from Google AI Edge Gallery.
Place the model file at:
/sdcard/Android/data/YOUR_APP_ID/files/gemma-3n-E4B-it-int4.litertlmThe plugin searches for the model in these locations:
<internal_files>/gemma3n_models/gemma-3n-E4B-it-int4.litertlm<external_files>/gemma-3n-E4B-it-int4.litertlm/sdcard/Download/gemma-3n-E4B-it-int4.litertlm- Google AI Edge Gallery directory
Usage
Check Availability
import { Gemma3n } from 'capacitor-gemma-3n';
const { available, status, message } = await Gemma3n.isAvailable();
console.log('Gemma 3n available:', available);Initialize Model
const result = await Gemma3n.initialize({
variant: 'e4b', // 'e4b' (4-bit, more accurate) or 'e2b' (2-bit, smaller)
maxTokens: 1024
});
if (result.success) {
console.log('Model loaded!');
}Generate Text
const response = await Gemma3n.generateText({
prompt: 'What is the capital of France?',
systemPrompt: 'You are a helpful assistant.',
maxTokens: 512
});
console.log(response.text);
console.log(`Generated in ${response.timeMs}ms`);Streaming Generation
// Set up listener
const listener = await Gemma3n.addListener('streamResponse', (event) => {
if (event.error) {
console.error('Error:', event.error);
return;
}
console.log('Partial:', event.partialText);
if (event.done) {
console.log('Complete!');
listener.remove();
}
});
// Start streaming
await Gemma3n.generateTextStream({
prompt: 'Write a short story about a robot.',
systemPrompt: 'You are a creative writer.'
});Audio Transcription
// From base64 audio (WAV 16kHz mono recommended)
const response = await Gemma3n.generateFromAudio({
audioBase64: 'BASE64_ENCODED_WAV',
prompt: 'Transcribe this audio:'
});
console.log('Transcription:', response.text);
// Or with streaming output
const listener = await Gemma3n.addListener('streamResponse', (event) => {
console.log(event.partialText);
if (event.done) listener.remove();
});
await Gemma3n.generateFromAudioStream({
audioBase64: 'BASE64_ENCODED_WAV',
prompt: 'Transcribe this audio in French:'
});Get Model Info
const info = await Gemma3n.getModelInfo();
console.log('Loaded:', info.loaded);
console.log('Variant:', info.variant);
console.log('Memory usage:', info.memoryUsageMB, 'MB');
console.log('Device info:', info.deviceInfo);Unload Model
await Gemma3n.unloadModel();API Reference
Methods
| Method | Description |
|--------|-------------|
| isAvailable() | Check if Gemma 3n is available on device |
| initialize(options?) | Load the model into memory |
| generateText(options) | Generate text response |
| generateTextStream(options) | Generate with streaming output |
| generateFromAudio(options) | Transcribe/process audio |
| generateFromAudioStream(options) | Audio with streaming output |
| generateFromImage(options) | Process image (coming soon) |
| cancelGeneration(options) | Cancel ongoing generation |
| getModelInfo() | Get model and device info |
| unloadModel() | Free model from memory |
Events
| Event | Description |
|-------|-------------|
| streamResponse | Partial text during streaming |
| downloadProgress | Model download progress (future) |
Interfaces
interface InitializeOptions {
variant?: 'e4b' | 'e2b';
maxTokens?: number;
temperature?: number;
topK?: number;
}
interface GenerateTextOptions {
prompt: string;
systemPrompt?: string;
maxTokens?: number;
}
interface GenerateFromAudioOptions {
audioPath?: string;
audioBase64?: string;
prompt?: string;
maxTokens?: number;
}
interface GenerateResponse {
text: string;
tokenCount: number;
timeMs: number;
truncated: boolean;
}
interface StreamResponseEvent {
sessionId: string;
partialText: string;
done: boolean;
error?: string;
}
interface ModelInfo {
loaded: boolean;
variant: 'e4b' | 'e2b' | null;
memoryUsageMB: number;
deviceInfo: {
supportsGemma3n: boolean;
availableMemoryMB: number;
cpuCores: number;
hasNPU: boolean;
};
}Performance
Tested on Pixel 9 Pro XL:
- Model load time: ~5-10 seconds
- Generation speed: ~10-20 tokens/second (CPU)
- Memory usage: ~3-4GB
Limitations
- Android only - iOS not supported (no LiteRT-LM)
- Audio: Batch processing only, not real-time streaming STT
- Audio format: WAV 16kHz mono recommended
- Vision: Requires GPU backend (implementation pending)
- Model size: ~4.9GB for E4B variant
Troubleshooting
"Model not found"
Ensure the model file is in the correct location with the exact filename gemma-3n-E4B-it-int4.litertlm.
"Out of memory"
Close other apps to free RAM. E4B needs ~4GB available.
Audio "miniaudio decoder error"
Convert audio to WAV format (16kHz, 16-bit, mono).
Kotlin version conflict
The plugin uses Kotlin 2.2.21 and avoids coroutines to prevent version conflicts with other plugins.
Technical Notes
- Uses
java.util.concurrent.Executorsinstead of Kotlin Coroutines (avoids version conflicts) - Streaming via Capacitor's
notifyListeners - Engine configured with
audioBackend = Backend.CPUfor multimodal support
License
MIT
