react-native-litert-lm

v0.2.0

Published

7 days ago

High-performance LLM inference for React Native using LiteRT-LM. Optimized for Gemma 3n and other on-device language models.

0High
0Medium
0Low

hughchen

react-native nitro litert litert-lm llm gemma gemma-3n ai machine-learning on-device inference expo

react-native-litert-lm

High-performance LLM inference for React Native powered by LiteRT-LM and Nitro Module. Optimized for Gemma 3n and other on-device language models.

Features

🚀 Native Performance - Kotlin (Android) / C++ (iOS) implementation via Nitro Modules
🧠 Gemma 3n Ready - First-class support for Gemma 3n E2B/E4B models
⚡ GPU Acceleration - GPU delegate (Android), Metal (iOS when available)
📦 Bundled Tokenizer - No separate tokenization library needed
🔄 Streaming Support - Token-by-token generation callbacks
📱 Cross-Platform - Android API 26+
🖼️ Multimodal - Image and audio input support (Android Beta, iOS coming soon)
🧵 Async API - Non-blocking inference to prevent UI freezes

Status

⚠️ Early Preview: This library is under active development. Android is functional with enough RAM, iOS implementation pending LiteRT-LM iOS release. Please report any issues on the GitHub issues.

Installation

npm install react-native-litert-lm react-native-nitro-modules

Expo

Add to your app.json:

{
  "expo": {
    "plugins": ["react-native-litert-lm"],
    "android": {
      "minSdkVersion": 26
    }
  }
}

Then create a development build:

npx expo prebuild
npx expo run:android

Note: Only ARM devices are supported (physical devices or ARM emulators). x86_64 emulators are not supported.

Bare React Native

cd android && ./gradlew clean
cd ios && pod install  # iOS coming soon

Model Management

LiteRT-LM models (like Gemma 3n) are large files (3GB+) and cannot be bundled directly into your app's binary. You must download them at runtime to a writable directory (e.g., DocumentDirectory).

Downloading Models

We recommend using rn-fetch-blob or expo-file-system to download models.

import { FileSystem } from "react-native-file-access";
// or import * as FileSystem from 'expo-file-system';

const MODEL_URL =
  "https://huggingface.co/litert-community/gemma-3n-2b-it/resolve/main/model.litertlm";
const localPath = `${FileSystem.DocumentDirectoryPath}/gemma-3n.litertlm`;

async function downloadModel() {
  if (await FileSystem.exists(localPath)) return localPath;

  // Download logic here...
  return localPath;
}

Usage

Basic Generation

import { createLLM } from "react-native-litert-lm";

const llm = createLLM();

// Load a Gemma 3n model (async)
await llm.loadModel("/path/to/gemma-3n-e2b.litertlm", {
  backend: "gpu",
  temperature: 0.7,
  maxTokens: 512,
});

// Generate response (async)
const response = await llm.sendMessage("What is the capital of France?");
console.log(response);

// Clean up
llm.close();

Streaming Generation

llm.sendMessageAsync("Tell me a story", (token, done) => {
  process.stdout.write(token);
  if (done) console.log("\n--- Done ---");
});

Multimodal (Image/Audio)

// Image input (for vision models like Gemma 3n)
// ⚠️ Ensure model is loaded with { maxTokens: 1024+ }
const response = await llm.sendMessageWithImage(
  "What's in this image?",
  "/path/to/image.jpg",
);

// Audio input (for audio models)
const transcription = await llm.sendMessageWithAudio(
  "Transcribe this audio",
  "/path/to/audio.wav",
);

Check Performance

const stats = llm.getStats();
console.log(`Generated ${stats.completionTokens} tokens`);
console.log(`Speed: ${stats.tokensPerSecond.toFixed(1)} tokens/sec`);

Supported Models

Download .litertlm models from HuggingFace:

| Model | Size | Min Device RAM | Use Case | | ------------- | ------ | -------------- | ------------------------- | | Gemma 3n E2B | ~3GB | 4GB+ | Efficient, fast responses | | Gemma 3n E4B | ~4GB | 8GB+ | Higher quality | | Gemma 3 1B | ~1GB | 4GB+ | Smallest, fastest | | Phi-4 Mini | ~2GB | 4GB+ | Microsoft's small LLM | | Qwen 2.5 1.5B | ~1.5GB | 4GB+ | Multilingual |

API Reference

`createLLM(): LiteRTLM`

Creates a new LLM inference engine instance.

`loadModel(path, config?): Promise<void>`

path: string - Absolute path to .litertlm file
config.backend - 'cpu' | 'gpu' | 'npu' (default: 'gpu')
config.temperature - Sampling temperature (default: 0.7)
config.topK - Top-K sampling (default: 40)
config.maxTokens - Max generation length (default: 1024)

Note: Vision encoder is always set to GPU (required by Gemma 3n). Audio encoder is always set to CPU (optimal for audio).

Backend Options

| Backend | Description | Speed | Compatibility | | ------- | ----------------- | ------- | ------------------------------------------ | | 'cpu' | CPU inference | Slowest | Always available with less RAM requirement | | 'gpu' | GPU acceleration | Fast | Recommended default | | 'npu' | NPU/Neural Engine | Fastest | Requires supported hardware |

⚠️ NPU Note: NPU acceleration requires compatible hardware (Qualcomm Hexagon, MediaTek APU, etc.). If unavailable, LiteRT-LM automatically falls back to GPU.

`sendMessage(message): Promise<string>`

Blocking generation (executed on background thread). Returns complete response.

`sendMessageAsync(message, callback)`

Streaming generation. Callback receives (token, isDone).

`sendMessageWithImage(message, imagePath): Promise<string>`

Send a message with an image attachment (for vision models).

`sendMessageWithAudio(message, audioPath): Promise<string>`

Send a message with an audio attachment (for audio models).

`getHistory(): Message[]`

Get conversation history.

`resetConversation()`

Clear context and start fresh.

`close()`

Release all native resources.

`getRecommendedBackend(): Backend`

Returns the recommended backend for the current platform (usually 'gpu').

`checkBackendSupport(backend): string | undefined`

Returns a warning message if the specified backend may have issues on the current platform, or undefined if OK.

import { checkBackendSupport } from "react-native-litert-lm";

const warning = checkBackendSupport("npu");
if (warning) {
  console.warn(warning);
}

Requirements

React Native 0.76+
react-native-nitro-modules 0.33.2+
Android API 26+ (ARM64 only)
LiteRT-LM Android SDK: 0.9.0-alpha01 (bundled automatically)
iOS 15.0+ (coming soon)

Platform Support

| Platform | Status | Architecture | | -------- | -------- | ------------ | | Android | ✅ Ready | arm64-v8a | | iOS | 🚧 Stub | - |

Architecture

This library uses a split implementation strategy to maximize performance and compatibility:

Android: Uses Kotlin (HybridLiteRTLM.kt) to interface directly with the litertlm-android AAR.
iOS: Uses C++ (HybridLiteRTLM.cpp) which will interface with the LiteRT-LM C++ headers (once released).

Note for Contributors: Changes made to the C++ implementation (cpp/) do not affect Android. You must apply feature changes to both the Kotlin and C++ implementations.

License

The code in this repository is licensed under the MIT License.

⚠️ Important AI Model Disclaimer

This library acts as an execution engine for On-Device Large Language Models (LLMs). The AI models themselves are not distributed with this package and are not covered by the MIT license.

By downloading and running these models within your app, you agree to comply with their respective licenses and acceptable use policies:

Gemma (Google): Gemma Terms of Use
Llama 3 (Meta): Llama 3.2 Community License
Qwen (Alibaba): Apache 2.0 License
Phi (Microsoft): MIT License

The author of react-native-litert-lm takes no responsibility for the outputs generated by these models or the applications built using them.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

react-native-litert-lm

Features

Status

Installation

Expo

Bare React Native

Model Management

Downloading Models

Usage

Basic Generation

Streaming Generation

Multimodal (Image/Audio)

Check Performance

Supported Models

API Reference

createLLM(): LiteRTLM

loadModel(path, config?): Promise<void>

Backend Options

sendMessage(message): Promise<string>

sendMessageAsync(message, callback)

sendMessageWithImage(message, imagePath): Promise<string>

sendMessageWithAudio(message, audioPath): Promise<string>

getHistory(): Message[]

resetConversation()

close()

getRecommendedBackend(): Backend

checkBackendSupport(backend): string | undefined

Requirements

Platform Support

Architecture

License

⚠️ Important AI Model Disclaimer

`createLLM(): LiteRTLM`

`loadModel(path, config?): Promise<void>`

`sendMessage(message): Promise<string>`

`sendMessageAsync(message, callback)`

`sendMessageWithImage(message, imagePath): Promise<string>`

`sendMessageWithAudio(message, audioPath): Promise<string>`

`getHistory(): Message[]`

`resetConversation()`

`close()`

`getRecommendedBackend(): Backend`

`checkBackendSupport(backend): string | undefined`