npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

expo-fast-llm

v0.1.0

Published

Run **Gemma** and other LLMs **fully on-device** in your Expo / React Native app — no API keys, no cloud, no per-token costs, complete user privacy.

Readme

expo-litert-lm

Run Gemma and other LLMs fully on-device in your Expo / React Native app — no API keys, no cloud, no per-token costs, complete user privacy.

Built on Google's LiteRT-LM runtime with automatic backend selection (CPU / GPU / NPU), streaming, multimodal support, cancellation, and a typed React hook.

Status: Android only. iOS support is on the roadmap.


✨ Highlights

  • Dead-simple chat API: one line to ask a question.
  • Streaming: AsyncIterable<string> — no manual DeviceEventEmitter wiring.
  • React hook: useLiteRT() gives you reactive engine state.
  • Smart backend selection: automatic fallback from GPU → NPU → CPU, with device-aware preferences for known-buggy hardware (Pixel Tensor).
  • Multimodal: pass an image alongside your prompt for vision models.
  • Persistence: save the user's model choice so the app reloads instantly on next launch.
  • Cancellable: stop a long generation at any time.
  • Fully typed: TypeScript everywhere.

🚀 Installation

npx expo install expo-litert-lm

Add the plugin to your app.json / app.config.js:

{
  "expo": {
    "plugins": ["expo-litert-lm"]
  }
}

Then prebuild and run a dev client (this module ships native Android code):

npx expo prebuild --clean
npx expo run:android

📦 Getting a model

LiteRT-LM expects a .task or .tflite file. The most common choice is a Gemma model:

  1. Download from LiteRT Community on Hugging Face (e.g. gemma-3n-E2B-it-int4.task).
  2. Place it on the device — typically via expo-file-system by downloading to FileSystem.documentDirectory.
  3. Pass the resulting absolute path to LiteRTLM.initialize({ modelPath }).

The plugin handles file:// URI prefixes automatically.


💻 Quick start — the simplest possible example

import { useEffect, useState } from 'react';
import { Button, Text, View } from 'react-native';
import { useLiteRT } from 'expo-litert-lm';

export default function App() {
  const { isReady, isInitializing, initialize, generate } = useLiteRT();
  const [reply, setReply] = useState('');

  useEffect(() => {
    initialize({ modelPath: '/sdcard/Download/gemma.task' });
  }, []);

  return (
    <View style={{ padding: 24, marginTop: 60 }}>
      <Text>Status: {isReady ? '✅ ready' : isInitializing ? '⏳ loading' : '⌛'}</Text>
      <Button
        title="Ask"
        disabled={!isReady}
        onPress={async () => setReply(await generate('Why is the sky blue?'))}
      />
      <Text>{reply}</Text>
    </View>
  );
}

🌊 Streaming

generateStream and chatStream return an AsyncIterable<string>. Just use for await:

const { generateStream } = useLiteRT();

const askStreaming = async () => {
  setReply('');
  for await (const chunk of generateStream('Write a haiku about the moon')) {
    setReply((r) => r + chunk);
  }
};

Need to stop mid-generation? Either break from the loop or call .cancel() on the returned iterable:

const stream = generateStream('Long answer please');
setTimeout(() => stream.cancel(), 1000);
for await (const chunk of stream) { /* ... */ }

💬 Multi-turn chat

const { chat } = useLiteRT();

const reply = await chat(
  [
    { role: 'user', content: 'My name is Alice.' },
    { role: 'model', content: 'Nice to meet you, Alice!' },
    { role: 'user', content: 'What did I just tell you my name was?' },
  ],
  { systemPrompt: 'You are concise.' },
);

The last message must have role user. Prior turns are folded into the system prompt as plain text, so you get history-aware answers without burning tokens regenerating the assistant's prior responses.


🧵 Persistent conversations (recommended for chat UIs)

chat() is stateless: every call spins up a fresh native conversation and folds the prior turns into the system prompt. That works for short interactions, but for a real chat UI you want the engine's native chat template tracking state across turns. Use createConversation to get a long-lived handle:

const conversation = await LiteRTLM.createConversation({
  systemPrompt: 'You are a helpful, concise assistant.',
});

const a = await conversation.sendMessage('My name is Alice.');
const b = await conversation.sendMessage('What did I just tell you?');

for await (const chunk of conversation.sendMessageStream('Tell me a joke')) {
  process.stdout.write(chunk);
}

await conversation.close(); // or just let GC release it

The native Conversation is wired through an Expo SharedObject: it stays in memory across calls and is released when JS drops the reference (or when you explicitly close()). Multimodal works the same way:

await conversation.sendMessage('What is in this image?', {
  imageFilePath: '/sdcard/photo.jpg',
});

🖼 Vision (multimodal)

const reply = await generate('What is in this image?', {
  imageFilePath: '/sdcard/Download/photo.jpg',
});

Make sure the model you loaded is multimodal (e.g. Gemma 3n). Configure the image pipeline at init time:

await LiteRTLM.initialize({
  modelPath: '/sdcard/gemma-3n.task',
  visionBackend: 'GPU',
  maxNumImages: 1,
});

🧠 Reactive engine status

useLiteRT() exposes a live status object. Native events drive re-renders; no polling.

const { status } = useLiteRT();
// status.state: 'NotConfigured' | 'Initializing' | 'Ready' | 'Closing' | 'Error'
// status.actualBackend: which backend actually loaded (CPU/GPU/NPU)
// status.usedFallback: true if we fell back from the requested backend
// status.errorMessage: populated when state === 'Error'

Or outside React:

const sub = LiteRTLM.onStatusChange((s) => console.log(s.state));
// later:
sub.remove();

💾 Persisting the user's model choice

import { LiteRTLM } from 'expo-litert-lm';

// Save once after the user picks a model
await LiteRTLM.saveConfig({
  modelPath: '/sdcard/gemma.task',
  displayName: 'Gemma 2B int4',
  backend: 'GPU',
  temperature: 0.7,
});

// On next launch
const status = await LiteRTLM.initializeFromStoredConfig().catch(() => null);
if (!status) {
  // Prompt the user to choose a model
}

📚 Full API reference

Lifecycle

| Method | Description | |---|---| | initialize(config: ModelConfig): Promise<EngineStatus> | Loads a model. Auto-falls-back across backends. | | initializeFromStoredConfig(): Promise<EngineStatus> | Loads from previously saved config. | | getStatus(): Promise<EngineStatus> | One-shot status read. | | getAvailableBackends(): Promise<Backend[]> | Returns the device-appropriate probe order. | | onStatusChange(fn): StatusSubscription | Subscribe to status changes. | | close(): Promise<boolean> | Release native resources. Idempotent. |

Inference

| Method | Description | |---|---| | generate(prompt, opts?) | Single-turn. Returns full response. | | generateStream(prompt, opts?) | Single-turn. Returns AsyncIterable<string> with .cancel(). | | chat(messages, opts?) | Multi-turn. Returns full response. | | chatStream(messages, opts?) | Multi-turn streaming. | | createConversation(opts?) | Persistent multi-turn conversation backed by a native SharedObject. Returns a Conversation with sendMessage / sendMessageStream / close. | | cancel(): Promise<boolean> | Cancels the current in-flight inference. | | benchmark(prompt?): Promise<BenchmarkResult> | Latency / approx-tps measurement. |

Persistence

| Method | Description | |---|---| | saveConfig(config) | Persist model config to SharedPreferences. | | loadConfig() | Returns the persisted config, or null. | | clearConfig() | Erase persisted config. |

Types

See src/types.ts:

type Backend = 'CPU' | 'GPU' | 'NPU';

interface ModelConfig {
  modelPath: string;        // required
  displayName?: string;
  backend?: Backend;        // default 'GPU'
  visionBackend?: Backend;
  audioBackend?: Backend;
  maxNumImages?: number;    // default 1
  maxTokens?: number;       // default 1024
  topK?: number;            // default 40
  topP?: number;            // default 0.95
  temperature?: number;     // default 0.7
  strictBackend?: boolean;  // default false; if true, no fallback
}

interface EngineStatus {
  state: 'NotConfigured' | 'Initializing' | 'Ready' | 'Closing' | 'Error';
  modelConfig?: ModelConfig;
  errorMessage?: string;
  initDurationMs?: number;
  actualBackend?: Backend;
  usedFallback: boolean;
}

🛠 Troubleshooting

| Symptom | Cause | Fix | |---|---|---| | MODEL_NOT_FOUND | Wrong path / missing file | Verify with FileSystem.getInfoAsync(path) | | "All backend candidates failed" | Model file corrupted or incompatible | Re-download the model | | OOM / crash on first inference | Model larger than device RAM permits | Use a smaller int4-quantized model | | GPU crashes on Pixel 8 / 9 | Known OpenCL / Tensor SoC issues | Plugin auto-falls back to CPU; no action needed | | App freezes during streaming | Long generation; user wants to stop | Call stream.cancel() or LiteRTLM.cancel() | | Engine is not ready on chat() | Called before initialize() resolved | Await initialization or check status.state === 'Ready' | | APK too large | Model bundled in assets | Download at runtime instead of bundling |

Building

npm install
npm run build      # compile TS
npm test           # run jest
npm run lint

🏗 Architecture

The Android side is layered for testability and clarity:

┌──────────────────────────────────────────────────────────────┐
│  LiteRtLMModule  (React Native bridge / thin shim)           │
├──────────────────────────────────────────────────────────────┤
│  LiteRtLlmProvider           LiteRtConversationContext       │
│  (stateless chat / stream)   (SharedObject, multi-turn)      │
├──────────────────────────────────────────────────────────────┤
│  LiteRtEngineManager  (lifecycle, fallback, mutex, init cache)│
├──────────────────────────────────────────────────────────────┤
│  LiteRtConfigStore   (SharedPreferences)                     │
├──────────────────────────────────────────────────────────────┤
│  Google LiteRT-LM SDK                                        │
└──────────────────────────────────────────────────────────────┘

All public Kotlin classes carry KDoc — open them in Android Studio for inline reference.


📝 License

MIT