npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@ondeinference/react-native

v1.1.0

Published

On-device LLM inference for React Native. Run Qwen 2.5 models locally with Metal on iOS, CPU on Android. No cloud, no API key.

Readme


Run Qwen 2.5 models directly on the device. No server, no API key, and no user data leaving the phone. For an efficient on-device inference engine for React Native, the SDK page is the quickest place to check install details and platform notes. If you want to verify model downloads or GGUF output before you ship the app build, use Onde CLI.

The model downloads from Hugging Face the first time you load it, then runs locally after that. The 1.5B model is about 941 MB. On iPhone, Metal makes it feel surprisingly fast. On Android, it runs on CPU, so it is slower, but it still works well enough for local chat.

Installation

npx expo install @ondeinference/react-native

Quick start

import { OndeChatEngine, userMessage } from "@ondeinference/react-native";

// Picks the default model for the device:
//   iOS     → Qwen 2.5 1.5B (~941 MB, Metal)
//   Android → Qwen 2.5 1.5B (~941 MB, CPU)
const seconds = await OndeChatEngine.loadDefaultModel(
  "You are a helpful assistant."
);

const reply = await OndeChatEngine.sendMessage("Hello!");
console.log(reply.text);

// One-shot — doesn't touch conversation history
const expanded = await OndeChatEngine.generate(
  [userMessage("Expand: a cat in space")],
  { temperature: 0.0 }
);

await OndeChatEngine.unloadModel();

Platforms

| Platform | Backend | Default model | |----------|---------|---------------| | iOS | Metal | Qwen 2.5 1.5B (~941 MB) | | Android | CPU | Qwen 2.5 1.5B (~941 MB) |

API

OndeChatEngine

| Method | Returns | What it does | |--------|---------|--------------| | loadDefaultModel(systemPrompt?, sampling?) | Promise<number> | Load the platform default. Returns load time in seconds. | | loadModel(config, systemPrompt?, sampling?) | Promise<number> | Load a specific GGUF model. | | unloadModel() | Promise<string \| null> | Drop the model, free memory. Returns the model name. | | isLoaded() | boolean | Is anything loaded right now? | | info() | Promise<EngineInfo> | Status, model name, memory, history length. | | sendMessage(message) | Promise<InferenceResult> | Chat turn. Appends to history automatically. | | generate(messages, sampling?) | Promise<InferenceResult> | One-shot. History stays untouched. | | setSystemPrompt(prompt) | void | Replace the system prompt. | | clearSystemPrompt() | void | Remove it. | | setSampling(config) | void | Swap sampling params. | | history() | Promise<ChatMessage[]> | Full conversation so far. | | clearHistory() | number | Wipe it. Returns how many messages were removed. | | pushHistory(message) | void | Inject a message without running inference. |

Helpers

import {
  defaultModelConfig,     // platform-aware (1.5B on mobile, 3B on desktop)
  qwen251_5bConfig,       // force 1.5B (~941 MB)
  qwen253bConfig,         // force 3B (~1.93 GB)
  defaultSamplingConfig,  // temp=0.7, top_p=0.95, max_tokens=512
  deterministicSamplingConfig,  // temp=0.0
  mobileSamplingConfig,   // temp=0.7, max_tokens=128
  systemMessage,
  userMessage,
  assistantMessage,
} from "@ondeinference/react-native";

Example app

There is a working chat app in example/:

cd example
npm install
npx expo run:ios

It is a single-file example, about 290 lines, and it covers loading, chat, status, history management, and error handling.

Building from source

You need Rust and the right cross-compilation targets.

# iOS
rustup target add aarch64-apple-ios aarch64-apple-ios-sim
./scripts/build-rust.sh ios

# Android (set ANDROID_NDK_HOME first)
rustup target add aarch64-linux-android armv7-linux-androideabi x86_64-linux-android i686-linux-android
./scripts/build-rust.sh android

The script builds the Rust FFI bridge in rust/, then copies the static library for iOS or the shared libraries for Android into the right places under ios/ and android/.

How it fits together

TypeScript  →  Expo Module (Swift / Kotlin)  →  Rust C FFI  →  onde crate  →  mistral.rs
                @_silgen_name (iOS)                               ↓
                JNI external (Android)                     Metal / CPU

The native module talks to Rust through extern "C" functions. Complex types cross the boundary as JSON strings, and the TypeScript layer handles camelCase ↔ snake_case conversion. A global tokio::Runtime, created once, runs the async inference work.

License

Onde is dual-licensed under MIT and Apache 2.0. You can use either one.

© 2026 Splitfire AB


Copyright

© 2026 Onde Inference (Splitfire AB).