expo-apple-llm
v0.2.0
Published
Expo module for Apple's on-device LLM (Foundation Models, iOS 26+). One-shot generation and multi-turn LanguageModelSession, no network or API key.
Maintainers
Readme
expo-apple-llm
Expo module for Apple's on-device LLM. Wraps the Foundation Models framework (iOS 26+) — one-shot generate() and multi-turn LanguageModelSession, no network, no API key.
Requirements
- iOS 26.0 or later (on older iOS,
isAvailable()returnsfalseand calls throw) - Xcode 26 SDK (Swift 6 compiler) for building
- Expo SDK 54+
- Apple Intelligence–capable device with the feature enabled in Settings
The module uses #if compiler(>=6.0) so older Swift compilers still build (APIs become no-ops at runtime). isAvailable() reflects SystemLanguageModel.default.isAvailable — it is false on ineligible hardware, when Apple Intelligence is disabled, or while the model is still downloading.
Install
npm install expo-apple-llm
# or
bun add expo-apple-llmThen rebuild the native iOS app:
npx expo prebuild
npx expo run:iosUsage
Availability check
import { isAvailable, unavailabilityReason } from "expo-apple-llm";
if (!isAvailable()) {
switch (unavailabilityReason()) {
case "osTooOld": /* iOS < 26 */ break;
case "deviceNotEligible": /* hardware lacks Apple Intelligence */ break;
case "appleIntelligenceNotEnabled":/* ask user to enable in Settings */ break;
case "modelNotReady": /* still downloading — retry later */ break;
case "nativeModuleNotLoaded": /* wrong platform or not rebuilt */ break;
case "unknown": /* unmapped new case */ break;
}
}Always wrap calls in Platform.OS === "ios" — this module is iOS-only.
One-shot generation
import { generate } from "expo-apple-llm";
const answer = await generate("Summarize quantum entanglement in one sentence.");With options:
const answer = await generate(
"Translate to Japanese: Hello, world!",
{
instructions: "You are a professional translator.",
temperature: 0.2,
maximumResponseTokens: 256,
}
);Multi-turn session
LanguageModelSession preserves conversation history on the native side:
import { createSession } from "expo-apple-llm";
const session = await createSession({
instructions: "You are a helpful cooking assistant.",
});
const first = await session.respond("What can I make with eggs and rice?");
const second = await session.respond("Make it spicier."); // remembers context
session.release(); // free native resources when doneAPI
function isAvailable(): boolean;
function unavailabilityReason(): UnavailabilityReason | null;
function generate(prompt: string, options?: GenerateOptions): Promise<string>;
function createSession(options?: { instructions?: string }): Promise<LanguageModelSession>;
interface GenerateOptions {
instructions?: string; // system prompt (one-shot only)
temperature?: number; // 0.0–2.0
maximumResponseTokens?: number;
}
type UnavailabilityReason =
| "deviceNotEligible"
| "appleIntelligenceNotEnabled"
| "modelNotReady"
| "osTooOld"
| "nativeModuleNotLoaded"
| "unknown";
class LanguageModelSession {
respond(prompt: string, options?: GenerateOptions): Promise<string>;
release(): void;
}Platform notes
- Android / Web: not supported.
platforms: ["ios"]inexpo-module.config.json. Guard withPlatform.OS. - Old Architecture: works (
newArchEnabled: falseis fine). - Simulator: Foundation Models runs in the iOS 26 simulator on Apple Silicon Macs.
Concurrency
The module does not queue calls internally. Compute is serialized by the Neural Engine regardless, so parallel calls don't speed anything up — but each live session holds its own KV cache, so fan-out from a tight loop can pressure memory on lower-end devices (especially with long prompts or multi-turn history). If you call generate() or session.respond() in a fan-out, rate-limit it yourself (e.g. cap to ~3 concurrent).
License
MIT © nabettu
