apple-local-llm

v1.0.0

Published

6 months ago

Call Apple's on-device Foundation Models — no servers, no setup.

Downloads

0High
0Medium
0Low

parkerduff

apple llm foundation-models local on-device macos apple-intelligence

apple-local-llm

Call Apple's on-device Foundation Models from JavaScript — no servers, no setup.

Works with Node.js, Electron, and VS Code extensions.

Requirements

macOS 26+ (Tahoe)
Apple Silicon (M Series)
Apple Intelligence enabled in System Settings

Installation

npm install apple-local-llm

Quick Start

Simple API

import { createClient } from "apple-local-llm";

const client = createClient();

// Check compatibility first
const compat = await client.compatibility.check();
if (!compat.compatible) {
  console.log("Not available:", compat.reasonCode);
  // Handle fallback to cloud API
}

// Generate a response
const result = await client.responses.create({
  input: "What is the capital of France?",
});

if (result.ok) {
  console.log(result.text); // "The capital of France is Paris."
}

Streaming

for await (const chunk of client.stream({ input: "Count from 1 to 5." })) {
  if ("delta" in chunk) {
    process.stdout.write(chunk.delta);
  }
}

API Reference

`createClient(options?)`

Creates a new client instance.

const client = createClient({
  model: "default",               // Optional: model identifier (currently only "default")
  onLog: (msg) => console.log(msg), // Optional: debug logging
  idleTimeoutMs: 5 * 60 * 1000,     // Optional: helper idle timeout (default: 5 min)
});

Defaults:

Helper auto-shuts down after 5 minutes of inactivity
Helper auto-restarts up to 3 times on crash (with exponential backoff)
Request timeout: 60 seconds (configurable via timeoutMs)

You can also import and instantiate the class directly:

import { AppleLocalLLMClient } from "apple-local-llm";
const client = new AppleLocalLLMClient(options);

`client.compatibility.check()`

Check if the local model is available. Always call this before making requests.

const result = await client.compatibility.check();
// { compatible: true }
// or { compatible: false, reasonCode: "AI_DISABLED" }

Reason codes: | Code | Description | |------|-------------| | NOT_DARWIN | Not running on macOS | | UNSUPPORTED_HARDWARE | Not Apple Silicon | | AI_DISABLED | Apple Intelligence not enabled | | MODEL_NOT_READY | Model still downloading | | SPAWN_FAILED | Helper binary failed to start | | HELPER_NOT_FOUND | Helper binary not found | | HELPER_UNHEALTHY | Helper process not responding correctly | | PROTOCOL_MISMATCH | Helper version incompatible with client |

`client.capabilities.get()`

Get detailed model capabilities (calls the helper).

const caps = await client.capabilities.get();
// { available: true, model: "apple-on-device" }
// or { available: false, reasonCode: "AI_DISABLED" }

`client.responses.create(params)`

Generate a response.

const result = await client.responses.create({
  input: "Your prompt here",
  model: "default",         // Optional: model identifier
  max_output_tokens: 500,   // Optional: limit response tokens
  stream: false,            // Optional
  signal: abortController.signal, // Optional: AbortSignal
  timeoutMs: 60000,         // Optional: request timeout (ms)
  response_format: {        // Optional: structured JSON output
    type: "json_schema",
    json_schema: {
      name: "Result",
      schema: { type: "object", properties: { ... } }
    }
  }
});

Structured Output Example:

const result = await client.responses.create({
  input: "List 3 colors",
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "Colors",
      schema: {
        type: "object",
        properties: {
          colors: { type: "array", items: { type: "string" } }
        }
      }
    }
  }
});
const data = JSON.parse(result.text); // { colors: ["red", "blue", "green"] }

response_format is not supported with streaming.

Returns ResponseResult on success, or an error object:

// Success:
{ ok: true, text: "...", request_id: "..." }
// Error:
{ ok: false, error: { code: "...", detail: "..." } }

Note: The return type is a discriminated union, not the exported ResponseResult interface.

Error codes: | Code | Description | |------|-------------| | UNAVAILABLE | Model not available (see reason codes above) | | TIMEOUT | Request timed out (default: 60s) | | CANCELLED | Request was cancelled via AbortSignal | | RATE_LIMITED | System rate limit exceeded | | GUARDRAIL | Content violated Apple's safety guidelines | | INTERNAL | Unexpected error |

`client.stream(params)`

Async generator for streaming responses.

for await (const chunk of client.stream({ input: "..." })) {
  if ("delta" in chunk) {
    // Partial content
    console.log(chunk.delta);
  } else if ("done" in chunk) {
    // Final complete text
    console.log(chunk.text);
  }
}

`client.responses.cancel(requestId)`

Cancel an in-progress request.

const result = await client.responses.cancel("req_123");
// { ok: true } or { ok: false, error: { code: "NOT_RUNNING", detail: "..." } }

`client.shutdown()`

Gracefully shut down the helper process.

await client.shutdown();

TypeScript Types

All types are exported:

import type {
  ClientOptions,
  ReasonCode,
  CompatibilityResult,
  CapabilitiesResult,
  ResponsesCreateParams,
  ResponseResult,
  JSONSchema,
  ResponseFormat,
} from "apple-local-llm";

CLI Usage

The fm-proxy binary can also be used directly from the command line:

# Simple prompt
fm-proxy "What is the capital of France?"

# Streaming output
fm-proxy --stream "Tell me a story"
fm-proxy -s "Tell me a story"

# Limit output tokens
fm-proxy --max-tokens=50 "Count to 100"

# Start HTTP server
fm-proxy --serve
fm-proxy --serve --port=3000

# Other options
fm-proxy --help      # Show usage (or -h)
fm-proxy --version   # Show version (or -v)
fm-proxy --stdio     # stdio mode (used internally by npm package)

HTTP Server Mode

Run fm-proxy --serve to start a local HTTP server:

fm-proxy --serve --port=8080

Endpoints:

| Endpoint | Method | Description | |----------|--------|-------------| | /health | GET | Health check and availability status | | /generate | POST | Text generation (supports streaming) |

Options:

| Option | Description | |--------|-------------| | --port=<PORT> | Set server port (default: 8080) | | --auth-token=<TOKEN> | Require Bearer token for /generate |

You can also set AUTH_TOKEN environment variable instead of --auth-token.

CORS: All endpoints support CORS with Access-Control-Allow-Origin: *.

Examples:

# Health check
curl http://127.0.0.1:8080/health
# Response: {"status":"ok","model":"apple-on-device","available":true}

# Simple generation
curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "What is 2+2?"}'
# Response: {"text":"2+2 equals 4."}

# With max_output_tokens
curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "Count to 100", "max_output_tokens": 50}'

# With structured output (response_format)
curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "List 3 colors", "response_format": {"type": "json_schema", "json_schema": {"name": "Colors", "schema": {"type": "object", "properties": {"colors": {"type": "array", "items": {"type": "string"}}}}}}}'

# With authentication
curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"input": "Hello"}'

Streaming (SSE)

Add "stream": true to get Server-Sent Events with OpenAI-compatible chunks:

curl -N -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "Write a haiku", "stream": true}'

Response:

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"..."}}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]

How It Works

This package bundles a small native helper (fm-proxy) that communicates with Apple's Foundation Models framework over stdio. The helper is spawned on first request and stays alive to keep the model warm.

No localhost server — npm package uses stdio, not HTTP
No user setup — just npm install
Fails gracefully — check compatibility.check() and fall back to cloud

Runtime Support

JS API (createClient()): | Environment | Supported | |-------------|-----------| | Node.js | ✅ | | Electron (main process) | ✅ | | VS Code extensions | ✅ | | Electron (renderer) | ❌ No child_process | | Browser | ❌ |

HTTP Server (fm-proxy --serve): | Environment | Supported | |-------------|-----------| | Any HTTP client | ✅ | | Browser (fetch) | ✅ | | Electron (renderer) | ✅ |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

apple-local-llm

Requirements

Installation

Quick Start

Simple API

Streaming

API Reference

createClient(options?)

client.compatibility.check()

client.capabilities.get()

client.responses.create(params)

client.stream(params)

client.responses.cancel(requestId)

client.shutdown()

TypeScript Types

CLI Usage

HTTP Server Mode

Streaming (SSE)

How It Works

Runtime Support

License

`createClient(options?)`

`client.compatibility.check()`

`client.capabilities.get()`

`client.responses.create(params)`

`client.stream(params)`

`client.responses.cancel(requestId)`

`client.shutdown()`