dust-onnx-capacitor

v0.1.11

Published

3 months ago

Capacitor plugin for on-device ONNX Runtime model loading over .onnx files.

0High
0Medium
0Low

choreruiz

capacitor plugin onnx onnxruntime inference ml

Demo

Run the full demo from a clean clone:

git clone https://github.com/rogelioRuiz/dust-onnx-capacitor
cd dust-onnx-capacitor && npm install
npm run test:ios        # 22 tests (8 serve + 14 ONNX) — builds & installs app
npm run test:yolo-ios   # YOLO inference — requires test:ios run first

Android: replace test:ios / test:yolo-ios with test:android / test:yolo-android

Add --verbose for full build output (xcodebuild, gradlew, cap sync):

npm run test:ios:verbose
npm run test:android:verbose

capacitor-onnx

Capacitor plugin for on-device ONNX Runtime model loading, image preprocessing, and tensor inference over .onnx files.

Stage O1+O2+O3+O4+O5+O6 — model lifecycle management (load, unload, list, metadata), validated tensor I/O and single inference, JPEG/PNG image preprocessing to normalized NCHW tensors, hardware-accelerated execution providers (CoreML on iOS, NNAPI/XNNPACK on Android) with automatic CPU fallback, DustCore registry integration with ref-counted session lifecycle, priority-based eviction, and OS memory pressure handling, and multi-step pipeline inference with output-to-input chaining.

| | Android | iOS | Web | |---|---|---|---| | Runtime | ONNX Runtime 1.20.0 | onnxruntime-objc ~1.20 | Stub (throws) | | Min version | API 26 | iOS 16.0 | — | | Architecture | arm64-v8a only | arm64 + x86_64 sim | — |

Install

npm install dust-onnx-capacitor dust-core-capacitor
npx cap sync

dust-core-capacitor is a required peer dependency — it provides the shared ML contract types (DustModelServer, DustModelSession, DustCoreError, etc.) that capacitor-onnx implements.

iOS (SPM)

The iOS build resolves onnxruntime-swift-package-manager via Swift Package Manager automatically on first cap sync. No CocoaPods step required.

Android

Add the Kotlin gradle plugin to android/build.gradle:

classpath 'org.jetbrains.kotlin:kotlin-gradle-plugin:2.1.20'

Ensure minSdkVersion is at least 26 in android/variables.gradle.

API

import { ONNX } from 'capacitor-onnx';

loadModel

const result = await ONNX.loadModel({
  descriptor: {
    id: 'my-model',
    format: 'onnx',
    url: '/absolute/path/to/model.onnx',
  },
  config: {                          // optional
    accelerator: 'auto',             // 'auto' | 'cpu' | 'nnapi' | 'coreml' | 'xnnpack' | 'metal'
    threads: 4,                      // or { interOp: 2, intraOp: 4 }
    graphOptLevel: 'all',            // 'disable' | 'basic' | 'extended' | 'all'
    memoryPattern: true,
  },
  priority: 0,                      // 0 = interactive, 1 = background
});

// result.modelId   — string
// result.metadata  — { inputs: TensorMetadata[], outputs: TensorMetadata[], accelerator, opset? }

unloadModel

await ONNX.unloadModel({ modelId: 'my-model' });

listLoadedModels

const { modelIds } = await ONNX.listLoadedModels();
// modelIds: string[]

getModelMetadata

const metadata = await ONNX.getModelMetadata({ modelId: 'my-model' });
// metadata.inputs  — [{ name, dtype, shape }]
// metadata.outputs — [{ name, dtype, shape }]

runInference

const result = await ONNX.runInference({
  modelId: 'my-model',
  inputs: [
    { name: 'input_a', dtype: 'float32', shape: [1, 3], data: [1, 2, 3] },
    { name: 'input_b', dtype: 'float32', shape: [1, 3], data: [4, 5, 6] },
  ],
  outputNames: ['output'],         // optional — omit to return all outputs
});

// result.outputs — [{ name, dtype, shape, data }]

Input validation runs before inference:

Shape: rank and static dimensions must match model metadata (-1 dimensions are dynamic and accept any size)
Dtype: input dtype must match the model's expected dtype

runPipeline

const { results } = await ONNX.runPipeline({
  modelId: 'my-model',
  steps: [
    {
      inputs: [
        { name: 'input', shape: [1, 3], dtype: 'float32', data: [1, 2, 3] },
      ],
    },
    {
      inputs: [
        { name: 'input', data: 'previous_output' },           // chain from step 0 output named 'input'
      ],
    },
    {
      inputs: [
        { name: 'input', data: { fromStep: 0, outputName: 'output' } },  // explicit step reference
      ],
      outputNames: ['output'],
    },
  ],
});

// results — [{ outputs: [...] }, { outputs: [...] }, { outputs: [...] }]

runPipeline executes multiple sequential inference steps on the same session within a single bridge call. This eliminates bridge round-trip overhead for multi-step workflows (e.g. PaddleOCR detection → recognition).

Step input types:

Literal — data: number[] with shape and dtype — raw tensor data, same as runInference
'previous_output' — data: 'previous_output' — substitutes the output tensor of the same name from the immediately preceding step
Step reference — data: { fromStep, outputName } — substitutes a named output from any earlier step

Error behavior: if any step fails, the pipeline halts immediately. The error message includes the failing step index (e.g. "Pipeline step 2 failed: ...").

Memory management: intermediate step results are released as soon as no future step references them.

preprocessImage

const { tensor } = await ONNX.preprocessImage({
  data: base64Image,              // base64 JPEG/PNG payload, no data: prefix
  width: 224,
  height: 224,
  config: {
    resize: 'letterbox',          // 'stretch' | 'letterbox' | 'crop_center'
    normalization: 'imagenet',    // 'imagenet' | 'minus1_plus1' | 'zero_to_1' | 'none'
    // mean: [0.5, 0.5, 0.5],     // optional custom mean overrides normalization preset
    // std: [0.5, 0.5, 0.5],      // optional custom std overrides normalization preset
  },
});

// tensor — { name: 'image', dtype: 'float32', shape: [1, 3, 224, 224], data: [...] }

preprocessImage decodes JPEG/PNG bytes, resizes to the requested output dimensions, and returns a channel-first tensor ready to pass into runInference.

Resize modes:

stretch — scale directly to the target size
letterbox — preserve aspect ratio and pad with RGB (114, 114, 114)
crop_center — preserve aspect ratio, fill the target frame, and center-crop overflow

Normalization modes:

imagenet — (pixel / 255 - mean) / std using ImageNet RGB statistics
minus1_plus1 — pixel / 127.5 - 1
zero_to_1 — pixel / 255
none — raw 0...255 channel values

When config.mean and/or config.std are provided, the plugin applies ((pixel / 255) - mean) / std using those custom values instead of a preset normalization mode.

Accelerator selection

The config.accelerator field controls which ONNX Runtime execution provider (EP) is used:

| Value | Android | iOS | |---|---|---| | 'auto' | NNAPI | CoreML | | 'cpu' | CPU | CPU | | 'nnapi' | NNAPI | CPU (fallback) | | 'coreml' | CPU (fallback) | CoreML | | 'xnnpack' | XNNPACK | CPU (fallback) | | 'metal' | CPU (fallback) | CPU (fallback) |

Fallback behavior: If the requested EP fails to initialize (e.g. NNAPI unavailable on emulator, CoreML unsupported model op), the plugin automatically retries with CPU-only options. The metadata.accelerator field in the result reflects the EP that was actually used.

CoreML model cache (iOS): When CoreML is selected, compiled .mlmodel files are cached in Application Support/onnx-cache/{modelId}/. ORT handles cache invalidation internally based on the model graph hash — subsequent loads of the same model skip recompilation.

Error codes

| Code | When | |---|---| | inferenceFailed | File not found, corrupt model, ORT load/run failure | | formatUnsupported | descriptor.format is not 'onnx' | | modelNotFound | unloadModel / getModelMetadata / runInference with unknown ID | | invalidInput | Missing required fields | | shapeError | runInference input shape does not match model metadata | | dtypeError | runInference input dtype does not match model metadata | | preprocessError | preprocessImage failed to decode or transform the image |

Types

type TensorDtype = 'float16' | 'float32' | 'float64' | 'int8' | 'int16' | 'int32' | 'int64' | 'uint8' | 'bool' | 'string' | 'unknown';

interface TensorMetadata {
  name: string;
  dtype: TensorDtype;
  shape: number[];
}

interface TensorValue {
  name: string;
  data: number[];
  shape: number[];
  dtype?: TensorDtype;   // defaults to 'float32'
}

interface InferenceTensorValue {
  name: string;
  data: number[];
  shape: number[];
  dtype: TensorDtype;    // always present in outputs
}

type ResizeMode = 'stretch' | 'letterbox' | 'crop_center';

type NormalizationMode = 'imagenet' | 'minus1_plus1' | 'zero_to_1' | 'none';

interface PreprocessConfig {
  resize?: ResizeMode;
  normalization?: NormalizationMode;
  mean?: [number, number, number];
  std?: [number, number, number];
}

interface PreprocessResult {
  tensor: InferenceTensorValue;
}

interface ONNXModelMetadata {
  inputs: TensorMetadata[];
  outputs: TensorMetadata[];
  accelerator: string;
  opset?: number;
}

interface TensorReference {
  fromStep: number;
  outputName: string;
}

interface PipelineStepInput {
  name: string;
  shape?: number[];
  dtype?: TensorDtype;
  data: number[] | 'previous_output' | TensorReference;
}

interface PipelineStep {
  inputs: PipelineStepInput[];
  outputNames?: string[];
}

interface RunPipelineResult {
  results: RunInferenceResult[];
}

Architecture

┌──────────────────────────────────────────┐
│            TypeScript API                │
│  src/definitions.ts  src/plugin.ts       │
└─────────────┬────────────────────────────┘
              │ Capacitor bridge
   ┌──────────┴──────────┐
   ▼                     ▼
┌──────────────┐  ┌──────────────┐
│   Android    │  │     iOS      │
│  ONNXPlugin  │  │  ONNXPlugin  │
│      .kt     │  │    .swift    │
├──────────────┤  ├──────────────┤
│ONNXSession   │  │ONNXSession   │
│  Manager.kt  │  │  Manager     │
│              │  │    .swift    │
├──────────────┤  ├──────────────┤
│ Accelerator  │  │ Accelerator  │
│ Selector.kt  │  │ Selector     │
│ (NNAPI/      │  │    .swift    │
│  XNNPACK)    │  │ (CoreML)     │
├──────────────┤  ├──────────────┤
│ OrtSession   │  │ORTSession    │
│  Engine.kt   │  │  Engine      │
│  (ONNXEngine)│  │    .swift    │
├──────────────┤  ├──────────────┤
│ onnxruntime  │  │onnxruntime   │
│  -android    │  │   -objc      │
│   1.20.0     │  │   ~1.20      │
└──────────────┘  └──────────────┘
       │                 │
       └────────┬────────┘
                ▼
        dust-core-capacitor
    (shared ML contracts)

Both platforms use the same patterns:

Dedicated inference thread/queue — Android: HandlerThread, iOS: DispatchQueue
Thread-safe session cache — Android: ReentrantLock, iOS: NSLock
Reference counting — loading the same model ID twice increments the ref count instead of creating a duplicate session
ONNXEngine seam — ONNXSession delegates inference to an ONNXEngine protocol/interface; production uses OrtSessionEngine (real ORT), unit tests inject a MockONNXEngine
ImagePreprocessor seam — JPEG/PNG decode, resize, normalization, and NCHW packing live in a pure ImagePreprocessor on each platform with no Capacitor or ORT dependency
Pre-inference validation — shape rank/dimensions and dtype checked against model metadata before calling ORT
Pipeline execution — runPipeline executes sequential inference steps within a single bridge call, resolving previous_output and { fromStep, outputName } references between steps, with automatic release of intermediate tensors
AcceleratorSelector — pure function/struct that maps accelerator config to execution provider options; self-contained try/catch fallback to CPU on EP failure
DustCore registry — sessions are registered with DustCoreRegistry for cross-plugin discovery; loadModel(descriptor:priority:) flows through the shared DustModelServer protocol
Ref-counted session lifecycle — unloadModel decrements refCount and keeps the session cached; forceUnloadModel removes it entirely; evictUnderPressure removes zero-ref sessions by priority (.standard = background only, .critical = all)
OS memory pressure — iOS: UIApplication.didReceiveMemoryWarningNotification triggers .critical eviction; Android: ComponentCallbacks2.onTrimMemory(RUNNING_CRITICAL) and onLowMemory() trigger .critical eviction

Project structure

capacitor-onnx/
├── src/                          # TypeScript definitions + web stub
│   ├── definitions.ts            # Public API types
│   ├── plugin.ts                 # Plugin registration
│   └── index.ts                  # Exports
├── android/
│   ├── src/main/.../onnx/        # Kotlin plugin implementation
│   │   ├── ONNXPlugin.kt         # Capacitor bridge methods
│   │   ├── ONNXSessionManager.kt # Session cache + lifecycle
│   │   ├── ONNXSession.kt        # Session + validation + TensorData
│   │   ├── ONNXEngine.kt         # Engine interface
│   │   ├── OrtSessionEngine.kt   # Production ORT wrapper
│   │   ├── ImagePreprocessor.kt  # Pure image preprocessing
│   │   ├── AcceleratorSelector.kt # EP selection (NNAPI/XNNPACK/CPU)
│   │   ├── ONNXConfig.kt         # Runtime config
│   │   └── ONNXError.kt          # Error types
│   └── src/test/.../onnx/        # JUnit unit tests
│       ├── ONNXSessionManagerTest.kt  # 9 O1 lifecycle tests
│       ├── ONNXInferenceTest.kt       # 9 O2 inference tests
│       ├── ONNXPreprocessTest.kt      # 8 O3 preprocessing tests
│       ├── ONNXAcceleratorTest.kt     # 9 O4 accelerator tests
│       ├── ONNXRegistryTest.kt        # 9 O5 registry/session lifecycle tests
│       └── ONNXPipelineTest.kt       # 7 O6 pipeline tests
├── ios/
│   ├── Sources/ONNXPlugin/       # Swift plugin implementation
│   │   ├── ONNXPlugin.swift
│   │   ├── ONNXSessionManager.swift
│   │   ├── ONNXSession.swift     # Session + validation + protobuf parser
│   │   ├── ONNXEngine.swift      # Engine protocol
│   │   ├── ORTSessionEngine.swift # Production ORT wrapper
│   │   ├── ImagePreprocessor.swift # Pure image preprocessing
│   │   ├── AcceleratorSelector.swift # EP selection (CoreML/CPU) + cache
│   │   ├── ONNXConfig.swift
│   │   └── ONNXError.swift
│   └── Tests/ONNXPluginTests/    # XCTest unit tests + fixtures
│       ├── ONNXSessionManagerTests.swift  # 9 O1 lifecycle tests
│       ├── ONNXInferenceTests.swift       # 9 O2 inference tests
│       ├── ONNXPreprocessTests.swift      # 8 O3 preprocessing tests
│       ├── ONNXAcceleratorTests.swift     # 9 O4 accelerator tests
│       ├── ONNXRegistryTests.swift        # 9 O5 registry/session lifecycle tests
│       └── ONNXPipelineTests.swift       # 7 O6 pipeline tests
├── example/                      # E2E test app
│   ├── www/index.html            # Test runner UI (22 tests + YOLO demo)
│   ├── test-e2e-android.mjs      # Android E2E runner (22 tests)
│   ├── test-e2e-ios.mjs          # iOS E2E runner (22 tests)
│   ├── test-e2e-yolo-android.mjs # YOLO detection E2E (Android)
│   ├── test-e2e-yolo-ios.mjs     # YOLO detection E2E (iOS)
│   └── capacitor.config.json
├── test/
│   ├── fixtures/tiny-test.onnx   # Minimal Add model for E2E
│   └── generate-test-fixture.py  # Generates tiny-test.onnx
├── package.json
├── DustCapacitorOnnx.podspec
└── tsconfig.json

Testing

Test fixture

ios/Tests/ONNXPluginTests/Fixtures/tiny-test.onnx — a minimal ONNX model:

Op: Add(input_a, input_b) -> output
Shapes: [1, 3] float32 for all tensors
Opset: 13, IR version 7

Regenerate with:

pip install onnx
python scripts/generate-test-fixture.py

Unit tests (51 per platform)

All unit tests use mock engines or injected factories — no real ONNX Runtime required.

| ID | Test | What it verifies | |---|---|---| | O1-T1 | Load valid path | Session creation with factory | | O1-T2 | Metadata access | Input/output tensor names | | O1-T3 | Missing file | fileNotFound error | | O1-T4 | Corrupt file | loadFailed error | | O1-T5 | Wrong format | formatUnsupported rejection before load | | O1-T6 | Unload model | Cache cleared, listLoadedModels empty | | O1-T6b | Unload unknown ID | modelNotFound error | | O1-T7 | Load same ID twice | Ref count incremented, single session | | O1-T8 | Load two models | Both IDs appear in list | | O2-T1 | Float32 inference | Returns typed output tensor | | O2-T2 | Uint8 inference | Preserves non-float tensor dtype | | O2-T3 | Shape mismatch rank | Rejects with shapeError | | O2-T4 | Shape mismatch dim | Rejects with shapeError | | O2-T5 | Dynamic dimension | Accepts -1 metadata dims | | O2-T6 | Dtype mismatch | Rejects with dtypeError | | O2-T7 | Output filtering | Returns requested output subset | | O2-T8 | Inference after unload | Maps to modelNotFound | | O2-T9 | Engine failure | Maps to inferenceFailed | | O3-T1 | Red image + ImageNet | Produces expected normalized RGB planes | | O3-T2 | Letterbox resize | Preserves aspect ratio and centers content | | O3-T3 | Upscale resize | Handles smaller source images safely | | O3-T4 | minus1_plus1 | Maps white pixels to 1.0 | | O3-T5 | zero_to_1 | Maps black pixels to 0.0 | | O3-T6 | none | Preserves raw 0...255 channel values | | O3-T7 | Invalid image data | Rejects with preprocessError | | O3-T8 | Custom mean/std | Overrides preset normalization | | O4-T1 | Auto accelerator | Config reaches factory / selects platform EP | | O4-T2 | CPU accelerator | Metadata reflects cpu | | O4-T3 | Platform EP explicit | CoreML (iOS) / NNAPI (Android) propagated | | O4-T4 | Cached session reuse | Second load reuses session, not EP re-init | | O4-T5 | Resolved accelerator | Metadata uses EP actually selected | | O4-T6 | EP failure fallback | Falls back to CPU on EP init failure | | O4-T7 | CPU loads without retry | Single factory call, no fallback path | | O4-T8 | Both fail → LoadFailed | EP + CPU both fail → loadFailed error | | O4-T9 | Metadata via lookup | getModelMetadata returns resolved accelerator | | O5-T1 | Registry registration | Manager registered in DustCoreRegistry, resolvable | | O5-T2 | Load ready descriptor | Session created via descriptor, refCount=1 | | O5-T3 | Load notLoaded descriptor | Throws modelNotReady | | O5-T4 | Load unregistered ID | Throws modelNotFound | | O5-T5 | Unload keeps cached | refCount=0, session still in cache | | O5-T6 | Load twice reuses | Same instance, refCount=2 | | O5-T7 | Standard eviction | Background zero-ref removed, interactive kept | | O5-T8 | Critical eviction | All zero-ref sessions removed | | O5-T9 | allModelIds after evict | Only live session IDs returned | | O6-T1 | Two-step pipeline | Both results returned, shapes correct, callCount == 2 | | O6-T2 | Previous output chaining | Step 2 input substituted from step 0 output | | O6-T3 | Explicit fromStep chaining | StepReference routes correct tensor | | O6-T4 | Step 0 failure | Pipeline halts, error contains "step 0" | | O6-T5 | Step 1 failure | Pipeline halts, error contains "step 1" | | O6-T6 | Single-step equivalence | Pipeline result matches direct runInference | | O6-T7 | Pipeline on evicted session | modelEvicted thrown before any run() call |

# Android (from example/android/)
ANDROID_HOME=/path/to/sdk ./gradlew :capacitor-onnx:test

# iOS (from capacitor-onnx/, on macOS with simulator)
xcodebuild test -scheme DustCapacitorOnnx \
  -destination "platform=iOS Simulator,name=iPhone 16e" \
  -skipPackagePluginValidation

E2E tests (22 plugin tests)

The E2E tests run 22 scenarios in two phases on a real device/simulator with the actual ONNX Runtime:

Phase 1 — Serve lifecycle (S.1–S.8): Register a tiny test model via dust-serve, download it from the test script's HTTP fixture server, verify events and status transitions, capture the serve-managed file path
Phase 2 — ONNX API (O.1–O.14): Load/unload/inference tests using the serve-managed path from Phase 1. Error tests (missing file, corrupt file) use direct paths to test ONNX error handling

Both runners use an HTTP server on port 8099 to collect test results and serve model fixtures. The run-float32 test verifies real inference: input_a=[1,2,3] + input_b=[4,5,6] produces output=[5,7,9].

Android (device or emulator):

npm run test:android

iOS (simulator):

npm run test:ios

The runners handle the full pipeline: cap sync, build, install, fixture provisioning, and result collection.

Test results

| Suite | Count | Status | |---|---|---| | Android unit tests | 51 (9 O1 + 9 O2 + 8 O3 + 9 O4 + 9 O5 + 7 O6) | PASS | | iOS unit tests | 51 (9 O1 + 9 O2 + 8 O3 + 9 O4 + 9 O5 + 7 O6) | PASS | | Android E2E | 22 (8 serve + 14 ONNX) | PASS | | iOS E2E | 22 (8 serve + 14 ONNX) | PASS | | Android YOLO E2E | 5 detections via dust-serve | PASS | | iOS YOLO E2E | 5 detections via dust-serve | PASS |

YOLO E2E

The YOLO E2E tests run end-to-end object detection: the app registers and downloads yolo26s.onnx (~37 MB) through dust-serve, runs inference on a test image, and reports detections. The model is cached after the first download.

npm run test:yolo-ios       # requires test:ios run first (app must be installed)
npm run test:yolo-android   # requires test:android run first

Interactive YOLO demo

The example app has a Demo tab where you can load a YOLO model via dust-serve, pick an image from the gallery or camera, and run object detection interactively.

Using a different ONNX model

You can run any .onnx model — the plugin is not tied to YOLO. Here's how to swap it.

1. Pick a model

ONNX models are available from:

ONNX Model Zoo — pre-trained vision, NLP, and audio models
HuggingFace ONNX models — exported from PyTorch/TensorFlow
Export your own with torch.onnx.export() or tf2onnx

For mobile, keep models under 50 MB and prefer opset 13+ for broad ONNX Runtime 1.20 compatibility.

2. Load your model

const result = await ONNX.loadModel({
  descriptor: {
    id: 'my-model',         // any string — used as the session key
    format: 'onnx',
    url: '/absolute/path/to/model.onnx',
  },
  config: {
    accelerator: 'auto',    // CoreML on iOS, NNAPI on Android, auto-fallback to CPU
    threads: 4,
    graphOptLevel: 'all',
    memoryPattern: true,
  },
});

// Inspect the model's expected inputs/outputs
console.log(result.metadata.inputs);      // [{ name, dtype, shape }, ...]
console.log(result.metadata.outputs);     // [{ name, dtype, shape }, ...]
console.log(result.metadata.accelerator); // 'coreml', 'nnapi', or 'cpu'

3. Prepare inputs and run inference

Use the metadata to discover tensor names, shapes, and dtypes, then build inputs accordingly:

// For image models, use preprocessImage to get a ready-to-use NCHW tensor
const { tensor } = await ONNX.preprocessImage({
  data: base64ImageNoPrefix,       // raw base64, no data:image/... prefix
  width: 640,
  height: 640,
  config: { resize: 'letterbox', normalization: 'zero_to_1' },
});

const result = await ONNX.runInference({
  modelId: 'my-model',
  inputs: [{ name: 'images', data: tensor.data, shape: [1, 3, 640, 640], dtype: 'float32' }],
});

// For non-image models, pass raw tensor data directly
const result2 = await ONNX.runInference({
  modelId: 'my-model',
  inputs: [{ name: 'input_ids', data: [101, 2023, 2003, 1037, 3231, 102], shape: [1, 6], dtype: 'int64' }],
});

4. Configuration reference

| Config key | Default | What it does | |-----------|---------|-------------| | accelerator | 'auto' | Execution provider. 'auto' picks CoreML (iOS) or NNAPI (Android). Falls back to CPU transparently. | | threads | (ORT default) | Thread count. Pass a number for both inter/intra-op, or { interOp: 2, intraOp: 4 } for fine control. | | graphOptLevel | 'all' | Graph optimization level. 'all' applies all optimizations. Use 'disable' for debugging. | | memoryPattern | true | Pre-allocate memory based on tensor shapes. Disable only if shapes vary wildly between runs. |

Caveats

Input tensor names and shapes must match exactly. Unlike LLM inference where you just provide a prompt, ONNX models require tensors with specific names, shapes, and dtypes. Use getModelMetadata() after loading to discover what the model expects. Mismatches produce shapeError or dtypeError before inference runs.

Dynamic dimensions accept any size, static dimensions don't. If metadata shows shape: [-1, 3, 224, 224], the first dimension is dynamic (batch size) and accepts any value, but the remaining three must be exactly 3, 224, 224.

accelerator: 'auto' may fall back to CPU silently. Not all models support CoreML or NNAPI — unsupported ops cause the EP to fail initialization. The plugin retries with CPU automatically. Check metadata.accelerator in the load result to see which EP was actually used.

CoreML first-load compiles the model (~5–15s). On iOS with CoreML, the first load compiles the ONNX graph to an internal format. This is cached in Application Support/onnx-cache/{modelId}/ and skipped on subsequent loads of the same model.

preprocessImage expects raw base64, not a data URL. Strip the data:image/jpeg;base64, prefix before passing. The tensor output is always [1, 3, H, W] NCHW float32 — verify this matches your model's expected input layout.

Large tensor data crosses the bridge as JSON arrays. Inference inputs and outputs are serialized as number[] over the Capacitor bridge. For very large tensors (e.g., high-resolution images), this can be slow. Use preprocessImage on the native side instead of sending raw pixel data from JavaScript.

ONNX Runtime version compatibility. The plugin bundles ONNX Runtime 1.20. Models exported with newer opsets may use ops not yet supported. Stick to opset 13–20 for best compatibility.

iOS sandbox: models must be in the app container. iOS apps can only read files inside their own sandbox. Use dust-serve to download models — it stores them in the app's data container automatically. If loading manually, the absolute path must point to a file inside the app's sandbox.

cap sync may regenerate patched files. If you manually set the iOS deployment target to 16.0 or Android minSdk to 26, cap sync can overwrite those changes. Re-apply patches after syncing. The E2E test scripts handle this automatically, but manual runs require awareness.

Development

# Build TypeScript
npm run build

# Lint
npm run lint

# Type check
npm run typecheck