react-native-edge-vision

v0.1.0-alpha.1

Published

a day ago

Camera → ML model → AR overlay SDK for React Native. Run TFLite models on camera frames, get bounding boxes, render overlays.

0High
0Medium
0Low

mberlight

15 lines to real-time detection

import { useDetector, DetectionOverlay, TEXT_DETECTION_PRESET } from 'react-native-edge-vision'
import { Camera, useFrameProcessor, useCameraDevice } from 'react-native-vision-camera'

function TextDetector() {
  const device = useCameraDevice('back')
  const { detect, result } = useDetector(TEXT_DETECTION_PRESET)

  const fp = useFrameProcessor((frame) => {
    'worklet'
    detect(frame) // result arrives via hook state — no runOnJS needed
  }, [detect])

  return (
    <>
      <Camera style={{ flex: 1 }} frameProcessor={fp} device={device} isActive pixelFormat="yuv" />
      <DetectionOverlay result={result} visible />
    </>
  )
}

Install. Point at text. Boxes appear. No native code, no model math, no normalization.

Performance

Tested on Samsung Galaxy S24 (Snapdragon 8 Gen 3), XNNPACK delegate, 4 threads:

| Model | Dims | Type | Boxes | Inference | Total | Size | |-------|------|------|-------|-----------|-------|------| | PaddleOCR v2 320×320 | 320×320 | float32 | 40 | 26ms | 33ms | 4.5MB | | PaddleOCR v2 320×320 | 320×320 | int8 | 49 | 14ms | 29ms | 1.5MB | | PaddleOCR v2 352×640 | 352×640 | int8 | 51 | 34ms | 56ms | 1.5MB | | PaddleOCR v2 640×640 | 640×640 | float32 | 55 | 129ms | 145ms | 4.5MB |

Timing is from the test harness (no camera overhead). Live camera adds 8-15ms pipeline overhead + thermal variance. Run benchmark() on your device.

Live camera (352×640 int8, 50 maxDetections): 52-100ms total typical, ~10 FPS median. Inference variance (41-150ms) is CPU thermal throttling, not software.

Built for production AR applications. Zero per-frame allocation, verified on-device with method tracing.

vs. the alternatives

| | edge-vision | react-native-fast-tflite | DIY native | |---|:---:|:---:|:---:| | Lines to first detection | 12 | 40+ | 200+ | | JS in hot path | 0ms | ~28ms | 0ms | | Post-processing included | Yes | No | You build it | | Int8 auto-detection | Yes | No | You build it | | Debug overlay | Yes | No | You build it | | Benchmark mode | Yes | No | You build it | | Test harness (no camera) | Yes | No | You build it | | Auto-normalization | Yes | No | You build it | | Bundled demo model | Yes | No | You find one |

Install

npm install react-native-edge-vision react-native-vision-camera react-native-worklets-core

OpenCV (required for dbnet output mode — add to android/app/build.gradle):

dependencies {
    implementation 'org.opencv:opencv:4.13.0'
}

Rebuild native:

cd android && ./gradlew clean assembleDebug
# or: npx expo prebuild --clean (Expo projects)

The package bundles a PaddleOCR v2 text detection model. For custom models, copy .tflite files to android/app/src/main/assets/models/.

API Reference

`useDetector(config): DetectorHandle & { result }`

const { detect, result, benchmark, runDiagnostics, release, isReady } = useDetector(config)

DetectorConfig

| Field | Type | Default | Description | |-------|------|---------|-------------| | modelPath | string | required | Asset name ('model.tflite') or absolute path ('/data/.../model.tflite'). No / = loaded from assets/models/. | | inputWidth | number | required | Model input width in pixels. | | inputHeight | number | required | Model input height in pixels. | | normalization | { mean, std } | auto-detect | mean: [R, G, B], std: [R, G, B]. Omit if model has TFLite metadata. | | channelOrder | 'rgb' \| 'bgr' | 'rgb' | Input channel order. PaddleOCR uses 'rgb'. | | outputMode | 'dbnet' \| 'raw' | required | 'dbnet': probability map → boxes. 'raw': raw tensor to JS. | | scoreThreshold | number | 0.5 | Minimum probability for detection (dbnet mode). | | minBoxArea | number | 100 | Minimum box area in model-space pixels. Filters noise. | | maxDetections | number | 100 | Cap on detections per frame. Largest boxes kept. Reduces post-processing cost. | | includeRawOutput | boolean | false | Include raw model output tensor in result. | | numThreads | number | 4 | Inference thread count. | | delegate | 'cpu' \| 'gpu' | 'cpu' | Inference delegate. Test GPU before shipping. | | autoRotate | boolean | true | Rotate frame to display orientation before model input. | | aspectMode | 'letterbox' \| 'stretch' \| 'crop' | 'letterbox' | How to handle aspect ratio mismatch. See below. |

DetectorHandle

| Method/Property | Description | |----------------|-------------| | detect(frame) | Call inside useFrameProcessor worklet. Results arrive via result state. | | result | Latest DetectionResult \| null. Updates on every frame with detections. | | benchmark(iterations?) | Run inference-only benchmark. Returns { p50Ms, p95Ms, avgMs, fps, boxCount }. | | runDiagnostics() | Full diagnostic: benchmark + harness + live stats. Results in logcat. | | release() | Free model resources. Called automatically on unmount. | | isReady | true after first successful detection. |

DetectionResult

{
  boxes: DetectionBox[]   // sorted top-to-bottom by y
  inferenceMs: number     // pure model inference time
  totalMs: number         // full pipeline (preprocess + inference + post-process)
  frameWidth: number      // camera frame dimensions
  frameHeight: number
  rawOutput?: number[]    // only when includeRawOutput: true
}

Each DetectionBox: { x, y, w, h, score } — all normalized 0-1 relative to frame.

`<DetectionOverlay />`

Debug overlay that draws bounding boxes on the camera preview.

<DetectionOverlay result={result} visible={__DEV__} />

Presets

import { TEXT_DETECTION_PRESET } from 'react-native-edge-vision'

// Bundled PaddleOCR v2 det — 320×320 float32, letterbox, zero config
const { detect, result } = useDetector(TEXT_DETECTION_PRESET)

Aspect Modes

Controls how the camera frame (e.g., 720×1280) maps to the model input (e.g., 352×640):

| Mode | What it does | Distortion | Best for | |------|-------------|------------|----------| | 'letterbox' | Preserve aspect ratio, pad with gray | None | General use, square models | | 'stretch' | Resize directly to model dimensions | Yes | Non-square models matched to orientation | | 'crop' | Center-crop to model aspect ratio | None (loses edges) | When edge content doesn't matter |

For portrait apps with a non-square model (e.g., 352×640): use aspectMode: 'stretch'. The model dimensions already match the frame's aspect ratio — stretching is a direct resize with no distortion and zero wasted pixels.

Int8 Model Support

The SDK auto-detects int8/uint8 models and branches preprocessing and postprocessing automatically:

Preprocessing: Raw byte copy (int8) instead of float normalization — skip normalization config entirely
Postprocessing: Vectorized dequantization via OpenCV NEON ops
No config changes needed — just swap the .tflite file

Converting to int8

Use the included conversion script:

python models/quantization/convert_int8.py \
  --input ppocr_v2_det.onnx \
  --output ppocr_v2_352x640_int8.tflite \
  --calibration-image menu_photo.jpg \
  --width 352 --height 640

Critical: Use real images for calibration, not random data. Random calibration produces models with dead output (all zeros). The script augments a single image into 50 calibration samples with crops, rotations, and brightness jitter.

Why not onnx2tf -oiqt? It produces hybrid models (mix of float32 and int8 tensors) that are 74% slower than pure float32 due to constant dequant/requant overhead. The two-step pipeline (onnx2tf → SavedModel → TF converter) produces clean models with zero float32 tensors.

Troubleshooting

| Symptom | Cause | Fix | |---------|-------|-----| | detect returns null every frame | Model still loading (lazy init ~200ms) | Wait for isReady === true. Check logcat for [EDGE-VISION] errors. | | Boxes in wrong positions | Wrong channelOrder | PaddleOCR/OpenCV models: 'rgb'. Other frameworks: check docs. | | All scores near zero | Wrong normalization | Check your model's expected mean/std. Most use ImageNet: [0.485, 0.456, 0.406] / [0.229, 0.224, 0.225]. | | Int8 model outputs all zeros | Calibration with random data | Reconvert with real images. See convert_int8.py. | | "OpenCV required for dbnet" | Missing OpenCV dependency | Add implementation 'org.opencv:opencv:4.13.0' to build.gradle. | | Changing config doesn't take effect | Stale plugin init (older SDK versions) | Update to latest version. Config changes now trigger automatic re-init. | | XNNPACK reshape error | Dynamic input shapes not supported | XNNPACK compiles the graph at fixed dimensions. Convert separate models for different sizes. |

Requirements

React Native 0.73+
react-native-vision-camera 4.0+
react-native-worklets-core
Android (iOS coming soon)
OpenCV 4.13.0 (for dbnet output mode)

How It Works

Camera frame (YUV, 720p)
  → YUV→RGB (OpenCV, pre-allocated buffers — zero per-frame allocation)
  → Rotate to display orientation
  → Resize to model input (letterbox / stretch / crop)
  → Normalize (float32) or raw byte copy (int8)
  → LiteRT inference (XNNPACK delegate)
  → Dequantize output (OpenCV vectorized for int8)
  → Threshold → contours → top-N bounding boxes
  → Flat array to JS (<1ms bridge crossing)
  → Typed DetectionResult with boxes ready for overlay

Everything runs in native Kotlin. Zero JavaScript in the hot path. Pre-allocated buffers eliminate GC pressure (~4.4MB/frame saved).

Roadmap

[ ] iOS Swift plugin (Apple Vision + LiteRT)
[ ] YOLO output mode (direct box decode, no OpenCV needed)
[ ] Segmentation output mode
[ ] Multi-model frame scheduling
[ ] npx CLI model validator
[ ] Model registry (CDN hosting, versioning)

License

Apache 2.0 — SDK and bundled PaddleOCR v2 det model.

Peer dependencies: OpenCV 4.13.0 (Apache 2.0), react-native-vision-camera (MIT), LiteRT/TFLite (Apache 2.0), react-native-worklets-core (MIT).

Contributing

Issues and PRs welcome. When reporting bugs, include: device model, Android version, model file details, and adb logcat -s EDGE-VISION:D output.