react-native-edge-vision
v0.1.0-alpha.1
Published
Camera → ML model → AR overlay SDK for React Native. Run TFLite models on camera frames, get bounding boxes, render overlays.
Readme
15 lines to real-time detection
import { useDetector, DetectionOverlay, TEXT_DETECTION_PRESET } from 'react-native-edge-vision'
import { Camera, useFrameProcessor, useCameraDevice } from 'react-native-vision-camera'
function TextDetector() {
const device = useCameraDevice('back')
const { detect, result } = useDetector(TEXT_DETECTION_PRESET)
const fp = useFrameProcessor((frame) => {
'worklet'
detect(frame) // result arrives via hook state — no runOnJS needed
}, [detect])
return (
<>
<Camera style={{ flex: 1 }} frameProcessor={fp} device={device} isActive pixelFormat="yuv" />
<DetectionOverlay result={result} visible />
</>
)
}Install. Point at text. Boxes appear. No native code, no model math, no normalization.
Performance
Tested on Samsung Galaxy S24 (Snapdragon 8 Gen 3), XNNPACK delegate, 4 threads:
| Model | Dims | Type | Boxes | Inference | Total | Size | |-------|------|------|-------|-----------|-------|------| | PaddleOCR v2 320×320 | 320×320 | float32 | 40 | 26ms | 33ms | 4.5MB | | PaddleOCR v2 320×320 | 320×320 | int8 | 49 | 14ms | 29ms | 1.5MB | | PaddleOCR v2 352×640 | 352×640 | int8 | 51 | 34ms | 56ms | 1.5MB | | PaddleOCR v2 640×640 | 640×640 | float32 | 55 | 129ms | 145ms | 4.5MB |
Timing is from the test harness (no camera overhead). Live camera adds 8-15ms pipeline overhead + thermal variance. Run benchmark() on your device.
Live camera (352×640 int8, 50 maxDetections): 52-100ms total typical, ~10 FPS median. Inference variance (41-150ms) is CPU thermal throttling, not software.
Built for production AR applications. Zero per-frame allocation, verified on-device with method tracing.
vs. the alternatives
| | edge-vision | react-native-fast-tflite | DIY native | |---|:---:|:---:|:---:| | Lines to first detection | 12 | 40+ | 200+ | | JS in hot path | 0ms | ~28ms | 0ms | | Post-processing included | Yes | No | You build it | | Int8 auto-detection | Yes | No | You build it | | Debug overlay | Yes | No | You build it | | Benchmark mode | Yes | No | You build it | | Test harness (no camera) | Yes | No | You build it | | Auto-normalization | Yes | No | You build it | | Bundled demo model | Yes | No | You find one |
Install
npm install react-native-edge-vision react-native-vision-camera react-native-worklets-coreOpenCV (required for dbnet output mode — add to android/app/build.gradle):
dependencies {
implementation 'org.opencv:opencv:4.13.0'
}Rebuild native:
cd android && ./gradlew clean assembleDebug
# or: npx expo prebuild --clean (Expo projects)The package bundles a PaddleOCR v2 text detection model. For custom models, copy .tflite files to android/app/src/main/assets/models/.
API Reference
useDetector(config): DetectorHandle & { result }
const { detect, result, benchmark, runDiagnostics, release, isReady } = useDetector(config)DetectorConfig
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| modelPath | string | required | Asset name ('model.tflite') or absolute path ('/data/.../model.tflite'). No / = loaded from assets/models/. |
| inputWidth | number | required | Model input width in pixels. |
| inputHeight | number | required | Model input height in pixels. |
| normalization | { mean, std } | auto-detect | mean: [R, G, B], std: [R, G, B]. Omit if model has TFLite metadata. |
| channelOrder | 'rgb' \| 'bgr' | 'rgb' | Input channel order. PaddleOCR uses 'rgb'. |
| outputMode | 'dbnet' \| 'raw' | required | 'dbnet': probability map → boxes. 'raw': raw tensor to JS. |
| scoreThreshold | number | 0.5 | Minimum probability for detection (dbnet mode). |
| minBoxArea | number | 100 | Minimum box area in model-space pixels. Filters noise. |
| maxDetections | number | 100 | Cap on detections per frame. Largest boxes kept. Reduces post-processing cost. |
| includeRawOutput | boolean | false | Include raw model output tensor in result. |
| numThreads | number | 4 | Inference thread count. |
| delegate | 'cpu' \| 'gpu' | 'cpu' | Inference delegate. Test GPU before shipping. |
| autoRotate | boolean | true | Rotate frame to display orientation before model input. |
| aspectMode | 'letterbox' \| 'stretch' \| 'crop' | 'letterbox' | How to handle aspect ratio mismatch. See below. |
DetectorHandle
| Method/Property | Description |
|----------------|-------------|
| detect(frame) | Call inside useFrameProcessor worklet. Results arrive via result state. |
| result | Latest DetectionResult \| null. Updates on every frame with detections. |
| benchmark(iterations?) | Run inference-only benchmark. Returns { p50Ms, p95Ms, avgMs, fps, boxCount }. |
| runDiagnostics() | Full diagnostic: benchmark + harness + live stats. Results in logcat. |
| release() | Free model resources. Called automatically on unmount. |
| isReady | true after first successful detection. |
DetectionResult
{
boxes: DetectionBox[] // sorted top-to-bottom by y
inferenceMs: number // pure model inference time
totalMs: number // full pipeline (preprocess + inference + post-process)
frameWidth: number // camera frame dimensions
frameHeight: number
rawOutput?: number[] // only when includeRawOutput: true
}Each DetectionBox: { x, y, w, h, score } — all normalized 0-1 relative to frame.
<DetectionOverlay />
Debug overlay that draws bounding boxes on the camera preview.
<DetectionOverlay result={result} visible={__DEV__} />Presets
import { TEXT_DETECTION_PRESET } from 'react-native-edge-vision'
// Bundled PaddleOCR v2 det — 320×320 float32, letterbox, zero config
const { detect, result } = useDetector(TEXT_DETECTION_PRESET)Aspect Modes
Controls how the camera frame (e.g., 720×1280) maps to the model input (e.g., 352×640):
| Mode | What it does | Distortion | Best for |
|------|-------------|------------|----------|
| 'letterbox' | Preserve aspect ratio, pad with gray | None | General use, square models |
| 'stretch' | Resize directly to model dimensions | Yes | Non-square models matched to orientation |
| 'crop' | Center-crop to model aspect ratio | None (loses edges) | When edge content doesn't matter |
For portrait apps with a non-square model (e.g., 352×640): use aspectMode: 'stretch'. The model dimensions already match the frame's aspect ratio — stretching is a direct resize with no distortion and zero wasted pixels.
Int8 Model Support
The SDK auto-detects int8/uint8 models and branches preprocessing and postprocessing automatically:
- Preprocessing: Raw byte copy (int8) instead of float normalization — skip
normalizationconfig entirely - Postprocessing: Vectorized dequantization via OpenCV NEON ops
- No config changes needed — just swap the
.tflitefile
Converting to int8
Use the included conversion script:
python models/quantization/convert_int8.py \
--input ppocr_v2_det.onnx \
--output ppocr_v2_352x640_int8.tflite \
--calibration-image menu_photo.jpg \
--width 352 --height 640Critical: Use real images for calibration, not random data. Random calibration produces models with dead output (all zeros). The script augments a single image into 50 calibration samples with crops, rotations, and brightness jitter.
Why not onnx2tf -oiqt? It produces hybrid models (mix of float32 and int8 tensors) that are 74% slower than pure float32 due to constant dequant/requant overhead. The two-step pipeline (onnx2tf → SavedModel → TF converter) produces clean models with zero float32 tensors.
Troubleshooting
| Symptom | Cause | Fix |
|---------|-------|-----|
| detect returns null every frame | Model still loading (lazy init ~200ms) | Wait for isReady === true. Check logcat for [EDGE-VISION] errors. |
| Boxes in wrong positions | Wrong channelOrder | PaddleOCR/OpenCV models: 'rgb'. Other frameworks: check docs. |
| All scores near zero | Wrong normalization | Check your model's expected mean/std. Most use ImageNet: [0.485, 0.456, 0.406] / [0.229, 0.224, 0.225]. |
| Int8 model outputs all zeros | Calibration with random data | Reconvert with real images. See convert_int8.py. |
| "OpenCV required for dbnet" | Missing OpenCV dependency | Add implementation 'org.opencv:opencv:4.13.0' to build.gradle. |
| Changing config doesn't take effect | Stale plugin init (older SDK versions) | Update to latest version. Config changes now trigger automatic re-init. |
| XNNPACK reshape error | Dynamic input shapes not supported | XNNPACK compiles the graph at fixed dimensions. Convert separate models for different sizes. |
Requirements
- React Native 0.73+
- react-native-vision-camera 4.0+
- react-native-worklets-core
- Android (iOS coming soon)
- OpenCV 4.13.0 (for
dbnetoutput mode)
How It Works
Camera frame (YUV, 720p)
→ YUV→RGB (OpenCV, pre-allocated buffers — zero per-frame allocation)
→ Rotate to display orientation
→ Resize to model input (letterbox / stretch / crop)
→ Normalize (float32) or raw byte copy (int8)
→ LiteRT inference (XNNPACK delegate)
→ Dequantize output (OpenCV vectorized for int8)
→ Threshold → contours → top-N bounding boxes
→ Flat array to JS (<1ms bridge crossing)
→ Typed DetectionResult with boxes ready for overlayEverything runs in native Kotlin. Zero JavaScript in the hot path. Pre-allocated buffers eliminate GC pressure (~4.4MB/frame saved).
Roadmap
- [ ] iOS Swift plugin (Apple Vision + LiteRT)
- [ ] YOLO output mode (direct box decode, no OpenCV needed)
- [ ] Segmentation output mode
- [ ] Multi-model frame scheduling
- [ ] npx CLI model validator
- [ ] Model registry (CDN hosting, versioning)
License
Apache 2.0 — SDK and bundled PaddleOCR v2 det model.
Peer dependencies: OpenCV 4.13.0 (Apache 2.0), react-native-vision-camera (MIT), LiteRT/TFLite (Apache 2.0), react-native-worklets-core (MIT).
Contributing
Issues and PRs welcome. When reporting bugs, include: device model, Android version, model file details, and adb logcat -s EDGE-VISION:D output.
