expo-mediapipe

v0.4.1

Published

8 days ago

Zero-setup bridge between Google MediaPipe Tasks and Expo / React Native

0High
0Medium
0Low

ayush_jadaun

expo mediapipe react-native hand-tracking face-detection pose-estimation object-detection machine-learning vision

expo-mediapipe

Zero-setup bridge between Google MediaPipe Tasks Vision SDK and Expo / React Native.

Features

5 vision tasks — hand tracking, face detection, pose estimation, object detection, gesture recognition
Live camera hook — useMediaPipe streams results at frame rate via a managed CameraView
Multi-task inference — useMultiMediaPipe runs multiple tasks simultaneously on one camera feed
Video processing — processVideo() runs frame-by-frame inference on video files
One-shot image inference — detect() runs a single prediction on any image URI
Gesture recognition — thumbs up/down, victory, open palm, fist, pointing, ILY out of the box
Face blendshapes — 52 expression coefficients (smile, blink, jaw...) for avatar/AR apps
Runtime model download — downloadModel() fetches and caches models on demand, keeping your app small
GPU acceleration — optional delegate: 'gpu' for faster inference via Metal (iOS) / GPU delegate (Android)
JSI fast path — readLatest() pulls results synchronously as a Float32Array, no events, no per-landmark allocation (experimental, Android)
Drop-in overlay — <MediaPipeOverlay> draws dots/skeletons/boxes for any task, zero extra deps; or pair the JSI buffer with Reanimated + Skia worklets for UI-thread drawing
~50 helper functions — geometric gesture recognition (no ML model), joint angles + rep counting, pinch strength, blendshape/expression helpers, smoothing, projection — all pure & tested (reference)
Universal — same hooks run on iOS, Android, and web (WASM); write once, ship everywhere
Per-task confidence tuning — separate detection, presence, and tracking thresholds per task
Expo Config Plugin — auto-injects native dependencies and bundles model files
TypeScript-first — fully typed results with discriminated unions, zero any

Documentation

Helpers & Components Reference — every helper, hook, constant, and <MediaPipeOverlay> prop, with examples
Recipes — copy-paste solutions: gesture triggers, pinch control, rep counter, blink detection, smoothing, zero-bundle setup, and more
This README — install, quick start, full API reference, platform notes
Or ask your AI assistant: claude mcp add expo-mediapipe -- npx -y expo-mediapipe mcp (see AI Assistant Docs)

Installation

npx expo install expo-mediapipe

Plugin configuration

Add the plugin to your app.json (or app.config.js):

{
  "expo": {
    "plugins": [
      ["expo-mediapipe", { "modelsDir": "./assets/models" }]
    ]
  }
}

The modelsDir option tells the config plugin where to find .task model files. They are copied into the native project at build time and become accessible via the asset://models/<filename> URI scheme.

Place your model files

your-project/
  assets/
    models/
      hand_landmarker.task
      face_landmarker.task
      pose_landmarker.task
      efficientdet_lite0.task

Then rebuild your dev client:

npx expo prebuild --clean
npx expo run:android   # or run:ios

Quick Start

Live camera detection

import { useMediaPipe } from 'expo-mediapipe';

export default function HandTracker() {
  const { results, status, startCamera, stopCamera, CameraView } = useMediaPipe({
    task: 'handLandmarker',
    modelPath: 'hand_landmarker.task',
    maxResults: 2,
    minConfidence: 0.5,
    cameraFacing: 'front',
    delegate: 'gpu',
    onResults: (r) => {
      if (r.task === 'handLandmarker' && r.landmarks.length > 0) {
        console.log('Hand detected!', r.handedness);
      }
    },
  });

  return (
    <View style={{ flex: 1 }}>
      <CameraView style={{ flex: 1 }} />
      <Button title="Start" onPress={startCamera} />
      <Button title="Stop" onPress={stopCamera} />
    </View>
  );
}

Multi-task (hands + pose simultaneously)

import { useMultiMediaPipe } from 'expo-mediapipe';

const { results, status, startCamera, CameraView } = useMultiMediaPipe({
  tasks: [
    { task: 'handLandmarker', modelPath: 'hand_landmarker.task', maxResults: 2 },
    { task: 'poseLandmarker', modelPath: 'pose_landmarker_lite.task' },
  ],
  cameraFacing: 'back',
  onResults: (all) => {
    const hands = all.handLandmarker;   // HandLandmarkerResult | undefined
    const pose = all.poseLandmarker;    // PoseLandmarkerResult | undefined
  },
});

One-shot image detection

import { detect } from 'expo-mediapipe';

const result = await detect({
  task: 'objectDetection',
  modelPath: 'efficientdet_lite0.task',
  source: { type: 'image', uri: 'file:///path/to/photo.jpg' },
  maxResults: 5,
  minConfidence: 0.4,
  delegate: 'gpu',
});

Video processing

import { processVideo } from 'expo-mediapipe';

const frames = await processVideo({
  task: 'poseLandmarker',
  modelPath: 'pose_landmarker_lite.task',
  videoUri: 'file:///path/to/video.mp4',
  frameIntervalMs: 200,  // extract a frame every 200ms
  delegate: 'gpu',
});

frames.forEach((f) => {
  console.log(`Frame at ${f.timestampMs}ms:`, f.result);
});

Drawing results

Don't hand-roll the coordinate math — drop in <MediaPipeOverlay>. Plain React Native Views (no extra deps), handles FILL_CENTER projection and front-camera mirroring for you, works for every task:

import { useMediaPipe, MediaPipeOverlay } from 'expo-mediapipe';

const { results, status, CameraView, startCamera } = useMediaPipe({
  task: 'handLandmarker',
  modelPath: 'hand_landmarker.task',
});

<View style={{ flex: 1 }}>
  <CameraView style={StyleSheet.absoluteFill} />
  <MediaPipeOverlay
    results={results}
    color="#00ff88"   dotRadius={4}   lineWidth={2}
    showDots showConnections
  />
</View>

| Prop | Type | Default | Description | |---|---|---|---| | results | MediaPipeResult \| null | required | Latest result from a hook. | | cameraFacing | 'front' \| 'back' | 'back' | Mirror fallback when the result lacks isFrontCamera. | | showDots / showConnections | boolean | true | Toggle dots / skeleton lines. | | color | string | per-task | Dot & line color. | | dotRadius / lineWidth | number | 3 / 2 | Sizing. | | faceParts | FacePart[] | all | For faceLandmarker: restrict to 'oval' \| 'leftEye' \| 'rightEye' \| 'lips' \| 'nose'. | | showObjectLabels | boolean | true | For objectDetection: label + score above each box. |

For UI-thread drawing of heavy meshes, see the worklet pattern.

Helper Functions

Pure, dependency-free utilities (also tree-shakeable) that turn raw landmarks into something useful. All are unit-tested and run on every platform.

Gestures (no ML model needed) — recognize from a plain handLandmarker result:

import { recognizeHandGesture, isPinching, pinchStrength } from 'expo-mediapipe';

const hand = results.landmarks[0];
recognizeHandGesture(hand);  // 'Thumb_Up' | 'Victory' | 'Open_Palm' | ... | null
isPinching(hand);            // boolean
pinchStrength(hand);         // 0..1 — drive a slider or zoom

Pose / fitness:

import { elbowAngle, RepCounter } from 'expo-mediapipe';

const angle = elbowAngle(pose.landmarks[0], 'left'); // degrees, 180 = straight
const counter = new RepCounter({ downThreshold: 70, upThreshold: 160 });
counter.update(angle); // call per frame → counter.reps

Face:

import { isSmiling, isBlinking, topBlendshapes } from 'expo-mediapipe';
isSmiling(face);                 // needs taskOptions.outputBlendshapes
topBlendshapes(face, 3);         // strongest expressions

Geometry & projection: landmarkToPixel, landmarkDistance, landmarkAngle, boundingBoxOf, boxIoU, centroid, mirrorLandmarks, … Object detection: detectionsByLabel, filterByScore, nonMaxSuppression, highestConfidence. Temporal: smoothLandmarks, OneEuroFilter (jitter), landmarkVelocity. Hooks: useGesture, usePinch, useRepCounter, useSmoothedResults. Constants & guards: HAND_LANDMARKS.INDEX_TIP, POSE_LANDMARKS.LEFT_ELBOW, MODEL_URLS, isHandResult().

// Zero-bundle setup: download the official model at runtime
import { downloadModel, MODEL_URLS } from 'expo-mediapipe';
const modelPath = await downloadModel(MODEL_URLS.handLandmarker);

API Reference

`useMediaPipe(options)`

React hook for real-time single-task camera inference.

`UseMediaPipeOptions`

| Option | Type | Default | Description | |---|---|---|---| | task | MediaPipeTask | required | 'handLandmarker' | 'faceLandmarker' | 'poseLandmarker' | 'objectDetection' | | modelPath | string | required | Path to the .task model file. Accepts asset://models/<file>, file:///..., or a bare filename. | | maxResults | number | 1 | Maximum number of detections to return (meaning is task-specific). | | minConfidence | number | 0.5 | Minimum detection confidence in [0, 1]. Used as fallback for all thresholds. | | cameraFacing | 'front' \| 'back' | 'back' | Which camera to use. | | delegate | 'cpu' \| 'gpu' | 'cpu' | Hardware delegate for inference. GPU uses Metal (iOS) or GPU delegate (Android). | | taskOptions | object | undefined | Per-task confidence thresholds. See Per-task thresholds. | | onResults | (results: MediaPipeResult) => void | undefined | Called on every result frame — use for gesture logic or custom processing. | | onError | (error: MediaPipeError) => void | undefined | Called when the native layer emits an error. | | onPerformance | (stats: { fps: number }) => void | undefined | Called ~once per second with the measured result rate. |

`UseMediaPipeReturn`

| Property | Type | Description | |---|---|---| | results | MediaPipeResult \| null | Latest inference results, or null before the first frame. | | status | MediaPipeStatus | 'loading' | 'ready' | 'error' | | startCamera | () => void | Begin camera preview and inference. | | stopCamera | () => void | Stop camera and inference. | | pauseInference | () => void | Stop sending frames to MediaPipe; preview keeps running. | | resumeInference | () => void | Resume inference after a pause. | | readLatest | () => LatestResultBuffer \| null | Synchronous JSI pull of the newest result (Android, experimental). See JSI Fast Path. | | CameraView | React.FC<{ style?: object }> | Drop-in camera preview component. |

Pass onPerformance: (s) => console.log(s.fps) to receive the measured result rate roughly once per second — handy for performance overlays and benchmarking.

`useMultiMediaPipe(options)`

React hook for running multiple tasks simultaneously on one camera feed.

`UseMultiMediaPipeOptions`

| Option | Type | Default | Description | |---|---|---|---| | tasks | MultiTaskConfig[] | required | Array of task configurations. Each has task, modelPath, maxResults?, minConfidence?, delegate?, taskOptions?. | | cameraFacing | 'front' \| 'back' | 'back' | Which camera to use. | | onResults | (results: Partial<Record<MediaPipeTask, MediaPipeResult>>) => void | undefined | Called when any task produces a new result. Receives all latest results keyed by task. | | onError | (error: MediaPipeError) => void | undefined | Called on error from any task. | | onPerformance | (stats: { fps: number }) => void | undefined | Called ~once per second with the result rate summed across tasks. |

`UseMultiMediaPipeReturn`

| Property | Type | Description | |---|---|---| | results | Partial<Record<MediaPipeTask, MediaPipeResult>> | Latest results per task. Access with results.handLandmarker, results.poseLandmarker, etc. | | status | MediaPipeStatus | 'loading' | 'ready' | 'error' | | startCamera / stopCamera | () => void | Camera controls. | | pauseInference / resumeInference | () => void | Pause/resume inference while the preview keeps running. | | CameraView | React.FC<{ style?: object }> | Shared camera preview. |

Performance note: Each task runs its own ML graph. Running 2-3 tasks simultaneously will increase CPU/GPU load and may reduce frame rate on lower-end devices. Test on your target hardware.

`detect(options)`

One-shot inference on a single image. Returns Promise<MediaPipeResult>.

Each call creates and destroys a TaskRunner internally. Model loading adds ~100-500 ms of latency. Do not call in a tight loop — use useMediaPipe for real-time.

`DetectOptions`

| Option | Type | Default | Description | |---|---|---|---| | task | MediaPipeTask | required | The vision task to run. | | modelPath | string | required | Path to the .task model file. | | source | { type: 'image', uri: string } | required | Image source. | | maxResults | number | 1 | Maximum number of detections. | | minConfidence | number | 0.5 | Minimum confidence threshold. | | delegate | 'cpu' \| 'gpu' | 'cpu' | Hardware delegate. | | taskOptions | object | undefined | Per-task confidence thresholds. |

`processVideo(options)`

Frame-by-frame inference on a video file. Returns Promise<VideoFrameResult[]>.

Uses MediaPipe's VIDEO running mode with temporal tracking between frames (more accurate than processing each frame independently).

`ProcessVideoOptions`

| Option | Type | Default | Description | |---|---|---|---| | task | MediaPipeTask | required | The vision task to run. | | modelPath | string | required | Path to the .task model file. | | videoUri | string | required | Video file URI (file://, content://). | | frameIntervalMs | number | 100 | Extract a frame every N milliseconds. Lower = more frames, slower processing. | | maxResults | number | 1 | Maximum detections per frame. | | minConfidence | number | 0.5 | Minimum confidence threshold. | | delegate | 'cpu' \| 'gpu' | 'cpu' | Hardware delegate. | | taskOptions | object | undefined | Per-task confidence thresholds. |

`VideoFrameResult`

interface VideoFrameResult {
  result: MediaPipeResult;  // Inference result for this frame
  timestampMs: number;      // Frame timestamp in milliseconds
  frameIndex: number;       // Zero-based frame index
}

Memory note: All results are collected in memory. For very long videos, use a larger frameIntervalMs to limit the number of frames processed.

Per-task Confidence Thresholds

Pass taskOptions to fine-tune confidence thresholds per task. These override minConfidence for specific stages:

useMediaPipe({
  task: 'handLandmarker',
  modelPath: 'hand_landmarker.task',
  minConfidence: 0.5,  // fallback for any unset threshold
  taskOptions: {
    minDetectionConfidence: 0.7,   // initial detection must be very confident
    minPresenceConfidence: 0.5,    // presence tracking can be looser
    minTrackingConfidence: 0.3,    // landmark tracking can be loosest
  },
});

| Threshold | Applies to | Description | |---|---|---| | minDetectionConfidence | hand, face, pose | Confidence for initial detection of a new subject. | | minPresenceConfidence | hand, face, pose | Confidence that a previously detected subject is still present. | | minTrackingConfidence | hand, face, pose | Confidence for landmark tracking between frames. |

Object detection uses only minConfidence (no sub-thresholds).

Types

`MediaPipeResult` (discriminated union)

if (result.task === 'handLandmarker') {
  result.landmarks;       // Landmark[][] — one array per hand
  result.worldLandmarks;  // Landmark[][] — metric 3D, optional
  result.handedness;      // ('Left' | 'Right')[]
}

if (result.task === 'faceLandmarker') {
  result.landmarks;    // Landmark[][] — one array per face
  result.blendshapes;  // Category[][] — 52 coefficients, when outputBlendshapes is set
}

if (result.task === 'poseLandmarker') {
  result.landmarks;      // Landmark[][] — one array per person
  result.worldLandmarks; // Landmark[][] — 3D world-space coordinates
}

if (result.task === 'objectDetection') {
  result.detections; // { label: string; score: number; boundingBox: BoundingBox }[]
}

if (result.task === 'gestureRecognizer') {
  result.gestures;        // Category[][] — classifications per hand, best first
  result.landmarks;       // Landmark[][] — 21 per hand
  result.worldLandmarks;  // Landmark[][] — metric 3D, optional
  result.handedness;      // ('Left' | 'Right')[]
}

All result variants also include optional metadata: imageWidth, imageHeight, isFrontCamera.

Core Types

type MediaPipeTask = 'handLandmarker' | 'faceLandmarker' | 'poseLandmarker' | 'objectDetection' | 'gestureRecognizer';
type MediaPipeDelegate = 'cpu' | 'gpu';
type MediaPipeStatus = 'loading' | 'ready' | 'error';

interface Landmark { x: number; y: number; z: number; visibility?: number; }
interface BoundingBox { left: number; top: number; width: number; height: number; }
interface Category { categoryName: string; score: number; }

interface MediaPipeError {
  code: 'MODEL_NOT_FOUND' | 'INFERENCE_FAILED' | 'INVALID_INPUT' | 'CAMERA_UNAVAILABLE' | 'UNSUPPORTED_PLATFORM';
  message: string;
}

Supported Tasks

| Task | task value | Model file | Returns | |---|---|---|---| | Hand Tracking | 'handLandmarker' | hand_landmarker.task | 21 landmarks per hand + handedness | | Face Detection | 'faceLandmarker' | face_landmarker.task | 478 landmarks per face + optional 52 blendshapes | | Pose Estimation | 'poseLandmarker' | pose_landmarker.task | 33 landmarks per person + world-space 3D | | Object Detection | 'objectDetection' | efficientdet_lite0.task | Label, confidence, bounding box per detection | | Gesture Recognition | 'gestureRecognizer' | gesture_recognizer.task | Gesture label + score, 21 landmarks, handedness per hand |

Gesture recognition

const { results, CameraView, startCamera } = useMediaPipe({
  task: 'gestureRecognizer',
  modelPath: 'gesture_recognizer.task',
  maxResults: 2,
});

// results.gestures: Category[][] — classifications per hand, best first.
// Canned gestures: Closed_Fist, Open_Palm, Pointing_Up, Thumb_Down,
// Thumb_Up, Victory, ILoveYou, None
if (results?.task === 'gestureRecognizer') {
  const best = results.gestures[0]?.[0];
  if (best?.categoryName === 'Thumb_Up') console.log('👍');
}

Face blendshapes

Pass outputBlendshapes: true to receive 52 facial expression coefficients per face — ideal for driving avatars or detecting expressions:

const { results } = useMediaPipe({
  task: 'faceLandmarker',
  modelPath: 'face_landmarker.task',
  taskOptions: { outputBlendshapes: true },
});

if (results?.task === 'faceLandmarker' && results.blendshapes?.[0]) {
  const smile = results.blendshapes[0].find(b => b.categoryName === 'mouthSmileLeft');
  console.log('Smile intensity:', smile?.score);
}

Runtime model download

Skip bundling models to keep your install size small — download and cache at runtime instead (requires expo-file-system):

import { downloadModel } from 'expo-mediapipe';

const modelPath = await downloadModel(
  'https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/latest/gesture_recognizer.task',
);
// Cached after first call — subsequent calls return instantly.
// Use as modelPath in any hook or function.

Model Files

Download pre-trained .task model bundles from the official MediaPipe Solutions page:

Place .task files in your modelsDir and they will be bundled automatically. You can also load models at runtime from a file:// URI.

Example App

The example/ directory contains a full demo app showcasing every feature:

| Tab | Feature | What it demonstrates | |-----|---------|---------------------| | Hand Tracking | useMediaPipe + GPU + JSI + worklets | Real-time 21-point hand skeleton drawn on the UI thread via the JSI fast path; live FPS + JSI status badges | | Gesture | gestureRecognizer | Thumbs up/down, victory, fist, etc. with live emoji feedback | | Face | Face part toggles + blendshapes | 478-landmark face mesh with selectable parts and a live expression readout | | Pose | Pose estimation | 33-point body skeleton (full model) with GPU acceleration | | Objects | Object detection | Bounding boxes with labels and confidence scores | | Multi | useMultiMediaPipe + per-task JSI | Hand + pose simultaneously on one camera, each with its own JSI slot | | Video | processVideo | Pick a video from gallery, run frame-by-frame pose analysis |

All inference runs fully offline — model files are bundled as native assets via the Expo Config Plugin. No network connection required after install.

All camera tabs use useFocusEffect for proper start/stop on tab switches, front/back camera toggle, and GPU delegation.

Running the example

cd example
npm install
npx expo prebuild --clean
npx expo run:android   # or run:ios

Face part selection

The face tab demonstrates selective landmark rendering. Each part renders all landmark dots in that region (not just outlines):

| Part | What renders | Approx. points | |------|-------------|----------------| | 'oval' | Face contour | ~37 dots | | 'leftEye' | Eyelid, iris, brow | ~30 dots | | 'rightEye' | Eyelid, iris, brow | ~30 dots | | 'lips' | Outer + inner lip | ~40 dots | | 'nose' | Bridge, tip, nostrils | ~30 dots | | All selected | Full 478-point face mesh | 478 dots |

import { ResultOverlay } from '@/components/mediapipe/ResultOverlay';
import type { FacePart } from '@/components/mediapipe/ResultOverlay';

// Show only eyes and lips (much faster than full 478-point mesh)
const [activeParts, setActiveParts] = useState<FacePart[]>(['leftEye', 'rightEye', 'lips']);

<ResultOverlay
  status={status}
  results={results}
  faceParts={activeParts}  // omit for full face mesh
/>

Selecting specific parts instead of the full mesh significantly improves rendering performance — 30-40 dots vs 478 per frame.

Advanced Usage

Direct native access

import { NativeCameraView, NativeModule } from 'expo-mediapipe';
import type { NativeCameraViewProps } from 'expo-mediapipe';

// NativeCameraView — full control over the native camera view
<NativeCameraView
  task="handLandmarker"
  modelPath="asset://models/hand_landmarker.task"
  maxResults={2}
  minConfidence={0.5}
  cameraFacing="back"
  delegate="gpu"
  isCameraRunning={true}
  onResults={(e) => console.log(e.nativeEvent)}
  onError={(e) => console.error(e.nativeEvent)}
  onStatusChange={(e) => console.log(e.nativeEvent.status)}
  style={{ flex: 1 }}
/>

// NativeModule — direct native function calls
const raw = await NativeModule.runOnImage(
  'objectDetection', 'asset://models/efficientdet_lite0.task',
  imageUri, 5, 0.4, 'gpu', '{}',
);

JSI Fast Path (experimental, Android)

For latency-critical drawing, skip events and React state entirely — pull the latest result synchronously as a Float32Array:

const { readLatest, CameraView, startCamera } = useMediaPipe({
  task: 'handLandmarker',
  modelPath: 'hand_landmarker.task',
});

// In your render/draw loop:
const latest = readLatest(); // LatestResultBuffer | null
if (latest && latest.frameId !== lastDrawnFrame) {
  // latest.data: Float32Array, [x0,y0,z0, x1,y1,z1, ...]
  // latest.perEntity landmarks per hand/face/pose, latest.stride components each
  drawSkeleton(latest.data, latest.perEntity, latest.stride);
  lastDrawnFrame = latest.frameId;
}

Pull-based: no per-frame events, no per-landmark object allocation, and a busy JS thread reads the freshest frame instead of draining a backlog. Currently Android-only — readLatest() returns null on iOS and web (where the event API remains the path), so the same code runs everywhere.

UI-thread drawing with worklets

The JSI buffer pairs with Reanimated + Skia to draw the skeleton entirely on the UI thread — React renders the overlay once, then never participates in the per-frame draw loop again. Scrolls, modals, GC pauses, and busy app logic on the JS thread no longer stutter the overlay, because the UI thread just paints the freshest landmarks it has.

The data path is fully allocation-free per frame:

native result → JSI readLatest() (Float32Array) → Reanimated shared value
→ useDerivedValue worklet builds a Skia path → Canvas repaints (no React render)

The example app's WorkletLandmarkOverlay (example/components/mediapipe/) is a complete, copy-pasteable implementation. The wiring:

import { useMediaPipe } from 'expo-mediapipe';
import { WorkletLandmarkOverlay, useWorkletFrame } from './WorkletLandmarkOverlay';
import { TASK_CONNECTIONS } from './ResultOverlay';

function HandTracker() {
  const { frame, push } = useWorkletFrame();

  const { CameraView, startCamera, readLatest } = useMediaPipe({
    task: 'handLandmarker',
    modelPath: 'hand_landmarker.task',
    // The event is only a tick — the landmark payload rides the JSI buffer.
    onResults: (r) => {
      const latest = readLatest();
      if (latest) {
        push({
          data: Array.from(latest.data),
          stride: latest.stride,
          perEntity: latest.perEntity,
          imgW: r.imageWidth ?? 0,
          imgH: r.imageHeight ?? 0,
          mirror: r.isFrontCamera ?? false,
        });
      }
    },
  });

  return (
    <View style={{ flex: 1 }}>
      <CameraView style={StyleSheet.absoluteFill} />
      <WorkletLandmarkOverlay frame={frame} connections={TASK_CONNECTIONS.handLandmarker} />
    </View>
  );
}

Projection tip: project landmarks against the overlay's measured onLayout box, not useWindowDimensions() — the window includes the header/tab bar that the camera view doesn't occupy, which shifts the skeleton off the subject. WorkletLandmarkOverlay already does this.

Requires react-native-reanimated and @shopify/react-native-skia in your app. Where the JSI fast path is unavailable (iOS, web), feed the worklet from the onResults event payload instead — same shared-value channel, the data just arrives as objects rather than a Float32Array.

AI Assistant Docs (MCP)

The package ships a built-in MCP server that serves this documentation to AI assistants — so Claude Code, Cursor, and other MCP clients answer expo-mediapipe questions from the real docs instead of guessing:

# Claude Code
claude mcp add expo-mediapipe -- npx -y expo-mediapipe mcp

// Cursor / other MCP clients
{
  "mcpServers": {
    "expo-mediapipe": { "command": "npx", "args": ["-y", "expo-mediapipe", "mcp"] }
  }
}

Tools: list_sections, get_section, search_docs. The served docs always match the installed package version, fully offline.

Platform Support

| Platform | Camera Backend | ML Backend | Min SDK | |---|---|---|---| | Android | CameraX | MediaPipe Tasks Vision 0.10+ | API 24 (Android 7.0) | | iOS | AVFoundation | MediaPipe Tasks Vision 0.10+ | iOS 14.0 | | Web | getUserMedia | @mediapipe/tasks-vision (WASM) | Modern browsers |

Web

The same hooks and functions work on Expo web with zero code changes — inference runs in-browser via Google's official WASM build (loaded from CDN at runtime, pinned to the bundled SDK version).

Web specifics:

Models: put your .task files in your project's public/models/ directory — the asset://models/<file> scheme maps to /models/<file> on web. Full URLs also work, and downloadModel(url) simply returns the URL (browsers handle caching).
Camera: uses getUserMedia; the browser shows its own permission prompt. cameraFacing maps to the facingMode constraint.
GPU: delegate: 'gpu' uses WebGL when available, falling back to CPU/WASM.
processVideo() seeks through the file with an off-screen <video> element; videos must be same-origin or CORS-enabled.

Offline Support

All inference runs fully offline. Model files are bundled into the native binary at build time via the Expo Config Plugin — no network connection is needed after installation.

Models are copied from your modelsDir to native assets during expo prebuild
The asset://models/<filename> URI scheme loads directly from the app bundle
For dynamic model loading, use file:// URIs with models downloaded to device storage

Requirements

Expo 49+
New Architecture enabled ("newArchEnabled": true in app.json)
Custom dev build (not Expo Go) — native code is required

Why a Development Build?

This package uses native MediaPipe SDKs (25-30MB per platform) that aren't included in Expo Go. You need a development build — this is standard for any Expo package with custom native code.

The development build experience is nearly identical to Expo Go:

Hot reload, shake menu, QR code scanning all work the same
Build once, then iterate with the same fast workflow
Use EAS Build to build in the cloud if you don't have Android Studio / Xcode

Quick setup with EAS

# Install EAS CLI
npm install -g eas-cli

# Configure (one time)
eas build:configure

# Build development client
eas build --profile development --platform android
# or: eas build --profile development --platform ios

# After installing the build on your device:
npx expo start --dev-client

Local build (no EAS account needed)

npx expo prebuild --clean
npx expo run:android   # or run:ios

Contributing

Contributions are welcome! Please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/my-feature)
Commit your changes
Push to the branch (git push origin feature/my-feature)
Open a Pull Request

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

expo-mediapipe

Features

Documentation

Installation

Plugin configuration

Place your model files

Quick Start

Live camera detection

Multi-task (hands + pose simultaneously)

One-shot image detection

Video processing

Drawing results

Helper Functions

API Reference

useMediaPipe(options)

UseMediaPipeOptions

UseMediaPipeReturn

useMultiMediaPipe(options)

UseMultiMediaPipeOptions

UseMultiMediaPipeReturn

detect(options)

DetectOptions

processVideo(options)

ProcessVideoOptions

VideoFrameResult

Per-task Confidence Thresholds

Types

MediaPipeResult (discriminated union)

Core Types

Supported Tasks

Gesture recognition

Face blendshapes

Runtime model download

Model Files

Example App

Running the example

Face part selection

Advanced Usage

Direct native access

JSI Fast Path (experimental, Android)

UI-thread drawing with worklets

AI Assistant Docs (MCP)

Platform Support

Web

Offline Support

Requirements

Why a Development Build?

Quick setup with EAS

Local build (no EAS account needed)

Contributing

License

`useMediaPipe(options)`

`UseMediaPipeOptions`

`UseMediaPipeReturn`

`useMultiMediaPipe(options)`

`UseMultiMediaPipeOptions`

`UseMultiMediaPipeReturn`

`detect(options)`

`DetectOptions`

`processVideo(options)`

`ProcessVideoOptions`

`VideoFrameResult`

`MediaPipeResult` (discriminated union)