expo-mediapipe
v0.4.1
Published
Zero-setup bridge between Google MediaPipe Tasks and Expo / React Native
Maintainers
Readme
expo-mediapipe
Zero-setup bridge between Google MediaPipe Tasks Vision SDK and Expo / React Native.
Features
- 5 vision tasks — hand tracking, face detection, pose estimation, object detection, gesture recognition
- Live camera hook —
useMediaPipestreams results at frame rate via a managedCameraView - Multi-task inference —
useMultiMediaPiperuns multiple tasks simultaneously on one camera feed - Video processing —
processVideo()runs frame-by-frame inference on video files - One-shot image inference —
detect()runs a single prediction on any image URI - Gesture recognition — thumbs up/down, victory, open palm, fist, pointing, ILY out of the box
- Face blendshapes — 52 expression coefficients (smile, blink, jaw...) for avatar/AR apps
- Runtime model download —
downloadModel()fetches and caches models on demand, keeping your app small - GPU acceleration — optional
delegate: 'gpu'for faster inference via Metal (iOS) / GPU delegate (Android) - JSI fast path —
readLatest()pulls results synchronously as aFloat32Array, no events, no per-landmark allocation (experimental, Android) - Drop-in overlay —
<MediaPipeOverlay>draws dots/skeletons/boxes for any task, zero extra deps; or pair the JSI buffer with Reanimated + Skia worklets for UI-thread drawing - ~50 helper functions — geometric gesture recognition (no ML model), joint angles + rep counting, pinch strength, blendshape/expression helpers, smoothing, projection — all pure & tested (reference)
- Universal — same hooks run on iOS, Android, and web (WASM); write once, ship everywhere
- Per-task confidence tuning — separate detection, presence, and tracking thresholds per task
- Expo Config Plugin — auto-injects native dependencies and bundles model files
- TypeScript-first — fully typed results with discriminated unions, zero
any
Documentation
- Helpers & Components Reference — every helper, hook, constant, and
<MediaPipeOverlay>prop, with examples - Recipes — copy-paste solutions: gesture triggers, pinch control, rep counter, blink detection, smoothing, zero-bundle setup, and more
- This README — install, quick start, full API reference, platform notes
- Or ask your AI assistant:
claude mcp add expo-mediapipe -- npx -y expo-mediapipe mcp(see AI Assistant Docs)
Installation
npx expo install expo-mediapipePlugin configuration
Add the plugin to your app.json (or app.config.js):
{
"expo": {
"plugins": [
["expo-mediapipe", { "modelsDir": "./assets/models" }]
]
}
}The modelsDir option tells the config plugin where to find .task model files. They are copied into the native project at build time and become accessible via the asset://models/<filename> URI scheme.
Place your model files
your-project/
assets/
models/
hand_landmarker.task
face_landmarker.task
pose_landmarker.task
efficientdet_lite0.taskThen rebuild your dev client:
npx expo prebuild --clean
npx expo run:android # or run:iosQuick Start
Live camera detection
import { useMediaPipe } from 'expo-mediapipe';
export default function HandTracker() {
const { results, status, startCamera, stopCamera, CameraView } = useMediaPipe({
task: 'handLandmarker',
modelPath: 'hand_landmarker.task',
maxResults: 2,
minConfidence: 0.5,
cameraFacing: 'front',
delegate: 'gpu',
onResults: (r) => {
if (r.task === 'handLandmarker' && r.landmarks.length > 0) {
console.log('Hand detected!', r.handedness);
}
},
});
return (
<View style={{ flex: 1 }}>
<CameraView style={{ flex: 1 }} />
<Button title="Start" onPress={startCamera} />
<Button title="Stop" onPress={stopCamera} />
</View>
);
}Multi-task (hands + pose simultaneously)
import { useMultiMediaPipe } from 'expo-mediapipe';
const { results, status, startCamera, CameraView } = useMultiMediaPipe({
tasks: [
{ task: 'handLandmarker', modelPath: 'hand_landmarker.task', maxResults: 2 },
{ task: 'poseLandmarker', modelPath: 'pose_landmarker_lite.task' },
],
cameraFacing: 'back',
onResults: (all) => {
const hands = all.handLandmarker; // HandLandmarkerResult | undefined
const pose = all.poseLandmarker; // PoseLandmarkerResult | undefined
},
});One-shot image detection
import { detect } from 'expo-mediapipe';
const result = await detect({
task: 'objectDetection',
modelPath: 'efficientdet_lite0.task',
source: { type: 'image', uri: 'file:///path/to/photo.jpg' },
maxResults: 5,
minConfidence: 0.4,
delegate: 'gpu',
});Video processing
import { processVideo } from 'expo-mediapipe';
const frames = await processVideo({
task: 'poseLandmarker',
modelPath: 'pose_landmarker_lite.task',
videoUri: 'file:///path/to/video.mp4',
frameIntervalMs: 200, // extract a frame every 200ms
delegate: 'gpu',
});
frames.forEach((f) => {
console.log(`Frame at ${f.timestampMs}ms:`, f.result);
});Drawing results
Don't hand-roll the coordinate math — drop in <MediaPipeOverlay>. Plain React Native Views (no extra deps), handles FILL_CENTER projection and front-camera mirroring for you, works for every task:
import { useMediaPipe, MediaPipeOverlay } from 'expo-mediapipe';
const { results, status, CameraView, startCamera } = useMediaPipe({
task: 'handLandmarker',
modelPath: 'hand_landmarker.task',
});
<View style={{ flex: 1 }}>
<CameraView style={StyleSheet.absoluteFill} />
<MediaPipeOverlay
results={results}
color="#00ff88" dotRadius={4} lineWidth={2}
showDots showConnections
/>
</View>| Prop | Type | Default | Description |
|---|---|---|---|
| results | MediaPipeResult \| null | required | Latest result from a hook. |
| cameraFacing | 'front' \| 'back' | 'back' | Mirror fallback when the result lacks isFrontCamera. |
| showDots / showConnections | boolean | true | Toggle dots / skeleton lines. |
| color | string | per-task | Dot & line color. |
| dotRadius / lineWidth | number | 3 / 2 | Sizing. |
| faceParts | FacePart[] | all | For faceLandmarker: restrict to 'oval' \| 'leftEye' \| 'rightEye' \| 'lips' \| 'nose'. |
| showObjectLabels | boolean | true | For objectDetection: label + score above each box. |
For UI-thread drawing of heavy meshes, see the worklet pattern.
Helper Functions
Pure, dependency-free utilities (also tree-shakeable) that turn raw landmarks into something useful. All are unit-tested and run on every platform.
Gestures (no ML model needed) — recognize from a plain handLandmarker result:
import { recognizeHandGesture, isPinching, pinchStrength } from 'expo-mediapipe';
const hand = results.landmarks[0];
recognizeHandGesture(hand); // 'Thumb_Up' | 'Victory' | 'Open_Palm' | ... | null
isPinching(hand); // boolean
pinchStrength(hand); // 0..1 — drive a slider or zoomPose / fitness:
import { elbowAngle, RepCounter } from 'expo-mediapipe';
const angle = elbowAngle(pose.landmarks[0], 'left'); // degrees, 180 = straight
const counter = new RepCounter({ downThreshold: 70, upThreshold: 160 });
counter.update(angle); // call per frame → counter.repsFace:
import { isSmiling, isBlinking, topBlendshapes } from 'expo-mediapipe';
isSmiling(face); // needs taskOptions.outputBlendshapes
topBlendshapes(face, 3); // strongest expressionsGeometry & projection: landmarkToPixel, landmarkDistance, landmarkAngle, boundingBoxOf, boxIoU, centroid, mirrorLandmarks, …
Object detection: detectionsByLabel, filterByScore, nonMaxSuppression, highestConfidence.
Temporal: smoothLandmarks, OneEuroFilter (jitter), landmarkVelocity.
Hooks: useGesture, usePinch, useRepCounter, useSmoothedResults.
Constants & guards: HAND_LANDMARKS.INDEX_TIP, POSE_LANDMARKS.LEFT_ELBOW, MODEL_URLS, isHandResult().
// Zero-bundle setup: download the official model at runtime
import { downloadModel, MODEL_URLS } from 'expo-mediapipe';
const modelPath = await downloadModel(MODEL_URLS.handLandmarker);API Reference
useMediaPipe(options)
React hook for real-time single-task camera inference.
UseMediaPipeOptions
| Option | Type | Default | Description |
|---|---|---|---|
| task | MediaPipeTask | required | 'handLandmarker' | 'faceLandmarker' | 'poseLandmarker' | 'objectDetection' |
| modelPath | string | required | Path to the .task model file. Accepts asset://models/<file>, file:///..., or a bare filename. |
| maxResults | number | 1 | Maximum number of detections to return (meaning is task-specific). |
| minConfidence | number | 0.5 | Minimum detection confidence in [0, 1]. Used as fallback for all thresholds. |
| cameraFacing | 'front' \| 'back' | 'back' | Which camera to use. |
| delegate | 'cpu' \| 'gpu' | 'cpu' | Hardware delegate for inference. GPU uses Metal (iOS) or GPU delegate (Android). |
| taskOptions | object | undefined | Per-task confidence thresholds. See Per-task thresholds. |
| onResults | (results: MediaPipeResult) => void | undefined | Called on every result frame — use for gesture logic or custom processing. |
| onError | (error: MediaPipeError) => void | undefined | Called when the native layer emits an error. |
| onPerformance | (stats: { fps: number }) => void | undefined | Called ~once per second with the measured result rate. |
UseMediaPipeReturn
| Property | Type | Description |
|---|---|---|
| results | MediaPipeResult \| null | Latest inference results, or null before the first frame. |
| status | MediaPipeStatus | 'loading' | 'ready' | 'error' |
| startCamera | () => void | Begin camera preview and inference. |
| stopCamera | () => void | Stop camera and inference. |
| pauseInference | () => void | Stop sending frames to MediaPipe; preview keeps running. |
| resumeInference | () => void | Resume inference after a pause. |
| readLatest | () => LatestResultBuffer \| null | Synchronous JSI pull of the newest result (Android, experimental). See JSI Fast Path. |
| CameraView | React.FC<{ style?: object }> | Drop-in camera preview component. |
Pass onPerformance: (s) => console.log(s.fps) to receive the measured result rate roughly once per second — handy for performance overlays and benchmarking.
useMultiMediaPipe(options)
React hook for running multiple tasks simultaneously on one camera feed.
UseMultiMediaPipeOptions
| Option | Type | Default | Description |
|---|---|---|---|
| tasks | MultiTaskConfig[] | required | Array of task configurations. Each has task, modelPath, maxResults?, minConfidence?, delegate?, taskOptions?. |
| cameraFacing | 'front' \| 'back' | 'back' | Which camera to use. |
| onResults | (results: Partial<Record<MediaPipeTask, MediaPipeResult>>) => void | undefined | Called when any task produces a new result. Receives all latest results keyed by task. |
| onError | (error: MediaPipeError) => void | undefined | Called on error from any task. |
| onPerformance | (stats: { fps: number }) => void | undefined | Called ~once per second with the result rate summed across tasks. |
UseMultiMediaPipeReturn
| Property | Type | Description |
|---|---|---|
| results | Partial<Record<MediaPipeTask, MediaPipeResult>> | Latest results per task. Access with results.handLandmarker, results.poseLandmarker, etc. |
| status | MediaPipeStatus | 'loading' | 'ready' | 'error' |
| startCamera / stopCamera | () => void | Camera controls. |
| pauseInference / resumeInference | () => void | Pause/resume inference while the preview keeps running. |
| CameraView | React.FC<{ style?: object }> | Shared camera preview. |
Performance note: Each task runs its own ML graph. Running 2-3 tasks simultaneously will increase CPU/GPU load and may reduce frame rate on lower-end devices. Test on your target hardware.
detect(options)
One-shot inference on a single image. Returns Promise<MediaPipeResult>.
Each call creates and destroys a TaskRunner internally. Model loading adds ~100-500 ms of latency. Do not call in a tight loop — use
useMediaPipefor real-time.
DetectOptions
| Option | Type | Default | Description |
|---|---|---|---|
| task | MediaPipeTask | required | The vision task to run. |
| modelPath | string | required | Path to the .task model file. |
| source | { type: 'image', uri: string } | required | Image source. |
| maxResults | number | 1 | Maximum number of detections. |
| minConfidence | number | 0.5 | Minimum confidence threshold. |
| delegate | 'cpu' \| 'gpu' | 'cpu' | Hardware delegate. |
| taskOptions | object | undefined | Per-task confidence thresholds. |
processVideo(options)
Frame-by-frame inference on a video file. Returns Promise<VideoFrameResult[]>.
Uses MediaPipe's VIDEO running mode with temporal tracking between frames (more accurate than processing each frame independently).
ProcessVideoOptions
| Option | Type | Default | Description |
|---|---|---|---|
| task | MediaPipeTask | required | The vision task to run. |
| modelPath | string | required | Path to the .task model file. |
| videoUri | string | required | Video file URI (file://, content://). |
| frameIntervalMs | number | 100 | Extract a frame every N milliseconds. Lower = more frames, slower processing. |
| maxResults | number | 1 | Maximum detections per frame. |
| minConfidence | number | 0.5 | Minimum confidence threshold. |
| delegate | 'cpu' \| 'gpu' | 'cpu' | Hardware delegate. |
| taskOptions | object | undefined | Per-task confidence thresholds. |
VideoFrameResult
interface VideoFrameResult {
result: MediaPipeResult; // Inference result for this frame
timestampMs: number; // Frame timestamp in milliseconds
frameIndex: number; // Zero-based frame index
}Memory note: All results are collected in memory. For very long videos, use a larger
frameIntervalMsto limit the number of frames processed.
Per-task Confidence Thresholds
Pass taskOptions to fine-tune confidence thresholds per task. These override minConfidence for specific stages:
useMediaPipe({
task: 'handLandmarker',
modelPath: 'hand_landmarker.task',
minConfidence: 0.5, // fallback for any unset threshold
taskOptions: {
minDetectionConfidence: 0.7, // initial detection must be very confident
minPresenceConfidence: 0.5, // presence tracking can be looser
minTrackingConfidence: 0.3, // landmark tracking can be loosest
},
});| Threshold | Applies to | Description |
|---|---|---|
| minDetectionConfidence | hand, face, pose | Confidence for initial detection of a new subject. |
| minPresenceConfidence | hand, face, pose | Confidence that a previously detected subject is still present. |
| minTrackingConfidence | hand, face, pose | Confidence for landmark tracking between frames. |
Object detection uses only minConfidence (no sub-thresholds).
Types
MediaPipeResult (discriminated union)
if (result.task === 'handLandmarker') {
result.landmarks; // Landmark[][] — one array per hand
result.worldLandmarks; // Landmark[][] — metric 3D, optional
result.handedness; // ('Left' | 'Right')[]
}
if (result.task === 'faceLandmarker') {
result.landmarks; // Landmark[][] — one array per face
result.blendshapes; // Category[][] — 52 coefficients, when outputBlendshapes is set
}
if (result.task === 'poseLandmarker') {
result.landmarks; // Landmark[][] — one array per person
result.worldLandmarks; // Landmark[][] — 3D world-space coordinates
}
if (result.task === 'objectDetection') {
result.detections; // { label: string; score: number; boundingBox: BoundingBox }[]
}
if (result.task === 'gestureRecognizer') {
result.gestures; // Category[][] — classifications per hand, best first
result.landmarks; // Landmark[][] — 21 per hand
result.worldLandmarks; // Landmark[][] — metric 3D, optional
result.handedness; // ('Left' | 'Right')[]
}All result variants also include optional metadata: imageWidth, imageHeight, isFrontCamera.
Core Types
type MediaPipeTask = 'handLandmarker' | 'faceLandmarker' | 'poseLandmarker' | 'objectDetection' | 'gestureRecognizer';
type MediaPipeDelegate = 'cpu' | 'gpu';
type MediaPipeStatus = 'loading' | 'ready' | 'error';
interface Landmark { x: number; y: number; z: number; visibility?: number; }
interface BoundingBox { left: number; top: number; width: number; height: number; }
interface Category { categoryName: string; score: number; }
interface MediaPipeError {
code: 'MODEL_NOT_FOUND' | 'INFERENCE_FAILED' | 'INVALID_INPUT' | 'CAMERA_UNAVAILABLE' | 'UNSUPPORTED_PLATFORM';
message: string;
}Supported Tasks
| Task | task value | Model file | Returns |
|---|---|---|---|
| Hand Tracking | 'handLandmarker' | hand_landmarker.task | 21 landmarks per hand + handedness |
| Face Detection | 'faceLandmarker' | face_landmarker.task | 478 landmarks per face + optional 52 blendshapes |
| Pose Estimation | 'poseLandmarker' | pose_landmarker.task | 33 landmarks per person + world-space 3D |
| Object Detection | 'objectDetection' | efficientdet_lite0.task | Label, confidence, bounding box per detection |
| Gesture Recognition | 'gestureRecognizer' | gesture_recognizer.task | Gesture label + score, 21 landmarks, handedness per hand |
Gesture recognition
const { results, CameraView, startCamera } = useMediaPipe({
task: 'gestureRecognizer',
modelPath: 'gesture_recognizer.task',
maxResults: 2,
});
// results.gestures: Category[][] — classifications per hand, best first.
// Canned gestures: Closed_Fist, Open_Palm, Pointing_Up, Thumb_Down,
// Thumb_Up, Victory, ILoveYou, None
if (results?.task === 'gestureRecognizer') {
const best = results.gestures[0]?.[0];
if (best?.categoryName === 'Thumb_Up') console.log('👍');
}Face blendshapes
Pass outputBlendshapes: true to receive 52 facial expression coefficients per face — ideal for driving avatars or detecting expressions:
const { results } = useMediaPipe({
task: 'faceLandmarker',
modelPath: 'face_landmarker.task',
taskOptions: { outputBlendshapes: true },
});
if (results?.task === 'faceLandmarker' && results.blendshapes?.[0]) {
const smile = results.blendshapes[0].find(b => b.categoryName === 'mouthSmileLeft');
console.log('Smile intensity:', smile?.score);
}Runtime model download
Skip bundling models to keep your install size small — download and cache at runtime instead (requires expo-file-system):
import { downloadModel } from 'expo-mediapipe';
const modelPath = await downloadModel(
'https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/latest/gesture_recognizer.task',
);
// Cached after first call — subsequent calls return instantly.
// Use as modelPath in any hook or function.Model Files
Download pre-trained .task model bundles from the official MediaPipe Solutions page:
Place .task files in your modelsDir and they will be bundled automatically. You can also load models at runtime from a file:// URI.
Example App
The example/ directory contains a full demo app showcasing every feature:
| Tab | Feature | What it demonstrates |
|-----|---------|---------------------|
| Hand Tracking | useMediaPipe + GPU + JSI + worklets | Real-time 21-point hand skeleton drawn on the UI thread via the JSI fast path; live FPS + JSI status badges |
| Gesture | gestureRecognizer | Thumbs up/down, victory, fist, etc. with live emoji feedback |
| Face | Face part toggles + blendshapes | 478-landmark face mesh with selectable parts and a live expression readout |
| Pose | Pose estimation | 33-point body skeleton (full model) with GPU acceleration |
| Objects | Object detection | Bounding boxes with labels and confidence scores |
| Multi | useMultiMediaPipe + per-task JSI | Hand + pose simultaneously on one camera, each with its own JSI slot |
| Video | processVideo | Pick a video from gallery, run frame-by-frame pose analysis |
All inference runs fully offline — model files are bundled as native assets via the Expo Config Plugin. No network connection required after install.
All camera tabs use useFocusEffect for proper start/stop on tab switches, front/back camera toggle, and GPU delegation.
Running the example
cd example
npm install
npx expo prebuild --clean
npx expo run:android # or run:iosFace part selection
The face tab demonstrates selective landmark rendering. Each part renders all landmark dots in that region (not just outlines):
| Part | What renders | Approx. points |
|------|-------------|----------------|
| 'oval' | Face contour | ~37 dots |
| 'leftEye' | Eyelid, iris, brow | ~30 dots |
| 'rightEye' | Eyelid, iris, brow | ~30 dots |
| 'lips' | Outer + inner lip | ~40 dots |
| 'nose' | Bridge, tip, nostrils | ~30 dots |
| All selected | Full 478-point face mesh | 478 dots |
import { ResultOverlay } from '@/components/mediapipe/ResultOverlay';
import type { FacePart } from '@/components/mediapipe/ResultOverlay';
// Show only eyes and lips (much faster than full 478-point mesh)
const [activeParts, setActiveParts] = useState<FacePart[]>(['leftEye', 'rightEye', 'lips']);
<ResultOverlay
status={status}
results={results}
faceParts={activeParts} // omit for full face mesh
/>Selecting specific parts instead of the full mesh significantly improves rendering performance — 30-40 dots vs 478 per frame.
Advanced Usage
Direct native access
import { NativeCameraView, NativeModule } from 'expo-mediapipe';
import type { NativeCameraViewProps } from 'expo-mediapipe';
// NativeCameraView — full control over the native camera view
<NativeCameraView
task="handLandmarker"
modelPath="asset://models/hand_landmarker.task"
maxResults={2}
minConfidence={0.5}
cameraFacing="back"
delegate="gpu"
isCameraRunning={true}
onResults={(e) => console.log(e.nativeEvent)}
onError={(e) => console.error(e.nativeEvent)}
onStatusChange={(e) => console.log(e.nativeEvent.status)}
style={{ flex: 1 }}
/>
// NativeModule — direct native function calls
const raw = await NativeModule.runOnImage(
'objectDetection', 'asset://models/efficientdet_lite0.task',
imageUri, 5, 0.4, 'gpu', '{}',
);JSI Fast Path (experimental, Android)
For latency-critical drawing, skip events and React state entirely — pull the latest result synchronously as a Float32Array:
const { readLatest, CameraView, startCamera } = useMediaPipe({
task: 'handLandmarker',
modelPath: 'hand_landmarker.task',
});
// In your render/draw loop:
const latest = readLatest(); // LatestResultBuffer | null
if (latest && latest.frameId !== lastDrawnFrame) {
// latest.data: Float32Array, [x0,y0,z0, x1,y1,z1, ...]
// latest.perEntity landmarks per hand/face/pose, latest.stride components each
drawSkeleton(latest.data, latest.perEntity, latest.stride);
lastDrawnFrame = latest.frameId;
}Pull-based: no per-frame events, no per-landmark object allocation, and a busy JS thread reads the freshest frame instead of draining a backlog. Currently Android-only — readLatest() returns null on iOS and web (where the event API remains the path), so the same code runs everywhere.
UI-thread drawing with worklets
The JSI buffer pairs with Reanimated + Skia to draw the skeleton entirely on the UI thread — React renders the overlay once, then never participates in the per-frame draw loop again. Scrolls, modals, GC pauses, and busy app logic on the JS thread no longer stutter the overlay, because the UI thread just paints the freshest landmarks it has.
The data path is fully allocation-free per frame:
native result → JSI readLatest() (Float32Array) → Reanimated shared value
→ useDerivedValue worklet builds a Skia path → Canvas repaints (no React render)The example app's WorkletLandmarkOverlay (example/components/mediapipe/) is a complete, copy-pasteable implementation. The wiring:
import { useMediaPipe } from 'expo-mediapipe';
import { WorkletLandmarkOverlay, useWorkletFrame } from './WorkletLandmarkOverlay';
import { TASK_CONNECTIONS } from './ResultOverlay';
function HandTracker() {
const { frame, push } = useWorkletFrame();
const { CameraView, startCamera, readLatest } = useMediaPipe({
task: 'handLandmarker',
modelPath: 'hand_landmarker.task',
// The event is only a tick — the landmark payload rides the JSI buffer.
onResults: (r) => {
const latest = readLatest();
if (latest) {
push({
data: Array.from(latest.data),
stride: latest.stride,
perEntity: latest.perEntity,
imgW: r.imageWidth ?? 0,
imgH: r.imageHeight ?? 0,
mirror: r.isFrontCamera ?? false,
});
}
},
});
return (
<View style={{ flex: 1 }}>
<CameraView style={StyleSheet.absoluteFill} />
<WorkletLandmarkOverlay frame={frame} connections={TASK_CONNECTIONS.handLandmarker} />
</View>
);
}Projection tip: project landmarks against the overlay's measured
onLayoutbox, notuseWindowDimensions()— the window includes the header/tab bar that the camera view doesn't occupy, which shifts the skeleton off the subject.WorkletLandmarkOverlayalready does this.
Requires react-native-reanimated and @shopify/react-native-skia in your app. Where the JSI fast path is unavailable (iOS, web), feed the worklet from the onResults event payload instead — same shared-value channel, the data just arrives as objects rather than a Float32Array.
AI Assistant Docs (MCP)
The package ships a built-in MCP server that serves this documentation to AI assistants — so Claude Code, Cursor, and other MCP clients answer expo-mediapipe questions from the real docs instead of guessing:
# Claude Code
claude mcp add expo-mediapipe -- npx -y expo-mediapipe mcp// Cursor / other MCP clients
{
"mcpServers": {
"expo-mediapipe": { "command": "npx", "args": ["-y", "expo-mediapipe", "mcp"] }
}
}Tools: list_sections, get_section, search_docs. The served docs always match the installed package version, fully offline.
Platform Support
| Platform | Camera Backend | ML Backend | Min SDK | |---|---|---|---| | Android | CameraX | MediaPipe Tasks Vision 0.10+ | API 24 (Android 7.0) | | iOS | AVFoundation | MediaPipe Tasks Vision 0.10+ | iOS 14.0 | | Web | getUserMedia | @mediapipe/tasks-vision (WASM) | Modern browsers |
Web
The same hooks and functions work on Expo web with zero code changes — inference runs in-browser via Google's official WASM build (loaded from CDN at runtime, pinned to the bundled SDK version).
Web specifics:
- Models: put your
.taskfiles in your project'spublic/models/directory — theasset://models/<file>scheme maps to/models/<file>on web. Full URLs also work, anddownloadModel(url)simply returns the URL (browsers handle caching). - Camera: uses
getUserMedia; the browser shows its own permission prompt.cameraFacingmaps to thefacingModeconstraint. - GPU:
delegate: 'gpu'uses WebGL when available, falling back to CPU/WASM. processVideo()seeks through the file with an off-screen<video>element; videos must be same-origin or CORS-enabled.
Offline Support
All inference runs fully offline. Model files are bundled into the native binary at build time via the Expo Config Plugin — no network connection is needed after installation.
- Models are copied from your
modelsDirto native assets duringexpo prebuild - The
asset://models/<filename>URI scheme loads directly from the app bundle - For dynamic model loading, use
file://URIs with models downloaded to device storage
Requirements
- Expo 49+
- New Architecture enabled (
"newArchEnabled": trueinapp.json) - Custom dev build (not Expo Go) — native code is required
Why a Development Build?
This package uses native MediaPipe SDKs (25-30MB per platform) that aren't included in Expo Go. You need a development build — this is standard for any Expo package with custom native code.
The development build experience is nearly identical to Expo Go:
- Hot reload, shake menu, QR code scanning all work the same
- Build once, then iterate with the same fast workflow
- Use EAS Build to build in the cloud if you don't have Android Studio / Xcode
Quick setup with EAS
# Install EAS CLI
npm install -g eas-cli
# Configure (one time)
eas build:configure
# Build development client
eas build --profile development --platform android
# or: eas build --profile development --platform ios
# After installing the build on your device:
npx expo start --dev-clientLocal build (no EAS account needed)
npx expo prebuild --clean
npx expo run:android # or run:iosContributing
Contributions are welcome! Please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/my-feature) - Commit your changes
- Push to the branch (
git push origin feature/my-feature) - Open a Pull Request
License
MIT © 2026 Ayush Jadaun
