@localmode/mediapipe
v2.0.0
Published
MediaPipe Tasks provider for @localmode - hand/pose/face landmarks, gesture recognition, audio classification, language detection, and more via Google's on-device WASM runtime
Downloads
198
Maintainers
Readme
@localmode/mediapipe
MediaPipe Tasks provider for LocalMode -- run Google's on-device perception models in the browser. Hand, pose, and face landmark detection, gesture recognition, audio classification, language detection, and more, all entirely on-device via WebAssembly.
Wraps @mediapipe/tasks-vision, @mediapipe/tasks-audio, and @mediapipe/tasks-text as a single unified LocalMode provider. The privacy-first part is the point: the camera, microphone, and text never leave the browser -- the only network requests are the one-time model and WASM downloads.
Features
- 13 curated models -- landmarks, gestures, classification, detection, segmentation, embeddings, and language detection, all verified against Google's CDN
- Real-time streaming -- hand, pose, face, and gesture trackers run live over a
<video>element at 30-60fps - Universal browser support -- pure WebAssembly + WebGL; no WebGPU required
- Tiny models -- most under 10MB; face detection and selfie segmentation are ~250KB
- Unified LocalMode interface -- landmark tasks use
detectHands(),detectPose(), etc.; classification/detection/embedding tasks reuse the existing core functions - AbortSignal cancellation on every single-frame function
- GPU or CPU delegate, configurable per provider or per model
Installation
pnpm install @localmode/mediapipe @localmode/coreThe @mediapipe/tasks-* dependencies are installed automatically. The WASM runtime loads from the jsDelivr CDN by default -- set wasmBasePath to self-host it for fully offline apps.
Quick Start
Single-frame detection
import { detectHands } from '@localmode/core';
import { mediapipe } from '@localmode/mediapipe';
const { hands } = await detectHands({
model: mediapipe.handLandmarker(),
image: imageBlob,
numHands: 2,
});
for (const hand of hands) {
console.log(`${hand.handedness} hand -- ${hand.landmarks.length} landmarks`);
}Real-time streaming
import { mediapipe } from '@localmode/mediapipe';
const tracker = mediapipe.createHandTracker({
video: videoElement,
numHands: 2,
onResults: (hands, timestampMs) => drawHands(hands),
});
await tracker.start();
// later
tracker.stop();
await tracker.close();Tasks
The provider exposes a factory method per task. Landmark and gesture tasks use new core functions; the rest reuse standard LocalMode interfaces.
| Method | Interface | Core function |
| ------------------------------- | -------------------------- | ------------------------- |
| mediapipe.handLandmarker() | HandLandmarkModel | detectHands() |
| mediapipe.poseLandmarker() | PoseLandmarkModel | detectPose() |
| mediapipe.faceLandmarker() | FaceLandmarkModel | detectFaceLandmarks() |
| mediapipe.faceDetector() | FaceDetectionModel | detectFace() |
| mediapipe.gestureRecognizer() | GestureRecognitionModel | recognizeGesture() |
| mediapipe.imageClassifier() | ImageClassificationModel | classifyImage() |
| mediapipe.objectDetector() | ObjectDetectionModel | detectObjects() |
| mediapipe.imageSegmenter() | SegmentationModel | segmentImage() |
| mediapipe.imageEmbedder() | ImageFeatureModel | extractImageFeatures() |
| mediapipe.audioClassifier() | AudioClassificationModel | classifyAudio() |
| mediapipe.textEmbedder() | EmbeddingModel | embed() / embedMany() |
| mediapipe.languageDetector() | LanguageDetectionModel | detectLanguage() |
| mediapipe.textClassifier(modelPath) | ClassificationModel | classify() |
mediapipe.textClassifier() requires an explicit custom-trained .tflite model URL (built with MediaPipe Model Maker) -- MediaPipe ships no default text classifier. Calling it without a path throws a ValidationError.
Disposing Model Instances
Individual model instances have a close() method for releasing WASM resources when you are done with them:
const model = mediapipe.handLandmarker();
// ... use model ...
model.close(); // Release WASM resourcesThis applies to all model instances created via factory methods (not just streaming trackers). Call close() when the model is no longer needed to free memory.
Model Catalog
MEDIAPIPE_MODELS ships 13 curated models, all verified against storage.googleapis.com.
| Catalog ID | Model | Domain | Size |
| ---------------------- | ------------------------------------------ | ------ | ------ |
| hand_landmarker | Hand Landmarker | vision | 7.8MB |
| pose_landmarker | Pose Landmarker (Lite) | vision | 5.8MB |
| pose_landmarker_full | Pose Landmarker (Full) | vision | 9.4MB |
| face_landmarker | Face Landmarker (478-point mesh) | vision | 3.8MB |
| face_detector | Face Detector (BlazeFace) | vision | 230KB |
| gesture_recognizer | Gesture Recognizer | vision | 8.4MB |
| image_classifier | Image Classifier (EfficientNet-Lite0) | vision | 18.6MB |
| object_detector | Object Detector (EfficientDet-Lite0) | vision | 7.3MB |
| image_segmenter | Image Segmenter (Selfie) | vision | 250KB |
| image_embedder | Image Embedder (MobileNet-V3 Small) | vision | 4.1MB |
| audio_classifier | Audio Classifier (YAMNet, 521 categories) | audio | 4.1MB |
| language_detector | Language Detector (110 languages) | text | 315KB |
| text_embedder | Text Embedder (Universal Sentence Encoder) | text | 6.1MB |
Each factory uses its catalog default. Pass a catalog ID, a direct URL, or a modelPath setting to override:
const full = mediapipe.poseLandmarker('pose_landmarker_full');
const custom = mediapipe.handLandmarker('https://your-cdn.com/hand_landmarker.task');Streaming API
Four streaming trackers run MediaPipe vision tasks in VIDEO mode over a <video> element, invoking a callback once per processed frame (up to ~60fps):
| Factory | onResults payload |
| ---------------------------------- | -------------------------------------------------- |
| mediapipe.createHandTracker() | (hands: HandLandmarkResultItem[], timestampMs) |
| mediapipe.createPoseTracker() | (poses: PoseLandmarkResultItem[], timestampMs) |
| mediapipe.createFaceTracker() | (faces: FaceLandmarkResultItem[], timestampMs) |
| mediapipe.createGestureTracker() | (gestures: GestureResultItem[], timestampMs) |
Each returns a TrackerInstance:
interface TrackerInstance {
start(): Promise<void>; // load model + begin frame loop
stop(): void; // pause loop, keep model loaded
close(): Promise<void>; // stop and dispose the MediaPipe task
readonly isRunning: boolean;
}const tracker = mediapipe.createFaceTracker({
video: videoElement,
numFaces: 1,
outputBlendshapes: true,
onResults: (faces) => updateAvatar(faces[0]?.blendshapes),
onError: (err) => console.error(err),
});
await tracker.start();Streaming trackers report per-frame errors through onError instead of throwing.
Provider Configuration
import { createMediaPipe } from '@localmode/mediapipe';
const myMediaPipe = createMediaPipe({
delegate: 'CPU', // 'GPU' (default) | 'CPU'
wasmBasePath: '/wasm/mediapipe', // self-host the WASM runtime
});wasmBasePath also accepts an object to set the vision / audio / text runtime paths individually:
createMediaPipe({
wasmBasePath: {
vision: '/wasm/tasks-vision',
audio: '/wasm/tasks-audio',
text: '/wasm/tasks-text',
},
});Per-model settings (modelPath, delegate, wasmBasePath) override provider defaults for a single model.
Browser Compatibility
MediaPipe Tasks runs on pure WebAssembly with a WebGL GPU delegate -- it does not require WebGPU.
| Browser | WASM | WebGL (GPU delegate) | | ----------- | ---- | -------------------- | | Chrome 80+ | Yes | Yes | | Edge 80+ | Yes | Yes | | Firefox 75+ | Yes | Yes | | Safari 14+ | Yes | Yes |
If the WebGL GPU delegate is unavailable, set delegate: 'CPU' to fall back to CPU inference.
Concurrent audio + vision. The MediaPipe audio and vision WASM runtimes can conflict if run concurrently in the same thread (mediapipe#4737). If your app uses audio classification and a vision task at the same time, run one of them in a Web Worker so each runtime has its own thread. Sequential use is unaffected.
Choosing a LocalMode Vision/Perception Provider
| Provider | When to use |
| ------------------------- | -------------------------------------------------------------------------------------------- |
| @localmode/mediapipe | Real-time human perception -- hand/pose/face landmarks, gestures, live video trackers |
| @localmode/transformers | Broader catalog of pre-trained ONNX models -- captioning, OCR, summarization, classification |
| @localmode/webllm | WebGPU LLM inference for text generation |
| @localmode/wllama | GGUF LLM inference on pure WASM |
| @localmode/litert | First-party Google .litertlm LLM runtime |
Utility Exports
| Function | Description |
|----------|-------------|
| getModelEntry(id) | Look up a MediaPipeModelEntry from the built-in catalog by its MediaPipeModelId |
| resolveModelUrl(idOrUrl) | Resolve a catalog ID or direct URL to the final model download URL |
import { getModelEntry, resolveModelUrl } from '@localmode/mediapipe';
const entry = getModelEntry('hand_landmarker');
console.log(entry.url, entry.sizeBytes);
const url = resolveModelUrl('pose_landmarker_full');Implementation Classes
For advanced use or custom wiring, all implementation classes are exported directly:
| Class | Implements |
|-------|------------|
| MediaPipeHandLandmarker | HandLandmarkModel |
| MediaPipePoseLandmarker | PoseLandmarkModel |
| MediaPipeFaceDetector | FaceDetectionModel |
| MediaPipeFaceLandmarker | FaceLandmarkModel |
| MediaPipeGestureRecognizer | GestureRecognitionModel |
| MediaPipeImageClassifier | ImageClassificationModel |
| MediaPipeObjectDetector | ObjectDetectionModel |
| MediaPipeImageSegmenter | SegmentationModel |
| MediaPipeImageEmbedder | ImageFeatureModel |
| MediaPipeAudioClassifier | AudioClassificationModel |
| MediaPipeTextClassifier | ClassificationModel |
| MediaPipeTextEmbedder | EmbeddingModel |
| MediaPipeLanguageDetector | LanguageDetectionModel |
Documentation
Full documentation at localmode.dev/docs/mediapipe.
Acknowledgments
Built on Google's MediaPipe Tasks -- on-device perception models compiled to WebAssembly. Catalog models are published by Google on storage.googleapis.com.
License
MIT (this package). The underlying @mediapipe/tasks-* packages are licensed by Google under Apache-2.0.
