npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@localmode/mediapipe

v2.0.0

Published

MediaPipe Tasks provider for @localmode - hand/pose/face landmarks, gesture recognition, audio classification, language detection, and more via Google's on-device WASM runtime

Downloads

198

Readme

@localmode/mediapipe

MediaPipe Tasks provider for LocalMode -- run Google's on-device perception models in the browser. Hand, pose, and face landmark detection, gesture recognition, audio classification, language detection, and more, all entirely on-device via WebAssembly.

npm license

Docs Demo

Wraps @mediapipe/tasks-vision, @mediapipe/tasks-audio, and @mediapipe/tasks-text as a single unified LocalMode provider. The privacy-first part is the point: the camera, microphone, and text never leave the browser -- the only network requests are the one-time model and WASM downloads.

Features

  • 13 curated models -- landmarks, gestures, classification, detection, segmentation, embeddings, and language detection, all verified against Google's CDN
  • Real-time streaming -- hand, pose, face, and gesture trackers run live over a <video> element at 30-60fps
  • Universal browser support -- pure WebAssembly + WebGL; no WebGPU required
  • Tiny models -- most under 10MB; face detection and selfie segmentation are ~250KB
  • Unified LocalMode interface -- landmark tasks use detectHands(), detectPose(), etc.; classification/detection/embedding tasks reuse the existing core functions
  • AbortSignal cancellation on every single-frame function
  • GPU or CPU delegate, configurable per provider or per model

Installation

pnpm install @localmode/mediapipe @localmode/core

The @mediapipe/tasks-* dependencies are installed automatically. The WASM runtime loads from the jsDelivr CDN by default -- set wasmBasePath to self-host it for fully offline apps.

Quick Start

Single-frame detection

import { detectHands } from '@localmode/core';
import { mediapipe } from '@localmode/mediapipe';

const { hands } = await detectHands({
  model: mediapipe.handLandmarker(),
  image: imageBlob,
  numHands: 2,
});

for (const hand of hands) {
  console.log(`${hand.handedness} hand -- ${hand.landmarks.length} landmarks`);
}

Real-time streaming

import { mediapipe } from '@localmode/mediapipe';

const tracker = mediapipe.createHandTracker({
  video: videoElement,
  numHands: 2,
  onResults: (hands, timestampMs) => drawHands(hands),
});

await tracker.start();
// later
tracker.stop();
await tracker.close();

Tasks

The provider exposes a factory method per task. Landmark and gesture tasks use new core functions; the rest reuse standard LocalMode interfaces.

| Method | Interface | Core function | | ------------------------------- | -------------------------- | ------------------------- | | mediapipe.handLandmarker() | HandLandmarkModel | detectHands() | | mediapipe.poseLandmarker() | PoseLandmarkModel | detectPose() | | mediapipe.faceLandmarker() | FaceLandmarkModel | detectFaceLandmarks() | | mediapipe.faceDetector() | FaceDetectionModel | detectFace() | | mediapipe.gestureRecognizer() | GestureRecognitionModel | recognizeGesture() | | mediapipe.imageClassifier() | ImageClassificationModel | classifyImage() | | mediapipe.objectDetector() | ObjectDetectionModel | detectObjects() | | mediapipe.imageSegmenter() | SegmentationModel | segmentImage() | | mediapipe.imageEmbedder() | ImageFeatureModel | extractImageFeatures() | | mediapipe.audioClassifier() | AudioClassificationModel | classifyAudio() | | mediapipe.textEmbedder() | EmbeddingModel | embed() / embedMany() | | mediapipe.languageDetector() | LanguageDetectionModel | detectLanguage() | | mediapipe.textClassifier(modelPath) | ClassificationModel | classify() |

mediapipe.textClassifier() requires an explicit custom-trained .tflite model URL (built with MediaPipe Model Maker) -- MediaPipe ships no default text classifier. Calling it without a path throws a ValidationError.

Disposing Model Instances

Individual model instances have a close() method for releasing WASM resources when you are done with them:

const model = mediapipe.handLandmarker();
// ... use model ...
model.close(); // Release WASM resources

This applies to all model instances created via factory methods (not just streaming trackers). Call close() when the model is no longer needed to free memory.

Model Catalog

MEDIAPIPE_MODELS ships 13 curated models, all verified against storage.googleapis.com.

| Catalog ID | Model | Domain | Size | | ---------------------- | ------------------------------------------ | ------ | ------ | | hand_landmarker | Hand Landmarker | vision | 7.8MB | | pose_landmarker | Pose Landmarker (Lite) | vision | 5.8MB | | pose_landmarker_full | Pose Landmarker (Full) | vision | 9.4MB | | face_landmarker | Face Landmarker (478-point mesh) | vision | 3.8MB | | face_detector | Face Detector (BlazeFace) | vision | 230KB | | gesture_recognizer | Gesture Recognizer | vision | 8.4MB | | image_classifier | Image Classifier (EfficientNet-Lite0) | vision | 18.6MB | | object_detector | Object Detector (EfficientDet-Lite0) | vision | 7.3MB | | image_segmenter | Image Segmenter (Selfie) | vision | 250KB | | image_embedder | Image Embedder (MobileNet-V3 Small) | vision | 4.1MB | | audio_classifier | Audio Classifier (YAMNet, 521 categories) | audio | 4.1MB | | language_detector | Language Detector (110 languages) | text | 315KB | | text_embedder | Text Embedder (Universal Sentence Encoder) | text | 6.1MB |

Each factory uses its catalog default. Pass a catalog ID, a direct URL, or a modelPath setting to override:

const full = mediapipe.poseLandmarker('pose_landmarker_full');
const custom = mediapipe.handLandmarker('https://your-cdn.com/hand_landmarker.task');

Streaming API

Four streaming trackers run MediaPipe vision tasks in VIDEO mode over a <video> element, invoking a callback once per processed frame (up to ~60fps):

| Factory | onResults payload | | ---------------------------------- | -------------------------------------------------- | | mediapipe.createHandTracker() | (hands: HandLandmarkResultItem[], timestampMs) | | mediapipe.createPoseTracker() | (poses: PoseLandmarkResultItem[], timestampMs) | | mediapipe.createFaceTracker() | (faces: FaceLandmarkResultItem[], timestampMs) | | mediapipe.createGestureTracker() | (gestures: GestureResultItem[], timestampMs) |

Each returns a TrackerInstance:

interface TrackerInstance {
  start(): Promise<void>;   // load model + begin frame loop
  stop(): void;             // pause loop, keep model loaded
  close(): Promise<void>;   // stop and dispose the MediaPipe task
  readonly isRunning: boolean;
}
const tracker = mediapipe.createFaceTracker({
  video: videoElement,
  numFaces: 1,
  outputBlendshapes: true,
  onResults: (faces) => updateAvatar(faces[0]?.blendshapes),
  onError: (err) => console.error(err),
});

await tracker.start();

Streaming trackers report per-frame errors through onError instead of throwing.

Provider Configuration

import { createMediaPipe } from '@localmode/mediapipe';

const myMediaPipe = createMediaPipe({
  delegate: 'CPU',                     // 'GPU' (default) | 'CPU'
  wasmBasePath: '/wasm/mediapipe',     // self-host the WASM runtime
});

wasmBasePath also accepts an object to set the vision / audio / text runtime paths individually:

createMediaPipe({
  wasmBasePath: {
    vision: '/wasm/tasks-vision',
    audio: '/wasm/tasks-audio',
    text: '/wasm/tasks-text',
  },
});

Per-model settings (modelPath, delegate, wasmBasePath) override provider defaults for a single model.

Browser Compatibility

MediaPipe Tasks runs on pure WebAssembly with a WebGL GPU delegate -- it does not require WebGPU.

| Browser | WASM | WebGL (GPU delegate) | | ----------- | ---- | -------------------- | | Chrome 80+ | Yes | Yes | | Edge 80+ | Yes | Yes | | Firefox 75+ | Yes | Yes | | Safari 14+ | Yes | Yes |

If the WebGL GPU delegate is unavailable, set delegate: 'CPU' to fall back to CPU inference.

Concurrent audio + vision. The MediaPipe audio and vision WASM runtimes can conflict if run concurrently in the same thread (mediapipe#4737). If your app uses audio classification and a vision task at the same time, run one of them in a Web Worker so each runtime has its own thread. Sequential use is unaffected.

Choosing a LocalMode Vision/Perception Provider

| Provider | When to use | | ------------------------- | -------------------------------------------------------------------------------------------- | | @localmode/mediapipe | Real-time human perception -- hand/pose/face landmarks, gestures, live video trackers | | @localmode/transformers | Broader catalog of pre-trained ONNX models -- captioning, OCR, summarization, classification | | @localmode/webllm | WebGPU LLM inference for text generation | | @localmode/wllama | GGUF LLM inference on pure WASM | | @localmode/litert | First-party Google .litertlm LLM runtime |

Utility Exports

| Function | Description | |----------|-------------| | getModelEntry(id) | Look up a MediaPipeModelEntry from the built-in catalog by its MediaPipeModelId | | resolveModelUrl(idOrUrl) | Resolve a catalog ID or direct URL to the final model download URL |

import { getModelEntry, resolveModelUrl } from '@localmode/mediapipe';

const entry = getModelEntry('hand_landmarker');
console.log(entry.url, entry.sizeBytes);

const url = resolveModelUrl('pose_landmarker_full');

Implementation Classes

For advanced use or custom wiring, all implementation classes are exported directly:

| Class | Implements | |-------|------------| | MediaPipeHandLandmarker | HandLandmarkModel | | MediaPipePoseLandmarker | PoseLandmarkModel | | MediaPipeFaceDetector | FaceDetectionModel | | MediaPipeFaceLandmarker | FaceLandmarkModel | | MediaPipeGestureRecognizer | GestureRecognitionModel | | MediaPipeImageClassifier | ImageClassificationModel | | MediaPipeObjectDetector | ObjectDetectionModel | | MediaPipeImageSegmenter | SegmentationModel | | MediaPipeImageEmbedder | ImageFeatureModel | | MediaPipeAudioClassifier | AudioClassificationModel | | MediaPipeTextClassifier | ClassificationModel | | MediaPipeTextEmbedder | EmbeddingModel | | MediaPipeLanguageDetector | LanguageDetectionModel |

Documentation

Full documentation at localmode.dev/docs/mediapipe.

Acknowledgments

Built on Google's MediaPipe Tasks -- on-device perception models compiled to WebAssembly. Catalog models are published by Google on storage.googleapis.com.

License

MIT (this package). The underlying @mediapipe/tasks-* packages are licensed by Google under Apache-2.0.