@ubox-lib/ubox-human

v1.1.3

Published

a month ago

Webcam pose detection add-on for Ubox — translates Human.js output to the Ubox 14-joint format

0High
0Medium
0Low

alves.reme

ubox pose-detection webcam human-js interactive

Ubox Human

An add-on for ubox-machine that uses the webcam instead of a Kinect sensor. It runs Human.js in the background, detects user skeletons from the camera feed, and feeds them into PhygitalMove using the same data format — making it a drop-in replacement for Kinect input.

CDN

Load after ubox-machine, then call UboxHuman.init() after Ubox.setSensor():

<script src="https://unpkg.com/@ubox-lib/[email protected]/ubox-machine.min.js"></script>
<script>
  const machine = Ubox.setMachine({
    /* ... */
  });
  const sensor = Ubox.setSensor({ machine: machine });
  machine.init();
</script>
<script src="https://unpkg.com/@ubox-lib/[email protected]/ubox-human.min.js"></script>
<script>
  UboxHuman.init();
</script>

How It Works

Call UboxHuman.init(config) to start the library. It:

Fetches and loads Human.js from CDN.
Requests camera access from the browser.
Starts a continuous detection loop on the webcam feed.
Calls window.getSkeletons (set by Ubox.setSensor()) with skeleton data on each frame.

Load order matters. Ubox.setSensor() must be called before UboxHuman.init() because ubox-human feeds data into window.getSkeletons, which setSensor assigns. If the sensor is not set up first, skeleton data will be silently dropped.

Configuration

UboxHuman.init(config) accepts an optional config object that is deep-merged with the defaults. You only need to specify the keys you want to override.

UboxHuman.init({
  body: { maxDetected: 2 }, // detect up to 2 people
  filter: { flip: false },  // disable mirror mode
  backend: "webgl",         // use WebGL instead of WebGPU
});

Default config:

{
  backend: "webgpu",
  modelBasePath: "https://vladmandic.github.io/human-models/models",
  debug: false,
  object: { enabled: false },
  body: {
    enabled: true,
    maxDetected: 1,
    modelPath: "https://vladmandic.github.io/human-models/models/movenet-thunder.json",
  },
  face: { enabled: false },
  hand: { enabled: false },
  gesture: { enabled: false },
  filter: { enabled: false, flip: true },
  segmentation: { enabled: false },
}

Any key from the Human.js config reference can be overridden.

Multi-Person Detection

Set body.maxDetected above 1 to detect multiple people simultaneously:

UboxHuman.init({ body: { maxDetected: 4 } });

When maxDetected > 1, ubox-human uses a nearest-neighbor approximation to maintain stable skeleton IDs across frames:

Each person is tracked by their hip centroid position from the previous frame.
A new detection is matched to the nearest known person; unmatched detections receive a new ID.
A person's ID is preserved for up to 30 frames of occlusion before being expired.
When a person re-enters after expiry, they receive a new ID.

When maxDetected === 1 (the default), IDs are assigned by index with no tracking overhead.

Camera Controls

| Key | Action | | --- | ------------------------------------------------ | | f | Flip the camera feed horizontally (mirror mode). |

Output Format

The skeleton data produced by ubox-human is identical in shape to what a Kinect sensor sends through PhygitalMove. The same userCallback, skeleton, and user objects apply — refer to the PhygitalMove skeleton documentation for the full data structure.

Joint index reference (same 14 joints as PhygitalMove):

| Index | Reference | Name | | ----- | -------------- | --------------- | | 0 | HAND_RIGHT | Right hand | | 1 | ELBOW_RIGHT | Right elbow | | 2 | SHOULDER_RIGHT | Right shoulder | | 3 | HAND_LEFT | Left hand | | 4 | ELBOW_LEFT | Left elbow | | 5 | SHOULDER_LEFT | Left shoulder | | 6 | HEAD | Head | | 7 | SPINE_NECK | Neck | | 8 | SPINE_CENTER | Center of spine | | 9 | HIP_CENTER | Center of hip | | 10 | KNEE_LEFT | Left knee | | 11 | KNEE_RIGHT | Right knee | | 12 | ANKLE_LEFT | Left ankle | | 13 | ANKLE_RIGHT | Right ankle |

Joints HEAD, SPINE_NECK, SPINE_CENTER, and HIP_CENTER are synthesized by averaging nearby detected keypoints — they are not directly detected by the camera model.

Differences from Kinect

| | Kinect (PhygitalMove) | Webcam (ubox-human) | | ------------ | ------------------------------ | ------------------------------------------------------------- | | Skeleton IDs | Sensor-assigned integers | 999x range (9990, 9991, …); stable across frames when maxDetected > 1 | | z (depth) | Accurate meter distance (0–4m) | Not available — always 2 | | Joint origin | Direct sensor measurement | MoveNet Thunder model via Human.js | | Backend | Native Kinect SDK | WebGPU (configurable) |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme