npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

llm-frames

v0.2.2

Published

Extract video frames for LLM context injection

Readme

llm-frames

npm CI

Extract video frames as a grid image for LLM context injection.

English | 한국어

What this is

llm-frames wraps ffmpeg to extract frames from a video and compose them into a single grid image — purpose-built for feeding video content into multimodal LLMs.

  • Grid image (~1500×1500 JPEG) — one image, all frames, minimal tokens
  • XML description — frame indices + timestamps as text for LLM context
  • Auto layout — cell size (longest side 384–512px) and grid shape calculated from video aspect ratio

Why llm-frames

Existing tools either target human inspection or general-purpose video processing — not LLM input budgets.

| | llm-frames | vcsi | ffmpeg (raw) | |---|---|---|---| | Purpose | LLM context injection | Human contact sheet | Frame extraction | | Output | Grid JPEG + XML timestamps | Contact sheet image | Individual frames | | Token budget aware | ✅ one image, all frames | ❌ large human-readable sheet | ❌ N separate images | | Frame index overlay | ✅ LLM can reference by number | ❌ | ❌ | | Auto layout | ✅ from video aspect ratio | partial | ❌ | | Timestamp text pairing | ✅ XML alongside image | ❌ | ❌ | | Runtime | Node.js + ffmpeg | Python | ffmpeg |

The core insight: sending a video to an LLM needs a paired output — a grid image the model can see, and a text block with timestamps it can reference. Neither alone is sufficient.

Requirements

  • Node.js 18+
  • ffmpeg 4.0+ in $PATH

Install

npm install llm-frames

Usage

import { extract } from "llm-frames";

const result = await extract({ input: "/path/to/video.mp4" });

// result.grid        — JPEG Buffer, inject as image
// result.description — XML string, inject as text
// result.frames      — raw VideoFrame[], for programmatic use
// result.metadata    — layout + per-frame timestamps as a JS object

Inject into LLM (Anthropic example)

const { grid, description } = await extract({ input: "video.mp4" });

const response = await anthropic.messages.create({
  model: "claude-opus-4-6",
  messages: [{
    role: "user",
    content: [
      {
        type: "image",
        source: { type: "base64", media_type: "image/jpeg", data: grid.toString("base64") },
      },
      {
        type: "text",
        text: `${description}\n\nDescribe what happens in this video.`,
      },
    ],
  }],
});

The XML description looks like:

<video_frames>
  <meta
    total_duration="1326s"
    frame_count="28"
    sampling="uniform interval=47.4s"
  />
  <frame index="1" timestamp="24s" />
  <frame index="2" timestamp="71s" />
  ...
</video_frames>

Frame numbers are overlaid on the grid image (top-left of each cell), so the LLM can reference them by index.

Options

interface ExtractOptions {
  input: string;           // path to input video

  mode?: "uniform"         // evenly spaced (default)
       | "highlight";      // scene-change biased

  startTime?: number;      // segment start (seconds, default: 0)
  endTime?: number;        // segment end   (seconds, default: end of video)

  width?: number;          // cell long side in px (default: 512)
                           // ignored by extract() — always overridden by autoLayout()
                           // only applies when calling extractFrames() directly
  quality?: number;        // JPEG quality 1–31, lower = better (default: 4)
  sceneThreshold?: number; // scene detection sensitivity 0.01–1 (highlight only, default: 0.4)
  ffmpegPath?: string;     // custom ffmpeg binary path
}

Return value

interface ExtractResult {
  frames:      VideoFrame[];   // raw frames
  grid:        Buffer;         // composed grid JPEG
  description: string;         // XML text block for LLM
  metadata:    GridMetadata;   // structured data (layout, per-frame timestamps)
  duration:    number;         // total video duration in seconds
  videoWidth:  number;
  videoHeight: number;
}

interface VideoFrame {
  index:         number;
  timestamp:     number;        // seconds from start
  data:          Buffer;        // JPEG
  mimeType:      "image/jpeg";
  isSceneChange?: boolean;      // highlight mode only
}

Sampling modes

uniform (default)

Frames at evenly spaced intervals. Count is determined by autoLayout() based on video resolution (4–36 frames, targeting a ~1500×1500 grid).

highlight

Frames at scene transition points, with uniform fill for low-motion segments. Useful for content with distinct scenes.

const result = await extract({
  input: "video.mp4",
  mode: "highlight",
  sceneThreshold: 0.3,  // lower = more sensitive
});

Segment extraction

const result = await extract({
  input: "video.mp4",
  startTime: 120,   // start at 2:00
  endTime: 360,     // end at 6:00
});

Utilities

import { autoLayout, autoCount, toHMS } from "llm-frames";

// compute grid layout for a given video resolution
const layout = autoLayout(1920, 1080);
// → { cols: 4, rows: 7, count: 28, cellW: 384, cellH: 216, longSide: 384 }

// recommended frame count from duration (8–32, ~1 frame per 2 minutes)
// note: extract() uses autoLayout().count instead — use this when calling extractFrames() directly
const count = autoCount(3600); // → 30

// seconds to HH:MM:SS
toHMS(3723); // → "01:02:03"

License

MIT