npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@d0paminedriven/metadata

v6.0.1

Published

convenient metadata extraction helpers

Readme

@d0paminedriven/metadata

  1. Extract - Full-featured Remote URL/Buffer extractor with network capabilities (Node.js/Server-side first) → isomorphic by design (edge cases being handled as they emerge)
  2. ExtractClient - Client-side extractor for synchronous buffer processing (Browser-friendly) → all the same metadata with no remote fetching capabilites

Lightweight, zero‑native metadata extraction for images and documents. Designed to be fast, safe, and “header‑first” so you can classify and preview files without downloading entire objects.

  • Minimal bytes fetched (range requests, idle and deadline guards)
  • Hardened against slow/origin issues with quarantining and optional CDN origin fallbacks
  • Works with Buffers or remote URLs
  • Rich, typed results for common image and office/PDF formats

Features

  • Image formats: PNG (incl. iTXt, sRGB/ICC), JPEG (EXIF orientation/date, ICC), WebP (VP8/VP8L/VP8X, animation), GIF (frame count), BMP, AVIF (ispe, XMP), ICO (largest entry), TIFF (classic), HEIC/HEIF (ISOBMFF)
  • Documents: PDF (version, pages, linearization, encryption, text preview, dates via Info and XMP), DOCX/PPTX/XLSX (core/app props, slides/sheets, quick preview), RTF, plain text and common code/data files
  • Remote fetch strategy that never blocks the batch: probes first, ranges when possible, streams minimal data otherwise
  • Typed return shapes and small helpers for reuse via mixins

Install

Use your workspace package manager:

pnpm add @d0paminedriven/metadata
# or
npm i @d0paminedriven/metadata
# or
yarn add @d0paminedriven/metadata

Quick Start

Classify any URL or Buffer and get a unified, typed result:

import { Extract } from "@d0paminedriven/metadata";

const extract = new Extract({ debug: false });

// From a remote URL
const result1 = await extract.extractRemote(
  "https://cdn.example.com/path/to/file.pdf",
  96 * 1024 // optional head size to fetch/analyze
);

// From a Buffer
const fileBuffer = await fs.promises.readFile("./sample.jpeg");
const result2 = await extract.extractRemote(fileBuffer);

if (result1.type === "IMAGE") {
  console.log(result1.width, result1.height, result1.format);
} else {
  console.log(result1.format, result1.pageCount, result1.textPreview);
}

Update: Buffer byteSize Parity

  • Fix: extractRemote(Buffer) now sets byteSize to the detected total size (the Buffer length), matching how remote URLs report total size via Content-Length/Content-Range.
  • Result: URL and Buffer inputs now return consistent byteSize values across images and documents.

Example parity check:

const extract = new Extract();
const url = "https://assets.example.com/1758473273940-dogespace.jpg";
const r1 = await extract.extractRemote(url, 96 * 1024);
const buf = await fs.promises.readFile("src/test/local/1758473273940-dogespace.jpg");
const r2 = await extract.extractRemote(buf);
console.log(r1.byteSize, r2.byteSize); // identical

When To Use Which API

  • Unified flow for URLs or Buffers: new Extract(opts).extractRemote(source, size?, timeout?)
  • Images only (Buffer input): new ImgMetadataExtractor().getImageSpecsWorkup(buffer, size?)
  • Documents only (Buffer input): new DocMetadataExtractor().getDocumentSpecsWorkup(buffer, mime, filename?)

Exports are available from the package root and by subpath:

import { Extract, DocMetadataExtractor, ImgMetadataExtractor } from "@d0paminedriven/metadata";
// or
import { Extract } from "@d0paminedriven/metadata/extract";
import { DocMetadataExtractor } from "@d0paminedriven/metadata/docs";
import { ImgMetadataExtractor } from "@d0paminedriven/metadata/images";

Return Types (summary)

Image results (ExpandedImgSpecs):

  • Core: type: "IMAGE", width, height, format, frames, animated, hasAlpha, orientation, aspectRatio
  • Color: colorModel, colorSpace, iccProfile
  • EXIF: exifDateTimeOriginal
  • Source: source?, byteSize? (remote: Content-Length/Range; buffer: Buffer.length), fetchedBytes?, contentType?
interface ExpandedImageSpecs {
    type: "IMAGE";
  width: number;
  height: number;
  format:
    | "apng"
    | "png"
    | "jpeg"
    | "gif"
    | "bmp"
    | "webp"
    | "avif"
    | "svg"
    | "ico"
    | "heic"
    | "tiff"
    | "unknown";
  frames: number;
  animated: boolean;
  hasAlpha: boolean | null;
  orientation: number | null;
  aspectRatio: number;
  colorModel:
    | "rgb"
    | "rgba"
    | "grayscale"
    | "grayscale-alpha"
    | "indexed"
    | "cmyk"
    | "ycbcr"
    | "ycck"
    | "vector"
    | "lab"
    | "unknown";
  colorSpace:
    | "unknown"
    | "srgb"
    | "display_p3"
    | "adobe_rgb"
    | "prophoto_rgb"
    | "rec2020"
    | "rec709"
    | "cmyk"
    | "lab"
    | "xyz"
    | "gray";
  iccProfile: string | null;
  exifDateTimeOriginal: string | null;
  metadata?: Record<string, string>;
  source?: string;
  fetchedBytes?: number;
  byteSize?: number;
  contentType?: string;
}

Doc results (ExpandedDocSpecs):

  • Core: type: "DOCUMENT", format, mimeType, pageCount, wordCount, lineCount, textPreview
  • PDF: pdfVersion, isEncrypted, isSearchable, isLinearized
  • Common: author, subject, keywords, createdDate, modifiedDate
  • Source: source?, byteSize? (remote: Content-Length/Range; buffer: Buffer.length), fetchedBytes?, contentType?
interface ExpandedDocSpecs {
  type: "DOCUMENT";
  format: string | null;
  mimeType: string | null;
  pageCount: number | null;
  wordCount: number | null;
  lineCount: number | null;
  language: string | null;
  encoding: string | null;
  author: string | null;
  subject: string | null;
  keywords: string[] | null;
  pdfVersion: string | null;
  isEncrypted: boolean | null;
  isSearchable: boolean | null;
  isLinearized: boolean | null;
  textPreview: string | null;
  createdDate: string | null;
  modifiedDate: string | null;
  source?: string;
  fetchedBytes?: number;
  byteSize?: number;
  contentType?: string;
}

See src/types/index.ts for full type definitions.

The Hardened Fetcher (remote URLs)

The Extract class includes a minimal, safe fetcher that tries to do the least work necessary while avoiding pathological cases:

  • Probes the first chunk via Range with a short deadline to sniff type and server capabilities
  • Prefers Range for exact head bytes; uses HEAD to detect suspicious compression; streams with an idle guard when needed
  • PDFs fetch head + tail when ranged to surface cross‑reference and XMP metadata without full download
  • Quarantines problematic URLs (default 6h) to avoid re‑attempting failing endpoints during batch work
  • Optional CDN hardening hooks:
    • originFallback(cfUrl): resolve a CDN URL to origin (e.g., R2/S3) and re‑probe
    • invalidateCloudFrontKey(key): invalidate bad cache entries when origin succeeds

Example with a CloudFront → S3 origin fallback:

const extract = new Extract({
  debug: true,
  userAgent: "my-app/metadata-extractor",
  quarantineTtlMs: 6 * 60 * 60 * 1000,
  originFallback: (url) => url.replace("https://cdn.example.com/", "https://s3.example.com/"),
  invalidateCloudFrontKey: async (urlOrKey) => {
    // map full URL to distribution key if needed and call your invalidation API
  },
});

const meta = await extract.extractRemote("https://cdn.example.com/asset.png", 96 * 1024);

Image Extraction Details

  • PNG: IHDR dims, color type → color model/space, sRGB/cHRM/iCCP, iTXt/zTXt (XMP, Creation Time), EXIF (eXIf)
  • JPEG: SOF dims, EXIF orientation (0x0112) and DateTimeOriginal (0x9003), ICC in APP2
  • WebP: VP8/VP8L/VP8X dims, alpha flag, animation frames, ICC/XMP where present
  • GIF: dims and frame counting via blocks
  • BMP: dims and basic color model
  • AVIF/HEIF: ISOBMFF box walk for dims (ispe), XMP extraction
  • ICO: picks largest entry; detects embedded PNG
  • TIFF: classic TIFF IFDs for dims/samples; BigTIFF intentionally not parsed in lightweight path

Image parser is header‑first and avoids full decode; it surfaces useful metadata quickly and safely.

Document Extraction Details

  • PDF: version, linearization, encryption, page counting via Pages/Count (with fallbacks), quick text preview (BT/Tj/TJ), Info dictionary strings (Author/Subject/Title/Keywords), dates parsed from Info and XMP
  • DOCX/PPTX/XLSX: reads docProps/core.xml and app.xml; for DOCX extracts preview text from word/document.xml; for XLSX inspects sheets, sharedStrings, and common features (formulas, charts, pivots, macros)
  • RTF: naive but effective text stripping for preview/word count
  • Plain text and code/data: BOM detection, UTF‑8 validation with Windows‑1252 fallback, word/line counts, language hint from extension

ZIP handling is a minimal central‑directory reader with deflate via fflate only when necessary.

API Reference (selected)

  • class Extract(opts?: ExtractorHardenedOptions) → unified image/doc extractor with network hardening.
    • extractRemote(source: Buffer | string, size = 16384, timeout = 5000)ExpandedDocSpecs | ExpandedImgSpecs
    • Options: originFallback, invalidateCloudFrontKey, quarantineTtlMs, userAgent, debug, plus injection of img/docs extractors
  • class ImgMetadataExtractor → image‑only, Buffer in
    • getImageSpecsWorkup(buffer: Buffer, size = 4096 * 6)ExpandedImgSpecs
  • class DocMetadataExtractor → document‑only, Buffer in
    • getDocumentSpecsWorkup(buffer: Buffer, mime: string, filename?: string)ExpandedDocSpecs

Advanced: You can share extractor instances across many Extract instances via the provided mixins if needed.

Examples

Classify many URLs efficiently:

import { Extract } from "@d0paminedriven/metadata";

const extract = new Extract();
const urls = [
  "https://example.com/report.pdf",
  "https://example.com/photo.jpg",
  "https://example.com/deck.pptx",
];

const results = await Promise.all(urls.map(u => extract.extractRemote(u, 96 * 1024)));
for (const r of results) {
  console.log(r.type === "IMAGE" ? r.format : r.format, r.source);
}

Images from disk (Buffer):

import { ImgMetadataExtractor } from "@d0paminedriven/metadata/images";

const img = new ImgMetadataExtractor();
const buf = await fs.promises.readFile("./photo.webp");
const specs = img.getImageSpecsWorkup(buf);
console.log(specs.width, specs.height, specs.colorSpace);

Documents from an upload (Buffer + MIME):

import { DocMetadataExtractor } from "@d0paminedriven/metadata/docs";

const docs = new DocMetadataExtractor();
const { buffer, mime, filename } = await readUpload();
const specs = docs.getDocumentSpecsWorkup(buffer, mime, filename);
console.log(specs.format, specs.pageCount, specs.textPreview);

Verification

  • Comprehensive test: 629 remote URLs mapped 1:1 to local files across a dozen+ MIME types (JPEG, PNG, WebP, GIF, HEIC/AVIF/ICO/TIFF, PDF, DOCX/PPTX/XLSX, RTF/TXT).
  • Each pair is compared for exact byteSize equality using the unified extractRemote API.
  • Result: 629/629 passed remote vs local byteSize comparisons.

How to run:

pnpm test

The first time you run pnpm test a pretest script will probe the generation of the required src/test/local/* directory containing all 629 remote assets pulled locally for direct comparison. Once this dir and its contents exists all subsequent tests bypass this step via an exists checker in the src/test/output-local.ts file that is executed via the pretest script.

Artifacts and helpers:

  • Remote URL list: src/test/data.ts
  • Local assets (fetched once via the fs helper): src/test/local/*
  • Generated tuples mapping remote → local: src/test/tuples.ts (built by pnpm tsx src/test/workup.ts)
  • Full sample output: src/test/test-results.md

Runtime Notes

  • Node 18+ recommended (built‑in fetch and WHATWG streams); the library uses Buffer and fetch
  • No native dependencies; only fflate for ZIP/deflate
  • Network timeouts and idle thresholds are conservative by default to keep batch processing snappy

Limits & Caveats

  • Header‑first by design: not a full codec/renderer
  • BigTIFF is detected but not parsed in the lightweight path
  • Some edge/CDN behaviors may ignore Range on the first probe; the fetcher retries with HEAD and guardrails
  • If a URL is quarantined due to repeated failures, you’ll get QUARANTINED: <url> until TTL expires (override via quarantineTtlMs)

Local Development

  • Build: pnpm --filter @d0paminedriven/metadata build
  • Typecheck/Lint: pnpm --filter @d0paminedriven/metadata typecheck / pnpm --filter @d0paminedriven/metadata lint
  • Ad‑hoc test runner: pnpm --filter @d0paminedriven/metadata test (see src/test/ for examples)

CDN Integration Example (assets.aicoalesce.com)

If you’re serving uploads from assets.aicoalesce.com / assets-dev.aicoalesce.com behind a CDN, wire up originFallback and invalidateCloudFrontKey so the extractor can heal around cache issues without blocking batches.

import { Extract } from "@d0paminedriven/metadata";

// Map CDN URLs to your origin bucket(s)
const ORIGIN = {
  prod: "https://your-prod-bucket.s3.amazonaws.com", // e.g., s3://your-prod-bucket
  dev: "https://your-dev-bucket.s3.amazonaws.com",   // e.g., s3://your-dev-bucket
};

function cfToOrigin(cfUrl: string): string {
  try {
    const u = new URL(cfUrl);
    const path = u.pathname; // keep exact key
    if (u.hostname === "assets.aicoalesce.com") return `${ORIGIN.prod}${path}`;
    if (u.hostname === "assets-dev.aicoalesce.com") return `${ORIGIN.dev}${path}`;
    return cfUrl; // non‑CDN domains untouched
  } catch {
    return cfUrl;
  }
}

// Invalidate a single CDN key; implement either an internal endpoint
// or call AWS CloudFront directly (example below commented out).
async function invalidateKey(urlOrKey: string) {
  const key = urlOrKey.startsWith("http") ? new URL(urlOrKey).pathname : urlOrKey;

  // Option A: Internal service (recommended for apps)
  if (process.env.CF_INVALIDATE_ENDPOINT) {
    await fetch(process.env.CF_INVALIDATE_ENDPOINT, {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify({ key }),
    });
    return;
  }

  // Option B: AWS SDK v3 (uncomment and add dependency if you prefer)
  // import { CloudFrontClient, CreateInvalidationCommand } from "@aws-sdk/client-cloudfront";
  // const cf = new CloudFrontClient({});
  // await cf.send(new CreateInvalidationCommand({
  //   DistributionId: process.env.CF_DISTRIBUTION_ID!,
  //   InvalidationBatch: {
  //     CallerReference: String(Date.now()),
  //     Paths: { Quantity: 1, Items: [key] },
  //   },
  // }));
}

const extract = new Extract({
  userAgent: "d0paminedriven/metadata (+apps/ws-server)",
  originFallback: cfToOrigin,
  invalidateCloudFrontKey: invalidateKey,
  quarantineTtlMs: 6 * 60 * 60 * 1000,
});

// Example: classify with a generous head size for richer headers/XMP
const meta = await extract.extractRemote(
  "https://assets.aicoalesce.com/upload/abc123/some-file.png",
  96 * 1024
);
console.log(meta.type, meta.source);

Questions or ideas to improve extraction heuristics for new formats? Contributions welcome.