npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@tryghost/mg-assetscraper-db

v0.3.2

Published

Downloads remote assets from Ghost migration data and replaces URLs with local Ghost paths.

Readme

Asset Scraper

Downloads remote assets from Ghost migration data and replaces URLs with local Ghost paths.

What It Does

  • Discovers remote assets (images, media, files) in posts, tags, users, settings, newsletters, and snippets
  • Downloads and stores them locally in Ghost's content structure (/content/images, /content/media, /content/files)
  • Replaces remote URLs with __GHOST_URL__/content/... paths
  • Caches downloads in SQLite to avoid re-downloading duplicates
  • Auto-converts unsupported formats (HEIC/HEIF → JPEG, AVIF → WebP)

Usage

import AssetScraper from '@tryghost/mg-assetscraper-db';
import fsUtils from '@tryghost/mg-fs-utils';
import {makeTaskRunner} from '@tryghost/listr-smart-renderer';

// Create file cache for storing downloaded assets
const fileCache = new fsUtils.FileCache('my-migration');

// Migration data to process
const ctx = {
    posts: [...],
    tags: [...],
    users: [...],
    settings: [...],
    newsletters: [...]
};

// Initialize scraper
const scraper = new AssetScraper(fileCache, {
    domains: [
        'https://old-site.com',
        'https://cdn.image-service.com'
    ]
}, ctx);

await scraper.init();

// Run asset scraping tasks
const tasks = scraper.getTasks();
const taskRunner = makeTaskRunner(tasks, {
    concurrent: 5
});
await taskRunner.run();

// Check for any failed downloads
console.log(scraper.failedDownloads);

Scrape from all domains

Use allowAllDomains to scrape assets from any domain, optionally excluding specific ones. You can also supple a regular expression (as a literal or object)

const scraper = new AssetScraper(fileCache, {
    allowAllDomains: true,
    blockedDomains: [
        'https://ads.example.com',
        /https?:\/\/[a-z0-9-]+.example.com/,
        new Regexp('https?://[a-z0-9-]+.other-example.com')
    ]
}, ctx);

Note: When using allowAllDomains without any custom domains or blockedDomains, only URLs with file extensions (e.g., .jpg, .png, .mp4) are scraped. This prevents scraping non-asset URLs like API endpoints or web pages. Adding custom domain configuration disables this filter.

Options

| Option | Type | Default | Description | |-----------------------|------------------------|-------------|---------------------------------------------------------------------------| | domains | string[] | [] | Whitelist of allowed domains to scrape from (include protocol) | | allowAllDomains | boolean | false | Scrape from any domain instead of using whitelist | | blockedDomains | (string \| RegExp)[] | [] | Domains to exclude when allowAllDomains is true | | optimize | boolean | true | Optimize images using sharp | | findOnlyMode | boolean | false | Only discover assets, don't download (access via scraper.foundItems) | | baseUrl | string | undefined | Base URL for resolving relative URLs (only needed for Ghost JSON exports) | | processBase64Images | boolean | false | Extract embedded base64 images and save as files |

Context Object

The context object contains the Ghost migration data to process. Pass data directly or via result.data:

// Direct format
{
    posts: [...],
    posts_meta: [...],
    tags: [...],
    users: [...],
    settings: [...],
    custom_theme_settings: [...],
    snippets: [...],
    newsletters: [...]
}

// Alternative format
{
    result: {
        data: {
            posts: [...],
            // etc.
        }
    }
}

Supported File Types

  • Images: JPEG, PNG, GIF, WebP, SVG, ICO, AVIF, HEIC, HEIF
  • Media: MP4, WebM, OGG, MP3, WAV, M4A
  • Files: PDF, JSON, XML, RTF, OpenDocument formats, Microsoft Office formats

Notes

  • HEIC/HEIF images are automatically converted to JPEG
  • AVIF images are automatically converted to WebP
  • The SQLite cache prevents re-downloading assets across multiple runs
  • Failed downloads are tracked in scraper.failedDownloads:
[
    {
        src: 'https://example.com/image.jpg',
        status: 404,        // HTTP status code
        skip: 'Not Found'   // Reason for failure
    }
]