npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

site-mapper

v4.1.0

Published

sitemap.xml generation in node.js.

Downloads

1,343

Readme

site-mapper

Site Map Generation in node.js

Requirements

Node.js >= 18

Installation

This module is intended to be used as a dependency in a website-specific site map building project. Add the module to the dependencies section of a package.json file:

{
  "dependencies": {
    "site-mapper": ">= 4.0.0"
  }
}
npm install --save site-mapper

Running site-mapper

Create a directory to hold your site map generation configuration. This directory will hold all the files needed to tell site-mapper what to create.

Dependencies

Create a package.json file similar to the following:

{
  "name": "my-website-site-maps",
  "description": "sitemap generation for mysite.com",
  "version": "1.0.0",
  "dependencies": {
    "site-mapper": ">= 4.0.0"
  },
  "engines": {
    "node": ">=18"
  }
}

Configuration

Create a directory called ./config. For each environment you will generate sitemaps for, create a JavaScript file named for the environment:

./config/production.js

Configuration Format

The configuration file can contain any of the following keys. The values below are defaults that will be used unless overridden in your configuration file.

const config = {};
config.sources = {};
config.sitemaps = {};
config.logConfig = {
  name: 'sitemapper',
  level: 'debug'
};
config.defaultSitemapConfig = {
  targetDirectory: `${process.cwd()}/tmp/sitemaps/${config.env}`,
  sitemapIndex: 'sitemap.xml',
  sitemapRootUrl: 'http://www.mysite.com',
  sitemapFileDirectory: '/sitemaps',
  maxUrlsPerFile: 50000,
  urlBase: 'http://www.mysite.com'
};

module.exports = config;

The sitemaps object contains named keys pointing at objects that define a particular sitemap. The sitemap definition can contain (and override) any of the keys in the config.defaultSitemapConfig object. The produced sitemap consists of a sitemap index XML file referencing one or more gzipped sitemap XML files, created from URLs produced by the config.sources objects.

The configuration allows for defining one or more sitemaps to create — for example, one sitemap for the www subdomain and another for the foobar subdomain. By default, all sources defined in config.sources are used to generate URLs for all sitemaps. To use different sources for different sitemaps, provide a sources key in each sitemap configuration object:

// Specify which sources to include — all others are ignored
sources: {
  includes: ['source1', 'source2']
}

or

// Specify which sources to exclude — all others are included
sources: {
  excludes: ['source1', 'source2']
}

The sources object contains arbitrarily named keys pointing at functions that take a single sitemapConfig object and return an object with type and options keys. The options key points at a source configuration object that can contain: input, options, siteMap, cached, ignoreErrors.

The input parameter sitemapConfig is an object formed by merging config.defaultSitemapConfig with the specific sitemap configuration.

type is either one of the built-in source type classes (see below) or a site-specific class derived from SitemapTransformer.

A minimal config might look like ./config/staging.js:

const {StaticSetSource, JsonSource} = require('site-mapper');

const appConfig = {
  sitemaps: {
    main: {
      sitemapRootUrl: 'http://staging.mysite.com',
      urlBase: 'http://staging.mysite.com',
      sitemapIndex: 'sitemap_index.xml',
      targetDirectory: `${process.cwd()}/tmp/sitemaps/staging`
    }
  },
  sources: {
    staticUrls: (sitemapConfig) => ({
      type: StaticSetSource,
      options: {
        siteMap: {
          channel: 'static',
          changefreq: 'weekly',
          priority: 1
        },
        options: {
          urls: ['/', '/about', '/faq', '/jobs']
        }
      }
    }),
    serviceUrls: (sitemapConfig) => ({
      type: JsonSource,
      options: {
        siteMap: {
          changefreq: 'weekly',
          priority: 0.8,
          channel: (url) => url.category,
          urlAugmenter: (url) => {
            url.url = `http://${sitemapConfig.urlBase}/widgets/${url.category}/${url.url}`;
          }
        },
        input: {
          url: 'http://api.mysite.com/widgets'
        },
        options: {
          filter: /urls\./
        }
      }
    })
  }
};

module.exports = appConfig;

Logging

site-mapper logs using pino. To get pretty-printed logs while running, pipe output through pino-pretty:

NODE_ENV=staging ./node_modules/.bin/site-mapper | npx pino-pretty

Running the Code

  1. Install dependencies:
    npm install
  2. Run the generator:
    NODE_ENV=staging ./node_modules/.bin/site-mapper

Overriding the configuration from the command line

For one-off generation of sitemap(s) for a single source, the site-mapper command line tool accepts the following arguments:

Usage: node_modules/.bin/site-mapper [-s SITEMAP] [-i INCLUDES] [-e EXCLUDES]

Options:
  -s, --sitemap  name of the sitemap in the sitemaps section of the config file
  -i, --include  only include specified source(s)                        [array]
  -e, --exclude  add specified source(s) to excludes                     [array]
  -h, --help     Show help                                             [boolean]

Using the configuration above, to generate the sitemap(s) for just the staticUrls source:

NODE_ENV=staging ./node_modules/.bin/site-mapper -s main -i staticUrls

Sitemap Generation

The site-mapper module views the sitemap generation process as follows:

SiteMapper creates one or more Source objects, pipes each one to a Sitemap, which then pipes to one or more SitemapFile objects, depending on the number of URLs the source produces and the configured maximum number of URLs per SitemapFile (50,000 by default).

+------------+          +------------+        +--------------+       +-------------+
| SiteMapper |          |   Source   |        | Sitemap      |       | SitemapFile |
|------------| creates  |------------|creates |--------------| adds  |-------------|
|            +--------->|            |------->|              |------>|             |
|            |          |            |  urls  |              | urls  |             |
+------------+          +------------+        +--------------+       +-------------+

Sources

Sources are Node.js Transform stream implementations that operate in object mode and produce URL objects from data of a specific format. The included source types are:

  1. StaticSetSource — configured with a static list of URL strings in the config file
  2. CsvSource — produces URLs from CSV data
  3. JsonSource — produces URLs from JSON data
  4. XmlSource — produces URLs from XML data

Sources are configured with an input that produces raw text data of the appropriate format. Source inputs can be files, URLs, or an instantiated Readable stream object.

Source configuration details

{
  ignoreErrors: true,   // log and continue on error instead of aborting
  input: {
    // one of:
    fileName: '/path/to/file.csv',
    url: 'https://api.mysite.com/data',
    stream: readableStream
  },
  options: {},          // source-specific options (see below)
  siteMap: {},          // per-source sitemap metadata
  cached: {}            // optional caching config (see below)
}
  • ignoreErrors — set to true to log and skip errors instead of aborting the entire run
  • input — one of fileName, url, or stream
  • options — source-specific options (see individual source docs below)
  • siteMap — sitemap metadata for this source: channel, changefreq, priority, urlFormatter, urlAugmenter, urlFilter, extraUrls
  • cached — if present, enables caching of input data; contains cacheFile (path) and maxAge (milliseconds)

URL Channel

Each URL is associated with a channel. The channel names individual sitemap files and can be a static string or derived from each URL at runtime:

sources: {
  staticChannel: (sitemapConfig) => ({
    siteMap: { channel: 'products' }
  }),
  dynamicChannel: (sitemapConfig) => ({
    siteMap: { channel: (url) => url.category }
  })
}

Sitemap files are created as ${CHANNEL}${SEQUENCE}.xml.gz in the target directory.

Important: if two sources produce the same channel name, the second source's files will overwrite the first source's. Ensure each source produces a unique channel.

Sitemaps

A Sitemap is created for each URL channel. As URLs are added, sequentially numbered files are created: ${CHANNEL}${SEQUENCE}.xml.gz. Each file holds up to maxUrlsPerFile URLs (default: 50,000).

Sitemap Index and Files

site-mapper creates a sitemap index and as many gzipped sitemap files as required. Gzipping cannot be disabled.

Publishing Sitemaps to Search Engines

It is up to you how you expose the generated sitemaps to Google and other search engines — there are no default publishing mechanisms in site-mapper.

Releases

Releases are fully automated via semantic-release and triggered by pushes to master that pass CI. No manual versioning or tagging is required.

How it works

The CI workflow runs tests on Node 22.x and 24.x. If tests pass and the push is to master, semantic-release analyses the commit messages since the last release and:

  1. Determines the next version number (patch/minor/major) from commit types
  2. Publishes the new version to npm
  3. Creates a GitHub release with generated release notes

Commit message conventions

Releases follow the Angular commit message convention:

| Commit type | Release triggered | |---|---| | fix: ... | Patch (e.g. 1.0.01.0.1) | | feat: ... | Minor (e.g. 1.0.01.1.0) | | feat!: ... or BREAKING CHANGE: in body | Major (e.g. 1.0.02.0.0) | | chore: ..., docs: ..., refactor: ... | No release | | Dependabot merge commits | No release |

A commit-msg git hook (installed via husky on npm install) enforces this format and will reject non-conforming messages. Use the interactive commit helper to avoid rejections:

npm run commit

Required secrets

The GitHub repository must have these secrets configured:

  • NPM_TOKEN — npm automation token with publish access
  • GITHUB_TOKEN — provided automatically by GitHub Actions

Tests

npm test

Sources

StaticSetSource

options: {
  urls: ['/', '/about', '/faq']
}

Produces URLs from a static array of URL strings defined in the configuration.

CsvSource

options: {
  columns: ['url', 'imageUrl', 'lastModified'],
  relax_column_count: true
}
  • columns — names of the CSV columns (default: ['url', 'imageUrl', 'lastModified', 'comments'])
  • relax_column_counttrue to suppress errors when column count differs from config

XmlSource

options: {
  urlTag: 'url',
  urlAttributes: {
    lastmod: 'lastModified',
    changefreq: 'changeFrequency',
    priority: 'priority',
    loc: 'url'
  }
}
  • urlTag — name of the XML elements containing URL data
  • urlAttributes — map of XML attribute/child tag names to URL object property names

Out of the box, XmlSource is configured to read sitemap.xml files, which is useful for including sitemaps generated by other methods alongside site-mapper's own output.

JsonSource

The JSON source expects an array in the JSON data containing objects or strings representing URLs.

options: {
  filter: /regex/,
  transformer: (obj) => ({ url: obj.href }),
  stringArray: true
}
  • filter — regex specifying where the URL array(s) are within the JSON document
  • transformer — function that converts a raw object or string from JSON into a URL-compatible object
  • stringArray — set to true if the array contains plain URL strings rather than objects

Source Input Streams

Sources read data from local files (input.fileName), from URLs (input.url), or from Readable stream instances (input.stream).

site-mapper ships with two custom input stream classes for HTTP data:

MultipleHttpInput

const {MultipleHttpInput} = require('site-mapper');

new MultipleHttpInput({
  urls: ['https://api.mysite.com/page1', 'https://api.mysite.com/page2'],
  format: 'json',   // wraps all responses in a JSON array: [response1, response2, ...]
  httpOptions: {}   // optional headers etc.
})

Fetches each URL in sequence and concatenates the responses. When format: 'json', the individual responses are wrapped in a JSON array so the combined output can be parsed by JsonSource. Pass it as input.stream to a source:

sources: {
  mySource: (sitemapConfig) => ({
    type: JsonSource,
    options: {
      input: {
        stream: new MultipleHttpInput({
          urls: ['https://api.mysite.com/a', 'https://api.mysite.com/b'],
          format: 'json'
        })
      },
      options: { filter: /.*/ },
      siteMap: { channel: 'products', changefreq: 'daily', priority: 0.8 }
    }
  })
}

PaginatedHttpInput

const {PaginatedHttpInput} = require('site-mapper');

new PaginatedHttpInput({
  url: 'https://api.mysite.com/items',
  format: 'json',           // wraps pages in a JSON array
  pagination: {
    page: 'page',           // query param name for the page number (default: 'page')
    per: 'per',             // query param name for page size (default: 'per')
    perPage: 100,           // page size value (default: 100)
    increment: 1,           // amount to increment page number per request (default: 1)
    start: 1                // starting page number (default: 1)
  },
  stop: {
    string: '"done"'        // stop fetching when this string appears in a response
  },
  httpOptions: {}           // optional headers etc.
})

Fetches pages from a paginated HTTP API, incrementing the page parameter with each request. Pagination stops when stop.string is found in a response body. When format: 'json', all page responses are wrapped in a JSON array so the combined output can be parsed by JsonSource.