stream-sitemap-parser

v4.0.3

Published

2 years ago

Receive any type of sitemap stream and parse it. Stream back list of URLs or errors found

0High
0Medium
0Low

sitemap-parser

Stream a sitemap file and get back a stream of URLs or any error found while parsing the file.

Usage

const { fetch, verify, getRules } = require('stream-sitemap-parser');

fs.createReadStream(file)
  .pipe(fetch())
  .on('data', function (url) {
    // each chunk now contains an url and all its given atributes
    {
      loc: 'www.google.com',
      lastmod: '2017-01-01T00:00:00.000Z',
      changefreq: 'monthly',
      priority: '0.8',
      alternate: [
        {
          href: 'https://www.google.com/es/',
          hreflang: 'es'
        }
      ]
    }
  })

verify(fs.createReadStream(file))
  .then(result => {
    // result will be an object containing information about any warning or error found while parsing the sitemap
    {
      messages: [
        {
          type: 'tooManyTags',
          details: {
            parent: 'url',
            tag: 'loc'
          }
        }
      ],
      alternates: [
        {
          loc: 'https://www.google.com',
          alternate: [
            {
              href: 'https://www.google.com/es/',
              hreflang: 'es'
            }
          ]
      ]
    }
  })

getRules();
// returns an object of all loaded rules of the parser

fetch and verify can take several options.

fetch ( { contentType, domain, maxSize, maxUrls } )

verify (sitemapStream, { contentType, domain, maxSize, maxUrls } )

contentType will be by default xml. Set it to txt when streaming that data type.

domain will be by default null. Set it to a given domain to make sure that the URLs parsed will have the same domain.

maxSize will be by default 50MB. Set it to any given size to make sure that the stream can't have a larger size than this.

maxUrls will be by default 50000. Set it to any given value to make sure that no more URLs will be parsed.

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

stream-sitemap-parser

v4.0.3

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

sitemap-parser

Usage