stream-sitemap-parser
v4.0.3
Published
Receive any type of sitemap stream and parse it. Stream back list of URLs or errors found
Keywords
Readme
sitemap-parser
Stream a sitemap file and get back a stream of URLs or any error found while parsing the file.
Usage
const { fetch, verify, getRules } = require('stream-sitemap-parser');
fs.createReadStream(file)
.pipe(fetch())
.on('data', function (url) {
// each chunk now contains an url and all its given atributes
{
loc: 'www.google.com',
lastmod: '2017-01-01T00:00:00.000Z',
changefreq: 'monthly',
priority: '0.8',
alternate: [
{
href: 'https://www.google.com/es/',
hreflang: 'es'
}
]
}
})
verify(fs.createReadStream(file))
.then(result => {
// result will be an object containing information about any warning or error found while parsing the sitemap
{
messages: [
{
type: 'tooManyTags',
details: {
parent: 'url',
tag: 'loc'
}
}
],
alternates: [
{
loc: 'https://www.google.com',
alternate: [
{
href: 'https://www.google.com/es/',
hreflang: 'es'
}
]
]
}
})
getRules();
// returns an object of all loaded rules of the parser
fetch and verify can take several options.
fetch ( { contentType, domain, maxSize, maxUrls } )
verify (sitemapStream, { contentType, domain, maxSize, maxUrls } )
contentType will be by default xml. Set it to txt when streaming that data type.
domain will be by default null. Set it to a given domain to make sure that the URLs parsed will have the same domain.
maxSize will be by default 50MB. Set it to any given size to make sure that the stream can't have a larger size than this.
maxUrls will be by default 50000. Set it to any given value to make sure that no more URLs will be parsed.
