@dataxquare/sitemapper
v1.2.0
Published
Parser for XML Sitemaps to be used with Robots.txt and web crawlers
Maintainers
Readme
📋 Overview
Sitemapper is a Node.js module that makes it easy to parse XML sitemaps. It supports single sitemaps, sitemap indexes with multiple sitemaps, and various sitemap formats including image and video sitemaps.
🚀 Installation
# Using npm
npm install sitemapper --save
# Using yarn
yarn add sitemapper
# Using pnpm
pnpm add sitemapper🏃♂️ Quick Start
Module Usage
import Sitemapper from 'sitemapper';
const sitemap = new Sitemapper({
timeout: 10000, // 10 second timeout
});
sitemap
.fetch('https://gosla.sh/sitemap.xml')
.then(({ url, sites }) => {
console.log('Sites: ', sites);
})
.catch((error) => console.error(error));CLI Usage
You can also use Sitemapper directly from the command line:
# Using npx
npx sitemapper https://gosla.sh/sitemap.xml💻 Examples
Promise Example
import Sitemapper from 'sitemapper';
const sitemap = new Sitemapper();
sitemap
.fetch('https://wp.seantburke.com/sitemap.xml')
.then(({ url, sites }) => {
console.log(`Sitemap URL: ${url}`);
console.log(`Found ${sites.length} URLs`);
console.log(sites);
})
.catch((error) => console.error(error));Async/Await Example
import Sitemapper from 'sitemapper';
async function parseSitemap() {
const Google = new Sitemapper({
url: 'https://www.google.com/work/sitemap.xml',
timeout: 15000, // 15 seconds
concurrency: 10,
});
try {
const { sites } = await Google.fetch();
console.log(`Found ${sites.length} URLs in the sitemap`);
console.log(sites);
} catch (error) {
console.error('Error fetching sitemap:', error);
}
}
parseSitemap();Advanced Example with Proxy
import Sitemapper from 'sitemapper';
import { HttpsProxyAgent } from 'hpagent';
const sitemapper = new Sitemapper({
url: 'https://gosla.sh/sitemap.xml',
timeout: 30000,
concurrency: 5,
retries: 2,
debug: true,
proxyAgent: new HttpsProxyAgent({
proxy: 'http://localhost:8080',
}),
requestHeaders: {
'User-Agent': 'Mozilla/5.0 (compatible; SitemapperBot/1.0)',
},
fields: {
loc: true,
lastmod: true,
sitemap: true,
},
});
sitemapper
.fetch()
.then(({ sites }) => console.log(sites))
.catch((error) => console.error(error));⚙️ Configuration Options
Sitemapper can be customized with the following options:
Available Fields
Important: When using the fields option, the return format changes from an array of URL strings to an array of objects containing your selected fields.
For the fields option, specify which fields to include by setting them to true:
Example Default Output (without fields)
// Returns an array of URL strings
[
'https://wp.seantburke.com/?p=234',
'https://wp.seantburke.com/?p=231',
'https://wp.seantburke.com/?p=185',
];Example Output with Fields
// Returns an array of objects
[
{
loc: 'https://wp.seantburke.com/?p=234',
lastmod: '2015-07-03T02:05:55+00:00',
priority: 0.8,
},
{
loc: 'https://wp.seantburke.com/?p=231',
lastmod: '2015-07-03T01:47:29+00:00',
priority: 0.8,
},
];🧩 CLI Usage
Sitemapper includes a simple CLI tool for basic sitemap parsing directly from the command line:
npx sitemapper <sitemap-url>Example
npx sitemapper https://gosla.sh/sitemap.xmlOutput
The CLI will display the sitemap URL and list all URLs found in the sitemap:
Sitemap URL: https://gosla.sh/sitemap.xml
Found URLs:
1. https://gosla.sh/page1
2. https://gosla.sh/page2
3. https://gosla.sh/page3
...CLI Options
Currently, the CLI supports the --timeout parameter to set the request timeout in milliseconds:
npx sitemapper https://gosla.sh/sitemap.xml --timeout=5000Note: The CLI implementation is basic and does not yet support all options available in the JavaScript API. More advanced features like fields filtering, concurrency control, and different output formats require using the JavaScript API directly.
🤝 Contributing
Contributions from experienced engineers are highly valued. When contributing, please consider:
Guidelines
- Maintain backward compatibility where possible
- Consider performance implications, particularly for large sitemaps
- Add TypeScript types
- Add tests for your change
- Update documentation and examples
- Check for typos
- Code should pass ESLint, Prettier, Spell Check and TypeScript checks
- Try not to bloat the main dependencies with new packages, dev dependencies are fine
- If adding packages, make sure to run
npm installwith the latest NPM version to update package-lock.json
Pull Request Process
- PRs should be focused on a single concern/feature
- Include sufficient context in the PR description
- Reference any relevant issues
- Run
npm testlocally to verify your changes pass the test- Sometimes the tests will fail since they reference real world sitemaps. Try running it again.
- PRs will not run github actions by default, they need to be run manually by @seantomburke
For substantial changes, consider opening an issue for discussion before implementation.
Note: The CI pipeline enforces TypeScript type checking, linting rules, formatting standards, and test coverage thresholds.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
