npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

astro-robots-txt

v1.0.0

Published

Generate a robots.txt for Astro

Downloads

40,989

Readme

Help Ukraine now!

astro-robots-txt

This Astro integration generates a robots.txt for your Astro project during build.

Release License: MIT


Why astro-robots-txt?

The robots.txt file informs search engines which pages on your website should be crawled. See Google's own advice on robots.txt to learn more.

For Astro project you usually create the robots.txt in a text editor and place it to the public/ directory. In that case you must manually synchronize site option in astro.config.* with Sitemap: record in robots.txt.
It brakes DRY principle.

Sometimes, especially during development, it's necessary to prevent your site from being indexed. To achieve this you need to place the meta tag <meta name="robots" content="noindex"> into the <head> section of your pages or add X-Robots-Tag: noindex to the HTTP response header, then add the lines User-agent: * and Disallow: \ to robots.txt.
Again you have to do it manually in two different places.

astro-robots-txt can help in both cases on the robots.txt side. See details in this demo repo.


Installation

The experimental astro add command-line tool automates the installation for you. Run one of the following commands in a new terminal window. (If you aren't sure which package manager you're using, run the first command.) Then, follow the prompts, and type "y" in the terminal (meaning "yes") for each one.

# Using NPM
npx astro add astro-robots-txt

# Using Yarn
yarn astro add astro-robots-txt

# Using PNPM
pnpx astro add astro-robots-txt

Then, restart the dev server by typing CTRL-C and then npm run astro dev in the terminal window that was running Astro.

Because this command is new, it might not properly set things up. If that happens, log an issue on Astro GitHub and try the manual installation steps below.

First, install the astro-robots-txt package using your package manager. If you're using npm or aren't sure, run this in the terminal:

npm install --save-dev astro-robots-txt

Then, apply this integration to your astro.config.* file using the integrations property:

astro.config.mjs

import robotsTxt from 'astro-robots-txt';

export default {
  // ...
  integrations: [robotsTxt()],
}

Then, restart the dev server.

Usage

The astro-robots-txt integration requires a deployment / site URL for generation. Add your site's URL under your astro.config.* using the site property.

Then, apply this integration to your astro.config.* file using the integrations property.

astro.config.mjs

import { defineConfig } from 'astro/config';
import robotsTxt from 'astro-robots-txt';

export default defineConfig({
  site: 'https://example.com',

  integrations: [robotsTxt()],
});

Note that unlike other configuration options, site is set in the root defineConfig object, rather than inside the robotsTxt() call.

Now, build your site for production via the astro build command. You should find your robots.txt under dist/robots.txt!

Warning If you forget to add a site, you'll get a friendly warning when you build, and the robots.txt file won't be generated.

robots.txt

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap-index.xml

Configuration

To configure this integration, pass an object to the robotsTxt() function call in astro.config.mjs.

astro.config.mjs

...
export default defineConfig({
  integrations: [robotsTxt({
    transform: ...
  })]
});

| Type | Required | Default value | | :-----------------------------: | :------: | :-------------: | |Boolean / String / String[]| No | true |

If you omit the sitemap parameter or set it to true, the resulting output in a robots.txt will be Sitemap: your-site-url/sitemap-index.xml.

If you want to get the robots.txt file without the Sitemap: ... entry, set the sitemap parameter to false.

astro.config.mjs

import robotsTxt from 'astro-robots-txt';

export default {
  site: 'https://example.com',
  integrations: [
    robotsTxt({
      sitemap: false,
    }),
  ],
};

When the sitemap is String or String[] its values must be a valid URL. Only http or https protocols are allowed.

astro.config.mjs

import robotsTxt from 'astro-robots-txt';

export default {
  site: 'https://example.com',
  integrations: [
    robotsTxt({
      sitemap: [
        'https://example.com/first-sitemap.xml',
        'http://another.com/second-sitemap.xml',
      ],
    }),
  ],
};

| Type | Required | Default value | | :-----: | :------: | :-------------: | | String| No | sitemap-index |

Sitemap file name before file extension (.xml). It will be used if the sitemap parameter is true or omitted.

:grey_exclamation: @astrojs/sitemap and astro-sitemap integrations have the sitemap-index.xml as their primary output. That is why the default value of sitemapBaseFileName is set to sitemap-index.

astro.config.mjs

import robotsTxt from 'astro-robots-txt';

export default {
  site: 'https://example.com',

  integrations: [
    robotsTxt({
      sitemapBaseFileName: 'custom-sitemap',
    }),
  ],
};

| Type | Required | Default value | | :-----------------: | :------: | :-------------: | |Boolean / String | No | undefined |

Some crawlers (Yandex) support a Host directive, allowing websites with multiple mirrors to specify their preferred domain.

astro.config.mjs

import robotsTxt from 'astro-robots-txt';

export default {
  site: 'https://example.com',

  integrations: [
    robotsTxt({
      host: 'your-domain-name.com',
    }),
  ],
};

If the host option is set to true, the Host output will be automatically resolved using the site option from Astro config.

| Type | Required | Default value | | :------------------------: | :------: | :-------------: | | (content: String): Stringor(content: String): Promise<String> | No | undefined |

Sync or async function called just before writing the text output to disk.

astro.config.mjs

import robotsTxt from 'astro-robots-txt';

export default {
  site: 'https://example.com',

  integrations: [
    robotsTxt({
      transform(content) {
        return `# Some comments before the main content.\n# Second line.\n\n${content}`;        
      },
    }),
  ],
};

| Type | Required | Default value | | :--------: | :------: | :---------------------------------: | | Policy[] | No | [{ allow: '/', userAgent: '*' }] |

List of Policy rules

Type Policy

| Name | Type | Required | Description | | :----------: | :-------------------: | :------: | :---------------------------------------------------------------------------------------------------- | | userAgent | String | Yes | You must provide a name of the automatic client (search engine crawler).Wildcards are allowed.| | disallow | String / String[] | No | Disallowed paths for crawling | | allow | String / String[] | No | Allowed paths for crawling | | crawlDelay | Number | No | Minimum interval (in secs) for the crawler to wait after loading one page, before starting other | | cleanParam | String / String[] | No | Indicates that the page's URL contains parameters that should be ignored during crawling.Maximum string length is limited to 500.|

astro.config.mjs

import robotsTxt from 'astro-robots-txt';

export default {
  site: 'https://example.com',

  integrations: [
    robotsTxt({
      policy: [
        {
          userAgent: 'Googlebot',
          allow: '/',
          disallow: ['/search'],
          crawlDelay: 2,
        },
        {
          userAgent: 'OtherBot',
          allow: ['/allow-for-all-bots', '/allow-only-for-other-bot'],
          disallow: ['/admin', '/login'],
          crawlDelay: 2,
        },
        {
          userAgent: '*',
          allow: '/',
          disallow: '/search',
          crawlDelay: 10,
          cleanParam: 'ref /articles/',
        },
      ],
    }),
  ],
};

Examples

| Example | Source | Playground | | ------------- | -------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | | basic | GitHub | Play Online | | advanced | GitHub | Play Online |

Contributing

You're welcome to submit an issue or PR!

Changelog

See CHANGELOG.md for a history of changes to this integration.

Inspirations