npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

link-checker

v1.4.2

Published

CLI which is testing existence of linked pages and anchors

Downloads

3,407

Readme

Link Checker

Link checker for HTML pages which checks href attributes including the anchor in the target.
The Command Line Interface expects a directory on your local file system which will be scanned.

Why did I wrote this tool?

I was using a nice CLI called html-proofer, but was using a preprocessing step in order to get Javadoc and Scaladoc working because of the iframe setup. At some point it didn't scale anymore. Scaladoc link checker with html-proofer took 5 minutes.

link-checker is using cheerio for parsing HTML, which is using the fastest HTML parser for Node.js: htmlparser2. Same Scaladoc which took 5 minutes with html-proofer takes now 5 seconds with link-checker. Also URL transformation for iframes can be turned on on-the-fly via --javadoc. In this mode links like /index.html#com.org.company.product.library.Main@init will check for a HTML in the pathcom/org/company/product/library/Main.html and the anchor init.

FAQ

I need to check links on a website via http(s)

Just use a website-scraper and download all the pages to your file system.

I've used the module with this options:

{
  urls: [urlToScrape],
  directory: outputDirectory,
  recursive: true,
  filenameGenerator: 'bySiteStructure',
  urlFilter: function(url) {
      return url.indexOf(urlToScrape) != -1;
  }
}

Installation

NPM

You can install it via npm

npm install -g link-checker

You can also install it without -g but then you need to put the binary, located in node_modules/.bin/link-checker to your $PATH.

Docker

https://hub.docker.com/r/timaschew/link-checker/

docker pull timaschew/link-checker

Usage

You need to pass exactly one path where to check links
Usage: link-checker path [options]

Options:
  --version             Show version number                            [boolean]
  --allow-hash-href     If `true`, ignores the `href` `#`              [boolean]
  --disable-external    disable checks HTTP links                      [boolean]
  --external-only       check HTTP links only                          [boolean]
  --file-ignore         RegExp to ignore files to scan                   [array]
  --url-ignore          RegExp to ignore URLs                            [array]
  --url-swap            RegExp for URLs which can be replaced on the fly [array]
  --limit-scope         forbid to follow URLs which are out of provided path,
                        like ../somewhere                              [boolean]
  --mkdocs              transforming URLS from foo/#bar to foo/index.html#bar
                                                                       [boolean]
  --javadoc             Enable special URL transforming which allows to check
                        iframe deeplinks for local javadoc and scaladoc[boolean]
  --javadoc-external    Domain or base URL to do URL transformation to check
                        iframe deeplinks                                 [array]
  --http-status-ignore  pass HTTP status code which will be ignore, by default
                        only 2xx are allowed                             [array]
  --json                print errors as JSON                           [boolean]
  --http-redirects      Amount of allowed HTTP redirects            [default: 0]
  --http-timeout        HTTP timeout in milliseconds             [default: 5000]
  --http-always-get     Use always HTTP GET requests, by default HEAD is used
                        for pages without any anchors                  [boolean]
  --warn-name-attr      show warning if name attribute instead of id was used
                        for an anchor                                  [boolean]
  --http-cache          Directory to store the non failing HTTP responses. If
                        none is specified responses won't be cached.    [string]
  --http-cache-max-age  Invalidate the cache after the given period. Allowed
                        values: https://www.npmjs.com/package/ms [default: "1w"]
  -h, --help            Show help                                      [boolean]

Examples:
  link-checker path/to/html/files  checks directory with HTMLfiles for broken
                                   links and anchors

linkcheckerrc configuration

The above configuration can, alternatively or in addition, be provided by a .linkcheckerrc in the project root:

{
    "allow-hash-href": true,
    "disable-external": true,
    ...
}

In addition, this format also provides means to override these settings based on URL regular expression matching:

{
    "overrides": {
        "https://www\\.google.com/#": {
            "allow-hash-href": true,
            "http-status-ignore": [403, 404]
        },
        "marketplace\\.visualstudio\\.com": {
            "http-always-get": true
        }
    }
}