@floodlight/crawler
v0.0.10
Published
floodlight enterprise crawler with continuation
Readme
Crawler for web accessibility
This is a web accessibility crawler written in Node.js
Getting Started
To get started with the crawler, follow these steps:
- Install dependencies with
npm install - Run the crawler for a domain with
DOMAIN=<domain> npm run crawl. For example,DOMAIN=deranged.dk npm run crawlruns forderanged.dkandwww.deranged.dk.DOMAINdefaults todigst.dk.
- You can also set a env var
LIMITto limit the number of pages crawled in the crawl. E.g.DOMAIN=kbhbilleder.dk LIMIT=10000 npm run crawlto limit to ten thousand pages that finish evaluation (skipped or failed pages do not count towards the total).
Running on Ubuntu
When running the crawler locally on ubuntu, you can get an error because of AppArmor which restricts puppeteer.
You need to run the following script, where CHROMIUM_BUILD_PATH is the path to puppeteer's chrome. Example: /home/user/.cache/puppeteer/chrome/linux-140.0.7339.80/chrome-linux64/chrome. The correct path can be found in the AppArmor error message.
export CHROMIUM_BUILD_PATH=/@{HOME}/chromium/src/out/**/chrome
cat | sudo tee /etc/apparmor.d/chrome-dev-builds <<EOF
abi <abi/4.0>,
include <tunables/global>
profile chrome $CHROMIUM_BUILD_PATH flags=(unconfined) {
userns,
# Site-specific additions and overrides. See local/README for details.
include if exists <local/chrome>
}
EOF
sudo service apparmor reload # reload AppArmor profiles to include the new one