@opd/crawler
v1.7.0
Published
web crawler based on Puppeteer
Maintainers
Readme
crawler
Web crawler based on
Puppeteer
Install
npm install @opd/crawlerUse
import Crawler from '@opd/crawler'
// or commonjs
const Crawler = require('@opd/crawler').default
const crawler = new Crawler(options)API
new Crawler(options)
create crawler instance
options: crawler instance config
parallel: maximum number of crawlers, default is5pageEvaluate: evaluate function on current page, seePuppeteer, cannot support extra args now
crawler.launch([options])
launch browser use puppeteer.launch
crawler.queue(urls)
add urls to crawler queue
Note: check url strictly, means url must start with
https?
crawler.start([urls]): PageResult[]
start crawl page, if urls is presented, will call crawler.queue firstly.
const result = await crawler.start()
console.log(result)
// [
// {
// url, // page url
// result // crawled result
// }
// ]Note: if you call
startbeforelaunch,browserwill also be launched, but with no extra launch options
