npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

node-sr-crawler

v1.0.0

Published

node.js crawler

Readme

Spider.js


介绍

这里有两个概念需要你了解一下,1.请求实例 2.请求调度仓库

请求实例

生成一个请求实例

var req = new Request(config);

发送这个请求

var p = req.request();    //返回请求的Promise对象

配置对象(标准模式)

{
    url: 'http://www.xxx.com/',
    retry: 0,           //默认0,重试次数(次)
    retryTimeout: 0,    //默认0,判定请求超时的依据(ms)
    pageMode: false,    //开启分页爬取模式,默认false
    charSet: 'UTF-8',    //请求压面的字符编码,默认utf-8
}

开启分页爬取模式后会忽略单独请求配置对象中的url字段,请求地址是结合调度仓库配置对象url字段和请求配置对象中pageIndex

请求调度仓库

生成一个仓库

var store = new Spider(config);

请求入库

var p = store.queue(req);   //返回入库请求的Promise对象

配置对象(标准模式)

{
    sendRate: 2000, //请求的发送间隔(ms),默认2000
    retry: 0,       //仓库中请求的重试次数,默认0
    retryTimeout: 0, //仓库中请求的超时判定依据(ms),默认0
    pageMode: false //是否开启分页模式,默认false
}

分页模式

一个分页模式的应用必须是双向的,也就是说,仓库和请求必须同时开启该请求才会开启分页模式,如想使用分页模式仓库必须开启pageMode,仓库中的请求可以选择性地开启

仓库 分页模式所需配置项

{
    pageMode: true, 
    url: 'http://www.xxx.com/page=1',   //分页爬取第1页的地址,地址中的页码必须是单数
    pagePattern: /page=1/g, //能匹配上述(page=1)的正则表达式
}

请求 分页模式所需配置项

{
    pageMode: true,
    pageIndex: 1,  //请求页码,会替换上面pagePattern中数字部分
}

基本使用

引入模块

var Spider = require('node-spider');

创建爬虫实例

配置对象写法详见config篇

var s1 = new Spider(config);

在请求队列队尾生成请求

var requset = new Request(requsetConfig);
var p = s1.queue(requset);  //返回请求的promise实例

监听事件

请求事件

s1.on('request',function(request){
//request 实例
})