npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@swipefintech/scrapist

v1.0.2

Published

Modular framework for building and scaling web scraping workloads over CLI, HTTP & WebSockets.

Downloads

8

Readme

@swipefintech/scrapist

Modular framework for building and scaling web scraping workloads over CLI, HTTP & WebSockets.

Installation

To install this in your project, make sure you have Node.js installed on your workstation and run below command:

yarn add @swipefintech/scrapist

# or if using yarn
npm install @swipefintech/scrapist --save

Usage

First you need to implement your scraping jobs (commands) classes.

Commands

You should extend either ScrapeUsingBrowserCommand class or ScrapeUsingHttpClientCommand to create your jobs as below.

import { IInput, IOutput, Status, ScrapeUsingHttpClientCommand, HttpClient } from '@swipefintech/scrapist'

export default class YourCommand extends ScrapeUsingHttpClientCommand {

  async handle (input: IInput, client: HttpClient): Promise<IOutput> {
    const { body } = await this.sendRequest(client, {
      // request options
    })
    return {
      data: body,
      status: Status.SUCCESS
    }
  }
}

You can persist session data i.e., cookies between commands automatically by using the @StoreCookies(<unique-key>) decorator. The key that you specify in the decorator is the key name in your input whose value has to be used as a unique identifier to load/save data.

import { StoreCookies } from '@swipefintech/scrapist'

@StoreCookies("accountId")
export default class YourCommand extends ScrapeUsingHttpClientCommand {
  // command implementation
}

You can also validate data present in your input, powered by Joi by overriding the rules() method in your command as below.

import Joi, { PartialSchemaMap } from 'joi'
import { ScrapeUsingHttpClientCommand } from '@swipefintech/scrapist'

export default class YourCommand extends ScrapeUsingHttpClientCommand {

  rules (): PartialSchemaMap {
    return {
      email: Joi.string().email().required(),
      password: Joi.string().required(),
      ...super.rules() // make sure to keep this
    }
  }
}

For bigger projects, it is advised to organise commands into modules like below:

import { IEngine, IModule } from '@swipefintech/scrapist'
import YourCommandNo1 from './YourCommandNo1'
import YourCommandNo2 from './YourCommandNo2'

export default class YourModule implements IModule {

  register (engine: IEngine): void {
    engine.register('YourCommandNo1', new YourCommandNo1())
    engine.register('YourCommandNo2', new YourCommandNo2())
    // and so on
  }
}

Running

Now that you have defined your commands, you need to create an instance of Engine class, register your commands (or mount modules) and handle the input.

import { Engine, IInput, IOutput } from '@swipefintech/scrapist'
import YourCommand1 from './YourCommand1'
import YourCommand2 from './YourCommand2'
import YourModule from './YourModule'

const engine = new Engine()

// either register commands
engine.register('YourCommand1', new YourCommand1())
engine.register('YourCommand2', new YourCommand2())

// or mount the module
engine.mount('YourModule', new YourModule())

const input: IInput = {
  command: 'YourCommand1', // or 'YourModule/YourCommand1' is using modules,
  data: {
    username: '[email protected]',
    password: 'super_secret',
  },
  externalId: 'Premium-User-123', // if using @StoreCookies(...) decorator
}
engine.handle(input)
  .then((output: IOutput) => {
    // deal with output
  })

If you are using the @StoreCookies(<unique-key>) decorator, you also need to provide a Cache implementation (from cache-manager) when creating Engine object as below.

import path from 'path'
import { caching } from 'cache-manager'
import store from 'cache-manager-fs-hash'
import { Engine } from '@swipefintech/scrapist'

// create a file-system (or any other)
const cache = caching({
  store,
  options: {
    path: path.join(__dirname, 'cache'),
    subdirs: true
  }
})

const engine = new Engine(cache)

Samples

This project also includes samples on implementing and using scrapist via CLI, HTTP (using Express) and WebSockets (using ws) frontends.

Clone this repository and follow below instructions to test the sample apps on your local workstation.

CLI

To run the command-line sample, run below command(s) inside cloned folder:

npm run start:cli -- \
  ExampleDotCom/GetHomePageLinkUsingBrowser \
  --session=Premium-User-123

npm run start:cli -- \
  ExampleDotCom/GetHomePageLinkUsingHttpClient \
  --referer=https://example.com/

HTTP or Web

To run the web (API) sample, run below command(s) inside cloned folder:

# start development server
npm run start:web

# run test commands (in another terminal)
curl http://localhost:3000/ExampleDotCom/GetHomePageLinkUsingBrowser \
  -H "Content-Type: application/json" \
  -d '{"session": "Premium-User-123"}'

curl http://localhost:3000/ExampleDotCom/GetHomePageLinkUsingHttpClient \
  -H "Content-Type: application/json" \
  -d '{"referer": "https://example.com/"}'

WebSockets

To run the web-socket sample, run below command(s) inside cloned folder:

# start development server
npm run start:ws

# connect to web-socket server
npx wscat -c ws://localhost:3000/

# run test commands
{"command": "ExampleDotCom/GetHomePageLinkUsingBrowser", "session": "Premium-User-123"}

{"command": "ExampleDotCom/GetHomePageLinkUsingHttpClient", "referer": "https://example.com/"}

License

Please see LICENSE file.