npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

scrapeblocks

v0.2.0

Published

Scraping automation framework based on Playwright

Downloads

7

Readme

ScrapeBlocks Logo

MIT License NPM Version lgtm Code Quality GitHub Last Commit

ScrapeBlocks

ScrapeBlocks is a layer on top of Playwright to make scraping automation easier.

You can set actions to be performed before starting scraping and you can also decide which scraping strategy to use.

Start with predefined actions and strategies in a matter of minutes. You can also write your own or use ones from the community.

Who is this for? 🤔

  • I just want to start scraping right now with little effort as possible
  • I have a complicated scraping workflow that I want to simplify with still getting the same results
  • I like to tinker with scraping and build my custom workflows

With ScrapeBlocks getting started with scraping is a matter of minutes.

You can use it with its batteries included or as extension to Playwright.

Whether you are a scraping-hero or just want to monitor the price for that product but you don't know much about scraping, ScrapeBlocks is here for you.

Features 🚀

  • Pre-scraping actions: perform actions before running a scraping strategy
    • Example use-case: you need to click something before your target becomes visible
  • Plug-n-play: write your own scraping strategies or use those from the community
    • Example use-cases: scrape for text of certain elements, get all the images, etc.
  • Fully customizible (or not): you can use it batteries included or use your own Playwright instances
  • Easy to start with: it's based on Playwright!

Actions included ⚡

Strategies included 🧙🏼

  • Scrape text element: retrieve the text within any element
  • Screenshot to map: returns a screenshot of the page with a json with the coordinates and xpath/css selector for elements of your choice
  • (to be continued...)

Installation 🔧

Install ScrapeBlocks with npm

  npm install scrapeblocks

Usage 🧑🏼‍💻

Using built-in Playwright

Basic textContent strategy

import { Scraper, ScrapingStragegies } from "scrapeblocks";

const URL = "https://webscraper.io/test-sites/e-commerce/allinone";
const selector = "h4.price";

const strategy = new ScrapingStragegies.TextContentScraping(selector);
const result = await new Scraper(URL, strategy).run();

console.log(result);

Output:

['$233.99', '$603.99', '$295.99']

With actions

import { Scraper, ScrapingStragegies, Select } from "scrapeblocks";

const URL = "https://webscraper.io/test-sites/e-commerce/more/product/488";
const selectElement = "div.dropdown > select";
const optionToSelect = "Gold";
const selector = "div.caption > h4:nth-child(2)";

const strategy = new ScrapingStragegies.TextContentScraping(selector);
const selectAction = new Select({
	element: selectElement,
	value: optionToSelect,
});
const result = await new Scraper(URL, strategy, [selectAction]).run();

console.log(result);

Output:

Samsung Galaxy Gold

You can chain multiple actions by passing them in the order you want them to be executed as array.

Example:

const actions = [scrollAction, clickAction, typeAction];

Starting from version 0.1.0, you can also just execute actions without providing any strategy.

The method will return instances of Playwright Browser, BrowserContext, Page.

Example:

const { browser, context, page } =
        await new Scraper<PlaywrightBlocks>(
          URL, undefined, [clickAction,]
          ).run();

TODO ✅

  • Implement more strategies

  • Implement more actions

  • Increase test cases

  • Write more extensive documentation

Contributing 🤝🏼

Feel free to fork this repo and create a PR. I will review them and merge if ok. The above todos can be a very good place to start.

License 📝

MIT