npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

shittyscrapper

v1.2.2

Published

a JS/TS text extractor

Downloads

3

Readme

shittyscrapper


A JS/TS data extractor for XML

The basic goal of this project is to replace complex and redundant XPATH queries by an XML pattern that will automatically fetch data where it is stored in the dom.

Install

Package is available as TS project at npmjs registry under shittyscrapper.

$ npm i -S shittyscrapper

Usage

1. Variable Naming

To declare a variable, the pattern ${variable.name} is used. The given key will be added to the result if found. Example:

${any.variable.name} => result = {any: {variable: {name: "the value"}}}

Variables ending with [] will be considered as arrays:

${any.array} => result = {any: {array: ["value1", "value2", ...]}}

2. Simple Data extraction

A basic usage of this lib would consist in writing the DOM fragments in which the data is and to fetch data.

Example Data:


<html>
<body>
<div class="test" id="test">
    <span>test</span>
    <span class="t">
        test2
    </span>
    <span class="t">
        test3
    </span>
</div>
<div class="test" id="test bis">
    <span>test bis</span>
    <span class="t">
        test2 bis
    </span>
    <span class="t">
        test3 bis
    </span>
</div>
<div class="people">
    <div>
        <span class="Name">John</span>
        <span class="LastName">Doe</span>
        <span>23</span>
        <div class="marks">
            <span>18</span>
            <span>19</span>
        </div>
    </div>
    <div>
        <span class="Name">Peter</span>
        <span>Parker</span>
        <span class="age">18</span>
        <div>
            <span>20</span>
            <span>19</span>
        </div>
    </div>
</div>
</body>
</html>

Let's extract the first span of the first div with class="test"

import SearchNode from "shittyscrapper";

// format for variable fetching is ${valid.variable.name}
const pattern = "<div class=\"test\"><span>${value1}</span></div>"
const searchNode: SearchNode = SearchNode.BuildSearchNode(pattern)
const data: string = "string of the data file";
const result: any = searchNode.MapData(data)

// result = {value1: "test"}

Now let's extract the first span with class="t" in the first div

import SearchNode from "shittyscrapper";

const pattern: string = "<div class=\"test\"><span class=\"t\">${value}</span></div>"
const searchNode = SearchNode.BuildSearchNode(pattern)
const data: string = "" // the data
const result = searchNode.MapData(data)

// result = {value: "test2"}

3. Complex Types Data extraction

The xml/html tags can be given two additional attributes in pattern.

  • The datatype attribute that can be either "repeatable", "entity" or "block"
  • The key attribute that has to be used alongwith datatype="entitity"

Note: setting datatype="block" is equivalent to setting datatype="entity repeatable"

The datatype Attribute

datatype can be used upon tags to give them different behaviours.

The "repeatable" value is used to say that the given element can be seen multiple times in the DOM and that it should map to the pattern with the key every time an Element in the DOM is matching. Therefore, the variable names in the element should end with []

Let's retrieve all the spans in div.test

import SearchNode from "shittyscrapper";

const pattern = "<div class=\"test\"><span>${spans[]}</span></div>"

const searchNode = SearchNode.BuildSearchNode(pattern)
// const data: string = "<html>...</html>"
const result = searchNode.MapData(data)
// result = {spans:[
// "test","test2",
// "test3", "test bis",
// "test2 bis", "test3 bis"
// ]}

The "block" value groups multiple variables under the same object. It has to be used alongwith the key attribute. The "block" value acts the same as "repeatable" but creates an additional object

Let's fetch Data for the users.

import SearchNode from "shittyscrapper";

const pattern = `<div class="people">
    <div datatype="block" key="people[]">
        <span>\${Name}</span>
        <span>\${LastName}</span>
        <span>\${Age}</span>
        <div class="marks">
            <span datatype="repeatable">\${Marks[]}</span>
        </div>
    </div>
</div>`

const searchNode = SearchNode.BuildSearchNode(pattern)
// const data: string = ...
const result = searchNode.MapData(data)
for (let person of result.people) {
    console.log(person)
}
// {Name: "John", LastName: "Doe", Age: "23", Marks: ["18", "19"]}
// {Name: "Peter", LastName: "Parker", Age: "18", Marks: ["19", "20"]}

4. Data checking

Example Data:


<html>
<body>
<p class="my-text" id="text-number-1">Some text with some data embedded</p>
<p class="my-text" id="text-number-2">Some other text</p>
<p class="my-text" id="text-number-3">Data 1</p>
<p class="my-text" id="text-number-4">Broken Data</p>
<p class="my-text" id="text-number-5">Data 2</p>
</body>
</html>

Variables can be embedded everywhere in the text

import SearchNode from "shittyscrapper";

const pattern = `<p class="my-text">Some text with \${data} embedded</p>`
const searchNode = SearchNode.BuildSearchNode(pattern)
// const data: string = ...
const result = searchNode.MapData(data)

// result = {data: "some data"}
// only first one is matched since the others did not match "Some text with ..."

Variables can be embedded everywhere in the attributes also:

import SearchNode from "./index";

const pattern = `<p class="my-text" id="text-number-\${index}" datatype="block" key="paragraphs[]">\${data}</p>`
const searchNode = SearchNode.BuildSearchNode(pattern)
// const data: string = ...
const result = searchNode.MapData(data)

for (const parag of result.paragraphs) {
    console.log(parag)
}

// {index: "1", data: "Some text with some data embedded"}
// ...
// {index: "5", data: "Data 2"}

Some Regex variables can be added to verify some data.

To add a regex checking, add ${/^any\sregex$/}

It can be added both before a variable to match a variable or anywhere in the pattern to validate the given data


import SearchNode from "./index";

const pattern = `<p class="my-text" id="text-number-\${/\d$/}\${index}" datatype="block" key="paragraphs[]">\${data}</p>`
const searchNode = SearchNode.BuildSearchNode(pattern)
// const data: string = ...
const result = searchNode.MapData(data)

for (const parag of result.paragraphs) {
    console.log(parag)
}

// {index: "3", data: "Data 1"}
// {index: "5", data: "Data 2"}

5. Known Limitations

  • The pattern is supposed to be a XML element.
  • It is not easy and defeating the purpose of this lib to pick a specific element given its position in the parent