npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

vgno-article-parser

v1.0.4

Published

Parses article HTML from DrPublish/DrLib into JSON-representable entities

Readme

vgno-article-parser

Build Status

Parses the article markup (HTML) provided by DrPublish/DrLib and translates it into a JSON-serializable structure.

Installing

npm install --save vgno-article-parser

Usage

Parse a response from DrLib:

var parseArticle = require('vgno-article-parser');
var request = require('request');

request({
    url: 'http://drlib.url.no/articles/10131048.json',
    json: true
}, function(err, res, body) {
    if (err) {
        throw err;
    }

    var parsed = parseArticle(body);

    // Result is an object keyed by the same keys as within `contents.web` of the drlib response:
    // motto, title, leadAsset, preamble, story etc.
    console.log(parsed.story);
});

Parse a specific chunk of HTML:

var articleParser = require('vgno-article-parser');
var htmlString = '<div>some html string</div>';

var parsed = articleParser.parseHtml(htmlString);

// Result is a tree of nodes. Root is an array, each node (can) have a `children` property
console.log(parsed);

Testing / developing

git clone [email protected]:vg/vgno-article-parser.git && cd vgno-article-parser
npm install
npm test

Adding entities

Adding new entities is fairly simple:

  1. Add a file to src/entities which parses the node into a serializable format
    • Ensure that this.type is set, and that it is a unique, descriptive value
    • node (first argument) is a plain object that cheerio returns when parsing an HTML node
    • At this state, the children are unparsed (this allows you to skip parsing unused nodes)
    • To traverse the children in a jQuery-like way, simple call cheerio(node) and use it's API
    • Set properties on itself (this.someAttribute = parsedThing)
  2. Add a reference to the entity in src/entities/index.js
  3. If the new entity is an overlooked HTML-tag, add it to src/entity-factory.js under tags, otherwise you will have to provide a sniffer (see below).
  4. Write one or more test to ensure that things are working as expected and won't have any regressions over time.

Sniffers

Sniffers are simple functions that detect if a given node should be treated as a specific entity. If the node does not match your entity, return false. Otherwise, return the entity type that you want to assign it. This allows a single sniffer to instantiate different entity types based on the node attributes.

Creating a sniffer is simple:

  1. Add a file to src/sniffers which exposes a single function. It takes a single argument (node) and should return as stated above.
  2. Add a reference to the sniffer in src/sniffers/index.js. Note that the order of the sniffers matter here. Think of it like a switch statement. Returning an entity in one of the sniffers at the top of the list will prevent the other sniffers from taking a look and possibly finding a better match.

Example

Given the following HTML:

<div>
    <h2>Chapter one: The fury of the seas</h2>
    <p>It was a cold day, according to <a href="http://espen.codes/">The Hooverdam</a>. Then again, he always complained about being cold. Make no mistake, however; the sea was angry that day.</p>
    <p>
        After only a few hours, the hull was <em>riddled</em> with <strong>holes</strong>.<br />
        GoodFire didn't mind, of course. Being a crab, he was used to the sea.

        <div id="dp-article-image229" class="dp-plugin-element dp-article-image dp-plugin-src-images dp-float-none ">
            <div class="dp-article-image-container">
                <div>
                    <img id="dp-image2135218-22993469" src="http://some.url/image.jpg" width="988" height="621" alt="" />
                    <div class="dp-article-image-title">Such title</div>
                    <div class="dp-article-image-description">Some description</div>
                    <div class="dp-article-image-byline">Foto: Whatever</div>
                </div>
            </div>
        </div>

        “I bet we'll hit the rocks before nightfall”, shouted <abbr title="Espen Volden">The Riddler</abbr>. He turned around just in time to see the monumental arms of the Kraken tear the battered ship in two.
    </p>

    <h3>A new beginning</h3>
    <p>The Hooverdam, dazed and confused, found himself throwing up water on a beach...</p>
</div>

This is the excerpts of a fantastic, unwritten, imaginary book:
"The Adventures of Crabman, The Riddler and The Hooverdam"

When ran through the parser and JSON-encoded, looks like the following:

[{
    "type": "block",
    "attributes": {},
    "children": [{
        "type": "heading",
        "level": 2,
        "attributes": {},
        "children": [{
            "type": "text",
            "content": "Chapter one: The fury of the seas"
        }]
    }, {
        "type": "paragraph",
        "attributes": {},
        "children": [{
            "type": "text",
            "content": "It was a cold day, according to"
        }, {
            "type": "link",
            "attributes": {},
            "to": "http://espen.codes/",
            "children": [{
                "type": "text",
                "content": "The Hooverdam"
            }]
        }, {
            "type": "text",
            "content": ". Then again, he always complained about being cold. Make no mistake, however; the sea was angry that day."
        }]
    }, {
        "type": "paragraph",
        "attributes": {},
        "children": [{
            "type": "text",
            "content": "After only a few hours, the hull was"
        }, {
            "type": "emphasis",
            "attributes": {},
            "children": [{
                "type": "text",
                "content": "riddled"
            }]
        }, {
            "type": "text",
            "content": "with"
        }, {
            "type": "strong",
            "attributes": {},
            "children": [{
                "type": "text",
                "content": "holes"
            }]
        }, {
            "type": "text",
            "content": "."
        }, {
            "type": "linebreak"
        }, {
            "type": "text",
            "content": "GoodFire didn't mind, of course. Being a crab, he was used to the sea."
        }, {
            "type": "article-image",
            "url": "http://some.url/image.jpg",
            "title": "Such title",
            "description": "Some description",
            "byline": "Foto: Whatever"
        }, {
            "type": "text",
            "content": "“I bet we'll hit the rocks before nightfall”, shouted"
        }, {
            "type": "abbreviation",
            "attributes": {},
            "title": "Espen Volden"
        }, {
            "type": "text",
            "content": ". He turned around just in time to see the monumental arms of the Kraken tear the battered ship in two."
        }]
    }, {
        "type": "heading",
        "level": 3,
        "attributes": {},
        "children": [{
            "type": "text",
            "content": "A new beginning"
        }]
    }, {
        "type": "paragraph",
        "attributes": {},
        "children": [{
            "type": "text",
            "content": "The Hooverdam, dazed and confused, found himself throwing up water on a beach..."
        }]
    }]
}, {
    "type": "text",
    "content": "This is the excerpts of a fantastic, unwritten, imaginary book:\n\"The Adventures of Crabman, The Riddler and The Hooverdam\""
}]

Credits

Created by Espen Hovlandsdal on my spare time. Be gentle and respectful when leaving feedback/issues, please ;-)