npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

article-metadata-parser

v1.2.2

Published

A JavaScript library for parsing metadata in a Web Page.

Downloads

197

Readme

Page Metadata Parser

A Javascript library for parsing metadata in web pages.

Coverage Status

Overview

Purpose

The purpose of this library is to be able to find a consistent set of metadata for any given web page. Each individual kind of metadata has many rules which define how it may be located. For example, a description of a page could be found in any of the following DOM elements:

<meta name="description" content="A page's description"/>

<meta property="og:description" content="A page's description" />

Because different web pages represent their metadata in any number of possible DOM elements, the Page Metadata Parser collects rules for different ways a given kind of metadata may be represented and abstracts them away from the caller.

The output of the metadata parser for the above example would be

{description: "A page's description"}

regardless of which particular kind of description tag was used.

Supported schemas

This library employs parsers for the following formats:

opengraph

twitter

meta tags

Requirements

This library is meant to be used either in the browser (embedded directly in a website or into a browser addon/extension) or on a server (node.js).

The parser depends only on the Node URL library or the Browser URL library.

Each function expects to be passed a Document object, which may be created either directly by a browser or on the server using a Document compatible object, such as that provided by domino.

Usage

Installation

npm install --save page-metadata-parser

Usage in the browser

The library can be built to be deployed directly to a modern browser by using

npm run bundle

and embedding the resultant js file directly into a page like so:

<script src="page-metadata-parser.bundle.js" type="text/javascript" />

<script>

  const metadata = metadataparser.getMetadata(window.document, window.location);

  console.log("The page's title is ", metadata.title);

</script>

Usage in node

To use the library in node, you must first construct a DOM API compatible object from an HTML string, for example:

import { getMetadata } from 'page-metadata-parser';
import jsdom from 'jsdom';

const url = 'https://github.com/mozilla/page-metadata-parser';
const response = await fetch(url);
const html = await response.text();
const dom = new jsdom.JSDOM(html);
const doc = dom.window.document;
const metadata = getMetadata(doc, url);

Metadata Rules

Rules

A single rule instructs the parser on a possible DOM node to locate a specific piece of content.

For instance, a rule to parse the title of a page found in a DOM tag like this:

<meta property="og:title" content="Page Title" />

Would be represented with the following rule:

['meta[property="og:title"]', element => element.getAttribute('content')]

A rule consists of two parts, a query selector compatible string which is used to look up the target content, and a callable which receives an element and returns the desired content from that element.

Many rules together form a Rule Set. This library will apply each rule to a page and choose the 'best' result. The order in which rules are defined indicate their preference, with the first rule being the most preferred. A Rule Set can be defined like so:

const titleRules = {
  rules: [
    ['meta[property="og:title"]', node => node.element.getAttribute('content')],
    ['title', node => node.element.text],
  ]
};

In this case, the OpenGraph title will be preferred over the title tag.

This library includes many rules for a single desired piece of metadata which should allow it to consistently find metadata across many types of pages. This library is meant to be a community driven effort, and so if there is no rule to find a piece of information from a particular website, contributors are encouraged to add new rules!

Built-in Rule Sets

This library provides rule sets to find the following forms of metadata in a page:

Field | Description --- | --- description | A user displayable description for the page. icon | A URL which contains an icon for the page. image | A URL which contains a preview image for the page. keywords | The meta keywords for the page. provider | A string representation of the sub and primary domains. title | A user displayable title for the page. type | The type of content as defined by opengraph. url | A canonical URL for the page. publishedDate | The published date of the page.

To use a single rule set to find a particular piece of metadata within a page, simply pass that rule set, a URL, and a Document object to getMetadata and it will apply each possible rule for that rule set until it finds a matching piece of information and return it.

Example:

const {getMetadata, metadataRuleSets} = require('page-metadata-parser');

const pageTitle = getMetadata(doc, url, {title: metadataRuleSets.title});

Extending a single rule

To add your own additional custom rule to an existing rule set, you can simply push it into that rule sets's array.

Example:

const {getMetadata, metadataRuleSets} = require('page-metadata-parser');

const customDescriptionRuleSet = metadataRuleSets.description;

customDescriptionRuleSet.rules.push([
  ['meta[name="customDescription"]', element => element.getAttribute('content')]
]);

const pageDescription = getMetadata(doc, url, {description: customDescriptionRuleSet});

Using all rules

To parse all of the available metadata on a page using all of the rule sets provided in this library, simply call getMetadata on the Document.

const {getMetadata, metadataRuleSets} = require('page-metadata-parser');

const pageMetadata = getMetadata(doc, url);