npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

potent

v1.1.1

Published

XPath rule generalizer - easily learn dynamic matching patterns in XML documents

Downloads

16

Readme

Potent - The XPath Rule Generalizer

Potent.js

Build Status Known Vulnerabilities npm version

This package builds atop Potent Tools to solve a single problem: given a set of XPath expressions, find the most specific expression which matches everything the original set would have. This allows you to learn a generalized pattern, e.g., for scraping data. A simple use case has a user specifying some example nodes, then you running potent.simplify(thoseNodes) and the resultant expression should capture all the 'similar content' on a page.

Installation

yarn add potent

Usage

A common use-case, find elements and generalize the selector as you go:

const potent = require('potent');
let currentRule = potent.get(element);
while(...) {
  // e.g., user selects an element, userElement
  newElement = potent.get(userElement);
  currentRule = potent.simplify([
    currentRule,
    newElement
  ]);
}

// use currentRule to find all matching elements.
const allMatchingElements = potent.find(currentRule);

API

XPathNodes result = potent.get(DOMElement a);

takes a DOM element and turns it in to an object appropriate for simplifying.

XPathNodes common = potent.simplify(XPathNodes a, XPathNodes b);

takes in two results from .get and produces the common query between them.

XPathNodes common;
common.toString(); 

produces the XPath expression for this string.

DOMElementList matching = potent.find(XPathNodes a, document)

produces a set of matching elements from your document for query a.

FAQ

Where did this come from? This was the backend to a commercial project I embarked on in 2012 to provide a GUI scraping tool for academic researchers to collect data from government websites and other mostly-structured online data sources. The business was abandoned and instead became a set of tools I used for my own research and consulting efforts. The current version is effectively just a polished up version of that original code.

What about grouping rules? A common problem in this form of scraping occurs when you wish to collect tuples of data. If all fields are always available, then simple index matching will work. When fields are sporadically missing, it will not -- a way to avoid this problem is to match in stages; for example, match a collection of outer elements first, then iterate over them and match the requisite inner elements.

Is this problem well-defined? Not really! There are many cases where the right decision is subjective -- for example, if we select /a/b/c and /a/b/d, a decision must be made whether to match /a/b/* or /a/b/(c|d). To support this, we calculate a configurable violation cost for each non-matching pair with the end goal being finding the minimum cost set. Similar cases apply on attribute patterns. The decisions as they have been made here have been used by a small number of users in production for many years.

Is there a GUI? Not publicly, yet.

Can I use prewritten XPath queries? Yes, but only simple ones. For example, this is a valid usage of potent.simplify:

const potent = require('potent');
const rule = potent.simplify([
  "//div[@class='title']/a[@id='123']",
  "//div[@class='title']/a[@id='456']",
  "//div[@class='title']/a[@id='999']",
]).toString(); // -> '//div.title/a[@id]'
const allTitles = potent.find(rule, document); // -> ['Title 123', 'Title 124', ... 'Title 456', ..., 'Title 998', 'Title 999']

However, this relies on XPathNodes::fromString() which is only partially implemented -- simple paths, attributes and anything produced from .toString() should be functional.

License

Dual-licensed MIT or BSD-3-Clause. There is significant dependency on BSD-3-Clause licensed code, due to the use of old Firebug code from potent-tools.

Development

  • In general, readability will be preferred to conciseness.
  • Please ensure all unit tests pass (yarn test).
  • Please ensure new code has sufficient coverage (yarn run coverage).
  • Please ensure code has been linted to meet the formatting standards (I use eslint-config-strawhouse and Prettier).