npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@hmb-research/x-ray

v3.1.0

Published

structure any website

Downloads

119

Readme

x-ray

NPM Version CI Status NPM Downloads Node Version GitHub Issues License

var Xray = require('@hmb-research/x-ray')
var x = Xray()

x('https://blog.ycombinator.com/', '.post', [
  {
    title: 'h1 a',
    link: '.article-title@href'
  }
])
  .paginate('.nav-previous a@href')
  .limit(3)
  .write('results.json')

Installation

npm install @hmb-research/x-ray

Features

  • Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.

  • Composable: The API is entirely composable, giving you great flexibility in how you scrape each page.

  • Pagination support: Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose what you've already scraped.

  • Crawler support: Start on one page and move to the next easily. The flow is predictable, following a breadth-first crawl through each of the pages.

  • Responsible: X-ray has support for concurrency, throttles, delays, timeouts and limits to help you scrape any page responsibly.

  • Pluggable drivers: Swap in different scrapers depending on your needs. Currently supports HTTP and PhantomJS driver drivers. In the future, I'd like to see a Tor driver for requesting pages through the Tor network.

Selector API

xray(url, selector)(fn)

Scrape the url for the following selector, returning an object in the callback fn. The selector takes an enhanced jQuery-like string that is also able to select on attributes. The syntax for selecting on attributes is selector@attribute. If you do not supply an attribute, the default is selecting the innerText.

Here are a few examples:

  • Scrape a single tag
xray('http://google.com', 'title')(function(err, title) {
  console.log(title) // Google
})
  • Scrape a single class
xray('http://reddit.com', '.content')(fn)
  • Scrape an attribute
xray('http://techcrunch.com', 'img.logo@src')(fn)
  • Scrape innerHTML
xray('http://news.ycombinator.com', 'body@html')(fn)

xray(url, scope, selector)

You can also supply a scope to each selector. In jQuery, this would look something like this: $(scope).find(selector).

xray(html, scope, selector)

Instead of a url, you can also supply raw HTML and all the same semantics apply.

var html = '<body><h2>Pear</h2></body>'
x(html, 'body', 'h2')(function(err, header) {
  header // => Pear
})

API

xray.driver(driver)

Specify a driver to make requests through. Available drivers include:

  • request - A simple driver built around request. Use this to set headers, cookies or http methods.
  • phantom - A high-level browser automation library. Use this to render pages or when elements need to be interacted with, or when elements are created dynamically using javascript (e.g.: Ajax-calls).

xray.stream()

Returns Readable Stream of the data. This makes it easy to build APIs around x-ray. Here's an example with Express:

var app = require('express')()
var x = require('x-ray')()

app.get('/', function(req, res) {
  var stream = x('http://google.com', 'title').stream()
  stream.pipe(res)
})

xray.write([path])

Stream the results to a path.

If no path is provided, then the behavior is the same as .stream().

xray.then(cb)

Constructs a Promise object and invoke its then function with a callback cb. Be sure to invoke then() at the last step of xray method chaining, since the other methods are not promisified.

x('https://dribbble.com', 'li.group', [
  {
    title: '.dribbble-img strong',
    image: '.dribbble-img [data-src]@data-src'
  }
])
  .paginate('.next_page@href')
  .limit(3)
  .then(function(res) {
    console.log(res[0]) // prints first result
  })
  .catch(function(err) {
    console.log(err) // handle error in promise
  })

xray.paginate(selector)

Select a url from a selector and visit that page.

xray.limit(n)

Limit the amount of pagination to n requests.

xray.abort(validator)

Abort pagination if validator function returns true. The validator function receives two arguments:

  • result: The scrape result object for the current page.
  • nextUrl: The URL of the next page to scrape.

xray.delay(from, [to])

Delay the next request between from and to milliseconds. If only from is specified, delay exactly from milliseconds.

var x = Xray().delay('1s', '10s')

xray.concurrency(n)

Set the request concurrency to n. Defaults to Infinity.

var x = Xray().concurrency(2)

xray.throttle(n, ms)

Throttle the requests to n requests per ms milliseconds.

var x = Xray().throttle(2, '1s')

xray.timeout (ms)

Specify a timeout of ms milliseconds for each request.

var x = Xray().timeout(30)

Collections

X-ray also has support for selecting collections of tags. While x('ul', 'li') will only select the first list item in an unordered list, x('ul', ['li']) will select all of them.

Additionally, X-ray supports "collections of collections" allowing you to smartly select all list items in all lists with a command like this: x(['ul'], ['li']).

Selector Types

X-ray supports multiple selector types to handle different data extraction needs. Each type provides unique capabilities for structuring and extracting data.

String Selectors

The most common selector type. Uses CSS-like syntax with support for attribute extraction using @.

x('http://example.com', {
  title: 'h1',                    // Extract text content
  link: 'a@href',                 // Extract href attribute
  html: '.content@html',          // Extract innerHTML
  image: 'img.logo@src'           // Extract src attribute
})

Function Selectors

Use functions for advanced extraction logic or to compose nested x-ray instances.

x('http://example.com', {
  // Custom extraction logic
  custom: function($, callback) {
    const value = $('.selector').text().toUpperCase()
    callback(null, value)
  },

  // Nested x-ray instance (crawling)
  details: x('.link@href', {
    title: 'h1',
    description: 'p'
  })
})

Array Selectors

Extract multiple elements into an array.

// Array of strings
x('http://example.com', {
  links: ['a@href']  // Returns array of all href attributes
})

// Array of objects
x('http://example.com', '.items', [{
  title: 'h2',
  price: '.price',
  link: 'a@href'
}])

RegExp Selectors

Extract data using regular expressions with capture groups.

x('http://example.com', {
  price: /\$(\d+\.\d{2})/,        // Extracts "19.99" from "$19.99"
  email: /[\w.]+@[\w.]+\.\w+/,    // Extracts email address
  orderId: /Order #([A-Z0-9-]+)/  // Extracts order ID from text
})

RegExp selectors return the first capture group if present, otherwise the full match. Returns null if no match is found.

Optional Fields (null/undefined)

Use null or undefined to define optional fields that always return null.

x('http://example.com', {
  title: 'h1',           // Required field
  subtitle: null,        // Optional field (always null)
  description: undefined // Optional field (always null)
})

This is useful for defining a consistent schema where some fields may not always be present.

Custom Type Handlers

Register custom type handlers for specialized extraction logic.

const x = Xray()

// Define a custom type
function PriceType(selector) {
  this.selector = selector
}

// Register the handler
x.type('price', function(value, $, scope, filters, callback) {
  const text = $(value.selector).text()
  const price = parseFloat(text.replace(/[^0-9.]/g, ''))
  callback(null, isNaN(price) ? null : price)
}, function(value) {
  return value instanceof PriceType
})

// Use the custom type
x('http://example.com', {
  price: new PriceType('.price')
})(function(err, result) {
  console.log(result.price) // Returns numeric price
})

Strict Mode

Enable strict type validation during development to catch selector type errors:

const x = Xray({ strict: true })

x('http://example.com', {
  title: 'h1',
  invalid: 123  // Throws TypeError in strict mode
})

TypeScript Support

X-ray now includes TypeScript definitions for type-safe scraping:

import XRay = require('@hmb-research/x-ray')

interface Article {
  title: string
  author: string
  price: string | null
}

const xray = XRay()
const result = await xray('http://example.com', '.article', [{
  title: 'h2',
  author: '.author',
  price: /\$(\d+\.\d{2})/
}])

// result is typed as Article[]

Composition

X-ray becomes more powerful when you start composing instances together. Here are a few possibilities:

Crawling to another site

var Xray = require('@hmb-research/x-ray')
var x = Xray()

x('http://google.com', {
  main: 'title',
  image: x('#gbar a@href', 'title') // follow link to google images
})(function(err, obj) {
  /*
  {
    main: 'Google',
    image: 'Google Images'
  }
*/
})

Scoping a selection

var Xray = require('@hmb-research/x-ray')
var x = Xray()

x('http://mat.io', {
  title: 'title',
  items: x('.item', [
    {
      title: '.item-content h2',
      description: '.item-content section'
    }
  ])
})(function(err, obj) {
  /*
  {
    title: 'mat.io',
    items: [
      {
        title: 'The 100 Best Children\'s Books of All Time',
        description: 'Relive your childhood with TIME\'s list...'
      }
    ]
  }
*/
})

Filters

Filters can specified when creating a new Xray instance. To apply filters to a value, append them to the selector using |.

var Xray = require('@hmb-research/x-ray')
var x = Xray({
  filters: {
    trim: function(value) {
      return typeof value === 'string' ? value.trim() : value
    },
    reverse: function(value) {
      return typeof value === 'string'
        ? value
            .split('')
            .reverse()
            .join('')
        : value
    },
    slice: function(value, start, end) {
      return typeof value === 'string' ? value.slice(start, end) : value
    }
  }
})

x('http://mat.io', {
  title: 'title | trim | reverse | slice:2,3'
})(function(err, obj) {
  /*
  {
    title: 'oi'
  }
*/
})

Examples

In the Wild

  • Levered Returns: Uses x-ray to pull together financial data from various unstructured sources around the web.

Resources

  • Video: https://egghead.io/lessons/node-js-intro-to-web-scraping-with-node-and-x-ray

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT