npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

read-comfortably

v2.2.7

Published

turns any web page into a clean view for reading

Downloads

226

Readme

read-comfortably

turns any web page into a clean view for reading.

This module is based on arc90's readability project.

Example

Before -> After

Install

$ npm install --save read-comfortably

Note that as of our 2.0.0 release, this module only works with Node.js >= 4.0. In the meantime you are still welcome to install a release in the 1.x series(by npm install node-comfortably@1) if you use an older Node.js version.

Usage

Promise read(html [, options])

Where

  • html url or html code.
  • options is an optional options object
  • Promise is the return to run - read(..).then(..)

Example

const read = require('read-comfortably');
const fs = require('fs');
const start = new Date();
read('http://abduzeedo.com/einsteins-theory-general-relativity-turns-100-video').then(
  result => {
    const { res, article } = result;
    console.log('res:', res); // Response Object from fetchUrl Lib
    console.log('dom:', article.dom); // DOM
    console.log('title:', article.title); // Title
    console.log('desc:', article.getDesc(300)); // Description Article
    article.images.then(images => console.log('images:', images)); // Article's Images
    fs.writeFile('test/article.html', article.html, err => { // HTML Source Code
      if (err) return console.error('error:', err);
      console.log('article(%d) is saved!', article.html.length, new Date() - start);
    });
    fs.writeFile('test/content.html', article.content, err => { // Main Article
      if (err) return console.error('error:', err);
      console.log('content(%d) is saved!', article.content.length, new Date() - start);
    });
    const sources = [
      { selector: 'script[src]', attr: 'async', val: 'async' },
      { selector: 'link[rel="stylesheet"]', attr: 'href', tag: 'style' }
    ];
    article.getHtmls(sources).then(
      htmls => { // HTML Source Code by replace css files
        fs.writeFile('test/sources.html', htmls, err => {
          if (err) return console.error('error:', err);
          console.log('sources(%d) is saved!', article.html.length, new Date() - start);
        })
      }
    );
    article.iframes.then(
      iframes => { // Article's Iframes
        iframes.forEach((iframe, index) => {
          fs.writeFile('test/iframe/' + index + '.html', iframe.buf, err => {
            if (err) return console.error('error:', err);
            console.log('%s(%d) is saved!', iframe.url, index, new Date() - start);
          });
        });
      }
    );
  },
  err => console.error(err)
);

Options

read-comfortably will pass the options to fetchUrl directly. See fetchUrl lib to view all available options.

read-comfortably has twelve additional options:

  • urlprocess which should be a function to check or modify url before passing it to readability.

options.urlprocess = callback(url, options);

read(
  url,
  {
    urlprocess: (url, options) => {
      //...
    }
  }
);
  • preprocess which should be a function to check or modify downloaded source before passing it to readability.

options.preprocess = callback($, options);

read(
  url,
  {
    preprocess: ($, options) => {
      //...
    }
  }
);
  • postprocess which should be a function to check or modify article content after passing it to readability.

options.postprocess = callback(node, $);

read(
  url,
  {
    postprocess: (node, $) => {
      //...
    }
  }
);
  • asyncprocess which should be a function to async check or modify downloaded source before passing it to readability.

options.asyncprocess = callback(url, options);

read(
  url,
  {
    asyncprocess: (url, options) => {
      return new Promise((resolve, reject) => {
        //...
        resolve(..);
      });
    }
  }
);
  • afterToRemove which allow set your own nodes to remove array for tags after grabArticle function.

options.afterToRemove = array; (default ['script', 'noscript'])

read(
  url,
  {
    afterToRemove: [
      'iframe',
      'script',
      'noscript'
    ]
  }
);
  • nodesToRemove which allow set your own nodes to remove array for tags.

options.nodesToRemove = array;

read(
  url,
  {
    nodesToRemove: [
      'meta',
      'aside',
      'style',
      'object',
      'iframe',
      'script',
      'noscript'
    ]
  }
);
  • noChdToRemove which allow set your own nodes to remove array when it no children for tags.

options.noChdToRemove = array; (default ['div'])

read(
  url,
  {
    noChdToRemove: [
      'div',
      'li'
    ]
  }
);
  • considerDIVs true for turn all divs that don't have children block level elements into p's.

options.considerDIVs = boolean; (default false)

read(
  url,
  {
    considerDIVs: true
  }
);
  • nodesToScore which allow set your own nodes to score array for tags.

options.nodesToScore = array; (default ['p', 'article'])

read(
  url,
  {
    nodesToScore: ['p', 'pre']
  }
);
  • nodesToAppend which allow set your own nodes to append array for tags.

options.nodesToAppend = array; (default ['p'])

read(
  url,
  {
    nodesToAppend: ['pre']
  }
);
  • maybeImgsAttr which allow set your own maybe image's attributes.

options.maybeImgsAttr = array; (default ['src', 'href'])

read(
  url,
  {
    maybeImgsAttr: ['src', 'data-src']
  }
);
  • hostnameParse which allow you to convert to another hostname.

options.hostnameParse = object;

read(
  url,
  {
    hostnameParse = { 'www.google.com': 'www.google.com.hk' }
  }
);

article object

If html is an image, article is a buffer.

Else

content

The article content of the web page.

title

The article title of the web page. It's may not same to the text in the <title> tag.

html

The original html of the web page.

dom

The document of the web page generated by jsdom. You can use it to access the DOM directly(for example, article.document.getElementById('main')).

getDesc(length)

The article description of the web page.

iframes

The article content's iframes of the web page.

images

The article content's images of the web page.

getHtmls(files)

The original html of the web page by replace specified file.

res object

status

HTTP status code

responseHeaders

response headers

finalUrl

last url value, useful with redirects

redirectCount

how many redirects happened

cookieJar

CookieJar object for sharing/retrieving cookies

Why not JSDOM

Before starting this project I used jsdom, but the dependencies of that project plus the slowness of JSDOM made it very frustrating to work with. The compiling of contextify module (dependency of JSDOM) failed 9/10 times. And if you wanted to use it with node-webkit you had to manually rebuild contextify with nw-gyp, which is not the optimal solution.

So I decided to write my own version of Arc90's Readability using the fast Cheerio engine with the least number of dependencies.

The Usage of this module is similiar to JSDOM, so it's easy to switch.

The lib is using Cheerio engine because it can converted url to utf-8 automatically.

Contributors

https://gitlab.com/unrealce/read-comfortably

https://github.com/wzbg/read-comfortably

Related

  • cheerio - Tiny, fast, and elegant implementation of core jQuery designed specifically for the server.
  • fetch-promise - Fetch URL contents By Promise.
  • image-size - get dimensions of any image file.
  • is-image-url - Check if a url is an image.
  • is-pdf - Check if a Buffer/Uint8Array is a 7ZIP file.
  • is-url - Check whether a string is a URL.
  • log4js - Port of Log4js to work with node.
  • string - string contains methods that aren't included in the vanilla JavaScript string such as escaping html, decoding html entities, stripping tags, etc.
  • url - The core url packaged standalone for use with Browserify.

License

The MIT License (MIT)

Copyright (c) 2015 - 2016