npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

graby-ts

v1.1.0

Published

TypeScript version of Graby content extraction library

Downloads

39

Readme

Graby-TS

A JavaScript implementation of Graby, the content extraction library.

Overview

Graby-TS extracts content from web pages using site-specific configurations from FiveFilters ftr-site-config and the Mozilla Readability algorithm. This library is designed to be platform-agnostic, working in Node.js and NativeScript environments, with theoretical support for browsers and React Native (though these haven't been tested yet).

Installation

npm install graby-ts

Usage

Node.js

import { NodeGraby } from 'graby-ts/node';

// Create a Graby instance for Node.js
const graby = new NodeGraby();

// Extract content from a URL
const result = await graby.extract('https://example.com/article');

console.log(result.title);       // Article title
console.log(result.html);        // Article HTML content
console.log(result.authors);     // Article authors
console.log(result.date);        // Publication date
console.log(result.image);       // Featured image URL

NativeScript

import { NativeScriptGraby } from 'graby-ts/nativescript';

// Create a Graby instance for NativeScript
const graby = new NativeScriptGraby();

// Extract content from a URL
const result = await graby.extract('https://example.com/article');

console.log(result.title);       // Article title
console.log(result.html);        // Article HTML content
// ... and other properties

NativeScript / React Native Configuration

When using Graby-TS with NativeScript or React Native, you need to add the following to your webpack.config.js:

webpack.chainWebpack((config) => {
  config.resolve.set('fallback', {
    stream: false,
    fs: false,
  });
});

This is required because chardet and iconv-lite has extended functionality, which we don't use in Graby-TS.

API Reference

Extraction Result Properties

| Property | Type | Description | |----------|------|-------------| | title | string | The extracted title of the article | | html | string | The extracted HTML content of the article | | authors | string[] | Array of author names extracted from the article | | date | string \| null | Publication date in ISO format (if available) | | language | string \| null | Detected language of the content (if available) | | image | string \| null | URL of the featured image (if available) | | nextPageUrl | string \| null | URL to the next page (for multi-page articles) | | isNativeAd | boolean | Indicates if the content is a native advertisement | | success | boolean | Whether the extraction was successful | | originalUrl | string | The original URL that was processed | | finalUrl | string | The final URL after following any redirects | | status | number | HTTP status code of the response | | detectedEncoding | string | The original character encoding of the content (before conversion to UTF-8) |

Configuration Options

When creating a Graby instance, you can provide configuration options:

const graby = new NodeGraby({
  httpClient: {
    userAgent: 'Custom User Agent',
    // other options...
  },
  // other settings...
});

Available Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | httpClient | object | See below | HTTP client configuration | | httpClient.userAgent | string | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 | User agent string for requests | | httpClient.referer | string | https://www.google.com/ | Referer header for requests | | httpClient.maxRedirects | number | 10 | Maximum number of redirects to follow | | httpClient.autoDetectEncoding | boolean | true | Enable automatic encoding detection | | httpClient.forceEncoding | string \| null | null | Force a specific encoding (overrides detection) | | extractor | object | See below | Extractor configuration | | extractor.enableXss | boolean | true | Enable XSS protection for extracted content | | silent | boolean | false | Suppress console messages | | multipage | boolean | true | Enable multi-page article support | | multipageLimit | number | 10 | Maximum number of pages to process for multi-page articles |

Extracting from HTML

If you already have the HTML content, you can extract from it directly:

const graby = new NodeGraby();

// From string (UTF-8 or specified encoding)
const result = await graby.extractFromHtml(htmlContent, url);

// From binary data with automatic encoding detection
const result = await graby.extractFromHtml(binaryData, url);

// From binary data with specified encoding
const result = await graby.extractFromHtml(binaryData, url, 'windows-1251');

Note: The URL is still required to resolve relative links in the HTML.

Character Encoding Support

Graby-TS provides robust character encoding detection and conversion:

  1. Multi-level detection: Encodings are detected in the following order:

    • HTTP Content-Type header charset
    • XML/HTML meta tags and charset declarations
    • Binary content analysis using chardet
  2. Automatic conversion: Content is automatically converted to UTF-8 for processing and output

    • The original encoding is preserved in the detectedEncoding property
    • All returned HTML content is always in UTF-8, regardless of the source encoding
  3. Support for many encodings: Including UTF-8, ISO-8859-1, Windows-1251, Shift-JIS, and many more

  4. Special handling: Proper handling for common encodings like ISO-8859-1 to ensure special characters are preserved

Features Comparison with PHP Graby

✅ Implemented

  • Basic content extraction using site configs
  • Readability algorithm as fallback
  • HTML cleanup and post-processing
  • HTTP client with proper handling of redirects
  • Support for metadata extraction (OpenGraph, JSON-LD)
  • Lazy image loading detection and fixing
  • XSS protection
  • Multipage article support
  • Site-specific HTTP headers
  • Character encoding detection and conversion
  • wrap_in functionality to enclose content in specific tags
  • Unlike PHP Graby, this implementation uses the xpath-to-selector library to convert XPath expressions to CSS selectors instead of providing full XPath support. This works in most cases where simple XPath expressions can be converted to CSS.

🚧 Coming Soon

  • PDF and non-HTML content processing
  • Advanced content type exclusion handling
  • URL rewriting rules

Not planned

  • Advanced logging system

Platform Support

  • ✅ Node.js
  • ✅ NativeScript
  • 🔍 Browsers (probably)
  • 🔍 React Native (probably)

Credits

License

MIT