npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

graby-ts-site-config

v1.1.1

Published

Site configuration loader for Graby-TS with dynamic imports

Downloads

29

Readme

Graby-TS Site Config

A dynamic site configuration loader for Graby-TS based on FiveFilters site patterns format. This library provides standardized content extraction rules for different websites, allowing for consistent extraction across a wide range of domains.

The site configuration rules are sourced from FiveFilters ftr-site-config, which contains a comprehensive collection of extraction rules for thousands of websites.

Features

  • Dynamically loads site-specific extraction rules
  • Well-typed with full TypeScript support
  • Memory-efficient with on-demand loading and caching
  • Compatible with all JavaScript environments
  • Supports wildcard domain patterns
  • Based on the established FiveFilters site patterns format

Installation

npm install graby-ts-site-config

Usage

import { SiteConfigManager } from 'graby-ts-site-config';

// Create a site config manager instance
const configManager = new SiteConfigManager();

async function extractContent(url) {
  // Get the site configuration for this URL
  const { hostname } = new URL(url);
  const config = await configManager.getConfigForHost(hostname);
  
  // Now use the configuration with your content extractor
  console.log('Using config:', config);
  
  // Example: checking if this site has specific extraction rules
  if (config.title && config.title.length > 0) {
    console.log('This site has custom title extraction rules');
  }
}

// Preload configs for frequently used sites
configManager.preloadConfigs(['medium.com', 'wikipedia.org']);

API

SiteConfigManager

getConfigForHost(hostname: string): Promise<SiteConfig>

Asynchronously loads and returns the configuration for the given hostname.

const config = await configManager.getConfigForHost('nytimes.com');

hasConfigForHost(hostname: string): boolean

Checks if a configuration exists for the given hostname.

if (configManager.hasConfigForHost('medium.com')) {
  console.log('Medium has custom extraction rules');
}

preloadConfigs(hostnames: string[]): Promise<void>

Preloads configurations for an array of hostnames to improve performance.

await configManager.preloadConfigs(['medium.com', 'wikipedia.org']);

clearCache(): void

Clears the internal configuration cache.

configManager.clearCache();

Configuration Fields

The SiteConfig object contains various fields that control how content is extracted from a website:

Content Selection (XPath expressions)

| Field | Type | Description | |-------|------|-------------| | title | string[] | XPath expressions to extract the page title | | body | string[] | XPath expressions to extract the article body content | | date | string[] | XPath expressions to extract the publication date | | author | string[] | XPath expressions to extract the author(s) information |

Content Cleaning

| Field | Type | Description | |-------|------|-------------| | strip | string[] | XPath expressions for elements to remove from the content | | strip_id_or_class | string[] | Element IDs or classes to remove from the content | | strip_image_src | string[] | Remove images with matching src attributes | | native_ad_clue | string[] | XPath expressions to identify native advertisements |

Processing Options

| Field | Type | Default | Description | |-------|------|---------|-------------| | prune | boolean | true | Clean content from non-essential elements using Readability algorithm | | autodetect_on_failure | boolean | true | Fall back to auto-detection if the pattern-based extraction fails | | insert_detected_image | boolean | true | Insert the main image detected from metadata | | skip_json_ld | boolean | true | Skip extraction from JSON-LD structured data |

Multi-page Handling

| Field | Type | Description | |-------|------|-------------| | single_page_link | string[] | XPath expressions to find the "view as single page" link | | single_page_link_in_feed | string[] | XPath for single-page links in feed items | | next_page_link | string[] | XPath expressions to find links to subsequent pages | | if_page_contains | string[] | XPath expressions for conditional processing of multi-page content |

Content Enhancement

| Field | Type | Description | |-------|------|-------------| | find_string | string[] | Strings to find and replace in the content | | replace_string | string[] | Replacement strings (paired with find_string) | | wrap_in | Record<string, string> | Wrap matching elements with specified tags | | src_lazy_load_attr | string[] | Image attribute names for lazy-loaded images |

HTTP Options

| Field | Type | Description | |-------|------|-------------| | http_header | Record<string, string> | Additional HTTP headers to send with requests |

License

MIT