npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

easy-scrape

v1.0.4

Published

Scrape a website more easier than before

Readme

Easy Scrape

A powerful and flexible HTML scraping library built on top of Cheerio. Easy Scrape provides a declarative way to extract data from HTML with support for nested structures, transformations, advanced filtering, and much more.

Features

Simple & Declarative - Define your scraping schema in plain JavaScript objects
🎯 Flexible Selectors - Use CSS selectors to target any element
🔄 Data Transformation - Built-in conversion and transformation pipeline
📋 List Handling - Easy extraction of arrays and nested lists
🎨 Multiple Extraction Modes - Text, HTML, attributes, or custom functions
🔍 Advanced Filtering - Filter elements before extraction
🧭 Navigation - Parent, siblings, and ancestor traversal
🌐 URL Resolution - Automatically resolve relative URLs
📊 Table Parsing - Built-in support for HTML tables
Validation - Validate extracted data with custom functions
🎭 Conditional Extraction - Extract based on conditions
🛠️ Helper Functions - Common transformations included (toNumber, toDate, etc.)
📦 Presets - Ready-to-use patterns for common tasks
🛡️ Error Handling - Strict mode for validation or graceful fallbacks
📘 TypeScript Support - Full TypeScript definitions included

Installation

npm install easy-scrape

Quick Start

import { easyScrape } from 'easy-scrape';

const html = `
  <div class="product">
    <h2>Laptop</h2>
    <span class="price">$999</span>
  </div>
`;

const result = easyScrape(html, {
  title: 'h2',
  price: '.price'
});

console.log(result);
// { title: 'Laptop', price: '$999' }

Table of Contents

API Reference

easyScrape(input, schema, options?)

Parameters:

  • input - HTML string or Cheerio instance
  • schema - Scraping schema defining what to extract
  • options (optional) - Parsing options

Returns: Object with extracted data

Parsing Options

baseUrl (string)

Base URL for resolving relative URLs when resolveUrl is used.

const result = easyScrape(html, schema, {
  baseUrl: 'https://example.com'
});

xmlMode (boolean)

Parse as XML instead of HTML.

const xml = `<?xml version="1.0"?><root><item>Value</item></root>`;

const result = easyScrape(xml, {
  value: 'item'
}, {
  xmlMode: true
});
// { value: 'Value' }

decodeEntities (boolean)

Decode HTML entities. Default: true.

cheerioOptions (object)

Additional Cheerio load options.

Schema Options

Basic Options

selector (string)

CSS selector to find the element(s).

const result = easyScrape(html, {
  title: {
    selector: '.title'
  }
});

Or use shorthand:

const result = easyScrape(html, {
  title: '.title'  // String shorthand
});

attr (string)

Extract a specific attribute value.

const html = `<a href="https://example.com" class="link">Click</a>`;

const result = easyScrape(html, {
  url: {
    selector: '.link',
    attr: 'href'
  }
});
// { url: 'https://example.com' }

attrs (string[])

Extract multiple attributes as an object.

const html = `<a href="/page" class="nav-link" title="Go to page">Link</a>`;

const result = easyScrape(html, {
  linkData: {
    selector: '.nav-link',
    attrs: ['href', 'class', 'title']
  }
});
// { linkData: { href: '/page', class: 'nav-link', title: 'Go to page' } }

html (boolean)

Extract inner HTML instead of text.

const html = `<div class="box"><strong>Bold</strong> text</div>`;

const result = easyScrape(html, {
  content: {
    selector: '.box',
    html: true
  }
});
// { content: '<strong>Bold</strong> text' }

outerHtml (boolean)

Extract outer HTML including the element itself.

const html = `<div class="container"><p>Text</p></div>`;

const result = easyScrape(html, {
  fullHtml: {
    selector: 'p',
    outerHtml: true
  }
});
// { fullHtml: '<p>Text</p>' }

textMode (string)

Control how text is extracted. Options: 'text' (default), 'ownText', 'deepText'.

const html = `<div class="wrapper">Direct text<span>Nested text</span>More direct</div>`;

const result = easyScrape(html, {
  allText: {
    selector: '.wrapper',
    textMode: 'text'  // All text including descendants
  },
  ownText: {
    selector: '.wrapper',
    textMode: 'ownText'  // Only direct text nodes
  }
});
// { allText: 'Direct textNested textMore direct', ownText: 'Direct textMore direct' }

separator (string)

Join multiple text nodes with a separator.

const html = `<div class="list">Apple<br>Banana<br>Cherry</div>`;

const result = easyScrape(html, {
  joined: {
    selector: '.list',
    separator: ', '
  }
});
// { joined: 'Apple, Banana, Cherry' }

trimValue (boolean)

Whether to trim whitespace from extracted values. Default: true.

resolveUrl (boolean)

Resolve relative URLs to absolute using baseUrl option.

const html = `<a href="/about">About</a>`;

const result = easyScrape(html, {
  url: {
    selector: 'a',
    attr: 'href',
    resolveUrl: true
  }
}, {
  baseUrl: 'https://example.com'
});
// { url: 'https://example.com/about' }

Data Transformation

convert (function)

Transform the extracted value.

const html = `<span class="price">$99.99</span>`;

const result = easyScrape(html, {
  price: {
    selector: '.price',
    convert: (value) => parseFloat(value.replace('$', ''))
  }
});
// { price: 99.99 }

transform (function | function[])

Apply transformation pipeline after conversion.

const html = `<span class="amount">  100  </span>`;

const result = easyScrape(html, {
  amount: {
    selector: '.amount',
    transform: [
      (val) => val.trim(),
      (val) => parseInt(val),
      (val) => val * 2
    ]
  }
});
// { amount: 200 }

how (string | function)

Custom extraction method.

const html = `<div class="item" data-id="123">Item</div>`;

const result = easyScrape(html, {
  itemId: {
    selector: '.item',
    how: ($el) => $el.attr('data-id')
  }
});
// { itemId: '123' }

Element Selection & Navigation

eq (number)

Select a specific element by index (0-based).

const html = `
  <ul>
    <li>First</li>
    <li>Second</li>
    <li>Third</li>
  </ul>
`;

const result = easyScrape(html, {
  secondItem: {
    selector: 'li',
    eq: 1
  }
});
// { secondItem: 'Second' }

texteq (number)

Select a specific text node by index.

const html = `<div>Text1<span>Span</span>Text2</div>`;

const result = easyScrape(html, {
  firstText: { selector: 'div', texteq: 0 },
  secondText: { selector: 'div', texteq: 1 }
});
// { firstText: 'Text1', secondText: 'Text2' }

closest (string)

Find the closest ancestor matching the selector.

const html = `
  <div class="container">
    <div class="item">
      <span class="text">Click</span>
    </div>
  </div>
`;

const result = easyScrape(html, {
  containerClass: {
    selector: '.text',
    closest: '.container',
    how: ($el) => $el.attr('class')
  }
});
// { containerClass: 'container' }

parent (number | string)

Navigate to parent element(s).

  • Number: Move up N levels
  • String: Find parent matching selector
const html = `
  <div class="grandparent">
    <div class="parent">
      <span class="child">Text</span>
    </div>
  </div>
`;

const result = easyScrape(html, {
  parentText: {
    selector: '.child',
    parent: 1  // Go up 1 level
  },
  grandparentText: {
    selector: '.child',
    parent: 2  // Go up 2 levels
  }
});

parents (string)

Find ancestor element matching selector.

const result = easyScrape(html, {
  outerDiv: {
    selector: '.child',
    parents: '.grandparent'
  }
});

siblings (string)

Navigate to sibling elements. Options: 'next', 'prev', 'nextAll', 'prevAll'.

const html = `
  <div>
    <span class="first">First</span>
    <span class="target">Target</span>
    <span class="last">Last</span>
  </div>
`;

const result = easyScrape(html, {
  nextSibling: {
    selector: '.target',
    siblings: 'next'
  },
  prevSibling: {
    selector: '.target',
    siblings: 'prev'
  }
});
// { nextSibling: 'Last', prevSibling: 'First' }

siblingSelector (string)

Filter siblings by selector.

const result = easyScrape(html, {
  nextItems: {
    selector: '.marker',
    siblings: 'nextAll',
    siblingSelector: '.item',
    multiple: true
  }
});

Lists and Arrays

listItem (string)

Extract an array of items with nested data.

const html = `
  <ul>
    <li class="item">
      <span class="name">Item 1</span>
      <span class="value">10</span>
    </li>
    <li class="item">
      <span class="name">Item 2</span>
      <span class="value">20</span>
    </li>
  </ul>
`;

const result = easyScrape(html, {
  items: {
    listItem: '.item',
    data: {
      name: '.name',
      value: {
        selector: '.value',
        convert: (v) => parseInt(v)
      }
    }
  }
});
// { items: [{ name: 'Item 1', value: 10 }, { name: 'Item 2', value: 20 }] }

multiple (boolean)

Extract all matching elements as an array.

const html = `
  <span class="tag">JS</span>
  <span class="tag">CSS</span>
  <span class="tag">HTML</span>
`;

const result = easyScrape(html, {
  tags: {
    selector: '.tag',
    multiple: true
  }
});
// { tags: ['JS', 'CSS', 'HTML'] }

includeIndex (boolean)

Add _index property to list items.

const result = easyScrape(html, {
  fruits: {
    listItem: 'li',
    includeIndex: true
  }
});
// { fruits: [{ text: 'Apple', _index: 0 }, { text: 'Banana', _index: 1 }, ...] }

Advanced Features

map (function)

Map over elements with custom transformation.

const html = `
  <div class="product">Product 1</div>
  <div class="product">Product 2</div>
  <div class="product">Product 3</div>
`;

const result = easyScrape(html, {
  products: {
    selector: '.product',
    map: ($el, $, index) => ({
      id: index + 1,
      name: $el.text(),
      upper: $el.text().toUpperCase()
    })
  }
});
// { products: [{ id: 1, name: 'Product 1', upper: 'PRODUCT 1' }, ...] }

Filtering within map: Return null or undefined to exclude items.

const result = easyScrape(html, {
  expensive: {
    selector: '.item',
    map: ($el) => {
      const price = parseInt($el.attr('data-price'));
      if (price < 50) return null;  // Filter out
      return { name: $el.text(), price };
    }
  }
});

filter (function)

Filter elements before extraction.

const html = `
  <div class="item active">Active 1</div>
  <div class="item">Inactive</div>
  <div class="item active">Active 2</div>
`;

const result = easyScrape(html, {
  activeItems: {
    selector: '.item',
    filter: ($el) => $el.hasClass('active'),
    multiple: true
  }
});
// { activeItems: ['Active 1', 'Active 2'] }

regex (RegExp) & regexGroup (number)

Extract data using regular expressions.

const html = `<div class="price">Price: $99.99 USD</div>`;

const result = easyScrape(html, {
  amount: {
    selector: '.price',
    regex: /\$(\d+\.\d+)/,
    regexGroup: 1,  // Capture group (default: 0)
    convert: (val) => parseFloat(val)
  }
});
// { amount: 99.99 }

Conditional Extraction

if (function)

Only extract if condition function returns true.

const html = `<div class="product" data-available="true"><span class="price">$50</span></div>`;

const result = easyScrape(html, {
  price: {
    selector: '.price',
    if: ($) => $('.product').attr('data-available') === 'true'
  }
});
// { price: '$50' }

ifExists (string)

Only extract if selector exists in context.

const html = `
  <div class="container">
    <span class="badge">New</span>
    <span class="price">$99</span>
  </div>
`;

const result = easyScrape(html, {
  price: {
    selector: '.price',
    ifExists: '.badge'  // Only extract if badge exists
  }
});
// { price: '$99' }

ifNotExists (string)

Only extract if selector does NOT exist.

const result = easyScrape(html, {
  regularPrice: {
    selector: '.regular-price',
    ifNotExists: '.sale-price'
  }
});

Array Operations

Array operations are applied in this order: uniqueslicelimitflatten

unique (boolean)

Remove duplicate values from array.

const html = `
  <div class="tag">JavaScript</div>
  <div class="tag">Python</div>
  <div class="tag">JavaScript</div>
`;

const result = easyScrape(html, {
  uniqueTags: {
    selector: '.tag',
    multiple: true,
    unique: true
  }
});
// { uniqueTags: ['JavaScript', 'Python'] }

slice (array)

Array slice operation [start, end].

const result = easyScrape(html, {
  middleItems: {
    selector: '.item',
    multiple: true,
    slice: [1, 4]  // Get items at index 1, 2, 3
  }
});

limit (number)

Limit number of items in array.

const result = easyScrape(html, {
  topItems: {
    selector: '.item',
    multiple: true,
    limit: 3  // Only get first 3 items
  }
});

flatten (boolean | number)

Flatten nested arrays.

const result = easyScrape(html, {
  allTags: {
    listItem: '.category',
    data: {
      tags: {
        selector: '.tag',
        multiple: true
      }
    },
    flatten: true  // Flatten one level
  }
});

Combining array operations:

const result = easyScrape(html, {
  topUnique: {
    selector: '.item',
    multiple: true,
    unique: true,    // Remove duplicates first
    limit: 3         // Then take first 3
  }
});

Validation

validate (function)

Validate extracted value with custom function.

const html = `<div class="email">[email protected]</div>`;

const result = easyScrape(html, {
  email: {
    selector: '.email',
    validate: (value) => value.includes('@')
  }
});
// { email: '[email protected]' }

required (boolean)

Field is required - throws error if missing or empty.

try {
  const result = easyScrape(html, {
    title: {
      selector: '.missing-title',
      required: true  // Will throw error
    }
  });
} catch (error) {
  console.error('Required field missing:', error.message);
}

Table Parsing

Extract data from HTML tables.

const html = `
  <table class="data-table">
    <tr>
      <th>Name</th>
      <th>Age</th>
      <th>City</th>
    </tr>
    <tr>
      <td>John</td>
      <td>30</td>
      <td>NYC</td>
    </tr>
    <tr>
      <td>Jane</td>
      <td>25</td>
      <td>LA</td>
    </tr>
  </table>
`;

const result = easyScrape(html, {
  users: {
    selector: '.data-table',
    table: {
      headers: true,
      selector: 'tr'
    }
  }
});
/*
{
  users: [
    { "Name": "John", "Age": "30", "City": "NYC" },
    { "Name": "Jane", "Age": "25", "City": "LA" }
  ]
}
*/

Without headers:

const result = easyScrape(html, {
  data: {
    selector: 'table',
    table: {
      headers: false
    }
  }
});
// { data: [['Item 1', 'Value 1'], ['Item 2', 'Value 2']] }

Custom table conversion:

const result = easyScrape(html, {
  specs: {
    selector: '.spec-table',
    table: {
      headers: false
    },
    convert: (rows) => {
      const specs = {};
      rows.forEach(row => {
        if (row.length >= 2) {
          specs[row[0]] = row[1];
        }
      });
      return specs;
    }
  }
});
// { specs: { "CPU": "Intel i7", "RAM": "16GB" } }

Error Handling

default (any)

Default value when element is not found.

const result = easyScrape(html, {
  missing: {
    selector: '.not-exist',
    default: 'Not Found'
  }
});

strict (boolean)

Throw errors instead of returning null. Default: false.

try {
  const result = easyScrape(html, {
    required: {
      selector: '.not-exist',
      strict: true  // Will throw error
    }
  });
} catch (error) {
  console.error('Missing required field:', error.message);
}

Nested Data

Extract nested objects using the data property.

const html = `
  <div class="card">
    <h2 class="title">Product</h2>
    <div class="meta">
      <span class="price">$50</span>
      <span class="stock">In Stock</span>
    </div>
  </div>
`;

const result = easyScrape(html, {
  product: {
    selector: '.card',
    data: {
      title: '.title',
      price: '.price',
      stock: '.stock'
    }
  }
});
// { product: { title: 'Product', price: '$50', stock: 'In Stock' } }

Helper Functions

Easy Scrape includes common transformation helpers:

import { easyScrape, helpers } from 'easy-scrape';

const result = easyScrape(html, {
  price: {
    selector: '.price',
    convert: helpers.toNumber  // Parse number from "$1,234.56"
  },
  isAvailable: {
    selector: '.status',
    convert: helpers.toBoolean  // Convert "yes" to true
  },
  publishDate: {
    selector: 'time',
    attr: 'datetime',
    convert: helpers.toDate  // Convert to Date object
  }
});

Available Helpers

  • helpers.toNumber(val) - Parse number from string (removes non-numeric chars)
  • helpers.toInt(val) - Parse integer from string
  • helpers.toBoolean(val) - Convert to boolean (accepts: true, yes, 1, on)
  • helpers.toDate(val) - Convert to Date object
  • helpers.extractUrl(val) - Extract first URL from text
  • helpers.extractEmail(val) - Extract first email from text
  • helpers.stripHtml(html) - Remove HTML tags
  • helpers.parseJson(val) - Parse JSON string
  • helpers.capitalize(val) - Capitalize first letter
  • helpers.slug(val) - Convert to URL-friendly slug

Example:

const html = `
  <div class="data">
    <span class="price">$1,234.56</span>
    <span class="status">yes</span>
    <span class="email">Contact: [email protected]</span>
  </div>
`;

const result = easyScrape(html, {
  price: { selector: '.price', convert: helpers.toNumber },
  isActive: { selector: '.status', convert: helpers.toBoolean },
  email: { selector: '.email', convert: helpers.extractEmail }
});
// { price: 1234.56, isActive: true, email: '[email protected]' }

Presets

Ready-to-use patterns for common scraping tasks:

import { easyScrape, presets } from 'easy-scrape';

const result = easyScrape(html, {
  logo: presets.image('.logo'),
  aboutLink: presets.link('.nav a'),
  description: presets.meta('description'),
  ogImage: presets.ogMeta('image'),
  structuredData: presets.jsonLd()
});

Available Presets

  • presets.link(selector) - Extract href from link
  • presets.image(selector) - Extract src and alt from image
  • presets.meta(name) - Extract meta tag content by name
  • presets.ogMeta(property) - Extract Open Graph meta tag
  • presets.twitterMeta(name) - Extract Twitter Card meta tag
  • presets.jsonLd(selector) - Extract and parse JSON-LD structured data

Example:

const html = `
  <head>
    <meta name="description" content="Page description">
    <meta property="og:title" content="Page Title">
    <script type="application/ld+json">
    {"@type": "Product", "name": "Widget"}
    </script>
  </head>
  <body>
    <img src="/logo.png" alt="Company Logo">
    <a href="/about">About</a>
  </body>
`;

const result = easyScrape(html, {
  logo: presets.image('img'),
  aboutLink: presets.link('a'),
  description: presets.meta('description'),
  ogTitle: presets.ogMeta('title'),
  productData: presets.jsonLd()
});
/*
{
  logo: { src: '/logo.png', alt: 'Company Logo' },
  aboutLink: '/about',
  description: 'Page description',
  ogTitle: 'Page Title',
  productData: { '@type': 'Product', name: 'Widget' }
}
*/

Complex Examples

E-commerce Product Scraping

const result = easyScrape(html, {
  products: {
    listItem: '.product',
    data: {
      id: {
        selector: '',
        how: ($el) => $el.attr('data-id'),
        convert: helpers.toInt
      },
      title: '.title',
      price: {
        selector: '.price',
        convert: helpers.toNumber
      },
      originalPrice: {
        selector: '.original-price',
        convert: helpers.toNumber,
        default: null
      },
      discount: {
        selector: '.discount',
        regex: /(\d+)%/,
        regexGroup: 1,
        convert: helpers.toInt,
        ifExists: '.discount'
      },
      rating: {
        selector: '.rating',
        attr: 'data-rating',
        convert: helpers.toNumber
      },
      inStock: {
        selector: '.stock-status',
        convert: helpers.toBoolean
      },
      images: {
        selector: '.gallery img',
        multiple: true,
        attr: 'src',
        resolveUrl: true
      },
      features: {
        selector: '.features li',
        multiple: true
      }
    }
  }
}, {
  baseUrl: 'https://example.com'
});

Blog Article Extraction

const result = easyScrape(html, {
  article: {
    selector: 'article',
    data: {
      title: 'h1',
      author: {
        selector: '.author',
        regex: /By (.+)/,
        regexGroup: 1
      },
      publishDate: {
        selector: 'time',
        attr: 'datetime',
        convert: helpers.toDate
      },
      readTime: {
        selector: '.read-time',
        regex: /(\d+)/,
        regexGroup: 1,
        convert: helpers.toInt
      },
      tags: {
        selector: '.tag',
        multiple: true
      },
      content: {
        selector: '.article-body',
        html: true
      },
      headings: {
        selector: 'h2, h3',
        multiple: true
      },
      relatedPosts: {
        selector: '.related a',
        map: ($el) => ({
          title: $el.text(),
          url: $el.attr('href')
        }),
        limit: 5
      }
    }
  }
});

Complete E-commerce Page

const result = easyScrape(html, {
  // Meta data
  meta: {
    selector: 'head',
    data: {
      description: presets.meta('description'),
      ogImage: {
        ...presets.ogMeta('image'),
        resolveUrl: true
      },
      structuredData: presets.jsonLd()
    }
  },
  
  // Navigation
  breadcrumb: {
    selector: '.breadcrumb a',
    map: ($el) => ({
      text: $el.text(),
      url: $el.attr('href')
    })
  },
  
  // Product details
  title: '.product-title',
  
  images: {
    selector: '.gallery img',
    multiple: true,
    attr: 'src',
    resolveUrl: true
  },
  
  pricing: {
    selector: '.pricing',
    data: {
      current: {
        selector: '.sale-price',
        convert: helpers.toNumber,
        required: true
      },
      original: {
        selector: '.original-price',
        convert: helpers.toNumber,
        ifExists: '.sale-price'
      },
      savings: {
        selector: '.discount',
        regex: /\$(\d+)/,
        regexGroup: 1,
        convert: helpers.toNumber
      }
    }
  },
  
  availability: {
    selector: '.stock-info',
    data: {
      inStock: {
        selector: '',
        how: ($el) => $el.attr('data-available'),
        convert: helpers.toBoolean
      },
      quantity: {
        selector: '.quantity',
        regex: /(\d+)/,
        regexGroup: 1,
        convert: helpers.toInt
      }
    }
  },
  
  specifications: {
    selector: '.spec-table',
    table: {
      headers: false
    },
    convert: (rows) => {
      const specs = {};
      rows.forEach(([key, value]) => {
        specs[key] = value;
      });
      return specs;
    }
  },
  
  reviews: {
    listItem: '.review',
    data: {
      author: '.author',
      rating: {
        selector: '.rating',
        attr: 'data-rating',
        convert: helpers.toInt
      },
      comment: '.comment',
      date: {
        selector: 'time',
        attr: 'datetime',
        convert: helpers.toDate
      },
      helpful: {
        selector: '.helpful-count',
        regex: /(\d+)/,
        regexGroup: 1,
        convert: helpers.toInt,
        default: 0
      }
    },
    // Only show reviews with 4+ stars
    filter: ($el) => parseInt($el.find('.rating').attr('data-rating')) >= 4,
    limit: 10
  }
}, {
  baseUrl: 'https://shop.example.com'
});

Social Media Profile Scraping

const result = easyScrape(html, {
  profile: {
    selector: '.profile-card',
    data: {
      name: '.profile-name',
      username: {
        selector: '.username',
        regex: /@(.+)/,
        regexGroup: 1
      },
      bio: '.bio',
      avatar: {
        selector: '.avatar',
        attr: 'src',
        resolveUrl: true
      },
      stats: {
        selector: '.stats',
        data: {
          followers: {
            selector: '.followers-count',
            convert: helpers.toNumber
          },
          following: {
            selector: '.following-count',
            convert: helpers.toNumber
          },
          posts: {
            selector: '.posts-count',
            convert: helpers.toNumber
          }
        }
      },
      verified: {
        selector: '.verified-badge',
        how: ($el) => $el.length > 0,
        default: false
      },
      links: {
        selector: '.profile-links a',
        map: ($el) => ({
          text: $el.text(),
          url: $el.attr('href')
        })
      }
    }
  },
  
  posts: {
    listItem: '.post',
    data: {
      id: {
        selector: '',
        how: ($el) => $el.attr('data-post-id')
      },
      content: '.post-content',
      timestamp: {
        selector: 'time',
        attr: 'datetime',
        convert: helpers.toDate
      },
      likes: {
        selector: '.likes-count',
        convert: helpers.toNumber
      },
      comments: {
        selector: '.comments-count',
        convert: helpers.toNumber
      },
      media: {
        selector: '.post-media img',
        multiple: true,
        attr: 'src'
      },
      hashtags: {
        selector: '.hashtag',
        multiple: true,
        unique: true
      }
    },
    limit: 20
  }
});

News Article Aggregation

const result = easyScrape(html, {
  articles: {
    listItem: 'article.news-item',
    data: {
      headline: 'h2',
      summary: '.summary',
      category: {
        selector: '.category',
        convert: (val) => val.trim().toUpperCase()
      },
      author: {
        selector: '.author',
        ifExists: '.author'
      },
      publishDate: {
        selector: 'time',
        attr: 'datetime',
        convert: helpers.toDate
      },
      url: {
        selector: 'a',
        attr: 'href',
        resolveUrl: true
      },
      thumbnail: {
        selector: 'img',
        attr: 'src',
        resolveUrl: true
      },
      readTime: {
        selector: '.read-time',
        regex: /(\d+)/,
        regexGroup: 1,
        convert: helpers.toInt,
        default: null
      },
      isPremium: {
        selector: '.premium-badge',
        how: ($el) => $el.length > 0,
        default: false
      }
    },
    // Filter out old articles (older than 7 days)
    filter: ($el) => {
      const dateStr = $el.find('time').attr('datetime');
      const date = new Date(dateStr);
      const daysDiff = (Date.now() - date.getTime()) / (1000 * 60 * 60 * 24);
      return daysDiff <= 7;
    }
  }
});

Restaurant Menu Scraping

const result = easyScrape(html, {
  restaurant: {
    selector: '.restaurant-info',
    data: {
      name: 'h1',
      cuisine: '.cuisine-type',
      rating: {
        selector: '.rating',
        attr: 'data-rating',
        convert: helpers.toNumber
      },
      priceRange: '.price-range',
      address: '.address',
      phone: {
        selector: '.phone',
        convert: (val) => val.replace(/\D/g, '')
      }
    }
  },
  
  menu: {
    listItem: '.menu-category',
    data: {
      category: '.category-name',
      items: {
        listItem: '.menu-item',
        data: {
          name: '.item-name',
          description: '.item-description',
          price: {
            selector: '.price',
            convert: helpers.toNumber
          },
          calories: {
            selector: '.calories',
            regex: /(\d+)/,
            regexGroup: 1,
            convert: helpers.toInt,
            default: null
          },
          isVegetarian: {
            selector: '.veg-icon',
            how: ($el) => $el.length > 0,
            default: false
          },
          isSpicy: {
            selector: '.spicy-icon',
            how: ($el) => $el.length > 0,
            default: false
          },
          allergens: {
            selector: '.allergen',
            multiple: true,
            default: []
          }
        }
      }
    }
  }
});

Use Cases

Scraping E-commerce Sites

const productData = easyScrape(html, {
  products: {
    listItem: '.product-card',
    data: {
      name: '.product-name',
      price: {
        selector: '.price',
        convert: helpers.toNumber
      },
      rating: {
        selector: '.rating',
        attr: 'data-rating',
        convert: helpers.toNumber
      },
      inStock: {
        selector: '.stock-status',
        convert: helpers.toBoolean
      }
    }
  }
});

Extracting Article Metadata

const article = easyScrape(html, {
  title: 'h1',
  author: '.author-name',
  publishDate: {
    selector: 'time',
    attr: 'datetime',
    convert: helpers.toDate
  },
  tags: {
    selector: '.tag',
    multiple: true
  },
  content: {
    selector: '.article-body',
    html: true
  }
});

Job Listings Scraper

const jobs = easyScrape(html, {
  listings: {
    listItem: '.job-listing',
    data: {
      title: '.job-title',
      company: '.company-name',
      location: '.location',
      salary: {
        selector: '.salary',
        regex: /\$([\d,]+)\s*-\s*\$([\d,]+)/,
        convert: (val) => {
          const match = val.match(/\$([\d,]+)\s*-\s*\$([\d,]+)/);
          if (match) {
            return {
              min: parseInt(match[1].replace(/,/g, '')),
              max: parseInt(match[2].replace(/,/g, ''))
            };
          }
          return null;
        }
      },
      type: '.job-type',
      remote: {
        selector: '.remote-badge',
        how: ($el) => $el.length > 0,
        default: false
      },
      postedDate: {
        selector: '.posted-date',
        attr: 'data-date',
        convert: helpers.toDate
      },
      description: '.job-description',
      requirements: {
        selector: '.requirements li',
        multiple: true
      },
      benefits: {
        selector: '.benefits li',
        multiple: true
      }
    }
  }
});

Real Estate Listings

const properties = easyScrape(html, {
  listings: {
    listItem: '.property-card',
    data: {
      address: '.address',
      price: {
        selector: '.price',
        convert: helpers.toNumber
      },
      bedrooms: {
        selector: '.bedrooms',
        regex: /(\d+)/,
        regexGroup: 1,
        convert: helpers.toInt
      },
      bathrooms: {
        selector: '.bathrooms',
        regex: /(\d+\.?\d*)/,
        regexGroup: 1,
        convert: helpers.toNumber
      },
      sqft: {
        selector: '.sqft',
        convert: helpers.toNumber
      },
      images: {
        selector: '.gallery img',
        multiple: true,
        attr: 'src',
        limit: 10
      },
      features: {
        selector: '.features li',
        multiple: true
      },
      description: '.description',
      listingUrl: {
        selector: 'a.property-link',
        attr: 'href',
        resolveUrl: true
      }
    }
  }
}, {
  baseUrl: 'https://realestate.example.com'
});

TypeScript Support

Easy Scrape includes full TypeScript definitions:

import { easyScrape, ScrapeSchema, ScrapeOptions, helpers } from 'easy-scrape';

interface Product {
  title: string;
  price: number;
  inStock: boolean;
}

const schema: ScrapeSchema = {
  products: {
    listItem: '.product',
    data: {
      title: '.title',
      price: {
        selector: '.price',
        convert: helpers.toNumber
      },
      inStock: {
        selector: '.stock',
        convert: helpers.toBoolean
      }
    }
  }
};

const result = easyScrape(html, schema);
const products: Product[] = result.products;

Type-safe Options

import { ScrapeOptions, HowFunction, ConvertFunction } from 'easy-scrape';

const customHow: HowFunction = ($el) => $el.attr('data-id');

const customConvert: ConvertFunction = (val) => parseInt(val);

const options: ScrapeOptions = {
  selector: '.item',
  how: customHow,
  convert: customConvert,
  multiple: true,
  filter: ($el) => $el.hasClass('active'),
  validate: (val) => val > 0,
  required: true
};

Error Handling Best Practices

1. Use Default Values for Optional Fields

const result = easyScrape(html, {
  optionalField: {
    selector: '.optional',
    default: 'N/A'
  },
  optionalNumber: {
    selector: '.number',
    convert: helpers.toNumber,
    default: 0
  }
});

2. Use Strict Mode for Required Fields

const result = easyScrape(html, {
  requiredField: {
    selector: '.required',
    strict: true  // Throws if missing
  }
});

3. Use Required Flag with Validation

const result = easyScrape(html, {
  email: {
    selector: '.email',
    required: true,
    validate: (val) => val.includes('@')
  }
});

4. Wrap in Try-Catch for Production

try {
  const result = easyScrape(html, schema, { baseUrl });
  // Process result
} catch (error) {
  console.error('Scraping failed:', error.message);
  // Handle error (log, retry, fallback)
}

5. Use Conditional Extraction

const result = easyScrape(html, {
  salePrice: {
    selector: '.sale-price',
    ifExists: '.on-sale',  // Only extract if on sale
    convert: helpers.toNumber
  },
  regularPrice: {
    selector: '.regular-price',
    ifNotExists: '.sale-price',  // Only if no sale
    convert: helpers.toNumber
  }
});

Performance Tips

1. Use Specific Selectors

More specific selectors are faster:

// Slower
const result = easyScrape(html, { title: 'div span' });

// Faster
const result = easyScrape(html, { title: '.product-title' });

2. Avoid Deep Nesting

Flatten your data structure when possible:

// Less efficient
const result = easyScrape(html, {
  level1: {
    selector: '.level1',
    data: {
      level2: {
        selector: '.level2',
        data: {
          level3: '.level3'
        }
      }
    }
  }
});

// More efficient
const result = easyScrape(html, {
  level3: '.level1 .level2 .level3'
});

3. Use multiple Instead of map for Simple Text

// Less efficient (creates function overhead)
const result = easyScrape(html, {
  tags: {
    selector: '.tag',
    map: ($el) => $el.text()
  }
});

// More efficient
const result = easyScrape(html, {
  tags: {
    selector: '.tag',
    multiple: true
  }
});

4. Cache Cheerio Instances

Reuse parsed HTML for multiple extractions:

import * as cheerio from 'cheerio';

const $ = cheerio.load(html);

const result1 = easyScrape($, schema1);
const result2 = easyScrape($, schema2);

5. Use Array Operations Wisely

Apply filters early to reduce processing:

// Process less data
const result = easyScrape(html, {
  items: {
    selector: '.item',
    filter: ($el) => $el.hasClass('active'),
    multiple: true,
    limit: 10
  }
});

Migration Guide

If you're migrating from other scraping libraries:

From Cheerio Direct Usage

Before:

const $ = cheerio.load(html);
const products = [];
$('.product').each((i, el) => {
  products.push({
    title: $(el).find('.title').text(),
    price: parseFloat($(el).find('.price').text().replace('$', ''))
  });
});

After:

const result = easyScrape(html, {
  products: {
    listItem: '.product',
    data: {
      title: '.title',
      price: {
        selector: '.price',
        convert: helpers.toNumber
      }
    }
  }
});

From scrape-it

Easy Scrape is inspired by scrape-it and offers similar API with enhanced features:

// scrape-it style (still works!)
const result = easyScrape(html, {
  title: '.title',
  price: {
    selector: '.price',
    convert: x => parseFloat(x)
  }
});

// Enhanced with new features
const result = easyScrape(html, {
  title: '.title',
  price: {
    selector: '.price',
    convert: helpers.toNumber,
    required: true,
    validate: (val) => val > 0
  }
});

Common Patterns

Extract Links with Text

const links = easyScrape(html, {
  navigation: {
    selector: 'nav a',
    map: ($el) => ({
      text: $el.text(),
      href: $el.attr('href')
    })
  }
});

Extract Meta Tags

const meta = easyScrape(html, {
  title: 'title',
  description: presets.meta('description'),
  keywords: {
    ...presets.meta('keywords'),
    convert: (val) => val.split(',').map(k => k.trim())
  },
  ogTitle: presets.ogMeta('title'),
  ogImage: presets.ogMeta('image')
});

Extract Breadcrumbs

const breadcrumbs = easyScrape(html, {
  trail: {
    selector: '.breadcrumb a',
    map: ($el, $, index) => ({
      position: index + 1,
      name: $el.text(),
      url: $el.attr('href')
    })
  }
});

Extract Pagination Info

const pagination = easyScrape(html, {
  currentPage: {
    selector: '.pagination .active',
    convert: helpers.toInt
  },
  totalPages: {
    selector: '.pagination a:last',
    convert: helpers.toInt
  },
  nextPage: {
    selector: '.pagination .next',
    attr: 'href',
    default: null
  }
});

Debugging Tips

1. Test Selectors Separately

// Test each selector individually
const test = easyScrape(html, {
  test1: '.selector1',
  test2: '.selector2'
});
console.log(test);

2. Use Default Values During Development

const result = easyScrape(html, {
  field: {
    selector: '.test',
    default: 'DEBUG: Not found'  // Makes missing fields obvious
  }
});

3. Log Transform Steps

const result = easyScrape(html, {
  price: {
    selector: '.price',
    transform: [
      (val) => { console.log('1:', val); return val; },
      (val) => val.replace('$', ''),
      (val) => { console.log('2:', val); return val; },
      (val) => parseFloat(val)
    ]
  }
});

4. Inspect Extracted HTML

const result = easyScrape(html, {
  debug: {
    selector: '.target',
    outerHtml: true  // See the actual HTML
  }
});
console.log(result.debug);

FAQ

Q: How do I extract data from a specific element without a selector?

Use an empty selector with context:

{
  id: {
    selector: '',  // Use context element
    how: ($el) => $el.attr('data-id')
  }
}

Q: Can I use custom Cheerio methods?

Yes, via the how function:

{
  custom: {
    selector: '.item',
    how: ($el) => $el.prev().text()  // Any Cheerio method
  }
}

Q: How do I handle missing nested elements?

Use default or ifExists:

{
  optional: {
    selector: '.nested .deep',
    default: null,
    ifExists: '.nested'
  }
}

Q: Can I extract data from multiple pages?

Yes, fetch and scrape each page:

const results = [];
for (const url of urls) {
  const html = await fetch(url).then(r => r.text());
  const data = easyScrape(html, schema);
  results.push(data);
}

Q: How do I handle dynamic content (JavaScript-rendered)?

Easy Scrape works with static HTML. For JavaScript-rendered content, use tools like Puppeteer or Playwright to get the HTML first:

import puppeteer from 'puppeteer';
import { easyScrape } from 'easy-scrape';

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const html = await page.content();
await browser.close();

const result = easyScrape(html, schema);

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Credits

Built on top of Cheerio - Fast, flexible & lean implementation of core jQuery designed specifically for the server.

Inspired by scrape-it - A Node.js scraper for humans.