npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ai-scraper-fallback

v0.0.6

Published

A robust HTML-to-JSON scraper for real estate websites, using Google's Gemini API to extract structured data from complex web pages.

Downloads

292

Readme

AI Scraper Fallback 🤖

Resilient Web Scraping with LLM-powered Fault Tolerance.

NPM Version License: MIT

Never let a website layout change break your production scraper again. ai-scraper-fallback provides a smart safety net for your data extraction pipelines using Google Gemini AI.


💡 Why use this?

Use this when your CSS selectors fail because a website changed its layout.

Traditional scrapers (Cheerio, Puppeteer, Playwright) are fast but fragile. When they break, your data pipeline stops. ai-scraper-fallback acts as a Self-healing layer:

  1. Try your traditional scraper first (Fast & Cheap).
  2. If it fails (no data found), trigger ai-scraper-fallback (Smart & Resilient).
  3. Extract data successfully even if the HTML structure has completely changed.

🚀 Quick Start

const { scrapeWithAI } = require('ai-scraper-fallback');

// 1. Define what you want to extract (JSON Schema)
const schema = {
  type: "array",
  items: {
    type: "object",
    properties: {
      title: { type: "string" },
      price: { type: "number" },
      link: { type: "string" }
    }
  }
};

async function run() {
  const html = "<html>...your messy HTML...</html>";
  
  // 2. Trigger the magic
  const results = await scrapeWithAI(html, schema, "YOUR_GEMINI_API_KEY");
  console.log(results);
}

🛠️ API Reference

scrapeWithAI(html, schema, apiKey, customPrompt)

The primary generic function for any data extraction task.

| Argument | Type | Description | | :--- | :--- | :--- | | html | string | The raw HTML source code to analyze. | | schema | object | A JSON Schema defining the structure you want. | | apiKey | string | (Optional) Your Gemini API Key. | | customPrompt | string | (Optional) Additional instructions for the AI. |

scrapeHouses(html, context, apiKey)

A pre-built shortcut for real estate listings.

  • Context: Describe the source (e.g., 'Yungching', '591') to improve accuracy.
  • Output: Returns an array of objects containing title, address, price, description, link.

⚙️ Configuration

Instead of passing the API key every time, you can set it as an environment variable:

# .env file
GEMINI_API_KEY=your_actual_api_key_here

🌏 Multilingual Introduction

🇺🇸 English

Stop fighting fragile CSS selectors. This package implements a Self-healing Scraper pattern. Use your traditional fast/cheap scrapers for daily tasks, and automatically trigger this AI-driven engine when structural changes occur. It "reads" and understands the page just like a human.

🇹🇼 繁體中文 (Traditional Chinese)

別再為了脆弱的 CSS 選擇器而通宵修 Bug。本套件實現了 「自我修復爬蟲 (Self-healing Scraper)」 模式。平時維持高效能的傳統爬蟲,一旦偵測到網頁改版、資料失效時,系統會自動切換至 Gemini AI 引擎,像人類一樣「閱讀」並精準救回結構化資料。

🇮🇩 Bahasa Indonesia (Susi, ini untukmu!)

Berhenti memperbaiki kode yang gampang rusak. Paket ini menggunakan sistem "Self-healing". Jika tampilan website berubah, AI (Gemini) akan otomatis membantu mengambil data agar program tidak mati. Sangat cerdas dan kuat!


✨ Key Features

  • 🛡️ Resilient Scraping: Automatically handles website structural changes.
  • 🧠 Semantic Understanding: Extracts data based on meaning, not just tags.
  • ⚡ LLM-powered Fault Tolerance: A cost-effective safety net for your existing scrapers.
  • 📦 Zero-config Extraction: No complex setup, just provide HTML and get JSON.
  • 🔥 Powered by Gemini: Leveraging gemini-2.0-flash for state-of-the-art speed.

🔧 Relationship with other tools

| Feature | Traditional (Cheerio/Puppeteer) | AI Scraper Fallback | | :--- | :--- | :--- | | Speed | ⚡ Extremely Fast | 🐢 Slow (LLM Latency) | | Cost | 💸 Near Zero | 💰 LLM Token Cost | | Reliability | 📉 Low (Breaks on CSS changes) | 📈 High (Understands semantics) |


💻 Installation

npm install ai-scraper-fallback

📄 License

MIT