npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

page-content-for-ai

v1.0.0

Published

Extract web page content in a format optimized for AI/LLM consumption with semantic information about forms, buttons, tables, and interactive elements

Readme

page-content-for-ai

Extract web page content in a format optimized for AI/LLM consumption. Converts HTML to semantic markdown with enhanced information about forms, buttons, tables, and interactive elements.

Features

Semantic Extraction

  • Preserves form inputs with their current values and states
  • Captures button states (expanded/collapsed/disabled)
  • Converts HTML and ARIA tables to markdown tables
  • Identifies semantic sections (header, nav, main, footer)

🎯 AI-Optimized

  • Clean markdown output perfect for LLM context
  • Captures data-testid and component metadata
  • Includes page metadata (title, URL, language, viewport)
  • Tracks active input and scroll position

🚀 Modern Web Support

  • Handles ARIA roles (role="table", role="button", etc.)
  • Supports React/Vue/modern framework patterns
  • Works in both browser and Node.js environments
  • TypeScript support with full type definitions

Installation

npm install page-content-for-ai

Usage

Browser Environment

import { extractPageContent } from 'page-content-for-ai';

// Extract current page content
const content = extractPageContent(document.body, document, window);

console.log(content.title);          // "Example Page"
console.log(content.url);            // "https://example.com"
console.log(content.language);       // "en"
console.log(content.scrollPosition); // "25%"
console.log(content.content);        // Markdown representation

// Send to AI
const prompt = `Based on this page content, help the user:\n\n${content.content}`;

With Options

import { extractPageContent } from 'page-content-for-ai';

const content = extractPageContent(document.body, document, window, {
  includeFormData: true,   // Capture input values and states (default: true)
  includeTables: true,     // Convert tables to markdown (default: true)
  includeMetadata: true,   // Add data-testid attributes (default: true)
});

Browser Extension Example

// In your content script
import { extractPageContent } from 'page-content-for-ai';

chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
  if (request.action === 'extractContent') {
    const content = extractPageContent(document.body, document, window);
    sendResponse(content);
  }
});

Custom Turndown Service

import { extractPageContent } from 'page-content-for-ai';
import TurndownService from 'turndown';

// Customize markdown conversion
const customTurndown = new TurndownService({
  headingStyle: 'setext',
  hr: '---',
});

const content = extractPageContent(document.body, document, window, {
  turndownService: customTurndown,
});

Output Format

The extracted content includes:

interface PageContent {
  title: string;           // Page title
  url: string;             // Current URL
  description: string;     // Meta description
  viewport: string;        // Viewport size (e.g., "1920x1080")
  language: string;        // Page language
  scrollPosition: string;  // Scroll percentage
  activeInput?: string;    // Currently focused input (if any)
  content: string;         // Markdown content
}

Example Markdown Output

--- HEADER ---
[Homepage](/)
[BUTTON: Menu | collapsed]
--- END HEADER ---

--- MAIN ---
# Welcome to Example Page

[INPUT: Email address | type: email | required]
[INPUT: Password | type: password]
[BUTTON: Sign In]

**Table: User Data**
| Name | Status | Actions |
| --- | --- | --- |
| John Doe | Active | Edit |
| Jane Smith | Pending | Edit |
--- END MAIN ---

--- FOOTER ---
[Privacy Policy](/privacy)
© 2025 Example Inc.
--- END FOOTER ---

Features in Detail

Form Extraction

Captures form inputs with their current state:

[INPUT: Search query | type: search | value: "example"]
[INPUT: Email | type: email | required]
[x] Remember me
[ ] Send notifications
[SELECT: Country | selected: "United States"]

Button States

Enhanced button information:

[BUTTON: Submit]
[BUTTON: Menu | collapsed]
[BUTTON: Save | disabled]
[BUTTON: Language Selector | role: combobox]

Table Support

Both HTML and ARIA tables:

**Table: Monthly Sales**
| Month | Revenue | Growth |
| --- | --- | --- |
| January | $50,000 | 5% |
| February | $55,000 | 10% |

Semantic Sections

Clear section boundaries:

--- NAVIGATION (Main Menu) ---
[Home](/) [About](/about) [Contact](/contact)
--- END NAVIGATION ---

--- MAIN ---
Page content here...
--- END MAIN ---

Use Cases

  • 🤖 AI Chatbots - Provide page context to conversational AI
  • 🔧 Browser Extensions - Extract content for AI-powered tools
  • 📊 Web Scraping - Get clean, structured content for analysis
  • 🧪 Testing - Verify page content in a readable format
  • 📱 Mobile Apps - Parse web content for in-app AI features

Comparison with Alternatives

| Feature | page-content-for-ai | Mozilla Readability | Turndown | Cheerio | |---------|---------------------|---------------------|----------|---------| | Form State | ✅ | ❌ | ❌ | ❌ | | Button States | ✅ | ❌ | ❌ | ❌ | | ARIA Tables | ✅ | ❌ | ❌ | ❌ | | Semantic Sections | ✅ | ✅ | ❌ | ❌ | | Metadata | ✅ | Limited | ❌ | ❌ | | AI-Optimized | ✅ | Partial | ❌ | ❌ | | TypeScript | ✅ | ✅ | ✅ | ✅ |

Browser Compatibility

Works in all modern browsers:

  • Chrome/Edge 90+
  • Firefox 88+
  • Safari 14+

Node.js Usage

For server-side usage with JSDOM:

import { JSDOM } from 'jsdom';
import { extractPageContent } from 'page-content-for-ai';

const html = '<html>...</html>';
const dom = new JSDOM(html);

const content = extractPageContent(
  dom.window.document.body,
  dom.window.document,
  dom.window as any
);

API Reference

extractPageContent(body, document, window, options?)

Extract page content as a structured object.

Parameters:

  • body: HTMLElement - The HTML element to extract (usually document.body)
  • document: Document - The document object
  • window: Window - The window object
  • options?: PageContentOptions - Configuration options

Returns: PageContent

extractPageContentAsToml(body, document, window, options?)

Legacy TOML format output (deprecated).

Returns: string - TOML formatted content

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT © Trung Kien Dang

Acknowledgments

  • Built on top of Turndown for HTML to Markdown conversion
  • Inspired by Mozilla Readability for clean content extraction
  • Designed for modern AI/LLM applications