npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@agent-infra/browser-context

v0.2.2

Published

get browser context for AI Agent

Readme

Browser Context

A powerful web content extraction library that extracts clean, readable content from web pages and converts it to Markdown format. Built for browser automation and content processing workflows.

Features

  • Smart Content Extraction: Uses advanced algorithms (Defuddle + Mozilla Readability) to extract main content from web pages
  • HTML to Markdown Conversion: Clean conversion with support for GitHub Flavored Markdown (GFM)
  • Browser Integration: Works seamlessly with Puppeteer and other browser automation tools
  • Fallback Strategy: Automatically falls back to Readability when primary extraction fails
  • Customizable: Configurable tag removal and conversion options

Installation

pnpm install @agent-infra/browser-context

Usage

Basic Content Extraction

Extract clean content from a web page using Puppeteer:

import { extractContent } from '@agent-infra/browser-context';
import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/article');

// Extract content as Markdown
const result = await extractContent(page);
console.log(result.title); // Article title
console.log(result.content); // Clean Markdown content

await browser.close();

Manual Content Extraction

Extract content from HTML strings:

import {
  extractWithDefuddle,
  extractWithReadability,
} from '@agent-infra/browser-context';

// Using Defuddle (primary method)
const result1 = await extractWithDefuddle(htmlString, url, {
  markdown: true,
});

// Using Readability (fallback method)
const result2 = await extractWithReadability(page, {
  markdown: true,
});

HTML to Markdown Conversion

Convert HTML content to Markdown:

import { toMarkdown } from '@agent-infra/browser-context';

const html = '<h1>Title</h1><p>Content with <strong>bold</strong> text</p>';
const markdown = toMarkdown(html, {
  gfmExtension: true, // Enable GitHub Flavored Markdown
  codeBlockStyle: 'fenced', // Use fenced code blocks
  headingStyle: 'atx', // Use # style headings
});

console.log(markdown);
// # Title
//
// Content with **bold** text

Advanced HTML to Markdown Options

import {
  toMarkdown,
  DEFAULT_TAGS_TO_REMOVE,
} from '@agent-infra/browser-context';

const options = {
  gfmExtension: true,
  codeBlockStyle: 'fenced' as const,
  headingStyle: 'atx' as const,
  emDelimiter: '*',
  strongDelimiter: '**',
  removeTags: [...DEFAULT_TAGS_TO_REMOVE, 'footer', 'nav'], // Remove additional tags
};

const markdown = toMarkdown(htmlContent, options);

API Reference

extractContent(page: Page)

Main extraction function that automatically tries Defuddle first, then falls back to Readability.

Parameters:

  • page: Puppeteer page instance

Returns:

  • Promise<{title: string, content: string}>: Extracted title and Markdown content

extractWithDefuddle(html: string, url: string, options: DefuddleOptions)

Extract content using the Defuddle library.

Parameters:

  • html: HTML content string
  • url: Page URL
  • options: Defuddle configuration options

extractWithReadability(page: Page, options?)

Extract content using Mozilla's Readability algorithm.

Parameters:

  • page: Puppeteer page instance
  • options.markdown: Whether to convert to Markdown (default: false)

toMarkdown(html: string, options?: ToMarkdownOptions)

Convert HTML to Markdown format.

Parameters:

  • html: HTML content string
  • options: Conversion options

ToMarkdownOptions:

  • gfmExtension: Enable GitHub Flavored Markdown (default: true)
  • removeTags: Array of HTML tags to remove
  • codeBlockStyle: 'indented' | 'fenced'
  • headingStyle: 'setext' | 'atx'
  • emDelimiter: Emphasis delimiter
  • strongDelimiter: Strong emphasis delimiter

Content Extraction Strategy

The library uses a smart two-tier extraction strategy:

  1. Primary: Defuddle library for accurate content extraction
  2. Fallback: Mozilla Readability algorithm when Defuddle fails

This ensures maximum compatibility and extraction success across different website structures.

Removed HTML Elements

By default, the following HTML elements are removed during Markdown conversion:

  • script, style, link, head
  • iframe, video, audio, canvas
  • object, embed, noscript
  • aside, dialog

You can customize this list using the removeTags option.

Browser Compatibility

This library is designed to work with:

  • Puppeteer
  • Playwright
  • Any browser automation tool that provides a Page-like interface

License

Apache License 2.0.

Credits

Special thanks to the open source projects that inspired this toolkit:

  • readability - A standalone version of the readability lib