npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@nicolaygerold/pluck

v0.1.7

Published

LLM-native HTML selector library. Extract data with XPath and CSS.

Readme

pluck

LLM-native HTML selector library. Extract data from HTML with XPath and CSS selectors.

bun add pluck

Quick Start

import { pluck } from "pluck";

const html = `
  <div class="product">
    <h1>Wireless Headphones</h1>
    <span class="price">$99.99</span>
    <a href="/buy/123">Buy now</a>
  </div>
`;

const doc = pluck(html);

// Extract text
const title = doc.css("h1::text").get();
// → "Wireless Headphones"

// Extract attribute
const link = doc.xpath("//a/@href").get();
// → "/buy/123"

// Check if element exists
if (doc.css(".discount").ok) {
  console.log("On sale!");
}

API Reference

Entry Point

import { pluck } from "pluck";

const doc = pluck(html);              // Parse HTML string
const doc = pluck(html, { debug: true }); // Enable debug logging

Query Methods

doc.css("selector")      // CSS selector
doc.xpath("//expression") // XPath expression

Both return a Selector that can be chained.

Extraction

.get()              // First match or null
.get("default")     // First match or default value
.getall()           // All matches as string[]

.text()             // Combined text content
.attr("name")       // Attribute value or null
.html()             // Inner HTML or null

Feedback

.ok                 // true if selector matched anything
.count              // Number of matches
.selector           // The selector string used

.result()           // { ok: true, value, count } or { ok: false, selector }

Chaining

.or(fallback)       // Use fallback selector if no match
.map(fn)            // Transform values: .map(s => s.toUpperCase())
.first()            // First match only
.last()             // Last match only
.eq(n)              // Nth match (0-indexed)

Iteration

.each((sel, i) => {})  // Iterate with callback
.toArray()             // Convert to Selector[]

for (const item of doc.css("li")) {
  console.log(item.text());
}

Pseudo-Elements

Extract text or attributes directly in selectors:

// CSS
doc.css("h1::text").get()           // Text content
doc.css("a::attr(href)").get()      // Attribute value

// XPath
doc.xpath("//h1::text").get()       // Text content  
doc.xpath("//a::attr(href)").get()  // Attribute value
doc.xpath("//a/@href").get()        // Also works (native XPath)

XPath Cheatsheet

Axes

//div                 // All div elements anywhere
/div                  // Direct child div
.//div                // Descendant div from current context
..                    // Parent element
ancestor::div         // Ancestor div elements
following-sibling::p  // Following p siblings
preceding-sibling::p  // Preceding p siblings
following::p          // All following p elements
preceding::p          // All preceding p elements

Predicates

//div[1]                      // First div
//div[last()]                 // Last div
//div[position() > 1]         // All except first
//a[@href]                    // Has href attribute
//a[@class='active']          // Exact attribute match
//a[@class!='hidden']         // Attribute not equal
//a[contains(@class, 'btn')]  // Attribute contains
//a[starts-with(@href, '/')]  // Attribute starts with
//a[ends-with(@href, '.pdf')] // Attribute ends with
//p[text()='Hello']           // Exact text match
//p[contains(text(), 'Hello')] // Text contains
//div[span]                   // Has child span
//div[count(p) > 2]           // Has more than 2 p children

String Functions

//p[normalize-space()='Hello']           // Ignore whitespace
//p[string-length() > 0]                 // Non-empty text
//p[substring(., 1, 5)='Hello']          // First 5 chars
//p[substring-before(., ':')='Price']    // Before delimiter
//p[substring-after(., ': ')='$99']      // After delimiter
//p[translate(., 'ABC', 'abc')='hello']  // Case conversion

Logical Operators

//a[@class='x' and @id='y']   // Both conditions
//a[@class='x' or @class='y'] // Either condition
//a[not(contains(@class, 'hidden'))] // Negation
//h1 | //h2                   // Union (combine results)

CSS Cheatsheet

div                    // Element
.class                 // Class
#id                    // ID
div.class              // Element with class
div > p                // Direct child
div p                  // Descendant
div + p                // Adjacent sibling
div ~ p                // General sibling
[href]                 // Has attribute
[href="/page"]         // Attribute equals
[href^="/"]            // Starts with
[href$=".pdf"]         // Ends with
[href*="example"]      // Contains
[class~="btn"]         // Contains word

Common Patterns

Tables

const doc = pluck(html);

// Get all rows
const rows = doc.xpath("//table//tr").toArray();

// Get specific cell (row 2, column 3)
const cell = doc.xpath("//table//tr[2]/td[3]::text").get();

// Get column values
const prices = doc.xpath("//table//tr/td[2]::text").getall();

// Get row by content
const row = doc.xpath("//tr[td[text()='Product A']]");

Lists

// All list items
const items = doc.css("ul li::text").getall();

// Nested lists
const nested = doc.xpath("//ul/li/ul/li::text").getall();

Forms

// Input value
const value = doc.css("input[name='email']::attr(value)").get();

// All form fields
doc.css("form input").each((input) => {
  const name = input.attr("name");
  const value = input.attr("value");
});

// Select options
const options = doc.xpath("//select[@name='country']/option/@value").getall();

Links

// All links
const hrefs = doc.css("a::attr(href)").getall();

// External links
const external = doc.xpath("//a[starts-with(@href, 'http')]/@href").getall();

// Links with specific text
const login = doc.xpath("//a[text()='Login']/@href").get();

Definition Lists

// Get value after specific term
const price = doc.xpath("//dt[text()='Price']/following-sibling::dd[1]::text").get();

Structured Data

// Extract product cards
const products = doc.css(".product").map((p) => ({
  name: p.css(".name::text").get(),
  price: p.css(".price::text").get(),
  url: p.css("a::attr(href)").get(),
})).getall();

CSS vs XPath

| Use CSS When | Use XPath When | |--------------|----------------| | Simple element selection | Text content matching | | Class/ID selection | Attribute contains/starts-with | | Direct children | Sibling navigation | | Attribute presence | Parent/ancestor traversal | | | Position-based selection | | | Complex predicates |

Rule of thumb: Start with CSS, switch to XPath when you need text matching or axis navigation.

Error Handling

// Check before access
const price = doc.css(".price::text");
if (price.ok) {
  console.log(price.get());
}

// Default values
const stock = doc.css(".stock::text").get("In stock");

// Fallback selectors
const title = doc
  .css("h1::text")
  .or(doc.css(".title::text"))
  .or(doc.css("title::text"))
  .get();

// Structured result
const result = doc.css(".price::text").result();
if (result.ok) {
  console.log(result.value, result.count);
} else {
  console.log("Selector failed:", result.selector);
}

// Invalid selectors return ok: false (no exceptions)
const invalid = doc.xpath("//[broken");
invalid.ok;    // false
invalid.count; // 0
invalid.get(); // null

LLM Tips

Patterns that work well for code generation:

// ✅ Good: Explicit extraction
doc.css("h1::text").get()
doc.xpath("//a/@href").get()

// ✅ Good: Check existence
if (doc.css(".error").ok) { ... }

// ✅ Good: Safe defaults  
doc.css(".price::text").get("N/A")

// ✅ Good: Structured extraction
doc.css(".item").map(item => ({
  title: item.css(".title::text").get(),
  link: item.css("a::attr(href)").get(),
})).getall()

// ❌ Avoid: Chaining without checks
doc.css(".maybe-missing").css(".child::text").get()

// ✅ Better: Check at each step
const parent = doc.css(".maybe-missing");
const text = parent.ok ? parent.css(".child::text").get() : null;

License

MIT