npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

marklift

v0.1.3

Published

URL → Clean Markdown SDK (agent-optimized)

Readme

Marklift

URL → Clean Markdown — Fetch a webpage, extract the main content, and convert it to LLM-friendly Markdown. Built for agents and pipelines.

  • Fetches HTTP(S) URLs with configurable timeout and headers
  • Source types: website, twitter (Nitter), reddit — inferred from URL when not specified. Medium adapter is removed for now.
  • Extracts article content with Mozilla Readability (or raw body)
  • Converts to Markdown with Turndown and custom rules
  • Optimizes for agents: normalizes spacing, dedupes links, strips tracking params, optional chunking
  • Typed API and CLI

Requirements: Node.js 18+


The Web Is Not LLM-Ready — Raw HTML is noisy, heavy, tracking junk, inconsistent, and expensive in tokens.

Install

npm install marklift

Usage

Programmatic

import { urlToMarkdown } from "marklift";

// source is inferred from URL when omitted (twitter/x.com → twitter, reddit → reddit, else website)
const result = await urlToMarkdown("https://example.com/article", {
  timeout: 10_000,
});
const tweet = await urlToMarkdown("https://x.com/user/status/123"); // uses twitter adapter

console.log(result.title);
console.log(result.markdown);
console.log(result.wordCount, result.sections.length, result.links.length);

CLI

# Install globally to get the `marklift` command
npm install -g marklift

# Convert a URL to Markdown (prints to stdout). Source is inferred from URL.
marklift https://example.com
marklift https://x.com/user/status/123   # uses twitter adapter
marklift https://reddit.com/r/...         # uses reddit adapter

# Output full result as JSON
marklift https://example.com --json

# Options
marklift https://example.com --timeout 15000
marklift https://example.com --chunk-size 2000
marklift https://example.com --source website   # override inferred source

CLI options:

| Option | Description | | ------------------------------------- | ------------------------------------------------------------------ | | --source <website\|twitter\|reddit> | Source adapter (default: inferred from URL). Override when needed. | | --timeout <ms> | Request timeout in milliseconds (default: 15000) | | --chunk-size <n> | Split markdown into chunks of ~n characters | | --json | Output full result as JSON instead of markdown |


API

urlToMarkdown(url, options?)

Converts a URL to clean Markdown. Returns a Promise<MarkdownResult>.

Options:

| Option | Type | Description | | ----------- | ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | | source | "website" \| "twitter" \| "reddit" | Source adapter. Default: inferred from URL (twitter.com/x.com/nitter → twitter, reddit.com → reddit, else website). Override to force a specific adapter. | | timeout | number | Request timeout in ms (default: 15000) | | headers | Record<string, string> | Custom HTTP headers (e.g. User-Agent) | | chunkSize | number | If set, result.chunks will contain token-safe chunks |

Result (MarkdownResult):

  • url — Original URL
  • title — Page title
  • description — Meta description (if present)
  • markdown — Full markdown with source-specific frontmatter (see below) + body
  • sections{ heading, content }[] by heading (stable order)
  • links — Deduplicated links, sorted (tracking params stripped)
  • wordCount — Approximate word count
  • contentHash — SHA-256 of optimized markdown (stability checks)
  • metadata? — Structured metadata (OG, canonical, author, publishedAt, image, language)
  • chunks? — When chunkSize is set: { content, index, total }[] (no split inside code blocks or tables)

urlToMarkdownStream(url, options?)

Async generator that yields MarkdownChunk (meta, sections, links) as they are produced. Useful for streaming into an LLM or pipeline.

Markdown format (per source)

Each adapter outputs markdown with a frontmatter block (------) then the body.

Website (and reddit). Format type: website. Medium not supported currently.

---
source: https://example.com/article
canonical: https://example.com/article
title: Example Article Title
description: Short meta description
author: John Doe
published_at: 2025-01-12
language: en
content_hash: <sha256>
word_count: 1243
---
# Title

Body content…

Twitter:

---
platform: twitter
source: https://twitter.com/username/status/1234567890
tweet_id: 1234567890
author:
  name: Author Name
published_at: 2025-01-10T18:22:00Z
language: en
content_hash: <sha256>
---
Body content…

Errors

  • InvalidUrlError — Invalid or non-HTTP(S) URL
  • FetchError — Network error, timeout, or non-2xx response
  • ParseError — Readability or parsing failure

Production: Website and reddit adapters use a browser-like User-Agent by default so requests from servers/datacenters get full HTML. The Twitter adapter keeps the Marklift User-Agent so Nitter works. Override via headers if needed.


Example

import { urlToMarkdown, urlToMarkdownStream } from "marklift";

// One-shot (source inferred from URL)
const result = await urlToMarkdown("https://blog.example.com/post", {
  timeout: 10_000,
  chunkSize: 2000,
});
console.log(result.title, result.wordCount);
if (result.chunks) {
  for (const chunk of result.chunks) {
    // Send chunk to LLM, etc.
  }
}

// Streaming
for await (const chunk of urlToMarkdownStream(
  "https://blog.example.com/post"
)) {
  process.stdout.write(chunk.content);
}

Testing

npm test          # unit + E2E (E2E needs network)
npm run test:unit # unit only (no network)
npm run test:e2e  # E2E with real URLs only

Set SKIP_E2E=1 to skip E2E tests (e.g. in CI without network).


Contributing

Contributions are welcome. See CONTRIBUTING.md for setup, code style, and how to submit changes.


License

MIT