npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@evointel/anno

v1.0.1

Published

Web content extraction for AI agents — ensemble extraction with confidence scoring, 93% token reduction vs raw HTML

Downloads

207

Readme

Anno

CI npm License: MIT Coverage

Web content extraction for AI agents. 93% fewer tokens than raw HTML.

Anno fetches web pages, runs an ensemble of extraction methods with confidence scoring, and returns clean structured text — so your AI agent spends tokens on content, not markup. Available via HTTP API, CLI, and MCP.

Benchmark (N=20)

Tested across news sites, documentation, Wikipedia, Stack Overflow, blogs, and data-heavy pages:

| Page Type | Example | Raw HTML | Anno | Reduction | |-----------|---------|----------|------|-----------| | News | bbc.com/news | 86,399 tok | 806 tok | 99.1% | | Docs | developer.mozilla.org | 54,682 tok | 1,925 tok | 96.5% | | Wiki | en.wikipedia.org/wiki/AI | 303,453 tok | 2,806 tok | 99.1% | | Forum | stackoverflow.com | 287,846 tok | 1,661 tok | 99.4% | | Blog | martinfowler.com | 21,510 tok | 2,647 tok | 87.7% | | Tables | wikipedia.org (browser comparison) | 291,843 tok | 792 tok | 99.7% | | Minimal | sqlite.org | 5,306 tok | 2,890 tok | 45.5% |

Average: 92.7% reduction. Overall: 98.2% (1.56M → 28.5K tokens across 20 pages)

Reproduce it yourself: npx tsx bench/run.ts

Quick Start

npm install --legacy-peer-deps
npm run build
npm start
# Server running at http://localhost:5213

Fetch a page

curl -X POST http://localhost:5213/v1/content/fetch \
  -H "Content-Type: application/json" \
  -d '{"url": "https://en.wikipedia.org/wiki/TypeScript"}'

Fetch with JavaScript rendering

For SPAs and dynamic sites, enable Playwright:

curl -X POST http://localhost:5213/v1/content/fetch \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "options": {"render": true}}'

Batch fetch

curl -X POST http://localhost:5213/v1/content/batch-fetch \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com", "https://news.ycombinator.com"]}'

How It Works

URL → Fetch → Ensemble Extraction → Confidence Scoring → Structured Output
              ├─ Readability
              ├─ Ollama LLM (optional)
              └─ DOM heuristic

Anno runs multiple extraction methods in parallel, scores each result for quality, and returns the best one. This ensemble approach handles everything from clean blog posts to messy e-commerce pages.

MCP Integration (Claude Code, Cursor, etc.)

Anno exposes itself as an MCP server. Any AI tool that supports MCP can use Anno natively.

Setup

  1. Start Anno: npm start
  2. Add to ~/.claude/.mcp.json (global) or .mcp.json (per-project):
{
  "mcpServers": {
    "anno": {
      "command": "node",
      "args": ["/path/to/anno/dist/mcp/server.js"],
      "env": {
        "ANNO_BASE_URL": "http://localhost:5213"
      }
    }
  }
}

MCP Tools

| Tool | Description | |------|-------------| | anno_fetch | Extract content from a single URL | | anno_batch_fetch | Parallel extraction from multiple URLs (up to 10) | | anno_crawl | Crawl a website with depth/page limits | | anno_health | Check server status |

CLI

npx anno start --port 5213
npx anno fetch https://example.com
npx anno crawl https://example.com --depth 2 --max-pages 10
npx anno health

Docker

docker build -t anno .
docker run -p 5213:5213 anno

API

| Method | Path | Description | |--------|------|-------------| | POST | /v1/content/fetch | Extract content from a URL | | POST | /v1/content/batch-fetch | Batch extract from multiple URLs | | POST | /v1/crawl | Start a crawl job | | GET | /v1/crawl/:id | Check crawl job status | | GET | /v1/crawl/:id/results | Get crawl results | | GET | /health | Server health check | | GET | /metrics | Prometheus metrics |

Configuration

| Variable | Default | Description | |----------|---------|-------------| | PORT | 5213 | Server port | | RENDERING_ENABLED | true | Playwright browser rendering | | REDIS_ENABLED | false | Redis caching (LRU fallback when off) | | AI_LLM_PROVIDER | none | LLM provider for AI-assisted extraction | | RESPECT_ROBOTS | true | Respect robots.txt | | RENDER_STEALTH | true | Stealth mode for browser rendering |

When NOT to Use Anno

  • Static text files — If the source is already clean text or JSON, Anno adds overhead for no gain (see SQLite at 45.5% — already minimal HTML)
  • Authenticated pages — Anno doesn't handle login flows (yet). Use a session cookie or authenticated proxy
  • Real-time streaming — Anno extracts on-demand, not as a live stream

Development

npm run dev      # Hot-reload
npm run lint     # ESLint
npm run build    # Compile TypeScript
npm test         # Lint + Vitest (1,958 tests)

Architecture

See ARCHITECTURE.md for system design.

License

MIT — Evolving Intelligence AI