npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

webskim

v1.5.0

Published

Context-efficient web search and reading for AI agents. MCP server powered by Jina AI.

Readme

webskim

Context-efficient web search and reading for AI agents. MCP server powered by Jina AI.

Built-in WebFetch dumps entire pages into context. One page = thousands of tokens gone.

webskim saves pages to disk and returns a table of contents. Your agent reads only what it needs.

Prerequisites

webskim uses Jina AI APIs under the hood — you need a Jina API key to use it.

Get your free API key at jina.ai — 1M tokens included, no credit card required.

Quick Start

Claude Code — add to .mcp.json in your project:

{
  "mcpServers": {
    "webskim": {
      "command": "npx",
      "args": ["-y", "webskim"],
      "env": { "JINA_API_KEY": "jina_..." }
    }
  }
}

Tip: Keep your key in a .env file instead of hardcoding it in .mcp.json:

# .env (gitignored)
JINA_API_KEY=jina_...
"env": { "JINA_API_KEY": "${JINA_API_KEY}" }

Then launch Claude Code with the env loaded:

alias c='set -a; source .env 2>/dev/null; set +a; claude'

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "webskim": {
      "command": "npx",
      "args": ["-y", "webskim"],
      "env": { "JINA_API_KEY": "jina_..." }
    }
  }
}

Cursor / Windsurf / other MCP clients — same pattern, point at npx -y webskim with JINA_API_KEY in env.

How It Works

Agent: webskim_search("react server components")
  → 5 results: title, URL, snippet (minimal tokens)

Agent: webskim_read("https://react.dev/reference/rsc/server-components")
  → Saved: .ai_pages/20260220_143052_react_dev__reference__rsc.md
  → Lines: 342 | ~2800 tokens
  → Table of Contents:
      L1:   # Server Components
      L18:  ## Reference
      L45:  ## Usage
      L89:  ### Fetching data
      L156: ### Streaming

Agent: Read(".ai_pages/..._rsc.md", offset=89, limit=67)
  → reads only the section it needs

No full pages in context. No wasted tokens. The agent decides what to read.

Tools

| Tool | What it does | |------|-------------| | webskim_search | Web search → titles, URLs, snippets | | webskim_read | Fetch URL/PDF → save as markdown, return TOC |

webskim_search

| Param | Description | |-------|-------------| | query | Search query | | num_results | 1–10 (default 5) | | site | Restrict to domain, e.g. "python.org" | | country | Locale code, e.g. "US", "PL" | | format | Default markdown. json returns {results:[{i,title,url,snippet,host}]} (preferred for weak models) |

webskim_read

| Param | Description | |-------|-------------| | url | Page or PDF URL | | inline | true returns markdown directly (default false → file path + TOC) | | head_lines | With inline: true, return only the first N lines | | target_selector | CSS — extract only this element | | remove_selector | CSS — drop these elements (overrides default chrome stripper). Empty string "" opts out of the default. | | include_images | Default false. Keep <img> tags. Default off saves 30–70% tokens on news pages. | | links | Default referenced. How to render links: referenced = footer notation, discarded = plain text, inline = full markdown | | max_tokens | Server-side truncation (saves context) |

Defaults note: webskim removes site chrome (nav/footer/aside/ads/cookie banners) before extraction. Override with explicit remove_selector if your target selector is e.g. aside.article-content.

Inline mode

For small pages or "give me the top" lookups you can skip the follow-up Read call and get the markdown back directly. The page is still saved to disk so you can fall back to Read(file, offset, limit) if head_lines truncated.

Agent: webskim_read("https://example.com/short", inline=true)
  → **Title**
    <full markdown content>

Agent: webskim_read("https://big-doc.com", inline=true, head_lines=80)
  → **Title**
    <first 80 lines>
    --- Showing 80/420 lines. Full file: .ai_pages/..._big_doc_com.md

# now decide whether to Read more from the file or move on

head_lines requires inline: true (otherwise the saved file would not match what was returned). Lines are 1-indexed and include the <!-- Source: ... --> header, so they line up with Read tool offsets.

Why webskim?

Context efficiency — pages saved to .ai_pages/ on disk, not dumped into context. Agent reads sections via offset/limit.

Tiny footprint — ~230 tokens per tool definition in system prompt. Minimal overhead vs. built-in alternatives.

Smart search — returns snippets, not full pages. Agent picks which URLs are worth reading.

PDF support — Jina Reader handles PDFs natively. Same API, same workflow.

Server-side token budgetmax_tokens truncates on the server before content reaches your agent.

CSS selectorstarget_selector / remove_selector extract exactly the part of the page you need.

Clean markdown — no HTML soup, no boilerplate, just readable content.

Fast and cheap — search ~2.5s, read ~8s. Jina API costs $0.02/1M tokens.

Make It the Default

The tool descriptions already tell the agent to prefer webskim, but for maximum reliability add this to your project's CLAUDE.md:

## Web Research

Always use webskim MCP tools as the primary choice for all web operations:
- **`webskim_search`** instead of `WebSearch` — returns lightweight snippets (title, URL, description)
- **`webskim_read`** instead of `WebFetch` — saves page to disk as markdown, returns file path + TOC

Workflow: webskim_search → webskim_read URL to disk → Read file with offset/limit.
Use WebSearch/WebFetch only as fallback when webskim tools are unavailable or fail.

Add .ai_pages/ to your .gitignore.

Configuration

| Env var | Required | Default | Description | |---------|----------|---------|-------------| | JINA_API_KEY | yes | — | Jina AI API key. Get one at https://jina.ai. | | WEBSKIM_CACHE_DIR | no | <cwd>/.ai_pages | Directory where webskim_read saves fetched pages. Created on demand. Useful for shared volumes or read-only CWDs. |

Development

git clone <repo-url> && cd webskim
npm install && npm run build
npm test

License

MIT