npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

Iโ€™ve always been into building performant and accessible sites, but lately Iโ€™ve been taking it extremely seriously. So much so that Iโ€™ve been building a tool to help me optimize and monitor the sites that I build to make sure that Iโ€™m making an attempt to offer the best experience to those who visit them. If youโ€™re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, ๐Ÿ‘‹, Iโ€™m Ryan Hefnerย  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If youโ€™re interested in other things Iโ€™m working on, follow me on Twitter or check out the open source projects Iโ€™ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soonโ€“ish.

Open Software & Tools

This site wouldnโ€™t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you ๐Ÿ™

ยฉ 2026 โ€“ย Pkg Stats / Ryan Hefner

docshark

v0.1.20

Published

๐Ÿฆˆ Documentation MCP Server โ€” scrape, index, and search any doc website

Readme

๐Ÿฆˆ DocShark

Built with Bun NPM Version MCP Compatible GitHub Release License: MIT

DocShark is a powerful MCP (Model Context Protocol) server designed to scrape, index, and search any documentation website. It creates a local, highly-searchable knowledge base from public documentation pages using FTS5 (Full-Text Search) and BM25 ranking, allowing AI assistants to query the latest docs effortlessly.


๐Ÿš€ Features

  • Automated Crawling: Discovers pages via sitemap.xml with fallback to BFS link crawling.
  • Smart Extraction: Uses Readability and Turndown to extract main content and convert it to clean Markdown, filtering out navbars and sidebars.
  • Semantic Chunking: Splits content based on headings, preserving contextual headers for better AI understanding.
  • High-Performance Search: Built-in SQLite + FTS5 indexing with BM25 ranking for accurate and lightning-fast search results.
  • JS-Rendered Site Support: Tiered fetching strategy automatically detects React/Vue SPAs (empty shells) and upgrades to puppeteer-core if you have it installed (zero-config, auto-fallback).
  • Polite Crawling: Respects robots.txt and implements rate limiting to prevent overloading documentation servers.
  • Standard MCP Tooling: Connect perfectly with Desktop Claude, VS Code, Cursor, and any other MCP-compatible clients via standard stdio or http/sse transports.

๐Ÿ“ฆ What We Have Done (Phase 1)

Phase 1: Core Engine is fully implemented and tested.

  • โœ… Custom SQLite Database with FTS5 virtual tables and auto-sync triggers.
  • โœ… Web scraping engine supporting standard fetch() and puppeteer-core.
  • โœ… Markdown processor utilizing Readability + Turndown.
  • โœ… Heading-based semantic chunker (500-1200 tokens per chunk).
  • โœ… Asynchronous job manager and queue system.
  • โœ… Complete HTTP API (REST endpoints + SSE event streams).
  • โœ… Seamless integration of 4 MCP tools: manage_library, search_docs, list_libraries, and get_doc_page.
  • โœ… Robust CLI interface (start, add, rename, search, list).

๐Ÿ—๏ธ What We Are Doing

We are actively polishing the integration between the core engine and external MCP clients (like VS Code Agents and Claude Desktop).

๐Ÿ”ฎ What We Plan To Do (Phase 2 & Beyond)

  • Web Dashboard: An intuitive SvelteKit dashboard to manage your synced libraries, view crawl progress in real-time (via SSE), and test searches manually.
  • Incremental Crawling: Smarter refresh jobs that compare ETag and Last-Modified headers to only re-scrape updated pages.
  • Vector Search (RAG): Integration of lightweight vector embeddings for semantic similarity search alongside the existing FTS5 keyword search.
  • Advanced Scraping Setup: Support for custom CSS selectors to define exactly where content lives in non-standard documentation websites.

๐Ÿ› ๏ธ Usage

Quick Start (from npm)

You can run DocShark directly without installing it globally using bunx:

# Add a documentation library to the index
bunx docshark add https://valibot.dev/guides/ --depth 2

# Search your indexed docs
bunx docshark search "schema validation"

Installation

To install DocShark globally as a CLI tool:

DocShark is intended to be installed and run with Bun.

# Global Bun installation
bun add -g docshark

After installation, you can use the docshark command:

docshark list

# Update the global Bun installation when a new release is published
docshark update

# Script-friendly update check
docshark update --check --quiet

Interactive CLI runs will also let you know when a newer version is available. Update notices are intentionally skipped for MCP stdio mode so they never interfere with protocol output.

For scripts, docshark update --check exits 0 when current, 10 when a newer version is available, and 1 when the version check could not be completed.

๐Ÿ”Œ MCP Integration

VS Code (GitHub Copilot / MCP Extension)

Add DocShark to your .vscode/settings.json or global MCP configuration:

{
  "mcpServers": {
    "docshark": {
      "command": "bunx",
      "args": ["-y", "docshark", "start", "--stdio"]
    }
  }
}

Cursor

  1. Open Cursor Settings > Models > MCP.
  2. Click + Add New MCP Server.
  3. Name: docshark
  4. Type: command
  5. Command: bunx -y docshark start --stdio

Claude Desktop

Edit your Claude Desktop configuration file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "docshark": {
      "command": "bunx",
      "args": ["-y", "docshark", "start", "--stdio"]
    }
  }
}

๐Ÿ› ๏ธ Development

Local Setup

Ensure you have Bun installed.

# Clone the repository
git clone https://github.com/Michael-Obele/docshark.git
cd docshark

# Install dependencies
bun install

# (Optional) Enable auto-detection & scraping of Javascript React/Vue single-page apps
bun add puppeteer-core

# Start the DocShark MCP server in HTTP mode for local testing
bun run src/cli.ts start --port 6380

Local CLI Debugging

# Run CLI directly while developing
bun run src/cli.ts list

๐Ÿ”„ Versioning & Changelog

This project uses Google's Release Please to automate versioning and changelog generation.

  • Semantic Versioning: Our versions automatically bump (e.g. 0.0.1 -> 0.0.2 or 0.1.0) based on standard Conventional Commits (feat:, fix:, chore:, etc.).
  • Automated: A PR is automatically created on master when standard commits are merged, generating a standard CHANGELOG.md.

๐Ÿ“œ License

This project is open-source and available under the MIT License.


Built to empower AI agents with the latest knowledge.