npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

scmcp

v1.0.0

Published

Privacy-first document conversion & research MCP server — convert any document to Markdown and back, entirely offline

Readme

ScMCP — Privacy-First Document Conversion & Research MCP Server

Convert any document to Markdown (so models can read it) and Markdown back to human formats (PDF, DOCX, HTML) — 100% offline, zero telemetry, zero API calls.

Installation

Option 1: Install from npm (recommended)

npm install -g scmcp

Option 2: Use directly with npx (no install needed)

npx -y scmcp

Option 3: Build from source

git clone https://github.com/microsoft/scmcp.git
cd scmcp
npm install
npm run build

Requirements

  • Node.js 18 or later
  • npm 8 or later
  • GitHub Copilot CLI (for MCP integration) — install guide

Setting Up with Copilot CLI

Step 1: Launch Copilot CLI

copilot

Step 2: Add ScMCP as an MCP server

Inside the Copilot CLI interactive session, type:

/mcp

Then select "Add new MCP server" and fill in:

| Field | Value | |-------|-------| | Server Name | scmcp | | Server Type | 2 (STDIO) | | Command | npx -y scmcp | | Environment Variables | (leave empty) | | Tools | * |

If installed globally or built from source, use this command instead:

  • Global: scmcp
  • From source: node C:\path\to\ScMCP\dist\index.js

Step 3: Verify

Run /mcp again — you should see:

MCP Server: scmcp
 Status: ✓ Connected
 Tools: 17 tools available

That's it! You can now ask Copilot to convert documents, scrape web pages, summarize text, and more.

What It Does

Agents speak Markdown. Humans use PDFs, Word docs, spreadsheets, presentations. ScMCP bridges that gap with 17 tools:

  • 📄 PDF → Markdown — chunked/paginated for large documents
  • 📝 DOCX → Markdown — handles images, tables, headings, lists
  • 📊 XLSX → Markdown — Excel sheets to Markdown tables
  • 📑 PPTX → Markdown — slide text + speaker notes
  • 🌐 HTML ↔ Markdown — bidirectional conversion
  • 📋 CSV ↔ JSON — data format interchange
  • 📖 RTF → Markdown — Rich Text Format support
  • 📚 EPUB → Markdown — with chapter selection
  • 🔤 OCR — image to text (tesseract.js, fully offline)
  • 🖼️ Image conversion — PNG, JPG, WebP, GIF, TIFF (sharp)
  • 📤 Markdown → PDF — styled PDF generation (puppeteer)
  • 📤 Markdown → DOCX — Word document generation
  • 🔍 Web scraping — URL to clean text/markdown/HTML
  • 📝 Summarization — extractive summarization (TF-IDF, fully local)
  • 📚 Citations — APA, MLA, Chicago style management

Privacy

| What | Network? | Details | |------|----------|---------| | All document conversions | ❌ None | Pure local file parsing | | OCR | ❌ None | WASM engine, no network | | Summarization | ❌ None | Local TF-IDF algorithm | | Image conversion | ❌ None | Pre-built native bindings | | Web scrape | ✅ User-initiated only | Only fetches URLs you provide |

No telemetry. No analytics. No phoning home. Ever.

Tools Reference

Document → Markdown

| Tool | Input | Output | |------|-------|--------| | pdf_to_md | { input_path, page_start?, page_end?, max_chars? } | Chunked Markdown + metadata | | docx_to_md | { input_path, output_path? } | Markdown content or file | | xlsx_to_md | { input_path, sheet_name?, output_path? } | Markdown tables | | pptx_to_md | { input_path, output_path? } | Markdown with slide text + notes | | html_to_md | { input_path \| html_string, output_path? } | Markdown content or file | | csv_to_json | { input_path, output_path? } | JSON array | | rtf_to_md | { input_path, output_path? } | Markdown content or file | | epub_to_md | { input_path, chapter?, output_path? } | Markdown with chapter support | | ocr_image | { input_path, language? } | Extracted text |

Markdown → Human Formats

| Tool | Input | Output | |------|-------|--------| | md_to_pdf | { input_path \| md_string, output_path?, css_path? } | Styled PDF | | md_to_html | { input_path \| md_string, output_path? } | HTML content or file | | md_to_docx | { input_path \| md_string, output_path? } | Word DOCX | | json_to_csv | { input_path, output_path? } | CSV content or file |

Research Tools

| Tool | Input | Output | |------|-------|--------| | web_scrape | { url, format: "text" \| "markdown" \| "html" } | Clean web content | | summarize_text | { text, sentence_count? } | Summary | | manage_citations | { action, format?, ...data } | Formatted citations |

Image Tools

| Tool | Input | Output | |------|-------|--------| | convert_image | { input_path, output_format, width?, height?, quality? } | Converted image file |

Large Document Handling

All document-to-text tools support chunked responses:

{
  "content": "extracted text...",
  "metadata": {
    "total_pages": 200,
    "page_start": 1,
    "page_end": 25,
    "has_more": true,
    "total_chars": 500000,
    "chunk_chars": 50000
  }
}
  • Default chunk: 50,000 characters
  • Request specific page ranges with page_start / page_end
  • Use output_path to write the full document without chunking

Usage Examples

Convert a PDF so the agent can read it:

"Read the PDF at C:\docs\report.pdf and summarize it"

Create a Word doc from Markdown:

"Convert my notes.md file to a DOCX"

Scrape a webpage:

"Scrape https://example.com and give me the content as markdown"

Extract text from an image:

"OCR this screenshot at C:\images\whiteboard.png"

Generate a styled PDF:

"Take this markdown and create a PDF with nice formatting"

Development

git clone https://github.com/microsoft/scmcp.git
cd scmcp
npm install
npm run build    # Compile TypeScript
npm run dev      # Watch mode with tsx
npm start        # Run the server

License

MIT