@ketlark/hugin-mcp

v1.0.0

Published

4 days ago

Hugin MCP — 100% local, 100% free MCP server for web search and web reading. No API keys needed. Self-hosted alternative to Tavily, Exa, Jina Reader, Firecrawl, and Brave Search API.

Downloads

173

Two tools. That's all your agent needs.

web_search("rust async tutorial")     → 10 results from 70+ engines
web_read("https://github.com/...")    → clean markdown, zero noise

Hugin runs a local SearXNG metasearch engine and 14 specialized page readers. Named after Odin's raven who scouted the world each morning and came back with answers.

Install

Step 1 — Docker

SearXNG runs in Docker and aggregates 70+ search engines (Google, Bing, DuckDuckGo, Brave, Startpage…). Without it, Hugin falls back to Bing — works, but limited.

# Make sure Docker is running, then:
docker compose up -d

No Docker? Install Docker Desktop first.

Step 2 — MCP client

Pick your client below.

claude mcp add hugin-mcp -- npx -y @ketlark/hugin-mcp@latest

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "hugin-mcp": {
      "command": "npx",
      "args": ["-y", "@ketlark/hugin-mcp@latest"]
    }
  }
}

.cursor/mcp.json at project root:

{
  "mcpServers": {
    "hugin-mcp": {
      "command": "npx",
      "args": ["-y", "@ketlark/hugin-mcp@latest"]
    }
  }
}

Same pattern — point to npx -y @ketlark/hugin-mcp@latest:

{
  "mcpServers": {
    "hugin-mcp": {
      "command": "npx",
      "args": ["-y", "@ketlark/hugin-mcp@latest"]
    }
  }
}

Step 3 — Verify (optional)

npx @ketlark/hugin-mcp setup

This checks Docker, SearXNG, Chrome, and prints a status report.

Tools

`web_search`

{
  "query": "rust async await tutorial",
  "count": 10,
  "engine": "auto",
  "categories": "general",
  "language": "en",
  "time_range": "month"
}

| Parameter | Default | Description | |---|---|---| | query | required | Search query. Supports site:, "exact", -exclude | | count | 10 | Max results (1–20) | | engine | "auto" | "auto", "searxng", or "bing" | | categories | — | general, news, images, videos, it, science, music, files, social media | | language | auto | en, fr, de, es, ja, zh, etc. | | time_range | — | day, month, year | | pageno | 1 | Page number |

Results are cached for 24 hours.

`web_read`

Single URL:

{ "url": "https://github.com/microsoft/typescript/issues/1" }

Batch — read multiple URLs in parallel:

{
  "urls": [
    "https://github.com/owner/repo",
    "https://www.reddit.com/r/programming",
    "https://en.wikipedia.org/wiki/Rust_(programming_language)"
  ]
}

| Parameter | Default | Description | |---|---|---| | url | — | Single URL to read | | urls | — | Multiple URLs for batch parallel reads | | format | "markdown" | "markdown" or "text" | | llm | false | Use ReaderLM-v2 for higher quality (slower, requires LM Studio) | | with_links_summary | false | Extract link summary | | with_images_summary | false | Extract image summary | | max_length | — | Truncate content to N characters |

Content is cached for 24 hours.

Why Hugin

14 specialized readers. Hugin detects the site you're reading and uses a dedicated API instead of generic HTML scraping. GitHub issues → REST API. YouTube → transcript API. Wikipedia → MediaWiki. Cleaner output, faster responses, fewer rate limits.

| Site | Method | Auth? | |---|---|---| | GitHub (issues, PRs, repos, files) | REST API | No (60 req/h) | | Reddit (posts, comments, subreddits) | JSON API | No | | YouTube (transcripts with timestamps) | Innertube API | No | | HackerNews (stories, comments) | Firebase API | No | | StackExchange (300+ Q&A sites) | Public API | No | | Wikipedia | MediaWiki API | No | | ArXiv (papers, abstract) | HTML scraping | No | | MDN Web Docs | index.json API | No | | npm packages | Registry API | No | | Docker Hub (images, tags) | v2 API | No | | PDF files | pdf-parse | — | | Any other page | Readability + Turndown + Puppeteer fallback | — |

SQLite cache. Every search result and page read gets cached for 24 hours. Your agent won't re-fetch the same page twice in a session.

Zero config. Chrome auto-detection across macOS, Linux, Windows (Chrome, Chromium, Brave, Edge). No browser found? Puppeteer features turn off — everything else works.

Graceful fallback chain. SearXNG down? Bing takes over. Readability fails? Puppeteer renders the page. Puppeteer blocked? ReaderLM-v2 can pick it up (if you have LM Studio running).

vs Others

| | Hugin | Tavily | Exa | Jina | Brave | Firecrawl | |---|---|---|---|---|---|---| | Cost | $0 | Freemium | Paid | Freemium | $5/mo | Freemium | | 100% local | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | No API key | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Web search | ✅ 70+ engines | ✅ | ✅ neural | ✅ | ✅ | ✅ | | Page reading | ✅ 14 handlers | ✅ | ✅ | ✅ | ❌ | ✅ | | Batch reads | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | | Cache | ✅ SQLite | ❌ | ❌ | ❌ | ❌ | ❌ | | Data privacy | Stays on your machine | Sent to Tavily | Sent to Exa | Sent to Jina | Sent to Brave | Sent to Firecrawl |

Every competitor sends your queries or page content to a cloud API. Hugin doesn't.

Configuration

Defaults work out of the box. Override with environment variables or a .env file:

| Variable | Default | What it does | |---|---|---| | HUGIN_SEARXNG_URL | http://localhost:8888 | SearXNG instance URL | | HUGIN_SEARXNG_PORT | 8888 | Port for auto-started SearXNG container | | HUGIN_LMSTUDIO_URL | http://localhost:1234 | LM Studio endpoint (ReaderLM) | | HUGIN_READERLM_MODEL | readerlm-v2-mlx | ReaderLM model name | | CHROME_PATH | auto-detected | Chrome/Chromium executable | | HUGIN_PUPPETEER_TIMEOUT | 15000 | Puppeteer timeout (ms) | | HUGIN_CACHE_DIR | .cache | SQLite cache directory | | HUGIN_CACHE_TTL | 86400 | Cache TTL in seconds (24h) |

| Platform | Search paths | |---|---| | macOS | Google Chrome, Chromium, Brave Browser, Microsoft Edge | | Linux | /usr/bin/google-chrome, /usr/bin/chromium, /usr/bin/brave-browser, /snap/bin/chromium | | Windows | C:\Program Files\Google\Chrome\Application\chrome.exe, Brave, Edge |

Set CHROME_PATH to override. No browser found = Puppeteer disabled, everything else works.

For complex pages where Readability struggles (heavy JS, nested layouts):

Install LM Studio
Download ReaderLM-v2 (quantized, e.g. readerlm-v2-q8-mlx for Apple Silicon)
Load the model and start the server
Pass "llm": true in web_read

Without ReaderLM, Hugin works fine via Readability + Turndown for the vast majority of pages.

Troubleshooting

| Symptom | Fix | |---|---| | SearXNG unavailable — Bing fallback | Run docker compose up -d or start Docker Desktop | | Docker is not installed | Install Docker Desktop | | Docker daemon is not running | Start Docker Desktop or run sudo systemctl start docker | | No Chrome/Chromium found | Set CHROME_PATH or install Chrome/Chromium | | better-sqlite3 build fails | Install python3 + build-essential (Linux) or Xcode CLI tools (macOS) | | Cloudflare 403 | Install Chrome for Puppeteer fallback |

Architecture

src/
├── index.js              # MCP entry point
├── setup.js              # Setup command (npx @ketlark/hugin-mcp setup)
├── config.js             # Env vars + platform detection
├── cache.js              # SQLite WAL cache
├── fetcher.js            # HTTP fetch (retry, rate-limit) + Puppeteer
├── html.js               # Readability + Turndown
├── llm.js                # ReaderLM-v2 client
├── format.js             # Response formatters
├── search/
│   ├── searxng.js        # SearXNG client + auto-start + Docker detection
│   └── bing.js           # Bing fallback
└── readers/
    ├── index.js          # URL → reader router
    ├── github.js         # GitHub REST API
    ├── reddit.js         # Reddit JSON API
    ├── youtube.js        # YouTube transcripts
    ├── hackernews.js     # HN Firebase API
    ├── stackexchange.js  # 300+ Q&A sites
    ├── wikipedia.js      # MediaWiki API
    ├── arxiv.js          # ArXiv metadata
    ├── mdn.js            # MDN JSON API
    ├── npm.js            # npm registry
    ├── dockerhub.js      # Docker Hub v2
    └── pdf.js            # PDF text extraction

Reading pipeline:

URL → cache? → specialized reader? → Readability → Turndown → markdown
                     │                     │
                     │                     └→ Puppeteer (SPA/403)
                     └→ ReaderLM (if llm=true)

License

MIT

Hugin depends on SearXNG, Mozilla Readability, Turndown, and Puppeteer.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme