@ketlark/hugin-mcp
v1.0.0
Published
Hugin MCP — 100% local, 100% free MCP server for web search and web reading. No API keys needed. Self-hosted alternative to Tavily, Exa, Jina Reader, Firecrawl, and Brave Search API.
Downloads
173
Maintainers
Readme
Two tools. That's all your agent needs.
web_search("rust async tutorial") → 10 results from 70+ engines
web_read("https://github.com/...") → clean markdown, zero noiseHugin runs a local SearXNG metasearch engine and 14 specialized page readers. Named after Odin's raven who scouted the world each morning and came back with answers.
Install
Step 1 — Docker
SearXNG runs in Docker and aggregates 70+ search engines (Google, Bing, DuckDuckGo, Brave, Startpage…). Without it, Hugin falls back to Bing — works, but limited.
# Make sure Docker is running, then:
docker compose up -dNo Docker? Install Docker Desktop first.
Step 2 — MCP client
Pick your client below.
claude mcp add hugin-mcp -- npx -y @ketlark/hugin-mcp@latestEdit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"hugin-mcp": {
"command": "npx",
"args": ["-y", "@ketlark/hugin-mcp@latest"]
}
}
}.cursor/mcp.json at project root:
{
"mcpServers": {
"hugin-mcp": {
"command": "npx",
"args": ["-y", "@ketlark/hugin-mcp@latest"]
}
}
}Same pattern — point to npx -y @ketlark/hugin-mcp@latest:
{
"mcpServers": {
"hugin-mcp": {
"command": "npx",
"args": ["-y", "@ketlark/hugin-mcp@latest"]
}
}
}Step 3 — Verify (optional)
npx @ketlark/hugin-mcp setupThis checks Docker, SearXNG, Chrome, and prints a status report.
Tools
web_search
{
"query": "rust async await tutorial",
"count": 10,
"engine": "auto",
"categories": "general",
"language": "en",
"time_range": "month"
}| Parameter | Default | Description |
|---|---|---|
| query | required | Search query. Supports site:, "exact", -exclude |
| count | 10 | Max results (1–20) |
| engine | "auto" | "auto", "searxng", or "bing" |
| categories | — | general, news, images, videos, it, science, music, files, social media |
| language | auto | en, fr, de, es, ja, zh, etc. |
| time_range | — | day, month, year |
| pageno | 1 | Page number |
Results are cached for 24 hours.
web_read
Single URL:
{ "url": "https://github.com/microsoft/typescript/issues/1" }Batch — read multiple URLs in parallel:
{
"urls": [
"https://github.com/owner/repo",
"https://www.reddit.com/r/programming",
"https://en.wikipedia.org/wiki/Rust_(programming_language)"
]
}| Parameter | Default | Description |
|---|---|---|
| url | — | Single URL to read |
| urls | — | Multiple URLs for batch parallel reads |
| format | "markdown" | "markdown" or "text" |
| llm | false | Use ReaderLM-v2 for higher quality (slower, requires LM Studio) |
| with_links_summary | false | Extract link summary |
| with_images_summary | false | Extract image summary |
| max_length | — | Truncate content to N characters |
Content is cached for 24 hours.
Why Hugin
14 specialized readers. Hugin detects the site you're reading and uses a dedicated API instead of generic HTML scraping. GitHub issues → REST API. YouTube → transcript API. Wikipedia → MediaWiki. Cleaner output, faster responses, fewer rate limits.
| Site | Method | Auth? | |---|---|---| | GitHub (issues, PRs, repos, files) | REST API | No (60 req/h) | | Reddit (posts, comments, subreddits) | JSON API | No | | YouTube (transcripts with timestamps) | Innertube API | No | | HackerNews (stories, comments) | Firebase API | No | | StackExchange (300+ Q&A sites) | Public API | No | | Wikipedia | MediaWiki API | No | | ArXiv (papers, abstract) | HTML scraping | No | | MDN Web Docs | index.json API | No | | npm packages | Registry API | No | | Docker Hub (images, tags) | v2 API | No | | PDF files | pdf-parse | — | | Any other page | Readability + Turndown + Puppeteer fallback | — |
SQLite cache. Every search result and page read gets cached for 24 hours. Your agent won't re-fetch the same page twice in a session.
Zero config. Chrome auto-detection across macOS, Linux, Windows (Chrome, Chromium, Brave, Edge). No browser found? Puppeteer features turn off — everything else works.
Graceful fallback chain. SearXNG down? Bing takes over. Readability fails? Puppeteer renders the page. Puppeteer blocked? ReaderLM-v2 can pick it up (if you have LM Studio running).
vs Others
| | Hugin | Tavily | Exa | Jina | Brave | Firecrawl | |---|---|---|---|---|---|---| | Cost | $0 | Freemium | Paid | Freemium | $5/mo | Freemium | | 100% local | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | No API key | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Web search | ✅ 70+ engines | ✅ | ✅ neural | ✅ | ✅ | ✅ | | Page reading | ✅ 14 handlers | ✅ | ✅ | ✅ | ❌ | ✅ | | Batch reads | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | | Cache | ✅ SQLite | ❌ | ❌ | ❌ | ❌ | ❌ | | Data privacy | Stays on your machine | Sent to Tavily | Sent to Exa | Sent to Jina | Sent to Brave | Sent to Firecrawl |
Every competitor sends your queries or page content to a cloud API. Hugin doesn't.
Configuration
Defaults work out of the box. Override with environment variables or a .env file:
| Variable | Default | What it does |
|---|---|---|
| HUGIN_SEARXNG_URL | http://localhost:8888 | SearXNG instance URL |
| HUGIN_SEARXNG_PORT | 8888 | Port for auto-started SearXNG container |
| HUGIN_LMSTUDIO_URL | http://localhost:1234 | LM Studio endpoint (ReaderLM) |
| HUGIN_READERLM_MODEL | readerlm-v2-mlx | ReaderLM model name |
| CHROME_PATH | auto-detected | Chrome/Chromium executable |
| HUGIN_PUPPETEER_TIMEOUT | 15000 | Puppeteer timeout (ms) |
| HUGIN_CACHE_DIR | .cache | SQLite cache directory |
| HUGIN_CACHE_TTL | 86400 | Cache TTL in seconds (24h) |
| Platform | Search paths |
|---|---|
| macOS | Google Chrome, Chromium, Brave Browser, Microsoft Edge |
| Linux | /usr/bin/google-chrome, /usr/bin/chromium, /usr/bin/brave-browser, /snap/bin/chromium |
| Windows | C:\Program Files\Google\Chrome\Application\chrome.exe, Brave, Edge |
Set CHROME_PATH to override. No browser found = Puppeteer disabled, everything else works.
For complex pages where Readability struggles (heavy JS, nested layouts):
- Install LM Studio
- Download
ReaderLM-v2(quantized, e.g.readerlm-v2-q8-mlxfor Apple Silicon) - Load the model and start the server
- Pass
"llm": trueinweb_read
Without ReaderLM, Hugin works fine via Readability + Turndown for the vast majority of pages.
Troubleshooting
| Symptom | Fix |
|---|---|
| SearXNG unavailable — Bing fallback | Run docker compose up -d or start Docker Desktop |
| Docker is not installed | Install Docker Desktop |
| Docker daemon is not running | Start Docker Desktop or run sudo systemctl start docker |
| No Chrome/Chromium found | Set CHROME_PATH or install Chrome/Chromium |
| better-sqlite3 build fails | Install python3 + build-essential (Linux) or Xcode CLI tools (macOS) |
| Cloudflare 403 | Install Chrome for Puppeteer fallback |
Architecture
src/
├── index.js # MCP entry point
├── setup.js # Setup command (npx @ketlark/hugin-mcp setup)
├── config.js # Env vars + platform detection
├── cache.js # SQLite WAL cache
├── fetcher.js # HTTP fetch (retry, rate-limit) + Puppeteer
├── html.js # Readability + Turndown
├── llm.js # ReaderLM-v2 client
├── format.js # Response formatters
├── search/
│ ├── searxng.js # SearXNG client + auto-start + Docker detection
│ └── bing.js # Bing fallback
└── readers/
├── index.js # URL → reader router
├── github.js # GitHub REST API
├── reddit.js # Reddit JSON API
├── youtube.js # YouTube transcripts
├── hackernews.js # HN Firebase API
├── stackexchange.js # 300+ Q&A sites
├── wikipedia.js # MediaWiki API
├── arxiv.js # ArXiv metadata
├── mdn.js # MDN JSON API
├── npm.js # npm registry
├── dockerhub.js # Docker Hub v2
└── pdf.js # PDF text extractionReading pipeline:
URL → cache? → specialized reader? → Readability → Turndown → markdown
│ │
│ └→ Puppeteer (SPA/403)
└→ ReaderLM (if llm=true)License
Hugin depends on SearXNG, Mozilla Readability, Turndown, and Puppeteer.
