url-content-extractor-mcp
v1.0.5
Published
MCP server for extracting content from URLs with proper citations
Downloads
18
Maintainers
Readme
URL Content Extractor MCP Server
A Model Context Protocol server that extracts content from URLs and provides properly formatted citations. Perfect for AI assistants that need to access and cite web content.
🚀 Quick Start
Install and Run with uvx/npx
# Run directly with uvx (recommended)
uvx url-content-extractor-mcp
# Or with npx
npx url-content-extractor-mcpInstall Globally
npm install -g url-content-extractor-mcp
url-content-extractor-mcp🔌 MCP Client Configuration
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"url-content-extractor": {
"command": "uvx",
"args": ["url-content-extractor-mcp"]
}
}
}Continue.dev
Add to your MCP configuration:
{
"mcpServers": {
"url-content-extractor": {
"command": "npx",
"args": ["url-content-extractor-mcp"]
}
}
}🛠️ Features
- Multiple URL processing: Extract content from multiple URLs in one call
- Citation formats: APA, MLA, and Simple citation styles
- Smart content extraction: Focuses on main content, removes navigation/ads
- Domain filtering: Allow/block specific domains for security
- Metadata extraction: Title, author, publication date, description
- Error handling: Graceful handling of failed URLs
- TypeScript: Full type safety and modern JavaScript features
📖 Usage
The server provides one tool: extract_url_content
Single URL
Extract content from: https://example.com/articleMultiple URLs
Compare these articles: https://site1.com/news, https://site2.com/blogExample Output
🌐 Web Content Extraction Results
Processed: 1 successful, 0 failed
## 📄 Extracted Content
**📄 Document 1: Breaking News Article**
**Source:** https://example.com/article
**Domain:** example.com
**Author:** Jane Reporter
**Published:** 2024-07-04
**Citation:** Jane Reporter. Breaking News Article. https://example.com/article
**Content:**
[Full article content here...]
## 📖 Citation Summary
1. Jane Reporter. Breaking News Article. https://example.com/article⚙️ Configuration
The server includes sensible defaults but can be customized by modifying the source:
- Max content length: 15,000 characters
- Min content length: 500 characters
- Timeout: 15 seconds
- Max URLs per call: 5
- Citation style: Simple (configurable to APA/MLA)
- Blocked domains: localhost, 127.0.0.1, 0.0.0.0
🔒 Security
- Domain filtering prevents access to local/internal resources
- Request timeouts prevent hanging
- Content length limits prevent memory issues
- No execution of JavaScript from scraped pages
📋 Requirements
- Node.js 18.0.0 or higher
- Internet connection for URL fetching
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
📄 License
MIT License - see LICENSE file for details.
🐛 Issues
Report issues on GitHub Issues.
