web-content-extract-mcp
v1.0.0
Published
MCP server for extracting web content using web-content-extract library
Maintainers
Readme
web-content-extract-mcp
A Model Context Protocol (MCP) server that provides web content extraction capabilities using the web-content-extract library.
🚀 Features
- ✅ Clean Content Extraction: Extract main content from web pages using Mozilla Readability, filtering out ads, navigation, and other non-essential elements
- ✅ SEO Metadata Extraction: Comprehensive SEO metadata extraction including:
- Standard meta tags (title, description, keywords, author)
- Open Graph metadata
- Schema.org itemprop metadata
- rel="author" links
- time tags for publication dates
- ✅ Multiple Output Formats: Support for Markdown, YAML Front Matter, and JSON output
- ✅ MCP Integration: Seamlessly integrate with AI assistants through the Model Context Protocol
- ✅ CLI Access: Can be used directly via npx without installation
📦 Installation
As an MCP Server
To use this as an MCP server, configure it in your MCP settings:
{
"mcpServers": {
"web-extract": {
"command": "npx",
"args": ["web-content-extract-mcp"],
"disabled": false,
"alwaysAllow": [],
"disabledTools": []
}
}
}Direct Installation
npm install web-content-extract-mcp🛠 Usage
Once configured as an MCP server, you can use the extract_web_content tool:
// Extract content (default: with SEO metadata, in Markdown format)
{
"url": "https://example.com"
}
// Extract content without SEO metadata
{
"url": "https://example.com",
"includeSeo": false
}
// Extract content in JSON format
{
"url": "https://example.com",
"format": "json"
}Command Line Usage
You can also use it directly via npx:
npx web-content-extract-mcp📚 API
extract_web_content Tool
Parameters:
url(string, required): The URL of the web page to extract content fromincludeSeo(boolean, optional): Whether to include SEO metadata (default: true)format(enum: "markdown" | "json", optional): Output format (default: "markdown")
Returns:
- Clean extracted content in the specified format
- SEO metadata when requested (default behavior)
🧪 Development
Prerequisites
- Node.js 16+
- npm or yarn
Setup
git clone https://github.com/Amoyens1s/web-content-extract-mcp.git
cd web-content-extract-mcp
npm install
npm run buildBuilding
npm run buildRunning Locally
npm start🤝 Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, and pull requests.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a pull request
📄 License
MIT
