mcp-docs-extractor
v1.0.0
Published
MCP server for extracting documentation from web links
Readme
MCP Docs Extractor
A tool that extracts and summarizes documentation from web links for AI consumption.
Features
- Extract and summarize documentation from web URLs
- Intelligently crawl related pages within the same domain for comprehensive documentation
- Convert web content into AI-optimized markdown
- Remove unnecessary content like ads, navigation menus, etc.
- Produce concise, well-structured documentation
- Focus on relevant information based on user query
Installation
# Install dependencies
pnpm install
# Build the project
pnpm build
# Install the server locally
pnpm install-serverIf you haven't globally defined your OPENAI_API_KEY and FIRECRAWL_API_KEY, you'll need to open the MCP config file and update the keys.
OPENAI_API_KEY=your_openai_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_keyUsage
This tool is designed to be used with Claude or other AI systems that support MCP.
Basic Usage
In Claude, you can extract documentation by calling:
{{mcp_docs-extractor_get-documentation}}With the parameters:
{
"links": ["https://example.com/docs"]
}Advanced Options
You can also specify a focus for the documentation:
{
"links": ["https://example.com/docs"],
"documentationFocus": "API endpoints"
}To include the reasoning process in the result:
{
"links": ["https://example.com/docs"],
"includeReasoning": true
}How It Works
The tool uses:
- FireCrawl to scrape web content
- OpenAI's GPT-4.1 to format and optimize the content
- MCP to integrate with Claude and other AI systems
When called, the tool:
- Receives links to documentation
- Uses FireCrawl to retrieve content from those links
- Intelligently discovers and crawls related pages within the same domain to gather comprehensive documentation
- Processes the content through GPT-4.1 to extract and format relevant information
- Returns well-structured documentation in markdown format
License
MIT
