@salesforcebob/sf-docs-mcp-server
v1.3.4
Published
MCP server for scraping Salesforce developer documentation and converting to Markdown. Use with Cursor, Claude Desktop, or any MCP client. Deploy to Heroku with one click.
Maintainers
Readme
SF Docs MCP Server
An MCP (Model Context Protocol) server for scraping Salesforce developer documentation and converting it to Markdown. Integrates with Cursor, Claude Desktop, and other MCP-compatible AI assistants. Deploy locally or to Heroku with one click.
What You Get
- 🔍 Smart page analysis - Automatically detects optimal extraction strategy for any Salesforce doc page
- 🕸️ Shadow DOM traversal - Handles React components and deeply nested shadow DOMs
- 📄 Multiple page types - Supports guide, reference, API reference, type definitions, and landing pages
- 🎯 Dynamic selectors - Fall back to custom selectors when automatic extraction fails
- 📝 Clean Markdown output - Converts HTML to GFM-compatible Markdown with tables
- 🚀 Heroku ready - One-click deploy for remote/hosted access
Table of Contents
- Prerequisites
- Install
- Run via npx
- Using with Cursor
- Using with Claude Desktop
- Running Remotely (Heroku)
- Available Tools
- Things You Can Ask
- How It Works
- Agent Usage Guide
- Batch Scraping
- Troubleshooting
- Dependencies
- Disclaimer
- License
Prerequisites
- Node.js >= 18.0.0
- Chrome/Chromium (installed automatically by Puppeteer)
Install
npm install -g @salesforcebob/sf-docs-mcp-serverOr use directly with npx (no installation required):
npx @salesforcebob/sf-docs-mcp-serverRun via npx
npx @salesforcebob/sf-docs-mcp-serverThis starts an MCP stdio server. Use it with MCP-compatible clients like Cursor or Claude Desktop.
Using with Cursor
- Open Cursor settings → MCP/Servers
- Add a new stdio server:
{
"mcpServers": {
"sf-docs": {
"command": "npx",
"args": ["-y", "@salesforcebob/sf-docs-mcp-server"]
}
}
}Or add to your Cursor MCP configuration file (~/.cursor/mcp.json).
- Save and reload tools. You should see:
scrape_sf_docsanalyze_page_structure
Using with Claude Desktop
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"sf-docs": {
"command": "npx",
"args": ["-y", "@salesforcebob/sf-docs-mcp-server"]
}
}
}Running Remotely (Heroku)
This server includes an Express HTTP transport for remote deployment.
One-Click Deploy to Heroku
After clicking Deploy:
- Choose an app name
- Deploy the app
- Verify endpoints:
GET /health→{ ok: true }GET /docs→ Documentation JSONPOST /mcp→ MCP HTTP endpoint
Local HTTP (for testing)
npm run serve
# or
npx @salesforcebob/sf-docs-mcp-server serveEndpoints:
GET http://localhost:3000/health→ Health checkGET http://localhost:3000/docs→ DocumentationPOST http://localhost:3000/mcp→ MCP HTTP endpoint
Using with HTTP-capable MCP Clients
Point your client at <your-app-url>/mcp as the MCP HTTP endpoint.
Available Tools
scrape_sf_docs
Scrape a Salesforce documentation page and return the content as Markdown.
Input:
url(string, required): The Salesforce documentation URL to scrapeselector(string, optional): CSS selector for content container (light DOM only)shadowPath(string[], optional): Array of selectors to traverse shadow DOM boundaries
Examples:
// Basic usage (automatic detection)
{
"url": "https://developer.salesforce.com/docs/einstein/genai/guide/get-started.html"
}
// With shadow path for nested shadow DOM
{
"url": "https://developer.salesforce.com/docs/commerce/einstein-api/references/einstein-profile-connector?meta=type:ClientIdParam",
"shadowPath": ["doc-amf-reference", "doc-amf-topic", "api-type-documentation"]
}analyze_page_structure
Analyze the DOM structure of a Salesforce documentation page to determine the best extraction approach. Use this first when the default scraper fails or returns empty content.
Input:
url(string, required): The Salesforce documentation URL to analyze
Output:
- Detected page type
- List of custom elements found
- Elements with shadow DOM
- Content containers with suggested selectors/shadow paths
- Suggested extraction approach
- DOM tree snapshot for debugging
Things You Can Ask
Here are examples of what you can ask your AI assistant:
- "Get the Agentforce getting started documentation"
- "Scrape the Models API reference page"
- "Extract the GraphQL Send Query endpoint documentation"
- "Analyze the page structure of this Commerce Cloud API page"
- "Get all the type definitions from the Einstein Profile Connector API"
- "Show me the Agent Script language reference"
Quick JSON examples:
Scrape a guide page:
{
"tool": "scrape_sf_docs",
"input": {
"url": "https://developer.salesforce.com/docs/einstein/genai/guide/agent-script.html"
}
}Analyze a failing page:
{
"tool": "analyze_page_structure",
"input": {
"url": "https://developer.salesforce.com/docs/commerce/einstein-api/references/einstein-profile-connector?meta=type:CookieIdParam"
}
}How It Works
The Salesforce developer docs use a React-based architecture with nested shadow DOM components. This server handles multiple page structures:
Supported Page Types
| Type | URL Pattern | Description |
|------|-------------|-------------|
| guide | /guide/* | Guide/tutorial pages |
| reference | /references/* with markdown | Reference pages with markdown content |
| api-reference | /references/*?meta=Summary | API summary pages |
| api-type | /references/*?meta=type:* | Type definition pages |
| api-method | /references/*?meta=* | Method/endpoint pages |
| overview | Landing pages | Overview/landing pages |
Custom Elements Handled
doc-heading- Headings with nested shadow DOMdoc-content-callout- Tips, notes, warningsdx-code-block- Code snippets with syntax highlightingapi-summary- API overview pagesapi-type-documentation- Type definition pagesapi-method-documentation- Method/endpoint pagesdx-group-text- Landing page content
Agent Usage Guide
For detailed instructions on how AI agents should use these tools, see AGENT_GUIDE.md.
Batch Scraping (Optional)
For batch scraping multiple pages at once, you can use the included scraper script:
// Edit the urls array in scraper.js
const urls = [
'https://developer.salesforce.com/docs/einstein/genai/guide/get-started.html',
// Add more URLs here
];Then run:
npm run scrapeTroubleshooting
| Problem | Solution |
|---------|----------|
| Empty content with pageType: "fallback" | Use analyze_page_structure to find the right extraction method |
| Shadow path not working | Check the DOM snapshot for the correct element names |
| Content looks incomplete | Try a different shadowPath or selector |
| "Could not find element" error | The shadow path is incorrect - re-analyze the page |
| Puppeteer/Chrome issues | Ensure Chrome is installed or set PUPPETEER_EXECUTABLE_PATH |
Dependencies
- Puppeteer - Headless browser automation
- Turndown - HTML to Markdown conversion
- turndown-plugin-gfm - GFM table support
- Express - HTTP server for remote deployment
- @modelcontextprotocol/sdk - MCP server implementation
Disclaimer
- This repository and MCP server are provided "as is" without warranties or guarantees of any kind, express or implied, including but not limited to functionality, security, merchantability, or fitness for a particular purpose.
- Use at your own risk. Review the source, perform a security assessment, and harden before any production deployment.
- Do not expose the HTTP endpoints publicly without proper authentication/authorization, rate limiting, logging, and monitoring.
- This tool scrapes publicly available Salesforce documentation. Ensure your usage complies with Salesforce's terms of service.
- You are solely responsible for the protection of your data and compliance with your organization's security policies.
License
MIT
