@siddhantxh/github-extractor-mcp
v1.0.0
Published
MCP server for extracting code from GitHub repositories and Google Colab notebooks
Maintainers
Readme
GitHub Extractor MCP
A Model Context Protocol (MCP) server that extracts code and content from GitHub repositories and Google Colab notebooks for LLM ingestion.
🚀 Quick Start
Installation
npm install -g @siddhantxh/github-extractor-mcpSetup in AI Tools
Cursor IDE
- Open Cursor Settings → AI → Model Context Protocol
- Add new server:
{
"name": "github-extractor",
"command": "npx",
"args": ["-y", "@siddhantxh/github-extractor-mcp"],
"env": {
"GITHUB_TOKEN": "your_github_token_here"
}
}Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"github-extractor": {
"command": "npx",
"args": ["-y", "@siddhantxh/github-extractor-mcp"],
"env": {
"GITHUB_TOKEN": "your_github_token_here"
}
}
}
}🔧 Available Tools
1. tree - Repository Structure
Get the file tree structure of any GitHub repository or Colab notebook.
Example:
Use github-extractor tree tool with https://github.com/vercel/next.js2. fetchAllContent - Extract All Files
Download and format all files from a repository with intelligent filtering.
Example:
Use github-extractor fetchAllContent for https://github.com/microsoft/vscode excluding "**/*.test.ts,**/node_modules/**"3. fetchAllContentButExclude - Smart Filtering
Extract files while excluding specific extensions or patterns.
Example:
Use github-extractor fetchAllContentButExclude for https://github.com/facebook/react excluding extensions: "test.js,spec.ts,.md"4. specificContent - Individual Files
Get content from a specific file or notebook cell.
Example:
Use github-extractor specificContent for https://github.com/vercel/next.js with filePath: "packages/next/src/server/next.ts"🌐 Supported URLs
- GitHub Repositories:
https://github.com/owner/repo - Specific Branches:
https://github.com/owner/repo/tree/branch-name - Individual Files:
https://github.com/owner/repo/blob/main/file.js - Colab Notebooks:
https://colab.research.google.com/github/owner/repo/blob/main/notebook.ipynb
🔑 Authentication
Required Environment Variable
GITHUB_TOKEN- Your GitHub Personal Access Token
Get a GitHub Token
- Go to GitHub Settings → Developer settings → Personal access tokens
- Click "Generate new token (classic)"
- Select scopes:
public_repo(andrepofor private repositories) - Copy the token and add it to your MCP configuration
✨ Features
- 🌳 Intelligent Tree View - Clean repository structure visualization
- 📄 Smart Content Extraction - Respects
.gitignoreand custom patterns - 🎯 Flexible Filtering - Exclude by file type, size, or custom patterns
- 📊 Token Estimation - Built-in token counting for LLM context planning
- 🔍 Binary File Detection - Automatically skips images, videos, and other binary files
- 📝 Notebook Support - Extract individual cells or full Jupyter notebooks
🛠️ Usage Examples
Analyze Repository Structure
Show me the structure of the React repository using the tree toolExtract TypeScript Files Only
Get all TypeScript files from https://github.com/microsoft/vscode but exclude test filesGet Specific Component
Show me the content of src/components/Button.tsx from https://github.com/user/repoExtract Colab Notebook
Extract the content from this Colab notebook: https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb📊 Rate Limits & Best Practices
- GitHub API: 5,000 requests/hour with token, 60/hour without
- File Size: Default 64KB limit per file (configurable)
- Repository Size: Optimized for repos under 1,000 files
- Binary Files: Automatically filtered out (images, videos, etc.)
🐛 Troubleshooting
"0 tools enabled" in Cursor
- Restart Cursor completely
- Verify your GitHub token is valid
- Check that the package is installed:
npm list -g @siddhantxh/github-extractor-mcp
"Tool not found" errors
- Ensure you're using the exact tool names:
tree,fetchAllContent, etc. - Check your MCP configuration syntax
- Verify the package is globally accessible:
which github-extractor-mcp
Rate limit errors
- Make sure you're using a valid GitHub token
- Consider using exclude patterns to reduce API calls
- For large repositories, use the
sizeLimitKBparameter
📄 License
MIT License - see LICENSE file for details.
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test with real repositories
- Submit a pull request
