@kennyfrc/jina-reader
v1.0.1
Published
A Model Context Protocol (MCP) server for integrating with Jina Reader API, allowing Claude and other AI assistants to read and extract content from webpages, HTML, and PDF files.
Downloads
32
Readme
Jina Reader MCP
A Model Context Protocol (MCP) server for integrating with Jina Reader API, allowing Claude and other AI assistants to read and extract content from webpages, HTML, and PDF files.
Features
- Read content from URLs
- Process HTML content directly
- Extract information from PDF files (base64 encoded)
- Format responses with links and metadata
Setup
- Clone this repository
- Install dependencies:
npm install - Build the project:
npm run build
Usage
Using with npx
The easiest way to use this package is with npx. You need to provide your Jina API key as an environment variable:
# Set the API key directly in the command
JINA_API_KEY=your_jina_api_key npx -y @kennyfrc/jina-readerStarting the server locally
If you've cloned the repository:
# Provide the API key when running
JINA_API_KEY=your_jina_api_key npm startIntegrating with Claude Desktop
Add this to your claude-desktop-config.json:
{
"mcpServers": {
"jina-reader": {
"command": "npx",
"args": ["-y", "@kennyfrc/jina-reader"],
"env": {
"JINA_API_KEY": "your_jina_api_key_here"
}
}
}
}Or if you've cloned the repository:
{
"mcpServers": {
"jina-reader": {
"command": "node",
"args": ["/absolute/path/to/your/dist/index.js"],
"env": {
"JINA_API_KEY": "your_jina_api_key_here"
}
}
}
}Place this file in:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Restart Claude Desktop to load your MCP server.
Available Tools
Each tool accepts the following optional parameters:
Common Parameters
engine: Control the rendering engine:none: Default rendering engine (good balance of quality and speed)direct: Speed-optimized renderingbrowser: Quality-optimized renderingcf-browser-rendering: Experimental rendering
max_length: Maximum number of characters to return (default is 20000)- Important: This parameter is essential for controlling token usage, especially with large documents like research papers
- By default, content is truncated to 20,000 characters to prevent excessive token consumption
- The response includes pagination information when content is clipped
- Pagination details show the current character range and guidance for subsequent requests
start_index: Start content from the character index (default is 0)- Essential for paginating through large documents
- Works with the pagination info provided in clipped responses
- For example, to get the next page of content after seeing "Currently showing characters 0-20000", use
start_index=20000
Tools
jina_read_url: Read and extract content from a webpage{ "url": "https://example.com", "engine": "browser", // Optional, for best quality "max_length": 10000, // Optional, limit output length "start_index": 0 // Optional, pagination starting point }jina_read_html: Read and extract content from HTML{ "html": "<html>...</html>", "engine": "direct", // Optional, for faster processing "max_length": 10000, // Optional, limit output length "start_index": 0 // Optional, pagination starting point }jina_read_pdf: Read and extract content from a PDF file{ "pdf": "base64_encoded_pdf_content", "engine": "none", // Optional, uses default engine "max_length": 10000, // Optional, limit output length "start_index": 0 // Optional, pagination starting point }
Pagination
When dealing with large documents (like research papers or lengthy articles), the content will be automatically clipped to control token usage. The response includes clear pagination information:
[PAGINATION INFO]
- Content clipped: Currently showing characters 0-20000 of approximately 78000 total
- To view the next section, use start_index=20000 with the same max_length
- Complete content can be accessed by making multiple paginated requestsTo retrieve subsequent pages, make additional requests with updated start_index values:
// First request (page 1)
{
"url": "https://arxiv.org/html/2401.14196v1",
"max_length": 20000
// start_index defaults to 0
}
// Second request (page 2)
{
"url": "https://arxiv.org/html/2401.14196v1",
"max_length": 20000,
"start_index": 20000
}
// Third request (page 3)
{
"url": "https://arxiv.org/html/2401.14196v1",
"max_length": 20000,
"start_index": 40000
}Caching
This MCP server implements two levels of caching:
- Local Memory Cache: Responses are cached in memory for 3 hours to improve performance and reduce API calls
- Optimized for documentation content which changes infrequently
- Significantly reduces API usage for repeated queries
- Jina API Server Cache: The Jina API also caches responses using the
X-Cacheheader
The caching system helps to:
- Improve response times
- Reduce API usage
- Decrease latency for repeated requests
- Minimize costs when working with large documentation
Development
For development with automatic rebuilding:
npm run devFor publishing instructions, see the docs/publishing.md file.
