unpaywall-mcp

v0.1.2

Published

3 months ago

MCP server for Unpaywall: DOI metadata, title search, OA links, and PDF text extraction

0High
0Medium
0Low

elliotpadfield

mcp model-context-protocol unpaywall oa open-access

Unpaywall MCP Server

An MCP (Model Context Protocol) server exposing Unpaywall tools so AI clients can:

Fetch metadata by DOI
Search article titles
Retrieve best OA fulltext links
Download and extract text from OA PDFs

Quickstart (npx)

Add this to your MCP client config (Claude Desktop example):

{
  "mcpServers": {
    "unpaywall": {
      "command": "npx",
      "args": ["-y", "unpaywall-mcp"],
      "env": { "UNPAYWALL_EMAIL": "[email protected]" }
    }
  }
}

Then try the tools: unpaywall_search_titles, unpaywall_get_fulltext_links, unpaywall_fetch_pdf_text.

You don't need to clone this repo or run npm install — npx handles fetching and caching on first call.

Requirements

Node.js 18+ (for npx)
An email address for Unpaywall / OpenAlex requests (required by Unpaywall, used for the OpenAlex polite pool).

Local development (contributors only)

End users should use the npx config above. Contributors building from source:

npm install
npm run build
[email protected] npm start   # stdio transport, as required by MCP clients

Hot-run (no build step):

[email protected] npm run dev

Tools

unpaywall_get_by_doi

Description: Fetch Unpaywall metadata for a DOI
Input schema:
- doi (string, required): e.g. 10.1038/nphys1170
- email (string, optional): overrides UNPAYWALL_EMAIL if provided
Output: JSON response from Unpaywall

unpaywall_search_titles

Description: Search article titles and return Unpaywall-style OA metadata for each hit (50 results/page)
Input schema:
- query (string, required): title query
- is_oa (boolean, optional): if true, only OA results; if false, only closed; omit for all
- page (integer >= 1, optional): page number
- email (string, optional): overrides UNPAYWALL_EMAIL
Output: JSON matching the Unpaywall search shape — results[].response is a DOI-style record (doi, title, is_oa, oa_status, best_oa_location, oa_locations), with score and snippet per result. _source: "openalex" marks the upstream.
Note: Backed by OpenAlex's /works endpoint because Unpaywall's own /v2/search has been returning HTTP 500 since its May 2025 rewrite. Unpaywall now runs as a subroutine of OpenAlex, so this is the canonical modern equivalent.

unpaywall_get_fulltext_links

Description: Return the best OA PDF URL and Open URL for a DOI, plus all OA locations
Input schema:
- doi (string, required)
- email (string, optional): overrides UNPAYWALL_EMAIL
Output: JSON with fields: best_pdf_url, best_open_url, best_oa_location, oa_locations, and select metadata

unpaywall_fetch_pdf_text

Description: Download and extract text from the best OA PDF for a DOI, or from a provided pdf_url
Input schema:
- pdf_url (string, optional): direct PDF URL (takes precedence)
- doi (string, optional): used to resolve best OA PDF if pdf_url not provided
- email (string, optional): required if using doi and no UNPAYWALL_EMAIL env var
- truncate_chars (integer >= 1000, optional): max characters of extracted text to return (default 20000)
Output: JSON with text (possibly truncated), length_chars, truncated, pdf_url, and PDF metadata

LLM prompting tips (MCP)

When using this server from an MCP-enabled LLM client, ask the model to:

Search then fetch: Use unpaywall_search_titles with a concise title phrase; select a result; then call unpaywall_get_fulltext_links or unpaywall_fetch_pdf_text on the chosen DOI.
Prefer OA: Pass is_oa: true in searches when you only want open-access.
Control size: Set truncate_chars in unpaywall_fetch_pdf_text (default 20000) and summarize long texts before proceeding.
Be resilient: If the best PDF URL is missing, fall back to best_open_url and extract content from the landing page (outside this server).
Respect rate limits: Space requests if making many calls; reuse earlier responses instead of repeating the same call.

Good user instructions to the LLM:

"Find 3 OA papers about 'foundation models in biomedicine', then extract and summarize the introduction of the best one."
"Search for 'Graph Neural Networks survey 2024', filter to OA if possible, then fetch the PDF text and produce a 10-bullet summary."

Example tool call payloads

Depending on your MCP client, the structure differs; the core payloads are:

// Search titles
{
  "name": "unpaywall_search_titles",
  "arguments": {
    "query": "graph neural networks survey",
    "is_oa": true,
    "page": 1
  }
}

// Get best OA links for a DOI
{
  "name": "unpaywall_get_fulltext_links",
  "arguments": {
    "doi": "10.48550/arXiv.1812.08434"
  }
}

// Fetch and extract PDF text (by DOI)
{
  "name": "unpaywall_fetch_pdf_text",
  "arguments": {
    "doi": "10.48550/arXiv.1812.08434",
    "truncate_chars": 20000
  }
}

Configure in an MCP client

Recommended (no-build) config for Claude Desktop using npm/npx:

{
  "mcpServers": {
    "unpaywall": {
      "command": "npx",
      "args": ["-y", "unpaywall-mcp"],
      "env": {
        "UNPAYWALL_EMAIL": "[email protected]"
      }
    }
  }
}

Alternative (local repo) config using the compiled dist:

{
  "mcpServers": {
    "unpaywall": {
      "command": "node",
      "args": ["/absolute/path/to/dist/index.js"],
      "env": {
        "UNPAYWALL_EMAIL": "[email protected]"
      }
    }
  }
}

After adding, ask your client to list tools and try:

unpaywall_search_titles with a query
unpaywall_get_fulltext_links with a doi
unpaywall_fetch_pdf_text with a doi (or pdf_url)

Notes

Respect Unpaywall's rate limits and usage guidelines: https://unpaywall.org/products/api
The server uses stdio transport and @modelcontextprotocol/sdk.
Set UNPAYWALL_EMAIL or pass email per call so Unpaywall can contact you about usage.

Maintainers: publish to npm

# 1) Build the project (also runs automatically on publish)
npm run build

# 2) Bump version (choose patch/minor/major)
npm version patch

# 3) Publish (ensure you are logged in: npm login)
npm publish --access public

# 4) Tag a release on GitHub (optional, recommended)

Users can then configure their MCP client with npx -y unpaywall-mcp as shown above. No clone or build required.