unpaywall-mcp
v0.1.2
Published
MCP server for Unpaywall: DOI metadata, title search, OA links, and PDF text extraction
Maintainers
Readme
Unpaywall MCP Server
An MCP (Model Context Protocol) server exposing Unpaywall tools so AI clients can:
- Fetch metadata by DOI
- Search article titles
- Retrieve best OA fulltext links
- Download and extract text from OA PDFs
Quickstart (npx)
Add this to your MCP client config (Claude Desktop example):
{
"mcpServers": {
"unpaywall": {
"command": "npx",
"args": ["-y", "unpaywall-mcp"],
"env": { "UNPAYWALL_EMAIL": "[email protected]" }
}
}
}Then try the tools: unpaywall_search_titles, unpaywall_get_fulltext_links, unpaywall_fetch_pdf_text.
You don't need to clone this repo or run npm install — npx handles fetching and caching on first call.
Requirements
- Node.js 18+ (for
npx) - An email address for Unpaywall / OpenAlex requests (required by Unpaywall, used for the OpenAlex polite pool).
Local development (contributors only)
End users should use the npx config above. Contributors building from source:
npm install
npm run build
[email protected] npm start # stdio transport, as required by MCP clientsHot-run (no build step):
[email protected] npm run devTools
unpaywall_get_by_doi
- Description: Fetch Unpaywall metadata for a DOI
- Input schema:
doi(string, required): e.g.10.1038/nphys1170email(string, optional): overridesUNPAYWALL_EMAILif provided
- Output: JSON response from Unpaywall
unpaywall_search_titles
- Description: Search article titles and return Unpaywall-style OA metadata for each hit (50 results/page)
- Input schema:
query(string, required): title queryis_oa(boolean, optional): if true, only OA results; if false, only closed; omit for allpage(integer >= 1, optional): page numberemail(string, optional): overridesUNPAYWALL_EMAIL
- Output: JSON matching the Unpaywall search shape —
results[].responseis a DOI-style record (doi,title,is_oa,oa_status,best_oa_location,oa_locations), withscoreandsnippetper result._source: "openalex"marks the upstream. - Note: Backed by OpenAlex's
/worksendpoint because Unpaywall's own/v2/searchhas been returning HTTP 500 since its May 2025 rewrite. Unpaywall now runs as a subroutine of OpenAlex, so this is the canonical modern equivalent.
unpaywall_get_fulltext_links
- Description: Return the best OA PDF URL and Open URL for a DOI, plus all OA locations
- Input schema:
doi(string, required)email(string, optional): overridesUNPAYWALL_EMAIL
- Output: JSON with fields:
best_pdf_url,best_open_url,best_oa_location,oa_locations, and select metadata
unpaywall_fetch_pdf_text
- Description: Download and extract text from the best OA PDF for a DOI, or from a provided
pdf_url - Input schema:
pdf_url(string, optional): direct PDF URL (takes precedence)doi(string, optional): used to resolve best OA PDF ifpdf_urlnot providedemail(string, optional): required if usingdoiand noUNPAYWALL_EMAILenv vartruncate_chars(integer >= 1000, optional): max characters of extracted text to return (default 20000)
- Output: JSON with
text(possibly truncated),length_chars,truncated,pdf_url, and PDF metadata
LLM prompting tips (MCP)
When using this server from an MCP-enabled LLM client, ask the model to:
- Search then fetch: Use
unpaywall_search_titleswith a concise title phrase; select a result; then callunpaywall_get_fulltext_linksorunpaywall_fetch_pdf_texton the chosen DOI. - Prefer OA: Pass
is_oa: truein searches when you only want open-access. - Control size: Set
truncate_charsinunpaywall_fetch_pdf_text(default 20000) and summarize long texts before proceeding. - Be resilient: If the best PDF URL is missing, fall back to
best_open_urland extract content from the landing page (outside this server). - Respect rate limits: Space requests if making many calls; reuse earlier responses instead of repeating the same call.
Good user instructions to the LLM:
- "Find 3 OA papers about 'foundation models in biomedicine', then extract and summarize the introduction of the best one."
- "Search for 'Graph Neural Networks survey 2024', filter to OA if possible, then fetch the PDF text and produce a 10-bullet summary."
Example tool call payloads
Depending on your MCP client, the structure differs; the core payloads are:
// Search titles
{
"name": "unpaywall_search_titles",
"arguments": {
"query": "graph neural networks survey",
"is_oa": true,
"page": 1
}
}// Get best OA links for a DOI
{
"name": "unpaywall_get_fulltext_links",
"arguments": {
"doi": "10.48550/arXiv.1812.08434"
}
}// Fetch and extract PDF text (by DOI)
{
"name": "unpaywall_fetch_pdf_text",
"arguments": {
"doi": "10.48550/arXiv.1812.08434",
"truncate_chars": 20000
}
}Configure in an MCP client
Recommended (no-build) config for Claude Desktop using npm/npx:
{
"mcpServers": {
"unpaywall": {
"command": "npx",
"args": ["-y", "unpaywall-mcp"],
"env": {
"UNPAYWALL_EMAIL": "[email protected]"
}
}
}
}Alternative (local repo) config using the compiled dist:
{
"mcpServers": {
"unpaywall": {
"command": "node",
"args": ["/absolute/path/to/dist/index.js"],
"env": {
"UNPAYWALL_EMAIL": "[email protected]"
}
}
}
}After adding, ask your client to list tools and try:
unpaywall_search_titleswith aqueryunpaywall_get_fulltext_linkswith adoiunpaywall_fetch_pdf_textwith adoi(orpdf_url)
Notes
- Respect Unpaywall's rate limits and usage guidelines: https://unpaywall.org/products/api
- The server uses stdio transport and
@modelcontextprotocol/sdk. - Set
UNPAYWALL_EMAILor passemailper call so Unpaywall can contact you about usage.
Maintainers: publish to npm
# 1) Build the project (also runs automatically on publish)
npm run build
# 2) Bump version (choose patch/minor/major)
npm version patch
# 3) Publish (ensure you are logged in: npm login)
npm publish --access public
# 4) Tag a release on GitHub (optional, recommended)Users can then configure their MCP client with npx -y unpaywall-mcp as shown above. No clone or build required.
