@ncbijs/pubmed-xml
v0.1.1
Published
Spec-compliant parser for PubMed/MEDLINE XML format
Downloads
549
Maintainers
Readme
Runtime: Browser + Node.js
Why
PubMed XML has dozens of structural edge cases — structured abstracts with labeled sections, collective author names, MedlineDate fallback formats, multiple article IDs across different schemes. Writing a one-off parser means rediscovering these pitfalls. This package handles them all and returns clean typed objects.
- PubmedArticleSet XML — full parse of the standard PubMed efetch XML format
- Streaming parser — process large XML responses article-by-article via
AsyncIterableIterator - MEDLINE plain-text — parse the tagged MEDLINE format (
PMID- ...,TI - ...) - Complete field coverage — abstract (structured/unstructured), authors, journal, MeSH, grants, keywords, data banks, comments/corrections, article IDs, publication types
Install
npm install @ncbijs/pubmed-xmlQuick start
Parse XML
import { parsePubmedXml } from '@ncbijs/pubmed-xml';
const xml = await fetch(
'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12345678&rettype=xml',
).then((r) => r.text());
const articles = parsePubmedXml(xml);
for (const article of articles) {
console.log(article.pmid, article.title);
console.log(article.journal.title, article.publicationDate.year);
}Stream large responses
import { createPubmedXmlStream } from '@ncbijs/pubmed-xml';
const response = await fetch(
'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=1,2,3&rettype=xml',
);
const textStream = response.body!.pipeThrough(new TextDecoderStream());
for await (const article of createPubmedXmlStream(textStream)) {
console.log(article.pmid, article.title);
}Parse MEDLINE text
import { parseMedlineText } from '@ncbijs/pubmed-xml';
const medline = `
PMID- 12345678
TI - A novel approach to cancer treatment
AU - Smith J
AU - Jones M
DP - 2024 Mar 15
AB - This study demonstrates...
MH - Neoplasms/*therapy
`;
const articles = parseMedlineText(medline);
console.log(articles[0].title); // 'A novel approach to cancer treatment'API
parsePubmedXml(xml)
Parse a PubmedArticleSet XML string into an array of typed article objects.
const articles = parsePubmedXml(xmlString);| Parameter | Type | Description |
| --------- | -------- | ---------------------------- |
| xml | string | PubmedArticleSet XML string. |
Returns ReadonlyArray<PubmedArticle>.
createPubmedXmlStream(input)
Create a streaming parser that yields PubmedArticle objects as complete <PubmedArticle> elements arrive. Useful for large responses that should not be buffered entirely in memory.
for await (const article of createPubmedXmlStream(textStream)) {
// process each article as it arrives
}| Parameter | Type | Description |
| --------- | ------------------------ | ---------------------------- |
| input | ReadableStream<string> | Readable stream of XML text. |
Returns AsyncIterableIterator<PubmedArticle>.
Throws if the stream ends with an incomplete <PubmedArticle> element.
parseMedlineText(text)
Parse MEDLINE plain-text format (tagged format with PMID-, TI -, etc.) into typed article objects.
const articles = parseMedlineText(medlineString);| Parameter | Type | Description |
| --------- | -------- | ----------------------------- |
| text | string | MEDLINE tagged-format string. |
Returns ReadonlyArray<PubmedArticle>.
PubmedArticle
Every parser returns PubmedArticle objects with the following fields:
| Field | Type | Description |
| --------------------- | ---------------------------------- | ------------------------------------------------------------------ |
| pmid | string | PubMed ID. |
| title | string | Article title (tags stripped). |
| vernacularTitle | string (optional) | Title in original language. |
| abstract | AbstractContent | Abstract text, with structured sections if available. |
| authors | ReadonlyArray<Author> | Author list. |
| journal | JournalInfo | Journal title, abbreviation, ISSN, volume, issue. |
| publicationDate | PartialDate | Year, month, day, or season/raw fallback. |
| mesh | ReadonlyArray<MeshHeading> | MeSH headings with descriptors, qualifiers, and major topic flags. |
| articleIds | ArticleIds | PMID, DOI, PMC, PII, MID identifiers. |
| publicationTypes | ReadonlyArray<string> | Publication type list (e.g., "Journal Article", "Review"). |
| grants | ReadonlyArray<Grant> | Funding grants. |
| keywords | ReadonlyArray<Keyword> | Author and NLM keywords. |
| commentsCorrections | ReadonlyArray<CommentCorrection> | Errata, retractions, comments. |
| dataBanks | ReadonlyArray<DataBank> | Data bank accession numbers (e.g., GenBank). |
| language | string | Language code (e.g., "eng"). |
| dateRevised | PartialDate (optional) | Date last revised. |
| dateCompleted | PartialDate (optional) | Date citation completed. |
Types
All types are exported for use in your own interfaces:
import type {
PubmedArticle,
Author,
MeshHeading,
MeshQualifier,
AbstractSection,
AbstractContent,
CommentCorrection,
DataBank,
Grant,
JournalInfo,
Keyword,
ArticleIds,
PartialDate,
} from '@ncbijs/pubmed-xml';