@ncbijs/xml
v0.1.1
Published
Zero-dependency regex-based XML reader for NCBI document formats
Maintainers
Readme
Runtime: Browser + Node.js
Why
DOM parsers like DOMParser or xml2js are heavy dependencies that need a runtime or polyfills. NCBI XML responses have predictable, well-documented structure — a targeted regex reader is faster, lighter, and works everywhere.
@ncbijs/xml extracts tags, blocks, and attributes from NCBI XML without external dependencies or platform assumptions.
- Tag extraction — text content of leaf elements
- Block extraction — full inner content including nested tags, with correct handling of same-name nesting
- Attribute reading — attribute values from opening tags, with entity decoding
- Entity decoding —
&,<,>,",', and numeric entities (&#x...;,&#...;) - Tag stripping — remove all XML tags and decode entities in one call
Install
npm install @ncbijs/xmlQuick start
import { readTag, readBlock, readAllTags, stripTags } from '@ncbijs/xml';
const xml = `
<PubmedArticle>
<MedlineCitation>
<PMID>12345678</PMID>
<Article>
<ArticleTitle>A <i>novel</i> approach</ArticleTitle>
<Language>eng</Language>
</Article>
</MedlineCitation>
</PubmedArticle>`;
readTag(xml, 'PMID');
// => '12345678'
readBlock(xml, 'ArticleTitle');
// => 'A <i>novel</i> approach'
stripTags('A <i>novel</i> approach');
// => 'A novel approach'
readAllTags(xml, 'Language');
// => ['eng']API
readTag(xml, tagName)
Extract the text content of the first matching tag. Only captures text between the open and close tags — no nested elements.
readTag('<PMID Version="1">12345678</PMID>', 'PMID');
// => '12345678'| Parameter | Type | Description |
| --------- | -------- | --------------------- |
| xml | string | XML string to search. |
| tagName | string | Tag name to find. |
Returns string | undefined.
readAllTags(xml, tagName)
Extract text content of all matching tags.
readAllTags('<Keyword>cancer</Keyword><Keyword>genomics</Keyword>', 'Keyword');
// => ['cancer', 'genomics']| Parameter | Type | Description |
| --------- | -------- | --------------------- |
| xml | string | XML string to search. |
| tagName | string | Tag name to find. |
Returns ReadonlyArray<string>.
readBlock(xml, tagName)
Extract the full inner content (including nested tags) between the first matching open/close pair. Handles nested same-name tags correctly.
readBlock('<Abstract><p>First.</p><p>Second.</p></Abstract>', 'Abstract');
// => '<p>First.</p><p>Second.</p>'| Parameter | Type | Description |
| --------- | -------- | --------------------- |
| xml | string | XML string to search. |
| tagName | string | Tag name to find. |
Returns string | undefined.
readAllBlocks(xml, tagName)
Extract inner content of all matching blocks.
readAllBlocks('<sec><p>A</p></sec><sec><p>B</p></sec>', 'sec');
// => ['<p>A</p>', '<p>B</p>']| Parameter | Type | Description |
| --------- | -------- | --------------------- |
| xml | string | XML string to search. |
| tagName | string | Tag name to find. |
Returns ReadonlyArray<string>.
readAttribute(xml, tagName, attrName)
Extract the value of an attribute from the first matching tag.
readAttribute('<PMID Version="1">12345678</PMID>', 'PMID', 'Version');
// => '1'| Parameter | Type | Description |
| ---------- | -------- | ----------------------- |
| xml | string | XML string to search. |
| tagName | string | Tag name to find. |
| attrName | string | Attribute name to read. |
Returns string | undefined.
readTagWithAttributes(xml, tagName)
Extract text content and all attributes from the first matching tag.
readTagWithAttributes('<Keyword MajorTopicYN="Y">cancer</Keyword>', 'Keyword');
// => { text: 'cancer', attributes: { MajorTopicYN: 'Y' } }| Parameter | Type | Description |
| --------- | -------- | --------------------- |
| xml | string | XML string to search. |
| tagName | string | Tag name to find. |
Returns TagWithAttributes | null.
readAllTagsWithAttributes(xml, tagName)
Extract text content and attributes from all matching tags.
readAllTagsWithAttributes(
'<DescriptorName UI="D009369" MajorTopicYN="Y">Neoplasms</DescriptorName>',
'DescriptorName',
);
// => [{ text: 'Neoplasms', attributes: { UI: 'D009369', MajorTopicYN: 'Y' } }]| Parameter | Type | Description |
| --------- | -------- | --------------------- |
| xml | string | XML string to search. |
| tagName | string | Tag name to find. |
Returns ReadonlyArray<TagWithAttributes>.
readAllBlocksWithAttributes(xml, tagName)
Extract inner content and attributes from all matching blocks.
readAllBlocksWithAttributes(
'<article-id pub-id-type="doi">10.1234/example</article-id>',
'article-id',
);
// => [{ content: '10.1234/example', attributes: { 'pub-id-type': 'doi' } }]| Parameter | Type | Description |
| --------- | -------- | --------------------- |
| xml | string | XML string to search. |
| tagName | string | Tag name to find. |
Returns ReadonlyArray<BlockWithAttributes>.
stripTags(xml)
Remove all XML tags from a string.
stripTags('<p>A <b>bold</b> statement</p>');
// => 'A bold statement'| Parameter | Type | Description |
| --------- | -------- | -------------------- |
| xml | string | XML string to strip. |
Returns string.
removeAllBlocks(xml, tagName)
Remove all occurrences of a block (open tag through close tag, including content). Also removes self-closing elements.
removeAllBlocks('<body><xref>1</xref> text</body>', 'xref');
// => '<body> text</body>'| Parameter | Type | Description |
| --------- | -------- | --------------------- |
| xml | string | XML string to modify. |
| tagName | string | Tag name to remove. |
Returns string.
decodeEntities(text)
Decode XML entities: &, <, >, ", ', and numeric character references ({, 💡).
decodeEntities('Smith & Jones — 2024');
// => 'Smith & Jones — 2024'| Parameter | Type | Description |
| --------- | -------- | --------------- |
| text | string | Text to decode. |
Returns string.
Types
All types are exported for use in your own interfaces:
import type { TagWithAttributes, BlockWithAttributes } from '@ncbijs/xml';