@ncbijs/pubtator

v0.0.0

Published

9 days ago

Text mining client for PubTator/BioC APIs with entity annotations and relation discovery

0High
0Medium
0Low

gllamas

ncbijs ncbi pubtator bioc text-mining ner nlp entity gene disease chemical annotation bioinformatics typescript

Why

PubTator3 has over 1 billion entity annotations across 36 million PubMed articles and 6 million PMC full-text articles. It identifies genes, diseases, chemicals, mutations, species, and cell lines. But the API has multiple layers (entity autocomplete, publication export) with different response formats.

@ncbijs/pubtator wraps them into a typed, promise-based client.

Entity search — autocomplete entities by name with optional type filter
Publication search — search PubTator-indexed publications
BioC export — export annotations in BioC XML or JSON
Free-text annotation — annotate arbitrary text with entity recognition
TSV parsing — parse PubTator tab-separated annotation format

Install

npm install @ncbijs/pubtator

Quick start

import { PubTator } from '@ncbijs/pubtator';

const pubtator = new PubTator();

// Search for gene entities
const genes = await pubtator.findEntity('BRCA1', 'gene');
console.log(genes[0].name); // "BRCA1"

// Export BioC annotations for PubMed articles
const bioc = await pubtator.export(['33024307', '32919527']);
for (const doc of bioc.documents) {
  for (const passage of doc.passages) {
    console.log(passage.annotations);
  }
}

API

`new PubTator()`

Creates a new PubTator3 client. No configuration required.

`findEntity(query, entityType?)`

Search entities by name via the PubTator3 autocomplete API.

const results = await pubtator.findEntity('aspirin', 'chemical');

| Parameter | Type | Required | Description | | ------------ | ------------ | -------- | ---------------------------------- | | query | string | Yes | Entity name or partial name. | | entityType | EntityType | No | Filter by entity type (see below). |

Returns Promise<ReadonlyArray<EntityMatch>>.

`EntityType` values

| Constant | API value | | ----------------------- | ------------- | | ENTITY_TYPES.Gene | 'gene' | | ENTITY_TYPES.Disease | 'disease' | | ENTITY_TYPES.Chemical | 'chemical' | | ENTITY_TYPES.Variant | 'variant' | | ENTITY_TYPES.Species | 'species' | | ENTITY_TYPES.CellLine | 'cell_line' |

`search(query, options?)`

Search PubTator-indexed publications by text.

const results = await pubtator.search('BRCA1 breast cancer', { page: 1, pageSize: 10 });
console.log(results.total);

| Parameter | Type | Required | Description | | --------- | --------------- | -------- | ------------ | | query | string | Yes | Search text. | | options | SearchOptions | No | Pagination. |

SearchOptions

| Option | Type | Default | Description | | ---------- | -------- | ------- | ----------------- | | page | number | -- | Page number. | | pageSize | number | -- | Results per page. |

Returns Promise<SearchResult>.

`export(pmids, options?)`

Export BioC annotations for a list of PMIDs.

const bioc = await pubtator.export(['33024307'], { format: 'xml', full: true });

| Parameter | Type | Required | Description | | --------- | ----------------------- | -------- | ----------------------------------- | | pmids | ReadonlyArray<string> | Yes | PubMed IDs to export. | | options | ExportOptions | No | Format and full-text configuration. |

ExportOptions

| Option | Type | Default | Description | | -------- | ------------------- | -------- | --------------------------------------------- | | format | 'json' | 'xml' | 'json' | BioC output format. | | full | boolean | -- | Include full-text annotations when available. |

Returns Promise<BioDocument>.

`annotateByPmid(pmids, options?)`

Annotate articles by their PubMed IDs.

const annotations = await pubtator.annotateByPmid(['33024307'], {
  concept: 'Gene',
  format: 'PubTator',
});

| Parameter | Type | Required | Description | | --------- | ----------------------- | -------- | -------------------------- | | pmids | ReadonlyArray<string> | Yes | PubMed IDs to annotate. | | options | AnnotateOptions | No | Concept filter and format. |

AnnotateOptions

| Option | Type | Default | Description | | --------- | ------------------------------------ | ------- | ---------------------------------------------- | | concept | ConceptType | -- | Filter to a specific concept type (see below). | | format | 'PubTator' | 'BioC' | 'JSON' | -- | Output format. |

Returns Promise<string>.

`ConceptType` values

| Constant | API value | | -------------------------- | -------------- | | CONCEPT_TYPES.Gene | 'Gene' | | CONCEPT_TYPES.Disease | 'Disease' | | CONCEPT_TYPES.Chemical | 'Chemical' | | CONCEPT_TYPES.Mutation | 'Mutation' | | CONCEPT_TYPES.Species | 'Species' | | CONCEPT_TYPES.BioConcept | 'BioConcept' |

`annotateText(text, options?)`

Annotate free text with entity recognition.

const annotated = await pubtator.annotateText(
  'BRCA1 is associated with breast cancer susceptibility.',
  { concept: 'Disease' },
);

| Parameter | Type | Required | Description | | --------- | ----------------- | -------- | -------------------------- | | text | string | Yes | Free text to annotate. | | options | AnnotateOptions | No | Concept filter and format. |

Returns Promise<string>.

`parseBioC(input)`

Parse a BioC XML or JSON string into a typed BioDocument.

import { parseBioC } from '@ncbijs/pubtator';

const bioc = parseBioC(xmlString);

| Parameter | Type | Required | Description | | --------- | -------- | -------- | --------------------------------- | | input | string | Yes | BioC XML or JSON string to parse. |

Returns BioDocument.

`parsePubTatorTsv(input)`

Parse PubTator tab-separated annotation format.

import { parsePubTatorTsv } from '@ncbijs/pubtator';

const annotations = parsePubTatorTsv(tsvString);

| Parameter | Type | Required | Description | | --------- | -------- | -------- | ----------------------------- | | input | string | Yes | PubTator TSV string to parse. |

Returns ReadonlyArray<PubTatorAnnotation>.

Types

All types are exported for use in your own interfaces:

import type {
  AnnotateOptions,
  Annotation,
  BioDocument,
  BioPassage,
  ConceptType,
  EntityMatch,
  EntityType,
  ExportOptions,
  PubTatorAnnotation,
  SearchOptions,
  SearchResult,
} from '@ncbijs/pubtator';

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Why

Install

Quick start

API

new PubTator()

findEntity(query, entityType?)

EntityType values

search(query, options?)

export(pmids, options?)

annotateByPmid(pmids, options?)

ConceptType values

annotateText(text, options?)

parseBioC(input)

parsePubTatorTsv(input)