gnparser

v2.0.0

Published

4 months ago

Node.js wrapper for GNParser, a scientific name parser

Downloads

360

0High
0Medium
0Low

amazingplants

scientific-names taxonomy biodiversity gnparser parser nomenclature

Global Names Parser for Node.js

This module provides a Node.js wrapper for GNParser by the Global Names Project.

Installation

npm install gnparser

The gnparser binary is automatically downloaded from GitHub releases during installation. Downloads are verified against pinned SHA-256 hashes and cached to avoid repeated downloads.

Environment variables

| Variable | Description | |----------|-------------| | GNPARSER_SKIP_DOWNLOAD=1 | Skip binary download during install (useful if providing your own binary) | | GNPARSER_BINARY_PATH | Use a custom path to the gnparser binary | | GNPARSER_USE_SYSTEM=1 | Use system-installed gnparser from PATH instead of downloading |

Cache location

Downloaded archives are cached to avoid repeated downloads in CI:

macOS: ~/Library/Caches/gnparser/
Linux: ~/.cache/gnparser/ (or $XDG_CACHE_HOME/gnparser/)
Windows: %LOCALAPPDATA%\gnparser\cache\

Supported platforms

macOS (Intel x64 and Apple Silicon ARM64)
Linux (x64 and ARM64)
Windows (x64 and ARM64)

Usage

The package supports both ESM and CommonJS imports:

// ESM
import { parse, parseSync } from 'gnparser'

// CommonJS
const { parse, parseSync } = require('gnparser')

TypeScript

Full TypeScript support with exported types:

import { parse, parseSync, ParseOptions, ParsedName } from 'gnparser'

const result: ParsedName = await parse('Homo sapiens')

parse(names, options?): Promise

Parses scientific names asynchronously.

names may be an individual scientific name string or an array of name strings
options is an optional object with these keys:
- details: boolean - include additional details (e.g. the individual parsed words of the name). Default is false.
- cultivars: boolean - parse using the cultivar nomenclatural code. Default is false.
- diaereses: boolean - preserve diaereses (e.g. Leptochloöpsis virgata) in normalized and canonical names. Default is false: diaereses will be transliterated to ASCII.
- removeDiaereses: boolean - transliterate diaereses to their ASCII counterparts without changing the spelling, e.g. Leptochloöpsis virgata → Leptochloopsis virgata

Returns a Promise that resolves to:

A JavaScript object for a single name input
An array of JavaScript objects for an array input

The recommended options for parsing botanical names are { cultivars: true, diaereses: true }.

Example

import { parse } from 'gnparser'

// Parse a single name
const parsed = await parse('Pardosa moesta Banks, 1892')
console.log(parsed.canonical.simple) // "Pardosa moesta"

// Parse multiple names
const names = ['Pardosa moesta Banks, 1892', 'Parus major L.', "Anthurium 'Ace of Spades'"]
const results = await parse(names)

// With options
const result = await parse("Sarracenia flava 'Maxima'", { details: true, cultivars: true })

parseSync(names, options?)

Synchronous version of parse(). Same arguments and return values, but blocks until parsing is complete.

import { parseSync } from 'gnparser'

const parsed = parseSync('Homo sapiens Linnaeus, 1758')
console.log(parsed.canonical.simple) // "Homo sapiens"

Output structure

Each parsed name returns an object with these fields:

interface ParsedName {
  id: string                    // Deterministic UUID v5
  verbatim: string              // Original input
  parsed: boolean               // Whether parsing succeeded
  quality: number               // 1-4, lower is better
  normalized?: string           // Normalized form
  canonical?: {
    stemmed: string             // Stemmed form for matching
    simple: string              // Simple canonical name
    full: string                // Full canonical with ranks
  }
  cardinality?: number          // 1=uninomial, 2=binomial, etc.
  rank?: string                 // Taxonomic rank
  authorship?: {
    verbatim: string
    normalized: string
    year?: string
    authors?: string[]
  }
  details?: object              // Detailed parsing info (with details option)
  words?: Word[]                // Parsed words (with details option)
  parserVersion: string
}

Migration from v1.x

Version 2.0 switches from C bindings to the gnparser binary with streaming mode. Key changes:

Async by default: parse() now returns a Promise. Use await or .then().
Sync alternative: Use parseSync() for synchronous parsing (same behavior as v1.x parse()).
No compilation required: No more node-gyp or native dependencies.
Broader platform support: Now supports Windows and more architectures.
ESM and CommonJS: Now supports both module systems with full TypeScript types.

// v1.x (synchronous)
const result = gnparser.parse('Homo sapiens')

// v2.x (async)
const result = await gnparser.parse('Homo sapiens')

// v2.x (sync alternative)
const result = gnparser.parseSync('Homo sapiens')

Development

Building

npm run build        # Build ESM and CJS
npm run clean        # Remove dist/
npm test             # Build and run tests

Updating gnparser version

When updating to a new gnparser version:

Update GNPARSER_VERSION in src/install.ts
Run npm run update-hashes to calculate and update SHA-256 hashes
Run npm test to verify everything works

The preversion script automatically runs hash updates and tests before version bumps.

Versioning

This module's major and minor version number matches that of the main GNParser Go project, but the patch version differs.

License

MIT