@edtf-ts/natural
v0.4.2
Published
Natural language to EDTF parser with locale support
Maintainers
Readme
@edtf-ts/natural
Natural language to EDTF parser with locale support and ambiguity handling.
GitHub Repository | Documentation | Interactive Playground
Installation
npm install @edtf-ts/natural @edtf-ts/core
# or
pnpm add @edtf-ts/natural @edtf-ts/core
# or
yarn add @edtf-ts/natural @edtf-ts/coreFeatures
- Natural Language Parsing - Parse human-readable date expressions
- Locale Support - US and EU date format handling
- Confidence Scoring - Automatic confidence assessment for each interpretation
- Ambiguity Handling - Returns multiple interpretations for ambiguous inputs
- Zero Config - Works out of the box with sensible defaults
- Powered by Nearley - Uses Earley parser for handling ambiguous grammars
Quick Start
import { parseNatural } from '@edtf-ts/natural';
// Parse a simple date
const results = parseNatural('January 12, 1940');
console.log(results[0].edtf); // '1940-01-12'
// Parse uncertain dates
const uncertain = parseNatural('possibly 1984');
console.log(uncertain[0].edtf); // '1984?'
// Parse approximate dates
const circa = parseNatural('circa 1950');
console.log(circa[0].edtf); // '1950~'
// Parse intervals
const interval = parseNatural('from 1964 to 2008');
console.log(interval[0].edtf); // '1964/2008'
console.log(interval[0].type); // 'interval'
// Parse seasons
const season = parseNatural('Spring 2001');
console.log(season[0].edtf); // '2001-21'
// Parse decades
const decade = parseNatural('the 1960s');
console.log(decade[0].edtf); // '196X'Handling Ambiguity
The parser returns multiple interpretations for ambiguous input with confidence scores:
// Ambiguous numeric date (could be MM/DD or DD/MM)
const ambiguous = parseNatural('02/03/2020', { locale: 'en-US' });
console.log(ambiguous);
// [
// {
// edtf: '2020-02-03',
// confidence: 0.6,
// interpretation: 'February 3, 2020 (US format: MM/DD/YYYY)',
// type: 'date',
// ambiguous: true
// },
// {
// edtf: '2020-03-02',
// confidence: 0.4,
// interpretation: 'March 2, 2020 (EU format: DD/MM/YYYY)',
// type: 'date',
// ambiguous: true
// }
// ]
// Get only the best result
const bestOnly = parseNatural('02/03/2020', { returnAllResults: false });
console.log(bestOnly.length); // 1Supported Formats
Complete Dates
parseNatural('January 12, 1940'); // US long format
parseNatural('12 January 1940'); // EU long format
parseNatural('Jan 12, 1940'); // US abbreviated
parseNatural('12 Jan 1940'); // EU abbreviatedPartial Dates
parseNatural('January 2020'); // Month and year
parseNatural('2020'); // Year onlyUncertain Dates (?)
parseNatural('possibly 1984');
parseNatural('maybe 1984');
parseNatural('perhaps 1984');
parseNatural('probably 1984');
parseNatural('1984?');Approximate Dates (~)
parseNatural('circa 1950');
parseNatural('c. 1950');
parseNatural('about 1950');
parseNatural('around 1950');
parseNatural('approximately 1950');
parseNatural('~1950');Intervals
parseNatural('from 1964 to 2008');
parseNatural('1964 to 2008');
parseNatural('between 1964 and 2008');
parseNatural('before 1930'); // ../1930
parseNatural('after 1930'); // 1930/..
parseNatural('since 1930'); // 1930/..Seasons
parseNatural('Spring 2001'); // 2001-21
parseNatural('Summer 2001'); // 2001-22
parseNatural('Fall 2001'); // 2001-23
parseNatural('Winter 2001'); // 2001-24Decades and Centuries
parseNatural('the 1960s'); // 196X
parseNatural('1960s'); // 196X
parseNatural('the 1800s'); // 18XX
parseNatural('19th century'); // 18XXSets and Lists
parseNatural('1667 or 1668'); // [1667,1668] (one of)
parseNatural('1667 and 1668'); // {1667,1668} (all of)
parseNatural('1984 or earlier'); // [..1984]API
parseNatural(input, options?)
Parse natural language date input into EDTF format.
Parameters:
input(string): Natural language date stringoptions(object, optional):locale(string): Locale for date interpretation (default: 'en-US')returnAllResults(boolean): Return all possible interpretations (default: true)minConfidence(number): Minimum confidence threshold 0-1 (default: 0)
Returns: Array of ParseResult objects, sorted by confidence (highest first)
ParseResult:
interface ParseResult {
edtf: string; // The EDTF string representation
type: 'date' | 'interval' | 'season' | 'set' | 'list';
confidence: number; // Confidence score (0-1)
interpretation: string; // Human-readable interpretation
parsed?: EDTFBase; // Parsed EDTF object (from @edtf-ts/core)
ambiguous?: boolean; // Whether this result is ambiguous
}ParseError
Error class thrown when parsing fails.
try {
parseNatural('invalid input');
} catch (error) {
if (error instanceof ParseError) {
console.log(error.message);
console.log(error.input);
console.log(error.position);
}
}Options
Locale Support
The parser supports different date format preferences based on locale:
// US locale (MM/DD/YYYY preference)
parseNatural('02/03/2020', { locale: 'en-US' });
// Returns US format (Feb 3) with higher confidence
// EU locale (DD/MM/YYYY preference)
parseNatural('02/03/2020', { locale: 'en-GB' });
// Returns EU format (Mar 2) with higher confidenceConfidence Threshold
Filter results by minimum confidence:
const results = parseNatural('ambiguous date', {
minConfidence: 0.7,
});
// Only returns interpretations with confidence >= 0.7How It Works
The parser uses Nearley, an Earley parser generator, which is specifically designed to handle ambiguous grammars. Unlike PEG parsers that return the first match, Nearley can return all valid parse trees, which is essential for natural language date parsing where ambiguity is common.
Why Nearley?
- Handles Ambiguity: Returns all valid interpretations for ambiguous input
- Proven in EDTF Space: Used successfully by edtf.js
- Rich Grammar Support: Supports complex, overlapping patterns gracefully
Confidence Scoring
The parser assigns confidence scores based on:
- Format specificity (ISO format = 1.0, numeric MM/DD = 0.6)
- Locale preferences (US locale prefers MM/DD over DD/MM)
- Ambiguity (unambiguous results get higher scores)
- Number of valid interpretations (single valid interpretation = 0.9)
Round-Trip Conversion
The natural language parser supports bidirectional conversion - you can parse EDTF-formatted output back into EDTF. This is particularly useful when displaying formatted dates to users and allowing them to type natural language that gets parsed back.
import { parse } from '@edtf-ts/core';
import { formatHuman } from '@edtf-ts/core';
import { parseNatural } from '@edtf-ts/natural';
// Parse EDTF -> Format to natural language -> Parse back to EDTF
const edtf = '1985/..';
const result = parse(edtf);
if (result.success) {
const formatted = formatHuman(result.value);
console.log(formatted); // "1985 to open end"
// Parse the formatted text back to EDTF
const roundTrip = parseNatural(formatted);
console.log(roundTrip[0].edtf); // "1985/.."
}Supported Round-Trip Patterns
Open and Unknown Endpoints:
// Open end
parseNatural('1985 to open end'); // '1985/..'
parseNatural('1985 onward'); // '1985/..'
parseNatural('1985 onwards'); // '1985/..'
// Open start
parseNatural('open start to 1985'); // '../1985'
// Unknown endpoints
parseNatural('1985 to unknown'); // '1985/'
parseNatural('unknown to 1985'); // '/1985'Qualified Dates:
parseNatural('1984 (uncertain)'); // '1984?'
parseNatural('June 2004 (approximate)'); // '2004-06~'
parseNatural('June 11, 2004 (uncertain/approximate)'); // '2004-06-11%'Intervals:
parseNatural('February 1, 2004 to February 2005'); // '2004-02-01/2005-02'
parseNatural('1964 to 2008'); // '1964/2008'This round-trip capability makes @edtf-ts/natural ideal for building user interfaces where users need to view and edit EDTF dates in natural language.
Development Status
This package is in active development. The grammar currently supports:
- Supported: Complete dates (various formats)
- Supported: Partial dates (month/year, year only)
- Supported: Uncertain and approximate qualifiers
- Supported: Basic intervals
- Supported: Seasons
- Supported: Decades and centuries (partial)
- Supported: Sets and lists (basic)
- Planned: All 200+ patterns from the specification (iterative refinement)
See the parser specification for the complete list of planned formats.
Contributing
The grammar is located in src/grammar.ne. To modify it:
- Edit
src/grammar.ne - Run
pnpm build:grammarto compile - Run
pnpm testto verify
Contributions to expand grammar coverage are welcome!
License
MIT Copyright 2025 Bob Pritchett
Related Packages
- @edtf-ts/core - Core EDTF parser, comparison, and types
