gedcom555-token
v555.0.1
Published
Tokenizer for Gedcom 5.5.5
Downloads
9
Readme
Gedcom 5.5.5 Token
A tokenizer for 'Gedcom 5.5.5'.
Install
npm i gedcom555-token;
Usage
import {tokenizeFromString} from "gedcom555-token";
const tokenized = tokenizeFromString(`0 HEAD
1 GEDC
2 VERS 5.5.5
2 FORM LINEAGE-LINKED
3 VERS 5.5.5
1 CHAR UTF-8
1 SOUR gedcom.org
0 @U@ SUBM
1 NAME gedcom.org
0 TRLR`);
/*
[
{
level: 0,
tag: `HEAD`,
},
{
level: 1,
tag: `GEDC`,
},
{
level: 2,
tag: `VERS`,
lineItem: `5.5.5`,
},
{
level: 2,
tag: `FORM`,
lineItem: `LINEAGE-LINKED`,
},
{
level: 3,
tag: `VERS`,
lineItem: `5.5.5`,
},
{
level: 1,
tag: `CHAR`,
lineItem: `UTF-8`,
},
{
level: 1,
tag: `SOUR`,
lineItem: `gedcom.org`,
},
{
level: 0,
tag: `SUBM`,
xrefId: `@U@`,
},
{
level: 1,
tag: `NAME`,
lineItem: `gedcom.org`,
},
{
level: 0,
tag: `TRLR`,
},
]
*/
Line by line:
When required, the tokenizer can be called for a single line.
import {tokenize} from "gedcom555-token/dist/token";
const tokenized = tokenize(`0 head`);
/*
{
level: 0,
tag: `HEAD`
}
*/
Notes
- Does not check encoding. Assuming that the string is unicode.
- Checks for line terminator consistency.
- Checks tags against known list. Todo: Low Priority: Allow tag list extension.
- Checks line item form single "@" at signs.
- Does not check other grammar rules. These are left for the parser to implement.
- Gedcom 555 tags being case insensitive, tokenize converts them to upper case.
License
MIT