bible-passage-reference-parser-languages
v0.1.0
Published
Language data files for bible-passage-reference-parser.
Maintainers
Readme
Language Data for the Bible Passage Reference Parser
The code in this repository is mostly AI-generated, though the source data is not.
This repository contains the YAML to Javascript build pipeline for Bible Passage Reference Parser language data.
Structure
data/_defaults.yaml: shared defaults for variables/options.data/*.yaml: per-language data (ISO 639-3 codes likeeng,zho).translation_systems/*.yaml: versification systems and translation aliases.src/: TypeScript source for building language output.bin/compile.sh: bundles TypeScript CLI tools insrc/tobin/*.js.bin/build_spec.js: builds localized-book Jasmine specs frombook_names/all/.bin/build_all_langs.js: builds all languages in parallel and runs specs for each language.lang/: generated output files (optional; regenerate as needed).book_names/all/: generated book-name lists used to build tests.book_names/preferred/: preferred display names (default + optional translation overrides).test/: generated localized-book specs.
Quick start
npm install
bin/compile.sh
node bin/build_lang.js engBuild a cross-language parser:
node bin/build_lang.js --cross eng spa --out eng_spa
node bin/build_spec.js eng_spaLoad a language module programmatically:
import load_language_code from "bible-passage-reference-parser-languages";
const lang = await load_language_code("eng");
// Reserved Windows code example:
const con = await load_language_code("con");Generate specs:
node bin/build_spec.js # all languages
node bin/build_spec.js eng # single languageBuild all languages:
Cross-language options:
- `--cross`: enable cross-language build mode.
- `--out <code>`: output language code (must not be 3 characters).
- `--merge-mode append|smart`: book merge mode (default `append`).
node bin/build_all_langs.js # parallel build + specs + tests
node bin/build_all_langs.js -j 4 # set worker count
node bin/build_all_langs.js --test-only # skip lang rebuild, build specs + run testsOutput format
The build outputs three JS classes:
bcv_regexpsbcv_translationsbcv_grammar_options_default
The output matches the expected format in Bible-Passage-Reference-Parser.
Creating a new language YAML
- Pick an existing language as a starting point (for example
data/eng.yaml) and copy it to a new ISO 639-3 code, likedata/isl.yaml. - Update the language file contents:
variables: text tokens and patterns used by the parser (titles, next, ff, etc.).options: language-specific parsing options; any missing options fall back todata/_defaults.yaml.books: list of Bible book names/abbreviations and regex-related data for the language.ordinalsandtranslationsare optional; include them if needed for the language.
- Build and verify:
bin/compile.sh
node bin/build_lang.js islNotes for new languages:
- The first language file passed to
build_langsets the primaryvariablesandoptions; additional languages are only used to mergebooks. data/_defaults.yamlprovides required defaults forvariablesandoptions, so you only need to override what differs.
YAML structure (data/*.yaml)
variables
Used to build the core grammar and separator patterns. Values can be:
- simple strings:
- "cap." - objects for fine control:
text: string valueregexp: raw regex (no escaping)regexp_after: appended raw regex after the text/regexpnormalize: noneto skip combining-character normalization for that item
Example:
variables:
and:
- text: a
regexp_after: (?!\p{L})
- vedi
to:
- "-"
- "a"options
Common options (see data/_defaults.yaml for full list):
normalize:combining_characters(default) ornonetrailing_dots_in_variables:optionaloras_isexpand_characters: array of{ character, expand: [ ... ] }to allow alternates anywhere that character appears in book names or variables. Example:expand_characters: - character: "'" expand: ["'", "’"]replace_characters_with:{ regexp, replacement }(default converts spaces to\s*)before_book_allowed_characters,after_book_allowed_characters: regex character classes used to enforce valid boundaries before/after a matched book name.before_every_book,after_every_book: regex patterns inserted immediately before/after every book match. Use these to add language-specific required prefixes/suffixes around all books (rare). These are applied in addition to the before/after allowed character boundaries.join_before,join_after: default join strings used when expandingbefore/afterbook patterns (for example the default space between an ordinal and the book name). Override to control whether the joiner is a space, empty string, punctuation, etc.
books
Each entry declares OSIS code(s) and the localized texts. Forms:
osis: "Gen"orosis: ["Jonah","Job"]osisobjects withbefore,after,joinfor numbered books:- osis: - osis: 1Sam before: *first - osis: 2Sam before: *second texts: - Samueltextscan be strings or objects withtextand optionalnormalize: none, which prevents diacritics and spacing from changing.
ordinals
Defines ordinal suffixes and optional Psalm handling:
ordinals:
- after: ["st"]
numbers: [1, 21, 31]
- after: ["nd"]
numbers: [2, 22, 32]
- between:
regexp: \s*
texts: ["Psalm"]book_names/all
node bin/build_lang.js <lang> writes book_names/all/<lang>.yaml, which is a normalized list of book texts used by bin/build_spec.js.
Names output collapses whitespace to single spaces and normalizes to NFC (combining-character variants are unified), but does not add extra variants.
book_names/preferred
book_names/preferred/<lang>.yaml documents preferred book names for display and UI. Each file has:
default: preferred names by OSIS code.translations(optional): translation-specific overrides (currently only ineng.yaml).
Example (book_names/preferred/eng.yaml):
default:
Gen:
long: Genesis
short: Gen
shorter: Ge
Ps:
long: Psalms
long_single: Psalm
short: Ps
shorter: Ps
translations:
niv:
Ps:
short_plural: Pss
1Sam:
shorter: 1SaKeys used in preferred names can include:
long: full namelong_single: singular form (e.g., Psalm vs Psalms)short: common short formshorter: shortest formshort_plural: translation-specific plural short form
tests
Optional per-language Jasmine tests can be added to data/<lang>.yaml (see pol.yaml for an example):
tests:
- text: "Rdz 1:1"
osis: "Gen.1.1"
- it: "should handle odd spacing"
text: "Rdz 1:1"
osis: "Gen.1.1"bin/build_spec.js will emit these into test/<lang>.spec.js. Entries with it get their own it(<label>) block; entries without it are grouped under it("should handle custom tests").
Notes
lang/can be regenerated at any time; it is not a source of truth.- Language codes are ISO 639-3 (e.g.,
eng,zho). - Windows-reserved 3-letter basenames (
con,prn,aux,nul) are stored on disk with a trailing underscore (for example, logical codeconmaps tocon_.yaml/con_.js/con_.spec.js).
ISO 639-2 to ISO 639-3 mapping
Mappings used by tooling and for compatibility with the older abbreviations used in Bible Passage Reference Parser:
| ISO 639-2 | ISO 639-3 | English name | | --- | --- | --- | | ar | ara | Arabic | | bg | bul | Bulgarian | | cs | ces | Czech | | cy | cym | Welsh | | da | dan | Danish | | de | deu | German | | el | grc | Greek | | en | eng | English | | es | spa | Spanish | | fa | fas | Persian | | fi | fin | Finnish | | fr | fra | French | | he | heb | Hebrew | | hi | hin | Hindi | | hr | hrv | Croatian | | ht | hat | Haitian Creole | | hu | hun | Hungarian | | id | ind | Indonesian | | is | isl | Icelandic | | it | ita | Italian | | ja | jpn | Japanese | | jv | jav | Javanese | | kn | kan | Kannada | | ko | kor | Korean | | la | lat | Latin | | lg | lug | Ganda | | mk | mkd | Macedonian | | mr | mar | Marathi | | ne | nep | Nepali | | nl | nld | Dutch | | no | nor | Norwegian | | ny | nya | Nyanja | | or | ori | Odia | | pa | pan | Punjabi | | pl | pol | Polish | | pt | por | Portuguese | | ro | ron | Romanian | | ru | rus | Russian | | sk | slk | Slovak | | sl | slv | Slovenian | | so | som | Somali | | sq | sqi | Albanian | | sr | srp | Serbian | | sv | swe | Swedish | | sw | swa | Swahili | | ta | tam | Tamil | | te | tel | Telugu | | th | tha | Thai | | tl | tgl | Tagalog | | tr | tur | Turkish | | uk | ukr | Ukrainian | | ur | urd | Urdu | | vi | vie | Vietnamese | | yo | yor | Yoruba | | zh | zho | Chinese |
Todo
- Move spec building process into the main build_lang script so that it's a one-step process.
- Add more translation-specific versification.
- Improve English translation representation.
Changelog
February 7, 2026. Add a loader function to handle file renaming for the "con" language so that this repo works on Windows.
January 31, 2026. Rework folder naming and include preferred book names from source data. Add an additional 2,100 languages from YouVersion data.
January 29, 2026. First release.
