twl-linker
v1.0.9
Published
Biblical Semantic Linker - Uses the biblical context database to create semantic links between USFM Bible text and biblical articles with confidence scoring.
Downloads
48
Readme
TWL Linker - Biblical Semantic Linker
A tool that automatically creates semantic links between USFM Bible text and biblical articles using a context database. This tool generates Translation Word List (TWL) files in TSV format with confidence scoring and disambiguation.
Available as:
- Global CLI tool: Install with
npm install -g twl-linker, use withtwl-linker <file> - NPM Package: Install with
npm install twl-linkerfor React.js/Node.js projects - Local development: Clone and run with
node twl-linker.js <file>
Features
- Semantic Matching: Intelligent text analysis to find biblical terms in USFM files
- Confidence Scoring: Each match includes a confidence score based on context analysis
- Disambiguation: Automatic disambiguation of ambiguous terms with fallback to manual review
- Batch Processing: Process multiple books at once
- Flexible Output: Customizable output file naming and locations
- Alignment Data Handling: Automatically removes USFM alignment markers to process clean text
- Built-in Database: No separate database setup required for CLI or package usage
Prerequisites
- Python 3.7+ (for building the context database)
- Node.js 12+ (for running the linker)
- Access to the
en_twrepository with biblical term definitions (should be in../en_tw/relative to this project) - Access to the
en_ultrepository with USFM Bible files (should be in../en_ult/relative to this project)
Setup
1. Install Python Dependencies
pip install -r requirements.txt2. Build the Biblical Context Database
The context database needs to be built before running the linker. This script reads biblical term definitions from the en_tw repository:
python build_biblical_context_database.pyNote: This script expects the en_tw repository to be located at ../en_tw/ relative to this project directory. The script will process definitions from ../en_tw/bible/ which should contain the kt/, names/, and other/ subdirectories with biblical term definitions.
This will create biblical_context_database.json which contains processed biblical term definitions, variants, and disambiguation rules.
Usage
Installation Options
| Usage Type | Installation | Command | Use Case |
| --------------------- | --------------------------- | ------------------------------------------ | ------------------------- |
| Global CLI | npm install -g twl-linker | twl-linker input.usfm | Command-line processing |
| NPM Package | npm install twl-linker | import { generateTWL } from 'twl-linker' | React.js/Node.js apps |
| Local Development | git clone <repo> | node twl-linker.js input.usfm | Development/customization |
Option 1: Global Installation (Recommended for CLI usage)
npm install -g twl-linkerOption 2: Local Installation (For development or package usage)
git clone <this-repository>
cd twl-linker
npm install # if you add dependencies laterOption 3: NPM Package (For React.js/Node.js projects)
npm install twl-linkerCommand Line Interface (CLI)
Global CLI Usage (after npm install -g twl-linker)
Process a single USFM file using the global command:
twl-linker <input_file> [output_file]Examples:
# Input: 01-GEN.usfm → Output: twl_GEN.tsv (auto-generated)
twl-linker ../en_ult/01-GEN.usfm
# Input: test.usfm → Output: test.tsv (auto-generated)
twl-linker test.usfm
# Custom output file
twl-linker ../en_ult/46-ROM.usfm my_output.tsvLocal CLI Usage (for development)
If you're working with the source code locally:
node cli.js <input_file> [output_file]Examples:
# Input: 01-GEN.usfm → Output: twl_GEN.tsv (auto-generated)
node cli.js ../en_ult/01-GEN.usfm
# Input: test.usfm → Output: test.tsv (auto-generated)
node cli.js test.usfm
# Custom output file
node cli.js ../en_ult/46-ROM.usfm my_output.tsvOutput File Naming Rules (applies to both CLI methods):
- Files starting with number and dash (e.g.,
01-GEN.usfm) →twl_GEN.tsv - Other files (e.g.,
test.usfm) →test.tsv - If you specify an output file, it's used exactly as given
Note: The global CLI command (twl-linker) includes the built-in biblical context database, so no separate database setup is required after installation.
Batch Processing
Process all USFM files in a directory (currently requires local installation):
node process_all_books.js <input_directory> [output_directory]Examples:
# Process files from ../en_ult and output to current directory
node process_all_books.js ../en_ult .
# Process files and output to same directory as input
node process_all_books.js ../en_ult
# Process files and output to a different directory
node process_all_books.js ../en_ult ./output_folderOutput Format
The generated TSV files contain the following columns:
| Column | Description | | -------------- | ------------------------------------------------------- | | Reference | Chapter:verse reference (e.g., "1:1") | | ID | Unique 4-character hexadecimal ID | | Tags | Category of the biblical term (kt, names, other) | | OrigWords | The original word(s) found in the text | | Occurrence | Occurrence number of this term in the verse | | TWLink | Resource link to the translation word article | | Confidence | Confidence score (0.1-1.0) | | Match_Type | Type of match (exact, morphological, theological, etc.) | | Context | Surrounding text context | | Disambiguation | Disambiguation method used |
Understanding the Output
Confidence Scores
- 0.8-1.0: High confidence matches
- 0.6-0.79: Medium confidence matches
- 0.5-0.59: Lower confidence matches (review recommended)
Match Types
- exact: Direct term match from cleaned terms
- morphological: Match using word variants (plurals, etc.)
- theological: Match using theological variants
- disambiguated: Automatically disambiguated ambiguous term
- ambiguous: Ambiguous term requiring manual review
Disambiguation
- single: Unambiguous term with single meaning
- auto:X.XX: Automatically disambiguated (score shown)
- manual:option1 (alternatives): Manual review needed with options listed
File Structure
twl-linker/
├── build_biblical_context_database.py # Context database builder
├── twl-linker.js # Main semantic linker
├── process_all_books.js # Batch processor
├── usfm-alignment-remover.js # USFM alignment data remover
├── biblical_context_database.json # Generated context database
├── requirements.txt # Python dependencies
└── README.md # This file
../en_tw/ # Biblical term definitions (separate repo)
└── bible/ # Biblical term definitions
├── kt/ # Key terms
├── names/ # Biblical names
└── other/ # Other terms
../en_ult/ # Aligned USFM files (separate repo)
├── 01-GEN.usfm
├── 02-EXO.usfm
└── ...Examples
Processing a Single Book
# Process Genesis
node twl-linker.js ../en_ult/01-GEN.usfm
# Output: twl_GEN.tsv with semantic linksProcessing All Books
Examples:
# Process all books in ../en_ult directory, output to current directory
node process_all_books.js ../en_ult ./output
# This will create:
# ./output/twl_GEN.tsv, ./output/twl_EXO.tsv, ./output/twl_LEV.tsv, etc.Troubleshooting
Common Issues
"Context database not found"
- Run
python build_biblical_context_database.pyfirst
- Run
"No USFM files found"
- Check that the input directory contains
.usfmfiles - Verify the directory path is correct
- Check that the input directory contains
"Output directory does not exist"
- Create the output directory before running batch processing
- Or use
.to output to current directory
"No such file or directory: '../en_tw/bible'"
- Ensure the
en_twrepository is cloned at../en_tw/relative to this project - The directory structure should be:
../en_tw/bible/kt/,../en_tw/bible/names/,../en_tw/bible/other/ - If the
en_twrepository is in a different location, you can modify the path inbuild_biblical_context_database.py
- Ensure the
Performance Tips
- The context database is loaded once per batch operation for efficiency
- Large books (like Psalms) may take longer to process
- High numbers of ambiguous terms may require manual review
Development
Adding New Terms
- Add term definitions to the appropriate directory in
../en_tw/bible/ - Rebuild the context database:
python build_biblical_context_database.py - Test with sample texts
Customizing Disambiguation
Edit the disambiguation rules in twl-linker.js in the disambiguateAmbiguousTerm function.
License
This project processes biblical text and translation resources. Please ensure compliance with the licenses of the source materials.
NPM Package Usage
This package can also be used as an npm module in React.js applications or other Node.js projects. Version 1.0.1+ uses ES6 modules for better compatibility with modern bundlers.
Installation as Package
npm install twl-linkerNote: If you need CommonJS support, use version 1.0.0: npm install [email protected]
Usage in React.js/Node.js
import { generateTWL, contextDatabase } from 'twl-linker';
// Generate TWL from USFM content
const usfmContent = `\\c 1
\\v 1 In the beginning God created the heaven and the earth.`;
const tsvOutput = generateTWL(usfmContent);
console.log(tsvOutput);
// Access the biblical context database if needed
console.log('Database metadata:', contextDatabase.metadata);
console.log('Total articles:', contextDatabase.metadata.total_articles);
// Access specific articles
const godArticle = contextDatabase.articles['god.md'];
console.log('God article:', godArticle);ES6 Import (React.js)
import { generateTWL, contextDatabase } from 'twl-linker';
function MyComponent() {
const handleGenerateTWL = (usfmText) => {
const result = generateTWL(usfmText);
// Process the TSV result
return result;
};
const getArticleInfo = (articleName) => {
const article = contextDatabase.articles[`${articleName}.md`];
return article;
};
return (
// Your component JSX
);
}API Reference for Package Usage
generateTWL(usfmContent)
Generates Translation Words Links (TWL) from USFM content using the built-in biblical context database.
- Parameters:
usfmContent(string): The USFM text content to process
- Returns: String containing TSV format with columns: Reference, ID, Tags, OrigWords, Occurrence, TWLink, Confidence, Match_Type, Context, Disambiguation
contextDatabase
The complete biblical context database object containing:
metadata: Statistics about the database (total articles, ambiguous terms, categories)articles: Object containing all biblical articles indexed by filenameambiguous_terms: Object mapping ambiguous terms to possible articles
Example accessing database:
// Get all articles in the 'kt' (key terms) category
const ktArticles = Object.entries(contextDatabase.articles)
.filter(([filename, article]) => article.category === 'kt')
.map(([filename, article]) => ({ filename, ...article }));
// Get ambiguous terms that need disambiguation
const ambiguousTerms = contextDatabase.ambiguous_terms;
console.log('Ambiguous terms:', Object.keys(ambiguousTerms));