hybridtm
v0.1.0
Published
HybridTM is a Translation Memory (TM) engine that supports ngrams-based and vector-based matching.
Downloads
3
Readme
HybridTM
HybridTM is a TypeScript translation memory engine that stores bilingual content in LanceDB and scores matches by combining semantic embeddings (Xenova/Transformers.js) with the built-in MatchQuality fuzzy metric.
Highlights
- Imports XLIFF 2.x and TMX 1.4b files, preserving metadata, notes, and custom properties
- Generates semantic vectors with any Xenova-compatible text model (default:
HybridTM.QUALITY_MODEL, LaBSE) - Provides
semanticTranslationSearch,semanticSearch, andconcordanceSearchAPIs with metadata-aware filtering - Streams data into LanceDB through a JSONL-based batch importer to keep memory usage predictable
- Prevents duplicate segments by rewriting entries with deterministic IDs (
fileId:unitId:segmentIndex:lang)
Models download automatically the first time you initialize an instance and are cached in the standard Hugging Face directory.
Requirements
- Node.js 22 LTS or later
- npm 11+
- Disk space for both the LanceDB directory you choose and the embedding model cache
Installation
npm install hybridtmQuick start
import path from 'node:path';
import { HybridTM, HybridTMFactory, Utils } from 'hybridtm';
const INSTANCE_NAME = 'docs-basic';
const DB_PATH = path.resolve('.hybridtm', INSTANCE_NAME + '.lancedb');
function getOrCreateTM(): HybridTM {
return HybridTMFactory.getInstance(INSTANCE_NAME)
?? HybridTMFactory.createInstance(INSTANCE_NAME, DB_PATH, HybridTM.QUALITY_MODEL);
}
async function main(): Promise<void> {
const tm = getOrCreateTM();
const source = Utils.buildXMLElement('<source>Hello world</source>');
const target = Utils.buildXMLElement('<target>Hola mundo</target>');
await tm.storeLangEntry('demo', 'demo.xlf', 'unit1', 'en', 'Hello world', source, undefined, 1, 1, { state: 'final' });
await tm.storeLangEntry('demo', 'demo.xlf', 'unit1', 'es', 'Hola mundo', target, undefined, 1, 1, { state: 'final' });
const matches = await tm.semanticTranslationSearch('Hi world', 'en', 'es', 50, 5);
matches.forEach((match) => {
console.log('Hybrid', match.hybridScore(), 'Semantic', match.semantic, 'Fuzzy', match.fuzzy);
console.log('Source:', match.source.toString());
console.log('Target:', match.target.toString());
});
await tm.close();
}
main().catch((error) => {
console.error(error);
process.exit(1);
});Import XLIFF/TMX content at any time:
await tm.importXLIFF('./translations/project.xlf', { minState: 'reviewed' });
await tm.importTMX('./translations/legacy.tmx');semanticTranslationSearch automatically pairs every source hit with its matching target segment (same fileId, unitId, and segmentIndex), making the output ready for CAT integrations.
Documentation
Each guide is short and task-oriented, so you can jump directly to the workflow you need.
Runnable samples
The samples project contains three scripts (dev:basic, dev:import, dev:filters) plus miniature XLIFF/TMX fixtures.
When working on the repository:
npm install
npm run build
cd samples
npm install
npm run dev:basicIf you copy samples/ elsewhere, update samples/package.json so the hybridtm dependency points to the published version you intend to test, then run npm install.
Project layout
ts/– source files for the librarydist/– compiled JavaScript and declarations (npm run build)docs/– task-focused tutorials referenced abovesamples/– standalone TypeScript project with runnable workflowsmodels/– local cache for pre-downloaded Xenova models (optional)
Development
npm run build– compile TypeScript todist/node dist/tmxtest.jsandnode dist/xlifftest.js– regression checks for the TMX and XLIFF importers (run after building)
Contributions should include unit or integration coverage when you touch importer or search logic. Use HybridTMFactory.removeInstance(name) to clean up any throwaway databases you create during manual tests.
License
Eclipse Public License 1.0 — see LICENSE for details.
