excel-l10n
v0.2.1
Published
Configurable Excel (XLSX) extraction → segmentation → XLIFF/JSON export → merge tool for modern JS workflows
Maintainers
Readme
excel-l10n
A configurable Excel (XLSX) extraction → segmentation → XLIFF/JSON export → merge tool for modern JS workflows. Inspired by Okapi's OpenXML filter, with native support for multi-lingual target columns per sheet and rich Excel filter options via a simple JSON/YAML configuration.
Installation
# As a dependency in your project
npm install excel-l10n
# Or globally for CLI usage
npm install -g excel-l10nQuick Start
CLI Usage
# Extract to XLIFF (uses SRX segmentation if enabled in config)
excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf --src-lang en
# Extract to XLIFF 1.2 (default is 2.1)
excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf --src-lang en --xliff-version 1.2
# Extract to JSON
excel-l10n extract -c config.yml -i workbook.xlsx -o out.json --format json
# Merge translated file back
excel-l10n merge -c config.yml -i workbook.xlsx -t translated.xlf -o workbook.translated.xlsxProgrammatic Usage
import { parseConfig, extract, exportUnitsToXliff, parseTranslated, merge } from 'excel-l10n';
// Load configuration
const config = parseConfig('config.yml');
// Extract translatable content
const units = await extract('workbook.xlsx', config);
// Export to XLIFF
const xliff = await exportUnitsToXliff(units, config, { srcLang: 'en' });
// After translation, parse and merge back
const translated = parseTranslated(xliff, 'xlf');
await merge('workbook.xlsx', 'workbook.translated.xlsx', translated, config);CLI Commands
Advanced:
Per-locale XLIFF export (one file per target language):
excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf --per-locale # emits out.fr.xlf, out.de.xlf, ... (based on targetColumns in config)Single bilingual XLIFF with explicit target language:
excel-l10n extract -c config.yml -i workbook.xlsx -o out.fr.xlf --target-lang frMerge all translations in one run (auto-detect trgLang per file):
# input is a directory containing .xlf/.xliff/.json files excel-l10n merge -c config.yml -i workbook.xlsx -t ./translated/ -o workbook.merged.xlsx # or a comma-separated list excel-l10n merge -c config.yml -i workbook.xlsx -t out.fr.xlf,out.de.xlf -o workbook.merged.xlsx
Quick inline (no config file):
# extract with inline flags
excel-l10n extract -i in.xlsx --sheet "Sheet1" --source A --target fr=B,de=C -o out.xlf --src-lang enRun excel-l10n --help for details.
API Reference
Core Functions
// Configuration
parseConfig(pathOrObject: string | Config): Config
// Extraction
extract(xlsxPath: string, config: Config): Promise<TranslationUnit[]>
// Export
exportUnitsToXliff(units: TranslationUnit[], config: Config, options?: {
srcLang?: string;
trgLang?: string;
generator?: string;
}): Promise<string>
exportUnitsToJson(units: TranslationUnit[], config: Config, options?: {
fileName?: string;
}): Promise<string>
// Parsing
parseTranslated(content: string, format: 'xlf' | 'json'): TranslationUnit[]
// Merging
merge(
inputXlsxPath: string,
outputXlsxPath: string,
translatedUnits: TranslationUnit[],
config: Config
): Promise<void>Using as a Library
When using excel-l10n as a dependency in your project:
import { parseConfig, extract, exportUnitsToXliff, merge } from 'excel-l10n';
import path from 'path';
async function localizeWorkbook() {
// Option 1: Load config from file
const config = parseConfig('./localization-config.yml');
// Option 2: Create config programmatically
const config = {
workbook: {
sheets: [{
namePattern: 'Sheet1',
sourceColumns: ['B'],
targetColumns: { fr: 'C', de: 'D' },
html: { enabled: true },
headerRow: 1,
valuesStartRow: 2
}]
},
global: {
srcLang: 'en',
xliffVersion: '2.1'
}
};
// Extract
const units = await extract('./input.xlsx', config);
console.log(`Extracted ${units.length} translation units`);
// Export to XLIFF
const xliff = await exportUnitsToXliff(units, config, { srcLang: 'en' });
// ... send to translation service ...
// Parse translated XLIFF
const translated = parseTranslated(xliff, 'xlf');
// Merge back to Excel
await merge('./input.xlsx', './output.xlsx', translated, config);
}TypeScript Support
Full TypeScript definitions are included. Import types as needed:
import type { Config, TranslationUnit, Segment } from 'excel-l10n';Segmentation (SRX)
- SRX rules are supported via
segmentation.rules.srxPath(seeexamples/default_rules.srx). - If no matching rule is found for the locale, a pragmatic built-in sentence splitter is used.
- The locale is derived from
sheet.sourceLocaleorglobal.srcLang.
XLIFF version selection
- Set
global.xliffVersionto"1.2"or"2.1"(default:"2.1") to control XLIFF output format. - Use CLI flag
--xliff-version 1.2to override the config. - XLIFF 2.1 uses
<pc>elements for inline codes. - XLIFF 1.2 uses
<g>elements for inline codes. - Both versions are fully compatible with popular Translation Management Systems.
Placeholders and inline codes
- Configure
inlineCodeRegexesper sheet to detect non-translatable tokens (e.g.,{0},%s). - XLIFF export converts tokens to
<ph id="..."/>with a per-segment placeholder map preserved in a<note category="ph">JSON payload for roundtrip. - JSON export preserves a placeholder map under
unit.meta.placeholderswithout altering source text. - During merge, placeholder markers (e.g.,
[[ph:ph1]]in translated content) are rehydrated back into original tokens.
HTML inline tags
- When HTML content is detected in cells (e.g.,
<div>This is <b>bold</b> text</div>), inline tags are converted to XLIFF inline elements. - XLIFF 2.1: HTML tags like
<b>bold</b>become<pc id="1" dataRef="html_b">bold</pc> - XLIFF 1.2: HTML tags like
<b>bold</b>become<g id="1" ctype="bold">bold</g> - Inline elements are properly recognized and protected by Translation Management Systems.
- During merge, XLIFF inline elements are converted back to their original HTML tags.
XLIFF notes
If global.exportComments is true, XLIFF export includes extra <note> entries per unit:
category=header— the header cell text for the source column (headerRow).category=metadataRows— a JSON object of metadata row values for this column.category=comments— cell notes/comments iftranslateCommentsis enabled.
These notes help maintain roundtrip context (sheet/row/col are always included as a base note).
Style preservation
When preserveStyles is true:
- A minimal style snapshot (font name/size/bold/italic/color, alignment, fill color) is captured at extract time.
- During merge, the snapshot is reapplied to the target cell. If no snapshot exists, styles are copied from the source cell.
Rich text run-level formatting is not preserved in the MVP; this can be extended in future iterations.
Config highlights
- Sheet selection via
namePattern. sourceColumnsandtargetColumns(locale → column letter). Optional auto-create targets.- Row/column filtering:
headerRow,valuesStartRow,skipHiddenRows,skipHiddenColumns,excludedRows/Columns. - Color exclusion via
excludeColors. - Formula handling via
extractFormulaResults. - Merged regions policy via
treatMergedRegions(top-left | expand | skip). - Comments via
translateComments. - Notes export via
global.exportComments. - Merge fallback via
global.mergeFallback(default:source). When a segment lacks a<target>, choose to use its<source>or leave it empty (empty). - XLIFF version via
global.xliffVersion(default:2.1). Choose between XLIFF 1.2 and 2.1 output formats.
Example CLI flows
- Extract per-locale XLIFFs, then merge all at once:
excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf --per-locale
# Edit out.fr.xlf, out.de.xlf ...
excel-l10n merge -c config.yml -i workbook.xlsx -t ./outdir -o workbook.merged.xlsx- Extract a single bilingual XLIFF for French and merge only FR:
excel-l10n extract -c config.yml -i workbook.xlsx -o out.fr.xlf --target-lang fr
# Edit out.fr.xlf
excel-l10n merge -c config.yml -i workbook.xlsx -t out.fr.xlf -o workbook.fr.xlsx --target-lang frSee src/config/schema.json for the full JSON Schema.
Tests
- Unit tests cover config parsing, SRX segmentation, utilities and more.
- Integration testing can be added to validate end-to-end roundtrips (example scaffold included under
tests/).
Status
MVP implementation with SRX segmentation, placeholders, XLIFF notes, and style preservation. Further enhancements planned:
- Rich text run-level formatting support
- Streaming for very large workbooks
- Expanded Okapi option coverage
Pseudo-translation
Generate fake translations to test UI expansion and encoding.
# XLSX → XLSX pseudo
excel-l10n pseudo -c config.yml -i workbook.xlsx -o pseudo.xlsx --target-lang fr
# XLIFF → XLIFF pseudo
excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf
excel-l10n pseudo -t out.xlf -o out.pseudo.xlf --expand 0.3 --wrap "⟦,⟧"Behavior:
- Wrap text with markers (default ⟦ ⟧)
- Expand length by +30% (configurable)
- Replace characters with accented/uncommon variants
- Preserve placeholders:
{0},%s,[[ph:ph1]]
Validate translations
Automatically check translated XLIFF/JSON for common issues.
excel-l10n validate -t translated/ --json --length-factor 2.5Checks include:
- Missing targets
- Placeholder mismatches (
{0},[[ph:ph1]]) - Length warnings (ratio > factor)
- ICU categories preserved (plural/select)
Exit code 0 = OK; 1 = findings.
ICU handling (plural/select)
ICU plural/select blocks are protected during XLIFF export so structure is not accidentally broken. Inner texts are represented with placeholders to preserve logic while still surfacing translatable parts. Validation ensures ICU categories (e.g., one, other) are preserved between source and target.
Example:
{count, plural, one {1 file} other {# files}}Streaming mode (experimental)
For very large workbooks, you can enable streaming extraction.
excel-l10n extract -c config.yml -i huge.xlsx -o huge.xlf --streamNote: the current release exposes the streaming flag and API; subsequent versions will wire a true streaming reader under the hood for constant-memory processing.
License: Polyform Noncommercial 1.0.0 For personal and non-commercial use only. Commercial licensing inquiries: [email protected]
