rfc-bcp47
v1.5.0
Published
Zero-dependency BCP 47 language tag parser, normalizer and matcher for JavaScript and TypeScript
Maintainers
Readme
Built with AI, end to end. Every line in this repo — source, tests, docs, release scripts — is generated, reviewed, and verified through Claude Code.
- Strict conventions enforced by the agent.
.claude/rules/pin TypeScript strict mode,Array<T>,readonlyby default, noany, semantic naming (nodata/result/item), and 18 testing rules — applied on every change without manual review.- Bundled Claude Code skills (
commit,verify,review,release,testing) that drive the full lifecycle: lint → build → test → conventional commit → tag-triggered release with SLSA provenance via OIDC trusted publishing — powered bynpm-trust(the/solo-npm:releaseskill itself comes from thesolo-npmmarketplace plugin).- Co-located Vitest specs for every operator (
parse,canonicalize,filter,lookup,extensionU/extensionT,acceptLanguage). The agent writes the failing test first, then makes it pass —pnpm testis the regression net, not a checkbox.- PRs disabled by design. Contributors can't push code directly; the maintainer takes issues and discussions through Claude Code. See
CONTRIBUTING.mdfor the full process.
- Parse any BCP 47 language tag into a structured, typed object
- Stringify a tag object back into a well-formed language tag string
- Canonicalize with case normalization and IANA registry data (deprecated subtags, suppress-script, extlang)
- Match language tags with
filterandlookupper RFC 4647 - Extension U/T extraction for Unicode locales (RFC 6067) and transformed content (RFC 6497)
- Accept-Language header parsing per RFC 9110
- WCAG-ready — use
parse()to validatelangattributes per WCAG 2.x SC 3.1.1 and SC 3.1.2 - TypeScript-first with full type inference and strict types out of the box
- Zero dependencies, tree-shakeable, works in Node.js and browsers
Install
npm install rfc-bcp47Operators
Tree-shakeable operators — import only what you need.
parse / stringify
import { parse, stringify } from 'rfc-bcp47';
const tag = parse('en-Latn-US');
if (tag?.type === 'langtag') {
tag.langtag.language // 'en'
tag.langtag.script // 'Latn'
tag.langtag.region // 'US'
}
stringify(tag!); // 'en-Latn-US'
parse('invalid!'); // nullparse returns one of three tag types or null for invalid input:
| type | When | Fields available |
|--------|------|-----------------|
| 'langtag' | Standard language tags (en-US, zh-Hant-TW) | langtag.language, langtag.script, langtag.region, langtag.extlang, langtag.variant, langtag.extension, langtag.privateuse |
| 'privateuse' | Private use tags (x-custom) | privateuse |
| 'grandfathered' | Legacy registered tags (i-klingon) | grandfathered.type, grandfathered.tag |
langtag
Build a tag from known parts without parsing a string. Validates subtags and throws RangeError on invalid input:
import { langtag, stringify } from 'rfc-bcp47';
const tag = langtag('en', { region: 'US' });
stringify(tag); // 'en-US'
langtag('!!!'); // RangeError — invalid languagecanonicalize
Reduce equivalent tags to a single canonical form — handles case normalization, deprecated subtags, suppress-script, extlang promotion, and extension ordering using IANA registry data:
import { canonicalize } from 'rfc-bcp47';
canonicalize('iw'); // 'he' (deprecated language)
canonicalize('zh-cmn'); // 'cmn' (extlang to preferred)
canonicalize('en-Latn'); // 'en' (suppress-script)
canonicalize('de-DD'); // 'de-DE' (deprecated region)filter
Find all matching tags with subtag-aware filtering per RFC 4647 §3.3.2:
import { filter } from 'rfc-bcp47';
const tags = ['de', 'de-DE', 'de-Latn-DE', 'de-AT', 'en-US', 'fr-FR'];
filter(tags, 'de-DE'); // ['de-DE', 'de-Latn-DE'] (skips Latn to match DE)
filter(tags, 'de'); // ['de', 'de-DE', 'de-Latn-DE', 'de-AT'] (all German)
filter(tags, '*-DE'); // ['de-DE', 'de-Latn-DE'] (* wildcard = any language)lookup
Find the single best match via progressive truncation per RFC 4647 §3.4:
import { lookup } from 'rfc-bcp47';
const tags = ['en', 'en-US', 'fr', 'de'];
lookup(tags, 'en-US-x-custom'); // 'en-US' (truncates to match)
lookup(tags, 'fr-CA'); // 'fr' (truncates to match)
lookup(tags, 'ja', 'en'); // 'en' (default fallback)Pair with acceptLanguage for HTTP content negotiation:
import { acceptLanguage, lookup } from 'rfc-bcp47';
const prefs = acceptLanguage('fr-CA, en-US;q=0.8, en;q=0.5');
const best = lookup(['en', 'en-US', 'fr', 'fr-CA'], prefs.map((p) => p.tag));
// 'fr-CA'extensionU / extensionT
Extract Unicode locale and transformed content extensions:
import { parse, extensionU, extensionT } from 'rfc-bcp47';
extensionU(parse('de-DE-u-co-phonebk-ca-buddhist')!);
// { attributes: [], keywords: { co: 'phonebk', ca: 'buddhist' } }
extensionT(parse('und-t-it-m0-ungegn')!);
// { source: 'it', fields: { m0: 'ungegn' } }acceptLanguage
Parse HTTP Accept-Language headers:
import { acceptLanguage } from 'rfc-bcp47';
acceptLanguage('fr-CA, en-US;q=0.8, en;q=0.5, *;q=0.1');
// [
// { tag: 'fr-CA', quality: 1.0 },
// { tag: 'en-US', quality: 0.8 },
// { tag: 'en', quality: 0.5 },
// { tag: '*', quality: 0.1 }
// ]Try the interactive demo or see the examples/ folder for more usage patterns.
Operator Reference
| Operator | Description |
|----------|-------------|
| parse(tag) | Parse a BCP 47 tag string into a structured object. Returns null for invalid input. |
| stringify(tag) | Convert a parsed tag object back into a well-formed string. |
| langtag(language, options?) | Create a langtag object with sensible defaults. Throws RangeError on invalid input. |
| canonicalize(tag) | Normalize casing, sort extensions, apply IANA registry mappings (deprecated subtags, suppress-script, extlang). Returns null for invalid input. |
| filter(tags, patterns) | Subtag-aware filtering with * wildcard support per RFC 4647 §3.3.2. Returns matched tags. |
| lookup(tags, preferences, defaultValue?) | Lookup per RFC 4647 §3.4. Returns first match or defaultValue/null. |
| extensionU(tag) | Extract Unicode locale attributes and keywords from the u extension. Takes a BCP47Tag (not a string). Returns null if absent. |
| extensionT(tag) | Extract transformed content data from the t extension. Takes a BCP47Tag (not a string). Returns null if absent. |
| acceptLanguage(header) | Parse an Accept-Language header into sorted { tag, quality } entries. |
CLDR Key References
Typed constants mapping extension keys to human-readable descriptions, sourced from the CLDR BCP 47 data. Zero runtime cost — tree-shaken if unused.
| Constant | Description |
|----------|-------------|
| UNICODE_LOCALE_KEYS | U extension keys → descriptions (e.g. ca → 'Calendar', nu → 'Numbering system') |
| TRANSFORM_KEYS | T extension keys → descriptions (e.g. m0 → 'Transform mechanism', s0 → 'Transform source') |
Choosing an Operator
| I want to... | Use |
|--------------|-----|
| Validate a language tag string | parse(tag) !== null |
| Read subtags (language, script, region) | parse(tag) → access .langtag.* |
| Build a tag from known parts | langtag(language, options) → stringify(tag) |
| Normalize casing and deprecated subtags | canonicalize(tag) |
| Read Unicode locale preferences (calendar, collation) | parse(tag) → extensionU(parsedTag) |
| Read transformed content metadata (source language) | parse(tag) → extensionT(parsedTag) |
| Find all locales matching a preference | filter(tags, patterns) |
| Pick the single best locale for a user | lookup(tags, preferences, defaultValue) |
| Parse an HTTP Accept-Language header | acceptLanguage(header) → lookup() or filter() |
Note:
canonicalizeandacceptLanguagetake strings.extensionUandextensionTtake a pre-parsedBCP47Tagfromparse(). This avoids re-parsing when you need multiple operations on the same tag.
Changelog
See CHANGELOG.md for breaking changes and release history.
