rfc-bcp47

v1.5.0

Published

14 days ago

Zero-dependency BCP 47 language tag parser, normalizer and matcher for JavaScript and TypeScript

0High
0Medium
0Low

bcp47 parser internationalization i18n language locale language-tag ietf rfc4647 rfc5646 normalize match filter lookup accept-language wcag

Built with AI, end to end. Every line in this repo — source, tests, docs, release scripts — is generated, reviewed, and verified through Claude Code.
Strict conventions enforced by the agent. .claude/rules/ pin TypeScript strict mode, Array<T>, readonly by default, no any, semantic naming (no data / result / item), and 18 testing rules — applied on every change without manual review.
Bundled Claude Code skills (commit, verify, review, release, testing) that drive the full lifecycle: lint → build → test → conventional commit → tag-triggered release with SLSA provenance via OIDC trusted publishing — powered by npm-trust (the /solo-npm:release skill itself comes from the solo-npm marketplace plugin).
Co-located Vitest specs for every operator (parse, canonicalize, filter, lookup, extensionU / extensionT, acceptLanguage). The agent writes the failing test first, then makes it pass — pnpm test is the regression net, not a checkbox.
PRs disabled by design. Contributors can't push code directly; the maintainer takes issues and discussions through Claude Code. See CONTRIBUTING.md for the full process.

Parse any BCP 47 language tag into a structured, typed object
Stringify a tag object back into a well-formed language tag string
Canonicalize with case normalization and IANA registry data (deprecated subtags, suppress-script, extlang)
Match language tags with filter and lookup per RFC 4647
Extension U/T extraction for Unicode locales (RFC 6067) and transformed content (RFC 6497)
Accept-Language header parsing per RFC 9110
WCAG-ready — use parse() to validate lang attributes per WCAG 2.x SC 3.1.1 and SC 3.1.2
TypeScript-first with full type inference and strict types out of the box
Zero dependencies, tree-shakeable, works in Node.js and browsers

Install

npm install rfc-bcp47

Operators

Tree-shakeable operators — import only what you need.

parse / stringify

import { parse, stringify } from 'rfc-bcp47';

const tag = parse('en-Latn-US');

if (tag?.type === 'langtag') {
  tag.langtag.language  // 'en'
  tag.langtag.script    // 'Latn'
  tag.langtag.region    // 'US'
}

stringify(tag!);  // 'en-Latn-US'

parse('invalid!');  // null

parse returns one of three tag types or null for invalid input:

| type | When | Fields available | |--------|------|-----------------| | 'langtag' | Standard language tags (en-US, zh-Hant-TW) | langtag.language, langtag.script, langtag.region, langtag.extlang, langtag.variant, langtag.extension, langtag.privateuse | | 'privateuse' | Private use tags (x-custom) | privateuse | | 'grandfathered' | Legacy registered tags (i-klingon) | grandfathered.type, grandfathered.tag |

langtag

Build a tag from known parts without parsing a string. Validates subtags and throws RangeError on invalid input:

import { langtag, stringify } from 'rfc-bcp47';

const tag = langtag('en', { region: 'US' });
stringify(tag);  // 'en-US'

langtag('!!!');  // RangeError — invalid language

canonicalize

Reduce equivalent tags to a single canonical form — handles case normalization, deprecated subtags, suppress-script, extlang promotion, and extension ordering using IANA registry data:

import { canonicalize } from 'rfc-bcp47';

canonicalize('iw');          // 'he'       (deprecated language)
canonicalize('zh-cmn');      // 'cmn'      (extlang to preferred)
canonicalize('en-Latn');     // 'en'       (suppress-script)
canonicalize('de-DD');       // 'de-DE'    (deprecated region)

filter

Find all matching tags with subtag-aware filtering per RFC 4647 §3.3.2:

import { filter } from 'rfc-bcp47';

const tags = ['de', 'de-DE', 'de-Latn-DE', 'de-AT', 'en-US', 'fr-FR'];

filter(tags, 'de-DE');   // ['de-DE', 'de-Latn-DE']  (skips Latn to match DE)
filter(tags, 'de');      // ['de', 'de-DE', 'de-Latn-DE', 'de-AT']  (all German)
filter(tags, '*-DE');    // ['de-DE', 'de-Latn-DE']  (* wildcard = any language)

lookup

Find the single best match via progressive truncation per RFC 4647 §3.4:

import { lookup } from 'rfc-bcp47';

const tags = ['en', 'en-US', 'fr', 'de'];

lookup(tags, 'en-US-x-custom');  // 'en-US' (truncates to match)
lookup(tags, 'fr-CA');           // 'fr'    (truncates to match)
lookup(tags, 'ja', 'en');        // 'en'    (default fallback)

Pair with acceptLanguage for HTTP content negotiation:

import { acceptLanguage, lookup } from 'rfc-bcp47';

const prefs = acceptLanguage('fr-CA, en-US;q=0.8, en;q=0.5');
const best = lookup(['en', 'en-US', 'fr', 'fr-CA'], prefs.map((p) => p.tag));
// 'fr-CA'

extensionU / extensionT

Extract Unicode locale and transformed content extensions:

import { parse, extensionU, extensionT } from 'rfc-bcp47';

extensionU(parse('de-DE-u-co-phonebk-ca-buddhist')!);
// { attributes: [], keywords: { co: 'phonebk', ca: 'buddhist' } }

extensionT(parse('und-t-it-m0-ungegn')!);
// { source: 'it', fields: { m0: 'ungegn' } }

acceptLanguage

Parse HTTP Accept-Language headers:

import { acceptLanguage } from 'rfc-bcp47';

acceptLanguage('fr-CA, en-US;q=0.8, en;q=0.5, *;q=0.1');
// [
//   { tag: 'fr-CA', quality: 1.0 },
//   { tag: 'en-US', quality: 0.8 },
//   { tag: 'en',    quality: 0.5 },
//   { tag: '*',     quality: 0.1 }
// ]

Try the interactive demo or see the examples/ folder for more usage patterns.

Operator Reference

| Operator | Description | |----------|-------------| | parse(tag) | Parse a BCP 47 tag string into a structured object. Returns null for invalid input. | | stringify(tag) | Convert a parsed tag object back into a well-formed string. | | langtag(language, options?) | Create a langtag object with sensible defaults. Throws RangeError on invalid input. | | canonicalize(tag) | Normalize casing, sort extensions, apply IANA registry mappings (deprecated subtags, suppress-script, extlang). Returns null for invalid input. | | filter(tags, patterns) | Subtag-aware filtering with * wildcard support per RFC 4647 §3.3.2. Returns matched tags. | | lookup(tags, preferences, defaultValue?) | Lookup per RFC 4647 §3.4. Returns first match or defaultValue/null. | | extensionU(tag) | Extract Unicode locale attributes and keywords from the u extension. Takes a BCP47Tag (not a string). Returns null if absent. | | extensionT(tag) | Extract transformed content data from the t extension. Takes a BCP47Tag (not a string). Returns null if absent. | | acceptLanguage(header) | Parse an Accept-Language header into sorted { tag, quality } entries. |

CLDR Key References

Typed constants mapping extension keys to human-readable descriptions, sourced from the CLDR BCP 47 data. Zero runtime cost — tree-shaken if unused.

| Constant | Description | |----------|-------------| | UNICODE_LOCALE_KEYS | U extension keys → descriptions (e.g. ca → 'Calendar', nu → 'Numbering system') | | TRANSFORM_KEYS | T extension keys → descriptions (e.g. m0 → 'Transform mechanism', s0 → 'Transform source') |

Choosing an Operator

| I want to... | Use | |--------------|-----| | Validate a language tag string | parse(tag) !== null | | Read subtags (language, script, region) | parse(tag) → access .langtag.* | | Build a tag from known parts | langtag(language, options) → stringify(tag) | | Normalize casing and deprecated subtags | canonicalize(tag) | | Read Unicode locale preferences (calendar, collation) | parse(tag) → extensionU(parsedTag) | | Read transformed content metadata (source language) | parse(tag) → extensionT(parsedTag) | | Find all locales matching a preference | filter(tags, patterns) | | Pick the single best locale for a user | lookup(tags, preferences, defaultValue) | | Parse an HTTP Accept-Language header | acceptLanguage(header) → lookup() or filter() |

Note: canonicalize and acceptLanguage take strings. extensionU and extensionT take a pre-parsed BCP47Tag from parse(). This avoids re-parsing when you need multiple operations on the same tag.

Changelog

See CHANGELOG.md for breaking changes and release history.