guessit-js
v4.0.0
Published
Extract metadata (title, year, season, episode, codec, language, etc.) from media filenames — TypeScript port of Python guessit
Downloads
799
Maintainers
Readme
guessit-js
Extract metadata (title, year, season, episode, codec, language, etc.) from media filenames — TypeScript port of Python guessit.
Features
- 100% compatibility with Python guessit (1036/1036 fixtures passing) — and more correct in places: ships fixes for 32 upstream guessit bugs that Python still has (see Differences from Python)
- 3.5x faster than Python (6.87ms vs 23.86ms per parse)
- 50 properties detected: title, year, season, episode, resolution, codec, language, and more — with a machine-readable schema
- Single dependency (rebulk-js)
- Dual format: ESM and CommonJS
- TypeScript: full, precise type definitions (typed
GuessItResult, enum'd value fields) - WASM: runs in any WASI-compatible runtime, bit-identical to the JS build
Install
npm install guessit-jsUsage
import { guessit } from 'guessit-js';
const result = guessit('The.Dark.Knight.2008.1080p.BluRay.x264-GROUP.mkv');
// {
// title: 'The Dark Knight',
// year: 2008,
// screen_size: '1080p',
// source: 'Blu-ray',
// video_codec: 'H.264',
// release_group: 'GROUP',
// container: 'mkv',
// type: 'movie'
// }
// Short titles are handled correctly too:
guessit('X2.2003.720p.DSNP.WEB-DL.DDP5.1.H.264-EVO.mkv');
// { title: 'X2', year: 2003, screen_size: '720p',
// streaming_service: 'Disney+', source: 'Web', ... }CommonJS
const { guessit } = require('guessit-js');
const result = guessit('Breaking.Bad.S01E02.720p.BluRay.x264-DEMAND.mkv');
console.log(result.title); // 'Breaking Bad'
console.log(result.season); // 1
console.log(result.episode); // 2Options
guessit('file.mkv', { type: 'episode' });
guessit('my 720p show S01E02', { expected_title: ['my 720p show'] });
guessit('file.mkv', { allowed_languages: ['en', 'fr'] });
guessit('file.mkv', { excludes: ['release_group'] });Detected Properties
| Category | Properties |
|----------|-----------|
| Title | title, alternative_title, episode_title |
| Episode | season, episode, episode_details, episode_count, season_count, absolute_episode, disc, part |
| Date | year, date |
| Video | screen_size, aspect_ratio, frame_rate, video_codec, video_profile, color_depth |
| Audio | audio_codec, audio_profile, audio_channels, audio_bit_rate |
| Source | source, streaming_service |
| Release | release_group, edition, other, proper_count |
| File | container (video / subtitle / archive / image / nfo / torrent / nzb), mimetype, size, crc32, uuid |
| Metadata | language, subtitle_language, country, type |
Output schema
The result is fully typed — import GuessItResult for autocomplete and type-checking:
import { guessit, properties, GUESSIT_SCHEMA, type GuessItResult } from 'guessit-js';
const r: GuessItResult = guessit('The.Dark.Knight.2008.1080p.BluRay.x264-GRP.mkv');
r.source; // typed as the closed enum: "Blu-ray" | "Web" | "HDTV" | …
properties(); // { source: ["Blu-ray", "Web", …], type: ["episode","movie"], … } for all 50 properties
GUESSIT_SCHEMA.source.enum; // the allowed values, programmaticallyproperties()— returns every emittable property with its possible values (value-constrained props list their full enum; free/computed props list[null]), mirroring Python guessit'sproperties().GUESSIT_SCHEMA— the machine-readable schema (type, cardinality, enum) for all properties.docs/output-schema.json— a JSON Schema (draft-07) of the output, for validating results or generating clients in other languages.
Regenerate the schema (after parsing changes) with npm run schema. A test (test/schema.test.ts) guarantees it never goes stale — every value emitted across the corpus must be in the schema.
REST API
npm start # port 3847
curl "http://localhost:3847/api/guessit?filename=Movie.2024.1080p.mkv"WASM
For non-JS environments (Rust, Go, C++, edge compute). Uses Javy (QuickJS → WASM).
npm run wasm
echo '{"filename":"Movie.2024.1080p.mkv"}' | wasmtime wasm/guessit.wasmThe WASM build is bit-identical to the JS build across the entire test corpus
(1026/1026, including accented titles) — verified by test/wasm-full.test.ts.
Differences from Python guessit
guessit-js is a faithful port (1036/1036 fixtures match Python 3.8.0), but it is not bug-for-bug identical — where Python has a genuine parsing bug, guessit-js is corrected. Highlights:
- 32 upstream guessit bugs fixed that Python still gets wrong — e.g.
Us.2019(title vs countryUS),The.Collector(title vs edition),X2.2003…(short title),grown-ish…[eztv](hyphenated title split),cd-matching mid-hash, and source/codec/extension tokens leaking intorelease_group/title. Full ledger:docs/upstream-issues.md. - More properties / better detection:
imdb_id/tmdb_id/tvdb_id,volume, archive & image containers (.rar/.7z/.jpg…), artwork classification (poster/fanart→other), VR / Opening-Ending credits, month-name dates, CJK season/episode markers, and detection of Telugu/Spanish that Python misses. - Typed, schema-described output: a precise
GuessItResultinterface, a completeproperties()(Python's is partial), and a JSON Schema. - No Python runtime: zero runtime dependencies, ESM + CJS + WASM, ~3.5x faster.
- Intentional divergences (cases where guessit-js is more correct than
Python) are catalogued per-example in
docs/python-parity.md.
Known issues
- Music files are not supported (#599):
Artist - Album/01 Track.flacis parsed with the video vocabulary (title / alternative_title), notartist/album/track. Out of scope for now. - No composite
qualityfield (#802):screen_size,sourceandvideo_codecare returned separately, not combined into one string (the request is underspecified). - A few ambiguous anime conventions remain
(#690/#696/#747):
e.g.
Re ZERO …- Season 2 - 15, romaji +(English title)— no unambiguous correct parse. - 12 debatable cases vs Python (neither clearly right) are listed under
"② NEUTRAL" in
docs/python-parity.md.
Performance
Warm per-parse, measured on one machine (absolute numbers are hardware-dependent — the live demo times it in your browser):
| Runtime | ms/parse | Notes | |---------|----------|-------| | Browser (V8) | ~2–3 ms | fastest; JIT-compiled | | Node.js 22 (V8) | ~4.4 ms | ~3.5× faster than Python (same machine) — recommended for servers/CLI | | Python 3.8 | ~15.5 ms | reference | | WASM (QuickJS/Javy) | ~35 ms warm · ~150 ms cold | for portability, not speed (see below) |
About the WASM build. It exists so you can run guessit in environments without a JS engine (Rust, Go, C/C++, edge/WASI runtimes). Javy compiles the bundle to QuickJS, which is an interpreter (no JIT), so per-parse compute (~35 ms) is slower than V8 and even than Python — that's inherent to the engine, not the code. If you want speed, use the Node/browser build (V8). If you need WASM, the levers that actually help:
- Amortize startup — instantiate the module once and parse many filenames; most of the ~150 ms single-shot cost is wasmtime + module init, not parsing.
- AOT-compile the module (
wasmtime compile guessit.wasm -o guessit.cwasm) to skip per-run JIT of the wasm itself (~30 ms off cold start). - The ~35 ms warm floor is QuickJS interpretation; beating it would require a JIT-capable WASI JS engine (none production-ready) or a native port — out of scope. WASM correctness is bit-identical to the JS build.
License
LGPL-3.0
