vin-test-data
v0.2.0
Published
Hundreds of thousands of real, validated VIN numbers with metadata for testing. Ships with optional corgi-based VIN decoding.
Downloads
220
Maintainers
Readme
vin-test-data
Hundreds of thousands of real, validated VIN numbers — bundled with the
package and ready for use in tests, fixtures, demos, and decoder
benchmarks. Optionally decodes any returned VIN with
@cardog/corgi.
Why
Building or testing anything that touches VINs (forms, decoders, OCR pipelines, NCIC code mappers) needs realistic VIN strings. Generating them is hard because real VINs encode WMI / VDS / VIS rules and a check-digit. This package ships ~547k pre-validated, real-world VINs with their make / model / year / body type / fuel type, so you can:
- Pull
Nrandom VINs in tests withgetRandomVins(N). - Filter by make, model, year, body type, fuel type.
- Decode any VIN end-to-end through corgi without wiring it up yourself.
Install
Requires Node 18 or newer.
npm install vin-test-data
# or
yarn add vin-test-data@cardog/corgi is a regular dependency, so decoding works out of the
box on Node.
CLI
The package ships a vin-test-data binary, so you can pull VINs and
decode them straight from the shell — handy for quick smoke tests,
seeding test fixtures, or piping into other tools.
# Try without installing
npx vin-test-data random 5
# Or globally
npm install -g vin-test-data
vin-test-data random 5Common commands
vin-test-data random 5 # 5 random VINs
vin-test-data random 5 --json # ...as JSON
vin-test-data random 20 --make Ford --year 2012 # 20 random Ford 2012 VINs
vin-test-data random 3 --decode # rich per-VIN decoded summary
vin-test-data random 3 --decode --json # full corgi DecodeResult per VIN
vin-test-data random 100 --vins-only > vins.txt # 100 plain VIN strings, one per line
vin-test-data find --make BMW --year-min 2010 --year-max 2015 --limit 10
vin-test-data find --make Toyota --vins-only --limit 50
vin-test-data decode 5UXFA13564LU32275 # detailed decoded breakdown
vin-test-data decode 5UXFA13564LU32275 --json # full corgi DecodeResult (every VDS pattern, etc.)
vin-test-data makes # list all 54 makes
vin-test-data years # list all years (1982–2015)
vin-test-data models --json # all 710 models as JSON
vin-test-data stats # dataset statistics
vin-test-data help # full referenceOutput flags
--json, --vins-only, and the human-readable default are available
on every command that returns records (random, find, decode,
makes, models, years, body-types, fuel-types, stats):
- (default) human-readable table or sectioned breakdown
--jsonmachine-readable JSON. For--decodethis is the full corgi DecodeResult, including every VDS pattern match, diagnostic timing, schema version, etc.--vins-onlyprint only the 17-char VINs, one per line (great for piping)
Filter flags (apply to random and find)
--make, --model, --year, --year-min, --year-max,
--body-type, --fuel-type, plus --limit and --random for find.
Exit codes
0— succeeded with results (or, forrandom 0, you asked for nothing).1— invalid arguments (bad count, unknown command, malformed flag).2— query ran successfully but matched no records. Useful in shell scripts:vin-test-data find --make Foo --vins-only > out.txt && echo "got matches".
Programmatic quick start
import {
getRandomVins,
findVinsByMake,
findVinsByYear,
getMakes,
getYears,
decodeVin,
getRandomVinsDecoded,
closeDecoder,
} from 'vin-test-data';
// 5 random VINs from the entire dataset
getRandomVins(5);
// → [{ vin: '2C3CDXCT5EH288332', year: 2014, make: 'Dodge', model: 'Charger', ... }, ... ]
// 20 random VINs filtered by make
getRandomVins(20, { make: 'Ford' });
// 100 random VINs from a year range
getRandomVins(100, { yearMin: 2010, yearMax: 2015 });
// Deterministic lookups (records returned in dataset order; same input → same output across calls)
findVinsByMake('Toyota', { limit: 10 });
findVinsByYear(2012, { limit: 10 });
// Discover what's in the dataset
getMakes(); // → ['Acura', 'Aston Martin', 'Audi', 'Bentley', 'BMW', ... 54 total]
getYears(); // → [1982, 1983, ..., 2015]
// Decode a single VIN through corgi
const decoded = await decodeVin('5UXFA13564LU32275');
// → corgi DecodeResult — { valid, components: { vehicle, wmi, modelYear, ... } }
// Or pull random VINs and decode them in one call
const batch = await getRandomVinsDecoded(5, { make: 'BMW' });
// → [{ record: VinRecord, decoded: DecodeResult }, ...]
// Release the corgi decoder when you're done (e.g. in test teardown)
await closeDecoder();Dataset
| Stat | Value |
|------|-------|
| Total VINs | 547,706 |
| Unique makes | 54 |
| Unique models | 710 |
| Year range | 1982 – 2015 |
| Body types | 11 |
| Fuel types | 8 |
| Compressed payload | ~6.4 MB (data/vins.tsv.gz) |
All VINs are real 17-character strings that previously passed check-digit validation against an external source.
Programmatic API
Random sampling
function getRandomVins(count: number, filters?: VinFilters): VinRecord[];Returns up to count random records, sampled without replacement. If
fewer matching records exist than requested, returns all of them.
Math.random() is used; this is intended for testing/fixtures, not
cryptographic use.
Filtered lookups
function findVins(filters: VinFilters, options?: FindOptions): VinRecord[];
function findVinsByMake(make: string, options?: FindOptions): VinRecord[];
function findVinsByYear(year: number, options?: FindOptions): VinRecord[];FindOptions:
limit?: number— cap result count.random?: boolean— randomize selection (default: deterministic dataset order).
Discovery — what's in the dataset
function getMakes(): string[]; // sorted, case-insensitive unique
function getModels(): string[];
function getYears(): number[]; // ascending
function getBodyTypes(): string[];
function getFuelTypes(): string[];
function getStats(): VinStats;Use these to drive UIs ("which makes can I filter by?") or to validate
filter inputs before calling getRandomVins.
Corgi-backed decoding
import type { DecodeResult } from '@cardog/corgi';
function decodeVin(vin: string): Promise<DecodeResult>;
function decodeRecords(records: VinRecord[]): Promise<DecodedVinRecord[]>;
function getRandomVinsDecoded(count: number, filters?: VinFilters): Promise<DecodedVinRecord[]>;
function closeDecoder(): Promise<void>;DecodeResult is the corgi-native type — re-exporting it here would
just shadow the original, so import it directly from @cardog/corgi
when you need full typing.
The corgi decoder is created lazily on first use and shared across all
decode calls. It costs a few hundred milliseconds to spin up (it loads
a SQLite VPIC database) but subsequent decodes are fast. Call
closeDecoder() in test teardown or before process exit to release
the SQLite connection — otherwise the open handle can keep the Node
event loop alive past your script's last line.
If createDecoder fails (e.g. a missing native binding) the error
surfaces from your first await. The internal cache is cleared on
failure so the next call gets a fresh attempt rather than a
permanently-rejected cached promise.
Filter shape
interface VinFilters {
year?: number; // exact year
yearMin?: number; // inclusive (use with yearMax for a range)
yearMax?: number;
make?: string; // case-insensitive exact match
model?: string;
bodyType?: string;
fuelType?: string;
}All filters combine with logical AND. String matches are
case-insensitive but exact (no substring matching) — use getMakes() /
getModels() to discover the canonical spellings.
Result shape
interface VinRecord {
vin: string; // 17-char uppercase VIN
year: number;
make: string;
model: string;
bodyType: string; // empty string if unknown in source
fuelType: string;
}
interface DecodedVinRecord {
record: VinRecord;
decoded: DecodeResult; // re-exported from @cardog/corgi
}Performance notes
- The first call to any lookup function lazily decompresses the bundled TSV (~250 ms for 547k records on a recent laptop) and builds in-memory indexes by year, make, model, body type, and fuel type. Subsequent calls are O(1) for filter resolution + O(n) over the matching subset.
- Indexed filters (
year,make,model,bodyType,fuelType) avoid full-dataset scans.yearMin/yearMaxranges fall back to a scan over year buckets, which is still fast. getRandomVinsdoes a partial Fisher–Yates shuffle, so sampling 100 VINs out of 500k is O(100), not O(500k).
Data attribution
The bundled VINs are derived from the "Used Car Auction Prices" dataset on Kaggle by Bojan Tunguz, distributed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Modifications applied:
- Each VIN in the source CSV was validated through
@cardog/corgi(which checks WMI / VDS / VIS rules and the check-digit). Only VINs that passed validation were kept. - All non-vehicle columns (
trim,transmission,state,condition,odometer,color,interior,seller,mmr,sellingprice,saledate) were dropped; onlyvin,year,make,model,body, and an inferredfuelTypewere retained. - The result was gzip-encoded as TSV for in-package distribution.
The vehicle attributes that ship with each VIN (year, make, model, body type, fuel type) are inherent properties of the VIN itself — they are derivable from any public VIN decoder (e.g. NHTSA's vPIC) given the VIN string.
The original dataset and this derivative contain only vehicle identifiers and vehicle attributes; no owner names, plate numbers, addresses, prices, or sale records from the source dataset are included in this package.
License
The source code in this repository is licensed under the MIT
License (see LICENSE). The bundled VIN dataset is a derivative
work under CC BY 4.0; see "Data attribution" above.
