npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dataset.sh/file

v0.1.1

Published

TypeScript library for reading and writing DatasetFile ZIP-based archive format

Readme

@dataset.sh/file

A TypeScript library for reading and writing DatasetFile ZIP-based archive format. This library provides an efficient way to store and access structured datasets with support for collections, type annotations, and binary files.

Features

  • 📦 ZIP-based format - Efficient compression and packaging of datasets
  • 📊 Multiple collections - Organize data into named collections (train, test, validation, etc.)
  • 🏷️ Type annotations - Include type information for each collection
  • 🔤 Typelang support - Define schemas using TypeScript-like syntax for cross-platform compatibility
  • 📎 Binary files - Attach model weights, images, or other binary assets

Installation

pnpm add @dataset.sh/file

Optional: Typelang Compiler

For enhanced type validation and cross-platform type generation, you can also install the Typelang compiler:

pnpm add @dataset.sh/typelang

Quick Start

Writing a Dataset

import {DatasetFile, DatasetFileWriter} from '@dataset.sh/file';

// Create a new dataset file
const writer = DatasetFile.open('my-dataset.dataset', 'w') as DatasetFileWriter;

// Add metadata
writer.updateMeta({
    author: 'Your Name',
    authorEmail: '[email protected]',
    description: 'My awesome dataset',
    tags: ['nlp', 'classification'],
    dataset_metadata: {
        version: '1.0.0',
        created: new Date().toISOString()
    }
});

// Add a collection with data and Typelang schema
const trainData = [
    {id: 1, text: 'Hello world', label: 'greeting'},
    {id: 2, text: 'How are you?', label: 'question'}
];

// Define schema using Typelang syntax
const typeSchema = `// use TrainItem
type TrainItem = {
  id: int
  text: string
  label: string
}`;

await writer.addCollection('train', trainData, typeSchema);

// Add binary files (optional)
const modelWeights = Buffer.from('...');
writer.addBinaryFile('model.bin', modelWeights);

await writer.close();

Reading a Dataset

import {DatasetFile, DatasetFileReader} from '@dataset.sh/file';

// Open an existing dataset
const reader = DatasetFile.open('my-dataset.dataset', 'r') as DatasetFileReader;

// Access metadata
console.log('Author:', reader.meta.author);
console.log('Collections:', reader.collections());

// Read a collection
const trainCollection = reader.collection('train');

// Get type annotation (raw Typelang schema)
const typeAnnotation = await trainCollection.typeAnnotation();
console.log('Type annotation:', typeAnnotation);

// Generate code from type annotation
const codeUsage = await trainCollection.generateCode();
if (codeUsage) {
    console.log('Type name:', codeUsage.useClass);
    console.log('Compilation result:', codeUsage.result);
}

// Access data
console.log('First 5 items:', trainCollection.top(5));
console.log('Random sample:', trainCollection.randomSample(3));

// Iterate through data
for (const item of trainCollection) {
    console.log(item);
}

// Convert to array
const allData = trainCollection.toList();

// Access binary files
const modelData = reader.openBinaryFile('model.bin');

reader.close();

API Reference

DatasetFile

Main entry point for opening dataset files.

DatasetFile.open(filePath: string, mode: 'r' | 'w')

Opens a dataset file for reading or writing.

  • filePath: Path to the dataset file
  • mode: 'r' for reading, 'w' for writing
  • Returns: DatasetFileReader or DatasetFileWriter

DatasetFileWriter

Used for creating new dataset files.

Methods

  • updateMeta(meta: Partial<DatasetFileMeta>): Update dataset metadata
  • async addCollection(name: string, data: any[], type_annotation?: string): Add a data collection with optional Typelang schema
  • addBinaryFile(fileName: string, data: Buffer): Add a binary file
  • async close(): Close and save the dataset file

DatasetFileReader

Used for reading existing dataset files.

Properties

  • meta: Dataset metadata

Methods

  • collections(): Get list of collection names
  • collection(name: string): Get a collection reader
  • coll(name: string): Shorthand for collection()
  • binaryFiles(): List binary file names
  • openBinaryFile(fileName: string): Read a binary file
  • close(): Close the dataset file

CollectionReader

Reader for individual collections within a dataset.

Properties

  • length: Number of items in the collection

Methods

  • async typeAnnotation(): Get raw Typelang schema string
  • async generateCode(): Generate code usage information from type annotation (returns CodeUsage with source, useClass, and compile result)
  • top(n: number): Get first n items
  • randomSample(n: number): Get random sample
  • toList(): Convert to array
  • [Symbol.iterator](): Iterate through items

File Format

DatasetFile uses a ZIP archive with the following structure:

dataset.dataset/
├── meta.json           # Dataset metadata
├── coll/              # Collections folder
│   ├── train/
│   │   ├── data.jsonl # Data in JSON Lines format
│   │   └── type.tl    # Typelang schema (optional)
│   └── test/
│       ├── data.jsonl
│       └── type.tl
└── bin/               # Binary files folder
    └── model.bin

Typelang Support

This library supports Typelang, a TypeScript-flavored schema definition language for cross-platform type generation.

Using Typelang Schemas

import {DatasetFile, DatasetFileWriter} from '@dataset.sh/file';

const writer = DatasetFile.open('typed-dataset.dataset', 'w') as DatasetFileWriter;

// Define complex types with Typelang
const userSchema = `// use User
type Address = {
  street: string
  city: string
  country: string
  postalCode?: string
}

type User = {
  id: string
  name: string
  email: string
  age: int
  address: Address
  tags: string[]
  status: "active" | "inactive" | "pending"
}`;

const userData = [{
    id: 'u1',
    name: 'Alice',
    email: '[email protected]',
    age: 30,
    address: {
        street: '123 Main St',
        city: 'San Francisco',
        country: 'USA'
    },
    tags: ['developer', 'team-lead'],
    status: 'active'
}];

await writer.addCollection('users', userData, userSchema);
await writer.close();

Generic Types

const responseSchema = `// use ApiResponse
type Response<T> = {
  success: bool
  data?: T
  error?: string
  timestamp: string
}

type UserData = {
  userId: string
  username: string
}

type ApiResponse = Response<UserData>`;

await writer.addCollection('responses', responseData, responseSchema);

Examples

Working with NLP Datasets

const writer = DatasetFile.open('nlp-dataset.dataset', 'w') as DatasetFileWriter;

writer.updateMeta({
    description: 'Sentiment analysis dataset',
    tags: ['nlp', 'sentiment', 'classification']
});

const data = [
    {text: 'This movie is great!', sentiment: 'positive'},
    {text: 'Terrible experience.', sentiment: 'negative'}
];

const sentimentSchema = `// use SentimentItem
type SentimentItem = {
  text: string
  sentiment: "positive" | "negative" | "neutral"
}`;

await writer.addCollection('train', data, sentimentSchema);

await writer.close();

Reading Python-created Datasets

This library is fully compatible with datasets created using the Python dataset-sh library, including those with Typelang type annotations:

const reader = DatasetFile.open('python-dataset.dataset', 'r') as DatasetFileReader;

// Read collections created in Python
const collection = reader.collection('data');

// Check for type annotation and generate code
const typeAnnotation = await collection.typeAnnotation();
if (typeAnnotation) {
    console.log('Type annotation:', typeAnnotation);
    const codeUsage = await collection.generateCode();
    if (codeUsage) {
        console.log('Type name:', codeUsage.useClass);
        console.log('Validation errors:', codeUsage.result.errors);
    }
}

// Iterate through data
for (const item of collection) {
    console.log(item);
}

reader.close();

Development

Building

pnpm build

Testing

pnpm test
pnpm test:watch
pnpm test:coverage

Running Examples

pnpm example
pnpm verify-python

Requirements

  • Node.js >= 16.0.0
  • TypeScript >= 5.0.0

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and feature requests, please use the GitHub issue tracker.