npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dataset.sh/cli

v0.1.1

Published

Dataset CLI for managing local and remote dataset storage

Downloads

127

Readme

@dataset.sh/cli

A powerful command-line interface for managing datasets with local caching, remote downloads, and flexible storage management. Similar to package managers like pnpm, but designed specifically for dataset files.

Features

  • 📦 Local and Global Installation - Install datasets per-project or globally
  • 🔄 Intelligent Caching - Global cache with SHA-256 integrity verification
  • 🏷️ Tag and Version Support - Install by semantic tags or specific versions
  • 🔗 Symbolic Linking - Efficient storage with automatic linking strategies
  • 🌐 Multiple Servers - Support for multiple dataset servers with authentication
  • 📤 Dataset Unpacking - Extract dataset contents for direct use
  • 🔐 Security - Built-in checksum verification and retry logic

Installation

Global Installation

pnpm add -g @dataset.sh/cli
# or
npm install -g @dataset.sh/cli

After global installation, use the dataset.sh command:

dataset.sh init
dataset.sh install nlp/sentiment

Using npx (No Installation Required)

npx @dataset.sh/cli init
npx @dataset.sh/cli install nlp/sentiment
npx @dataset.sh/cli unpack nlp/sentiment

Local Project Installation

pnpm add @dataset.sh/cli
# or
npm install @dataset.sh/cli

Then use via npm scripts or npx.

Quick Start

1. Initialize a Project

# Using global installation
dataset.sh init

# Using npx (no installation required)
npx @dataset.sh/cli init

2. Install a Dataset

# Install dataset with default tag (main)
dataset.sh install nlp/sentiment
# or
npx @dataset.sh/cli install nlp/sentiment

# Install specific tag
dataset.sh install nlp/sentiment -t v1.2

# Install specific version (using version hash)
dataset.sh install nlp/sentiment -v a1b2c3d4e5f6...

# Install globally
dataset.sh install -g nlp/sentiment

3. Unpack for Direct Use

# Unpack to public/datasets/nlp/sentiment
dataset.sh unpack nlp/sentiment
# or
npx @dataset.sh/cli unpack nlp/sentiment

# Unpack to custom location
dataset.sh unpack nlp/sentiment -d ./data

Global Options

--debug

Enable detailed debug logging to stderr. This shows internal operations including:

  • Configuration loading and path resolution
  • Network requests and responses
  • Cache operations (hits/misses)
  • File system operations
  • Linking strategies and operations
# Enable debug logging for any command
dataset.sh --debug init
dataset.sh --debug install nlp/sentiment

# Using npx
npx @dataset.sh/cli --debug init
npx @dataset.sh/cli --debug install nlp/sentiment

Debug output includes timestamped logs with module prefixes:

  • [CLI] - Command-line interface operations
  • [CONFIG] - Configuration and path management
  • [NETWORK] - HTTP requests and server communication
  • [CACHE] - Cache operations and integrity checking
  • [LINKING] - File linking and symlink operations
  • [FS] - File system operations
  • [INIT] - Init command operations
  • [INSTALL] - Install command operations
  • [UNPACK] - Unpack command operations

Commands

dataset.sh init

Creates a datasets.json file in the current directory.

dataset.sh init

dataset.sh install [dataset]

Installs datasets from datasets.json or adds and installs a specific dataset.

# Install all datasets from datasets.json
dataset.sh install

# Install specific dataset
dataset.sh install nlp/sentiment

# Install with options
dataset.sh install nlp/sentiment -t v1.2 -s myserver
dataset.sh install -g nlp/sentiment -v a1b2c3d4e5f6...

Options:

  • -g, --global - Install to global directory (~/.dataset_sh/global)
  • -s, --server <profile> - Use specific server profile
  • -t, --tag <tag> - Install specific tag (default: main)
  • -v, --version <version> - Install specific version (64-character hex string)

dataset.sh unpack <dataset>

Unpacks dataset content to a destination folder. The dataset must be installed first.

# Unpack to public/datasets
dataset.sh unpack nlp/sentiment

# Unpack to custom directory
dataset.sh unpack nlp/sentiment -d ./data

# Unpack specific version
dataset.sh unpack nlp/sentiment -v a1b2c3d4e5f6...

Options:

  • -v, --version <version> - Unpack specific version (default: latest available)
  • -d, --dest <folder> - Destination folder (default: public/datasets)

Configuration

Environment Variables

  • DSH_CACHE_DIR - Global cache directory (default: ~/.dataset_sh/cache)
  • DSH_GLOBAL_DIR - Global install directory (default: ~/.dataset_sh/global)
  • DSH_PROFILE_FILE - Server profiles file (default: ~/.dataset_sh/profile.json)

Server Profiles

Create ~/.dataset_sh/profile.json to configure server access:

{
  "servers": {
    "production": {
      "host": "https://api.example.com",
      "accessKey": "your-access-key"
    },
    "staging": {
      "host": "https://staging-api.example.com",
      "accessKey": "staging-key"
    }
  }
}

datasets.json Format

The datasets.json file tracks project dependencies:

{
  "datasets": {
    "nlp/sentiment": [
      {
        "tag": "v1.2",
        "host": "https://api.example.com"
      }
    ],
    "vision/imagenet": [
      {
        "version": "a1b2c3d4e5f6789...",
        "host": "https://api.example.com"
      }
    ]
  }
}

File Organization

Local Installation Structure

project/
├── datasets.json           # Project dataset manifest
├── dsh_datasets/          # Local dataset installations
│   └── nlp/
│       └── sentiment/
│           ├── tag/
│           │   ├── main -> ../version/a1b2c3d4...
│           │   └── v1.2 -> ../version/f6e5d4c3...
│           └── version/
│               ├── a1b2c3d4.../
│               └── f6e5d4c3.../

Global Cache Structure

~/.dataset_sh/
├── cache/                 # Global cache with integrity checking
│   └── nlp/
│       └── sentiment/
│           └── version/
│               ├── a1b2c3d4.../
│               │   └── sentiment.dataset
│               └── f6e5d4c3.../
│                   └── sentiment.dataset
├── global/               # Global installations
├── profile.json         # Server configurations

How It Works

Installation Process

  1. Tag Resolution - If installing by tag, resolves to specific version via API
  2. Cache Check - Checks if dataset exists in global cache and validates checksum
  3. Download - Downloads dataset if not cached or corrupted
  4. Verification - Validates SHA-256 checksum before caching
  5. Linking - Creates symbolic links (or copies) to target location

Caching Strategy

  • Global Cache - All datasets stored in ~/.dataset_sh/cache by version
  • Integrity Checking - SHA-256 checksums verify file integrity
  • Automatic Redownload - Corrupted cache entries are automatically redownloaded
  • Cross-Platform - Uses appropriate linking strategy per platform

Network Resilience

  • Exponential Backoff - Retries failed downloads with 1s, 2s, 4s delays
  • Smart Error Handling - Distinguishes between retryable and permanent failures
  • Authentication Support - Bearer token authentication for private servers

Examples

Machine Learning Workflow

# Initialize project
dataset.sh init

# Install training data
dataset.sh install ml/training-data -t latest

# Install validation set
dataset.sh install ml/validation-data -v a1b2c3d4e5f6...

# Unpack for training script
dataset.sh unpack ml/training-data -d ./data/train
dataset.sh unpack ml/validation-data -d ./data/val

Multi-Environment Setup

# Development
dataset.sh install nlp/dataset -t dev -s staging

# Production
dataset.sh install nlp/dataset -t v2.1 -s production

Global Dataset Management

# Install commonly used datasets globally
dataset.sh install -g common/embeddings
dataset.sh install -g common/stopwords

# Use in any project without reinstalling
dataset.sh unpack common/embeddings

Error Handling

The CLI provides clear, actionable error messages:

  • Network failures - Suggests checking connection and retry
  • Authentication errors - Points to profile configuration
  • Missing datasets - Shows available versions and tags
  • Disk space issues - Advises on freeing space
  • Permission errors - Guides on fixing file permissions

Troubleshooting

Debug Mode

When encountering issues, enable debug logging to see detailed internal operations:

dataset.sh --debug install problem/dataset
# or
npx @dataset.sh/cli --debug install problem/dataset

This will show:

  • Which server profiles are being used
  • Network request details and response codes
  • Cache hit/miss information
  • File system operations and linking strategies
  • Checksum verification steps

Common Issues

"datasets.json not found"

# Run init first
dataset.sh init
# or
npx @dataset.sh/cli init

"Server profile not found"

# Check your profile configuration
cat ~/.dataset_sh/profile.json

# Or create one
mkdir -p ~/.dataset_sh
echo '{"servers":{"default":{"host":"https://api.example.com"}}}' > ~/.dataset_sh/profile.json

"Checksum verification failed"

# Clear cache and retry
rm -rf ~/.dataset_sh/cache/category/dataset
dataset.sh --debug install category/dataset
# or
npx @dataset.sh/cli --debug install category/dataset

Network issues

# Use debug mode to see network details
dataset.sh --debug install category/dataset
# or
npx @dataset.sh/cli --debug install category/dataset

# Check server connectivity
curl -v https://your-server.com/api/health

Development

Building

pnpm build

Testing

pnpm test
pnpm test:watch

Compatibility

  • Node.js >= 16.0.0
  • TypeScript >= 5.0.0
  • Cross-platform - Works on Windows, macOS, and Linux

License

MIT