@dataset.sh/client
v0.0.2
Published
TypeScript client library for dataset.sh - A powerful dataset management system supporting both local and remote storage with seamless transfer capabilities.
Maintainers
Readme
@dataset.sh/client
TypeScript client library for dataset.sh - A powerful dataset management system supporting both local and remote storage with seamless transfer capabilities.
Installation
npm install @dataset.sh/client
# or
pnpm add @dataset.sh/client
# or
yarn add @dataset.sh/clientFeatures
- 💾 Local Storage: Manage datasets on your local filesystem
- ☁️ Remote Storage: Connect to dataset.sh servers
- 🔄 Seamless Transfer: Upload/download with resume support
- 📦 Version Control: Track dataset versions with checksums
- 🏷️ Tagging System: Tag versions for easy reference
- 📊 Progress Tracking: Monitor transfer progress in real-time
- 🔐 Authentication: Secure API key authentication
Quick Start
Local Storage
import { LocalStorage } from '@dataset.sh/client';
// Initialize local storage
const storage = new LocalStorage({ location: './my-datasets' });
// Create a dataset
const dataset = storage.dataset('my-namespace/my-dataset');
// Import data from collections
await dataset.importData({
users: {
name: 'users',
data: [
{ id: 1, name: 'Alice', age: 30 },
{ id: 2, name: 'Bob', age: 25 }
],
typeAnnotation: 'Array<{id: number, name: string, age: number}>'
}
}, null, ['v1.0', 'latest'], 'My user dataset');
// Access the latest version
const latest = dataset.latest();
if (latest) {
const reader = latest.open();
const users = reader.collection('users');
for (const user of users) {
console.log(user);
}
}Remote Storage
import { RemoteClient } from '@dataset.sh/client';
// Initialize remote client
const client = new RemoteClient({
host: 'https://api.dataset.sh',
accessKey: 'your-api-key'
});
// Access a remote dataset
const dataset = client.dataset('my-namespace/my-dataset');
// List all versions
const versions = await dataset.versions();
console.log('Available versions:', versions.map(v => v.getVersion()));
// Get latest version
const latest = await dataset.latest();
if (latest) {
// Read README
const readme = await latest.getReadme();
console.log(readme);
}Usage Guide
Local Client API
LocalStorage
The main entry point for local dataset operations.
import { LocalStorage } from '@dataset.sh/client';
// Initialize with custom location
const storage = new LocalStorage({ location: '/path/to/datasets' });
// List all namespaces
const namespaces = storage.namespaces();
// Access a specific namespace
const namespace = storage.namespace('my-namespace');
// List all datasets
const datasets = storage.datasets();
// Access a specific dataset
const dataset = storage.dataset('namespace/dataset-name');LocalDataset
Manage dataset versions and tags.
// Import from file
const version = dataset.importFile('./data.dataset', {
replace: false, // Don't replace if version exists
removeSource: false, // Keep source file
tags: ['v1.0'], // Apply tags
asLatest: true // Mark as latest
});
// Import from data
const version = await dataset.importData(
{
collection1: { name: 'collection1', data: [...] },
collection2: { name: 'collection2', data: [...] }
},
null, // Type dictionary (optional)
['v2.0', 'latest'], // Tags
'Dataset description'
);
// Manage versions
const versions = dataset.versions(); // List all versions
const v1 = dataset.version('abc123...'); // Get specific version
const latest = dataset.latest(); // Get latest tagged version
// Manage tags
dataset.setTag('stable', 'abc123...');
dataset.removeTag('beta');
const tags = dataset.tags(); // Get all tags
const version = dataset.resolveTag('stable'); // Resolve tag to version IDTransfer Between Local and Remote
The transfer module provides seamless data movement between local and remote storage.
Download from Remote to Local
import {
LocalStorage,
RemoteClient,
download,
downloadToFile,
ConsoleDownloadProgressReporter
} from '@dataset.sh/client';
// Setup
const remote = new RemoteClient({ host: 'https://api.dataset.sh', accessKey: 'key' });
const local = new LocalStorage({ location: './datasets' });
// Get source and target
const remoteDataset = remote.dataset('namespace/dataset');
const localDataset = local.dataset('namespace/dataset');
const remoteVersion = await remoteDataset.latest();
const localVersion = localDataset.version('abc123'); // Target version
// Download with progress tracking
await download(
remoteVersion,
localVersion,
new ConsoleDownloadProgressReporter()
);
// Or download to specific file
await downloadToFile(
remoteVersion,
'./downloads/dataset.zip',
new ConsoleDownloadProgressReporter()
);Upload from Local to Remote
import {
upload,
uploadFromFile,
ConsoleUploadProgressReporter
} from '@dataset.sh/client';
// Get source and target
const localVersion = localDataset.latest();
const remoteVersion = remoteDataset.version('new-version-id');
// Upload with progress tracking
await upload(
localVersion,
remoteVersion,
new ConsoleUploadProgressReporter()
);
// Or upload from file
await uploadFromFile(
'./my-dataset.zip',
remoteVersion,
new ConsoleUploadProgressReporter()
);Storage Structure
Local Storage Layout
storage-base/
├── namespace1/
│ ├── dataset1/
│ │ ├── version/
│ │ │ ├── abc123.../
│ │ │ │ └── abc123.../
│ │ │ │ ├── file.dataset
│ │ │ │ ├── readme
│ │ │ │ └── cache/
│ │ │ │ ├── meta.json
│ │ │ │ ├── data_sample_*.jsonl
│ │ │ │ └── typing_*.tl
│ │ │ └── def456.../
│ │ └── tag/
│ │ ├── latest
│ │ ├── v1.0
│ │ └── stable
│ └── dataset2/
└── namespace2/License
MIT
