@wholebuzz/fs
v1.3.0
Published
File system interface abstraction with implementations for GCP GCS, AWS S3, Azure, SMB, and Local file systems.
Maintainers
Readme
@wholebuzz/fs

File system abstraction with implementations for GCP GCS, AWS S3, Azure, SMB, HTTP, and Local file systems. Provides atomic primitives enabling multiple readers and writers.
- LocalFileSystem employs content hashing to approximate GCS Object Versioning.
- GoogleCloudFileSystem provides consistent parallel access paterns.
- S3FileSystem provides basic file system primitives.
- SMBFileSystem provides basic file system primitives.
- HTTPFileSystem provides a basic HTTP file system.
Provides file format implementations for:
- Lines
- CSV (via csv)
- JSON, ND-JSON / JSONL (via JSONStream and ndjson)
- Parquet including
streamingParquetcodec and parquetjs. - TFRecord including tfrecord-stream.
Additionally provides sharding & merging utilities.
Dependencies
The FileSystem implementations require peer dependencies:
- AnyFileSystem: None. URL resolution as a
FileSystem. Files have URLs and HTTP is a file system. - AzureBlobStorageFileSystem:
@azure/storage-bloband@azure/identity - AzureFileShareFileSystem:
@azure/storage-file-share - GoogleCloudFileSystem:
@google-cloud/storage - HTTPFileSystem:
axios - LocalFileSystem:
fs-ext,glob, andglob-stream - S3FileSystem:
aws-sdk,s3-stream-upload, andathena-express - SMBFileSystem:
@marsaud/smb2
Credits
Built with the tree-stream primitives ReadableStreamTree and WritableStreamTree.
Project history
The project started to support @wholebuzz/archive, a terabyte-scale archive for GCS. The focus has since expanded to include powering dbcp and @wholebuzz/mapreduce with a collection of file system implementations under a common interface. The atomic primitives are only available for Google Cloud Storage and local.
Example
import { AnyFileSystem } from '@wholebuzz/fs/lib/fs'
import { GoogleCloudFileSystem } from '@wholebuzz/fs/lib/gcp'
import { HTTPFileSystem } from '@wholebuzz/fs/lib/http'
import { LocalFileSystem } from '@wholebuzz/fs/lib/local'
import { S3FileSystem } from '@wholebuzz/fs/lib/s3'
import { readJSON, writeJSON } from '@wholebuzz/fs/lib/json'
const httpFileSystem = new HTTPFileSystem()
const fs = new AnyFileSystem([
{ urlPrefix: 'gs://', fs: new GoogleCloudFileSystem() },
{ urlPrefix: 's3://', fs: new S3FileSystem() },
{ urlPrefix: 'http://', fs: httpFileSystem },
{ urlPrefix: 'https://', fs: httpFileSystem },
{ urlPrefix: '', fs: new LocalFileSystem() },
])
await writeJSON(fs, 's3://bucket/file', { foo: 'bar' })
const foobar = await readJSON(fs, 's3://bucket/file')CLI
node lib/cli.js ls .
node lib/cli.js --helpAPI Reference
Modules
Methods
- appendToFile
- copyFile
- createFile
- ensureDirectory
- fileExists
- getFileStatus
- moveFile
- openReadableFile
- openWritableFile
- queueRemoveFile
- readDirectory
- readDirectoryStream
- removeDirectory
- removeFile
- replaceFile
Constructors
constructor
+ new FileSystem(): FileSystem
Returns: FileSystem
Methods
appendToFile
▸ Abstract appendToFile(urlText: string, writeCallback: (stream: WritableStreamTree) => Promise<boolean>, createCallback?: (stream: WritableStreamTree) => Promise<boolean>, createOptions?: CreateOptions, appendOptions?: AppendOptions): Promise<null | FileStatus>
Appends to the file, safely. Either writeCallback or createCallback is called.
For simple appends, the same paramter can be supplied for both writeCallback and
createCallback.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the file to append to. |
| writeCallback | (stream: WritableStreamTree) => Promise<boolean> | Stream callback for appending to the file. |
| createCallback? | (stream: WritableStreamTree) => Promise<boolean> | Stream callback for initializing the file, if necessary. |
| createOptions? | CreateOptions | Initial metadata for initializing the file, if necessary. |
| appendOptions? | AppendOptions | - |
Returns: Promise<null | FileStatus>
Defined in: src/fs.ts:209
copyFile
▸ Abstract copyFile(sourceUrlText: string, destUrlText: string): Promise<boolean>
Copies the file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| sourceUrlText | string | The URL of the source file to copy. |
| destUrlText | string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:178
createFile
▸ Abstract createFile(urlText: string, createCallback?: (stream: WritableStreamTree) => Promise<boolean>, options?: CreateOptions): Promise<boolean>
Creates file, failing if the file already exists.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the file to create. |
| createCallback? | (stream: WritableStreamTree) => Promise<boolean> | Stream callback for initializing the file. |
| options? | CreateOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:155
ensureDirectory
▸ Abstract ensureDirectory(urlText: string, options?: EnsureDirectoryOptions): Promise<boolean>
Ensures the directory exists
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the directory. |
| options? | EnsureDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:109
fileExists
▸ Abstract fileExists(urlText: string): Promise<boolean>
Returns true if the file exists.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the file to check whether exists. |
Returns: Promise<boolean>
Defined in: src/fs.ts:121
getFileStatus
▸ Abstract getFileStatus(urlText: string, options?: GetFileStatusOptions): Promise<FileStatus>
Determines the file status. The file version is used to implement atomic mutations.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the file to retrieve the status for. |
| options? | GetFileStatusOptions | - |
Returns: Promise<FileStatus>
Defined in: src/fs.ts:127
moveFile
▸ Abstract moveFile(sourceUrlText: string, destUrlText: string): Promise<boolean>
Moves the file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| sourceUrlText | string | The URL of the source file to copy. |
| destUrlText | string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:185
openReadableFile
▸ Abstract openReadableFile(url: string, options?: OpenReadableFileOptions): Promise<ReadableStreamTree>
Opens a file for reading.
optional version Fails if version doesn't match for GCS URLs.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| url | string | The URL of the file to read from. |
| options? | OpenReadableFileOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:134
openWritableFile
▸ Abstract openWritableFile(url: string, options?: OpenWritableFileOptions): Promise<WritableStreamTree>
Opens a file for writing.
optional version Fails if version doesn't match for GCS URLs.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| url | string | The URL of the file to write to. |
| options? | OpenWritableFileOptions | - |
Returns: Promise<WritableStreamTree>
Defined in: src/fs.ts:144
queueRemoveFile
▸ Abstract queueRemoveFile(urlText: string): Promise<boolean>
Queues deletion, e.g. after DaysSinceCustomTime.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:171
readDirectory
▸ Abstract readDirectory(urlText: string, options?: ReadDirectoryOptions): Promise<DirectoryEntry[]>
Returns the URLs of the files in a directory.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the directory to list files in. |
| options? | ReadDirectoryOptions | - |
Returns: Promise<DirectoryEntry[]>
Defined in: src/fs.ts:94
readDirectoryStream
▸ Abstract readDirectoryStream(urlText: string, options?: ReadDirectoryOptions): Promise<ReadableStreamTree>
Returns a stream of the URLs of the files in a directory.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the directory to list files in. |
| options? | ReadDirectoryOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:100
removeDirectory
▸ Abstract removeDirectory(urlText: string, options?: RemoveDirectoryOptions): Promise<boolean>
Removes the directory
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the directory. |
| options? | RemoveDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:115
removeFile
▸ Abstract removeFile(urlText: string): Promise<boolean>
Deletes the file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:165
replaceFile
▸ Abstract replaceFile(urlText: string, writeCallback: (stream: WritableStreamTree) => Promise<boolean>, options?: ReplaceFileOptions): Promise<boolean>
Replaces the file, failing if the file version doesn't match.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText | string | The URL of the file to replace. |
| writeCallback | (stream: WritableStreamTree) => Promise<boolean> | Stream callback for replacing the file. |
| options? | ReplaceFileOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:194 @wholebuzz/fs / Exports / json
Module: json
Table of contents
Variables
Functions
- newJSONLinesFormatter
- newJSONLinesParser
- parseJSON
- parseJSONLines
- pipeJSONFormatter
- pipeJSONLinesFormatter
- pipeJSONLinesParser
- pipeJSONParser
- readJSON
- readJSONHashed
- readJSONLines
- serializeJSON
- serializeJSONLines
- writeJSON
- writeJSONLines
- writeShardedJSONLines
Variables
JSONStream
• Const JSONStream: any
Defined in: src/json.ts:11
Functions
newJSONLinesFormatter
▸ Const newJSONLinesFormatter(): Transform
Returns: Transform
Defined in: src/json.ts:146
newJSONLinesParser
▸ Const newJSONLinesParser(): ThroughStream
Returns: ThroughStream
Defined in: src/json.ts:147
parseJSON
▸ parseJSON(stream: ReadableStreamTree): Promise<unknown>
Parses JSON object from [[stream]]. Used to implement readJSON.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream | ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown>
Defined in: src/json.ts:72
parseJSONLines
▸ parseJSONLines(stream: ReadableStreamTree): Promise<unknown[]>
Parses JSON object from [[stream]]. Used to implement readJSON.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream | ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:80
pipeJSONFormatter
▸ pipeJSONFormatter(stream: WritableStreamTree, isArray: boolean): WritableStreamTree
Create JSON formatter stream.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream | WritableStreamTree | - |
| isArray | boolean | Accept array objects or property tuples. |
Returns: WritableStreamTree
Defined in: src/json.ts:127
pipeJSONLinesFormatter
▸ pipeJSONLinesFormatter(stream: WritableStreamTree): WritableStreamTree
Create JSON-lines formatter stream.
Parameters
| Name | Type |
| :------ | :------ |
| stream | WritableStreamTree |
Returns: WritableStreamTree
Defined in: src/json.ts:142
pipeJSONLinesParser
▸ pipeJSONLinesParser(stream: ReadableStreamTree): ReadableStreamTree
Create JSON parser stream.
Parameters
| Name | Type |
| :------ | :------ |
| stream | ReadableStreamTree |
Returns: ReadableStreamTree
Defined in: src/json.ts:119
pipeJSONParser
▸ pipeJSONParser(stream: ReadableStreamTree, isArray: boolean): ReadableStreamTree
Create JSON parser stream.
Parameters
| Name | Type |
| :------ | :------ |
| stream | ReadableStreamTree |
| isArray | boolean |
Returns: ReadableStreamTree
Defined in: src/json.ts:110
readJSON
▸ readJSON(fileSystem: FileSystem, url: string): Promise<unknown>
Reads a serialized JSON object or array from a file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem | FileSystem | - |
| url | string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown>
Defined in: src/json.ts:17
readJSONHashed
▸ readJSONHashed(fileSystem: FileSystem, url: string): Promise<[unknown, null | string]>
Reads a serialized JSON object from a file, and also hashes the file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem | FileSystem | - |
| url | string | The URL of the file to parse a JSON object from. |
Returns: Promise<[unknown, null | string]>
Defined in: src/json.ts:25
readJSONLines
▸ readJSONLines(fileSystem: FileSystem, url: string): Promise<unknown[]>
Reads a serialized JSON-lines array from a file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem | FileSystem | - |
| url | string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:35
serializeJSON
▸ serializeJSON(stream: WritableStreamTree, obj: object | any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSON.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream | WritableStreamTree | The stream to write a JSON object to. |
| obj | object | any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:88
serializeJSONLines
▸ serializeJSONLines(stream: WritableStreamTree, obj: any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSONLines.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream | WritableStreamTree | The stream to write a JSON object to. |
| obj | any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:103
writeJSON
▸ writeJSON(fileSystem: FileSystem, url: string, value: object | any[]): Promise<boolean>
Serializes object or array to a JSON file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem | FileSystem | - |
| url | string | The URL of the file to serialize a JSON object or array to. |
| value | object | any[] | The object or array to serialize. |
Returns: Promise<boolean>
Defined in: src/json.ts:44
writeJSONLines
▸ writeJSONLines(fileSystem: FileSystem, url: string, obj: object[]): Promise<boolean>
Serializes array to a JSON Lines file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem | FileSystem | - |
| url | string | The URL of the file to serialize a JSON array to. |
| obj | object[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:53
writeShardedJSONLines
▸ writeShardedJSONLines(fileSystem: FileSystem, url: string, obj: object[], shards: number, shardFunction?: (x: object, modulus: number) => number): Promise<boolean>
Parameters
| Name | Type |
| :------ | :------ |
| fileSystem | FileSystem |
| url | string |
| obj | object[] |
| shards | number |
| shardFunction | (x: object, modulus: number) => number |
Returns: Promise<boolean>
Defined in: src/json.ts:57
