@sanity/import
v6.0.0
Published
Import documents to a Sanity dataset
Maintainers
Readme
@sanity/import
Imports documents from an ndjson-stream to a Sanity dataset
Requirements
- Node.js >= 20.19.1 (or >= 22.12 for Node 22)
Installing
npm install --save @sanity/importUsage
import fs from 'node:fs'
import {createClient} from '@sanity/client'
import {sanityImport} from '@sanity/import'
const client = createClient({
projectId: '<your project id>',
dataset: '<your target dataset>',
token: '<token-with-write-perms>',
useCdn: false,
})
// Input can either be a readable stream (for a `.tar.gz` or `.ndjson` file), a folder location (string), or an array of documents
const input = fs.createReadStream('my-documents.ndjson')
const options = {
/**
* A Sanity client instance, preconfigured with the project ID and dataset
* you want to import data to, and with a token that has write access.
*/
client: client,
/**
* Which mutation type to use for creating documents:
* `create` (default) - throws error if document IDs already exists
* `createOrReplace` - replaces documents with same IDs
* `createIfNotExists` - skips document with IDs that already exists
*
* Optional.
*/
operation: 'create',
/**
* Function called when making progress. Gets called with an object of
* the following shape:
* `step` (string) - the current step name of the import process
* `current` (number) - the current progress of the step, only present on some steps
* `total` (number) - total items before complete, only present on some steps
*/
onProgress: (progress) => {
/* report progress */
},
/**
* Whether or not to allow assets in different datasets. This is usually
* an error in the export, where asset documents are part of the export.
*
* Optional, defaults to `false`.
*/
allowAssetsInDifferentDataset: false,
/**
* Whether or not to allow unicode replacement characters (U+FFFD) in imported
* documents. This is often a sign of a corrupt export.
*
* Optional, defaults to `false`.
*/
allowReplacementCharacters: false,
/**
* Whether or not to allow failing assets due to download/upload errors.
*
* Optional, defaults to `false`.
*/
allowFailingAssets: false,
/**
* Whether or not to replace any existing assets with the same hash.
* Setting this to `true` will regenerate image metadata on the server,
* but slows down the import.
*
* Optional, defaults to `false`.
*/
replaceAssets: false,
/**
* Whether or not to skip cross-dataset references. This may be required
* when importing a dataset with cross-dataset references to a different
* project, unless a dataset with the referenced name exists.
*
* Optional, defaults to `false`.
*/
skipCrossDatasetReferences: false,
/**
* Whether or not to import system documents (like permissions, custom retention, and content releases).
* This is usually not necessary, and may cause conflicts if the target dataset
* already contains these documents. On a new dataset, it is recommended that roles are re-created
* manually, and that any custom retention policies are re-created manually.
*
* Optional, defaults to `false`.
*/
allowSystemDocuments: false,
}
sanityImport(input, options)
.then(({numDocs, warnings}) => {
console.log('Imported %d documents', numDocs)
// Note: There might be warnings! Check `warnings`
})
.catch((err) => {
console.error('Import failed: %s', err.message)
})Future improvements
- When documents are imported, record which IDs are actually touched
- Only upload assets for documents that are still within that window
- Only strengthen references for documents that are within that window
- Only count number of imported documents from within that window
- Asset uploads and strengthening can be done in parallel, but we need a way to cancel the operations if one of the operations fail
- Introduce retrying of asset uploads based on hash + indexing delay
- Validate that dataset exists upon start
- Reference verification
- Create a set of all document IDs in import file
- Create a set of all document IDs in references
- Create a set of referenced ID that do not exist locally
- Batch-wise, check if documents with missing IDs exist remotely
- When all missing IDs have been cross-checked with the remote API (or a max of say 100 items have been found missing), reject with useful error message.
License
MIT-licensed. See LICENSE.
