apify-schema-tools
v3.1.0
Published
Apify schema managing tools.
Maintainers
Readme
Apify Schema Tools
This is a tool intended for Apify actors developers.
It allows generating JSON schemas and TypeScript types, for input and dataset, from a single source of truth, with a few extra features.
As a quick example, assume you have a project that looks like this:
my-project
├── .actor
│ ├── actor.json
│ ├── dataset_schema.json
│ └── input_schema.json
└── src-schemas
├── dataset-item.json <-- source file for dataset
└── input.json <-- source file for inputAfter running this script, you will have:
my-project
├── .actor
│ ├── actor.json
│ ├── dataset_schema.json <-- updated with the definitions from src-schemas
│ └── input_schema.json <-- updated with the definitions from src-schemas
├── src
│ └── generated
│ ├── dataset.ts <-- TypeScript types generated from src-schemas
│ ├── input-utils.ts <-- utilities to fill input default values
│ └── input.ts <-- TypeScript types generated from src-schemas
└── src-schemas
├── dataset-item.json
└── input.jsonQuickstart
These instructions will allow you to quickly get to a point where you can use
the apify-schema-tools to generate your schemas and TypeScript types.
Let's assume you are starting from a new project created from an Apify template.
- Install
apify-schema-tools:
npm i -D apify-schema-tools- Initialize your project with default settings:
npx apify-schema-tools initThis command will:
- Create a
src-schemasfolder withinput.jsonanddataset-item.jsonfiles. - Create the necessary
.actorfiles if they don't exist. - Add configuration to your
package.json. - Add a
generatescript to yourpackage.json.
- Generate JSON schemas and TypeScript types from the source schemas:
npx apify-schema-tools sync- Now, you will be able to use TypeScript types and utilities in your project:
import { Actor } from 'apify';
import type { DatasetItem } from './generated/dataset.ts';
import type { Input } from './generated/input.ts';
import { getInputWithDefaultValues, type InputWithDefaults } from './generated/input-utils.ts';
await Actor.init();
const input: InputWithDefaults = getInputWithDefaultValues(await Actor.getInput<Input>());
'...'
await Actor.pushData<DatasetItem>({
tile: '...',
url: '...',
text: '...',
timestamp: '...',
});
await Actor.exit();Configuration
You can configure apify-schema-tools in two ways:
Using package.json configuration
The init command automatically adds configuration to your package.json. You can also manually add an apify-schema-tools section to customize the behavior:
{
"name": "my-actor",
"version": "1.0.0",
"apify-schema-tools": {
"input": ["input", "dataset"],
"output": ["json-schemas", "ts-types"],
"srcInput": "src-schemas/input.json",
"srcDataset": "src-schemas/dataset-item.json",
"outputTSDir": "src/generated",
"includeInputUtils": true
}
}Using command-line arguments
You can also pass options directly to the sync command. You can check which options are available:
$ npx apify-schema-tools --help
usage: apify-schema-tools [-h] {init,sync,check} ...
Apify Schema Tools - Generate JSON schemas and TypeScript files for Actor input and output dataset.
positional arguments:
{init,sync,check}
init Initialize the Apify Schema Tools project with default settings.
sync Generate JSON schemas and TypeScript files from the source schemas.
check Check the schemas for consistency and correctness.
optional arguments:
-h, --help show this help message and exit$ npx apify-schema-tools sync --help
usage: apify-schema-tools sync [-h] [-i [{input,dataset} ...]] [-o [{json-schemas,ts-types} ...]] [--src-input SRC_INPUT] [--src-dataset SRC_DATASET] [--add-input ADD_INPUT] [--add-dataset ADD_DATASET] [--input-schema INPUT_SCHEMA] [--dataset-schema DATASET_SCHEMA] [--output-ts-dir OUTPUT_TS_DIR]
[--deep-merge] [--include-input-utils {true,false}]
optional arguments:
-h, --help show this help message and exit
-i [{input,dataset} ...], --input [{input,dataset} ...]
specify which sources to use for generation (default: input,dataset)
-o [{json-schemas,ts-types} ...], --output [{json-schemas,ts-types} ...]
specify what to generate (default: json-schemas,ts-types)
--src-input SRC_INPUT
path to the input schema source file (default: src-schemas/input.json)
--src-dataset SRC_DATASET
path to the dataset schema source file (default: src-schemas/dataset-item.json)
--add-input ADD_INPUT
path to an additional schema to merge into the input schema (default: undefined)
--add-dataset ADD_DATASET
path to an additional schema to merge into the dataset schema (default: undefined)
--input-schema INPUT_SCHEMA
the path of the destination input schema file (default: .actor/input_schema.json)
--dataset-schema DATASET_SCHEMA
the path of the destination dataset schema file (default: .actor/dataset_schema.json)
--output-ts-dir OUTPUT_TS_DIR
path where to save generated TypeScript files (default: src/generated)
--deep-merge whether to deep merge additional schemas into the main schema (default: false)
--include-input-utils {true,false}
include input utilities in the generated TypeScript files: 'input' input and 'ts-types' output are required (default: true)Setting up your project manually
If you prefer to set up your project manually instead of using the init command, you can follow these steps:
- Create a
src-schemasfolder:
mkdir src-schemas- Create the files
input.jsonanddataset-item.jsoninside thesrc-schemas. Here is some example content:
{
"title": "Input schema for Web Scraper",
"type": "object",
"schemaVersion": 1,
"properties": {
"startUrls": {
"type": "array",
"title": "Start URLs",
"description": "List of URLs to scrape",
"default": [],
"editor": "requestListSources",
"items": {
"type": "object",
"properties": {
"url": { "type": "string" }
}
}
}
},
"required": ["startUrls"],
"additionalProperties": false
}{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Dataset schema for Web Scraper",
"type": "object",
"properties": {
"title": {
"type": "string",
"title": "Title",
"description": "Page title"
},
"url": {
"type": "string",
"title": "URL",
"description": "Page URL"
},
"text": {
"type": "string",
"title": "Text content",
"description": "Extracted text"
},
"timestamp": {
"type": "string",
"title": "Timestamp",
"description": "When the data was scraped"
}
},
"required": ["title", "url"]
}- Create the file
.actor/dataset_schema.jsonand enter some empty content:
{
"actorSpecification": 1,
"fields": {},
"views": {}
}- Link the dataset schema in
.actor/actor.json:
{
"actorSpecification": 1,
"...": "...",
"input": "./input_schema.json",
"storages": {
"dataset": "./dataset_schema.json"
},
"...": "..."
}- Generate JSON schemas and TypeScript types from the source schemas:
npx apify-schema-tools syncResolving conflicts
The sync command includes interactive conflict resolution to help you handle schema inconsistencies.
When the tool detects conflicts between your source schemas and existing target schemas,
it will prompt you to choose which version to keep.
When conflicts are detected
Conflicts occur when there are differences between your source schema files and the schemas that would be generated in the target locations. Common scenarios include:
- The source and the target schema have different title of description.
- The same property has different title of description in the source and target schemas.
- Properties that exist in the target schema are missing from the source schema.
Interactive mode (default behavior)
By default, when conflicts are detected, the tool will prompt you interactively to resolve each conflict:
⚠️ Field [properties > startUrls > description] in the source schema differs from
the target schema. Choose which to keep: (Use arrow keys)
❯ [source] List of URLs to scrape
[target] List of URLs to parse⚠️ Property "searchTerm" was removed from the source schema. What do you want to do? (Use arrow keys)
❯ Confirm deletion
Restore fieldNon-interactive modes
For automated scripts or CI/CD pipelines, you can use these options:
Force mode (--force)
Automatically resolves all conflicts by preferring the source schema:
npx apify-schema-tools sync --forceThis will:
- Always use values from the source schema when there are conflicts
- Remove properties that exist in target but not in source
- Overwrite target schemas without prompting
Fail on conflict (--fail-on-conflict)
Stops execution and exits with an error code when conflicts are detected:
npx apify-schema-tools sync --fail-on-conflictChecking if the schemas are in sync with the source schemas
The check command allows you to verify that your generated schemas and TypeScript files are up-to-date with your source schemas.
This is particularly useful in CI/CD pipelines to ensure that developers haven't forgotten to run the generation after making changes to the source schemas.
npx apify-schema-tools checkThe check command will:
- Compare the current generated files with what would be generated from the source schemas
- Exit with code 0 if everything is in sync
- Exit with code 1 if there are differences, showing you which files are out of sync
You can add this to your CI pipeline to automatically detect when schemas need to be regenerated:
{
"scripts": {
"generate": "apify-schema-tools sync",
"check-schemas": "apify-schema-tools check",
"test": "npm run check-schemas && npm run test:unit"
}
}The check command accepts the same configuration options as the sync command,
either through package.json configuration or command-line arguments,
ensuring it checks the same files that would be generated.
Ignoring descriptions while checking (--ignore-descriptions)
The check command can ignore the title and description fields in the source and target schemas, and their properties.
This allows you to edit your descriptions and change how your Actor will appear on the Apify platform,
without having to run this tool to synchronize the schemas, but still being able to check for semantical correctness:
npx apify-schema-tools check --ignore-descriptionsThe next time someone will try to run the sync command,
they will be prompted to solve the conflicts in the descriptions.
Extra features
Keep only allowed properties in Input schema
As an example, when type is "array", the property items is forbidden if editor is different from "select".
Merge a second schema into the main one
This feature is useful when working in monorepos. It allows you to define a single common schema across all the actors in the repo, and to add or override the tile, the description, and some properties, when necessary.
To use it, use the parameters --add-input and --add-dataset, e.g.:
npx apify-schema-tools sync \
--input input,dataset \
--output json-schemas,ts-types \
--src-input ../src-schemas/input.json \
--src-dataset ../src-schemas/dataset-item.json \
--add-input src-schemas/input.json \
--add-dataset src-schemas/dataset-item.jsonYou can also define the order of the properties in the merged schema.
To do so, add a position field to the properties. The script will follow these rules:
- Properties without position or with the same position, are sorted in the same order in which they appear in the source schemas, with the ones in the additional schema after the ones in the base schema.
- If both properties with and without position exist, the ones without position will appear at the end.
- The position will be overwritten if a property is overwritten.
An example:
# Source input schema
{
"title": "My input schema",
"description": "My input properties",
"type": "object",
"properties": {
"a": { "type": "string", "position": 3 },
"b": { "type": "string" }, // will be last, because it has no position
"c": { "type": "string", "position": 1 }
},
"required": ["a"],
"additionalProperties": false
}# Additional input schema
{
"description": "My input properties, a bit changed", // will override the description
"type": "object",
"properties": {
"c": { "type": "boolean", "position": 5 }, // will override also the position
"d": { "type": "string", "position": 1 } // will be first
},
"required": ["c", "d"], // will be merged to the source required parameters
"additionalProperties": false
}# Final input schema
{
"title": "My input schema",
"description": "My input properties, a bit changed",
"type": "object",
"properties": {
"d": { "type": "string" },
"a": { "type": "string" },
"c": { "type": "boolean" },
"b": { "type": "string" }
},
"required": ["a", "c", "d"],
"additionalProperties": false
}Use the option --deep-merge to merge object properties and array items, instead of overwriting every definition.
