@entryscape/entrysync
v0.9.0
Published
A JS-library for harvesting/synchronizing metadata to and from EntryStore
Keywords
Readme
EntrySync - library for synchronizing entries with metadata from various sources
The EntryScape platform with the backend EntryStore relies on the use of entries that may contain both a resource, metadata and external metadata. At the heart of the library is the mechanism that synchronizes metadata of entries by doing metadata fingerprinting.
Installation
Dependencies are installed by running pnpm install.
Testing
Do tests by running pnpm test.
A docker container will be initialized, containing test fixtures. This container can be needed to be stopped (removed automatically if stopped), in order to execute the tests again.
Synchronization patterns
The core functionality of the library lets you build a custom synchronization mechanism. However, most cases can be covered with the following established synchronization patterns. The patterns are listened below, each together with a corresponding CLI command.
Graph synchronization pattern - src/graph/graphSync.js
This pattern takes a single graph as input and breaks it up into smaller graphs centred around entities and synchronizes them as entries.
The algorithm for breaking up the graph is based on detecting entities based on rdf:type and includes all outgoing triples and then repeats the procedure for all blank nodes in object position.
CLI command:
cd cli
node graphSync.js config.jsonWhere config.json has to be provided, check the example cli/graphSync_exampleConfig.json
Type based synchronization pattern - src/context/typeSync.js
This pattern synchronizes entries in one context with another context (potentially in another EntryStore instance). Detection of entries is based on one or several classes (rdf:type).
CLI command:
cd cli
node typeSync.js config.jsonWhere config.json has to be provided, check the example cli/typeSync_exampleConfig.json
Traversal synchronization pattern - src/context/traverseSync.js
This pattern synchronizes entries in one context with another context (potentially in another EntryStore instance). Detection of entries is based on an initial starting point of one or several entries and includes all entries reachable via a set of properties.
CLI command:
cd cli
node traverseSync.js config.jsonWhere config.json has to be provided, check the example cli/traverseSync_exampleConfig.json
Utility functionality in CLI
Context creation
cd cli
node context.js config.json create TYPE \[ENTRYID]You can leave out the ENTRYID parameter and an id will be generated for you. The value of TYPE must be one of:
- catalog - Data catalog typically handled by EntryScape Catalog
- terms - Terminology context typically handled by EntryScape Terms
- workbench - Any kind of project with linked data typically handled by EntryScape Workbench
- model - Modelling project typically handled by EntryScape Models
Context removal
cd cli
node context.js config.json remove ENTRYIDWhere ENTRYID has to be an entryid of an existing context.
Context listing
cd cli
node context.js config.json listCore functionality
The following classes are central to how the synchronization works:
- src/EntrySync.js This class handles synchronizing metadata as Entries in an EntryStore instance, uses EntityIndex and DuplicateIndex to steer what should be synchronized.
- src/EntityIndex.js This class handles an index of synchronized entries with the corresponding metadata fingerprint, useful to speed up consecutive synchronizations, can be persisted on disk.
- src/DuplicateIndex.js Keeps track of which entities have already been synced and blocks them from being duplicated.
