@cwrc/leafwriter-validator

v4.4.2

Published

9 months ago

webworker to handle validation on Leaf-Writer

0High
0Medium
0Low

canadian-writing-research-collaboratory

lucaju

ajmacdonald

xml rdf editor annotation rng validation

LEAF-Writer Validator

NPM npm type definitions

LEAF-Writer Validator is a web worker able to validate XML documents stored in DOM trees according to a Relax NG schema. It is a fully typed wrapper around Salve (Schema-Aware Library for Validation and Edition) and salve-dom, which are de facto the library that validates XML documents. Please refer to their respective documentation for information about how these two libraries work.

Given an XML document and its Relax NG schema, LEAF-Writer Validator can perform validation, retrieve information about errors (including documentation, if available), get information about elements defined in the schema (tags, attributes values), list possible attributes and children elements for a tag, and speculatively validated insertion of tags in a specific context (before, after, inside, or around a tag) to assist the user in editing the document.

As a web worker, LEAF-Writer Validator is fast and non-blocking. Validation will occur in parallel to the browser’s main thread. However, depending on the schema’s complexity and the document’s length, the validation processes (including all the features listed above) might take some time to respond. When simply validating the document, the web worker emits events as it goes through the document. These events can be used to keep the UI updated. Other features, such as speculatively validated tag insertion in a specific context, are asynchronous and will only return at the end of the process.

LEAF-Writer Validator

Install

To install as a dependency, simply type npm install @cwrc/leafwriter-validator

LEAF-Writer Validator uses comlink as peer-dependency to facilitate communication to and from web workers. So, we should install it as well. npm install comlink

Load as a web worker

There are two always to incorporate LEAF-Writer Validator.

Prebuilt

The first method uses a prebuilt bundled minified version leafwriter-validator.worker.js found in the dist folder.

Copy leafwriter-validator.worker.js to the root of the website public folder.
Import Comlink and LEAF-Writer Validator type
Load Web worker from the root of the public folder

import type { Validator } from '@cwrc/leafwriter-validator';
import * as Comlink from 'comlink';

const worker = await new Worker('leafwriter-validator.worker.js');
const validator: Comlink.Remote<Validator> = Comlink.wrap(worker);

For development

The second method is more suitable for development. In this case, we should be imported directly from the node_modules dependency. Webpack will take care of splitting the file into a web worker.

Import Comlink and LEAF-Writer Validator type
Load Web worker from the @cwrc/leafwriter-validator

import type { Validator } from '@cwrc/leafwriter-validator';
import * as Comlink from 'comlink';

const worker = await new Worker(new URL('@cwrc/leafwriter-validator', import.meta.url));
const validator: Comlink.Remote<Validator> = Comlink.wrap(worker);

Initialize

To use the validator, we first need to initialize it with the schema. Call the the method initialize passing the InitializeParameters with the schema'sid and URL from which LEAF-Writer Validator can download a Relax NG schema. We use the id to identify the schema and avoid reloading the same schema if the method is called with the same parameters. Optionally, we can pass a third property shouldCache (default: true) to cache the schema.

This is an asynchronous function because LEAF-Writer Validator has to download and convert the schema to salve's internal format (more about how Salve converts schemas here). Besides converting the schema, Salve can export a JSON version to be cached and a manifest with a hash that can be used to check if the file got updated and should be reprocessed. For this reason, we cache and store the schemas by default.

The cached schema is stored in the browser's IndexedDB. Besides the schema's id and url, ot contanis the stringfied version of salve's processed object, the date when the cache was created, any warning generated by salve, and a hash reresentation of the schema file.

Basically, everytime you invoke this method, LEAF-Writer first check if the validator was already initialized, then it check if there is cache for the requested schema. If there is no cache, we process the schema and store the cache. If there is cache, we check if the file has changed using the hash. If it did change, we reprocess the file and save a new cache, otherwise we use the cached version.

It returns an object (InitializeResponse) with a success (boolean) property. If there is any error, there is second property error with an error message.

Example:

const response = await validator.loadSchema({
  id: 'cwrcTeiLite',
  url: 'https://cwrc.ca/schemas/cwrc_tei_lite.rng',
});

console.log(response.success); // true

Validate

With the schema loaded, we can now send the document into LEAF-Writer Validator, calling the method validate, passing the XML document as a string and a callback. We can call this method at any time after the schema is loaded. The validation process dispatches state-update events that trigger a callback function as the validator walks through the document. This callback will give us updates and let us know when the process is completed.

Parameters

| Name | Type | Description | | -------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | XML document* | string | The XML document | | callback | func | A function to receive event updates and the final results. Signature: (workingStateData: WorkingStateData) => void. See more about WorkingStateData. |

Example:

const documentString = new XMLSerializer(XMLDocument);

validator.validate(documentString, (event) => {
  if (event.state === 1 || event.state === 2) {
    console.log(event.partDone * 100); // 22% (assuming partDone === 0.2)
  }

  if (event.state === 4) {
    console.log(event.valid); // true
  }

  if (event.state === 3) {
    console.log(event.valid); // false
    event.errors.forEach((error) => console.log(error)); //the list of errors
  }
});

While the state is [1] INCOMPLETE or [2] WORKING, the WorkingStateData only returns { partDone, state }, which can be used to show a progress bar expressed in percentage.

if the state returns [3] INVALID the WorkingStateData returns the full object { errors, partDone, state, valid }, valid is false. This is the last time the callback will be triggered since the validator has completed the process. Error are object type ValidationError.

if the state returns [4] VALID the WorkingStateData only returns { partDone, state, valid }, where valid is true. This is the last time the callback will be triggered since the validator has completed the process.

On Demand vs. Auto-validation

We can call validate on demand, that is, every time a user clicks on a button, or auto validate every time the document changes (recommended). Auto-validation is helpful because the web worker is not synchronized with the XML in the DOM. Auto-validation is a valuable feature and keeps the document updated on the web worker.

This is important because when we have to get information about a tag or an attribute or get possible validated children for a tag, the validator needs to have the most updated version of the document already validated to be able to get perform the task. Otherwise, every time we want to do this task, we would have to pass the entire document and wait for validation before getting the results.

Auto-validation does not prevent on-demand validation. We added a validate button to the LEAF-Writer interface to improve usability to allow users to trigger the actions themselves. We also trigger validate before saving the document to warn users of potential mistakes.

Types of errors

Salve can handle several different validation errors (See here). The most important in our context are ElementNameError, AttributeNameError, AttributeValueError, ChoiceError, ValidationError. On LEAF-Writer Validator, these errors are sent to the main thread as an array of objects (ValidationError) with derails about where the error occurs and documentation, if available.

Element Name

An Element Name Error occurs when a (parent) tag contains a child element that is not allowed or not defined in the schema. For instance, the document contains <div><sunny>In the park</sunny></div> but the tag <div> cannot have <sunny> as a child. In this case, <sunny> is the target, and <div> is the element.

We can use {error.element.xpath} to inform, highlight, and navigate directly to the location where the error occurs in the document. If available, we can also display the element's documentation {error.element.documentation} and full name {error.element.fullname}. If the target is defined in the schema, so it might have some documentation {error.target.documentation} and a fullname {error.target.fullname} as well.

Attribute Name

An Attribute Name Error occurs when a tag contains any attribute not defined in the schema. For instance, the document contains <closer day=“22">, but the element <closer> on the schema does not contain the attribute day. In this case, <closer> is the element, and day is the target.

Attribute Value

An Attribute Value Error occurs when a tag contains an attribute that has a value outside the range defined in the schema. For instance, the document contains <persName cert="none"> but the attribute cert [certainty] of element <persName> [personal name] cannot have none as a value. In this case, cert is the target, and <persName> is the element.

We can use {error.element.xpath} to inform, highlight, and navigate directly to the location where the error occurs in the document. If available, we can also display the element's documentation {error.element.documentation} and full name {error.element.fullname}. The same can be done to the attribute. If available, we can also display documentation {error.target.documentation} and full name {error.target.fullname}.

Choice Error

Validation Error

Has validator

LEAF-Writer Validator has a handy function to check if the validator is initialized. The validator needs a schema and a document. So, calling hasValidator without one or the other will return false. Otherwise, this function returns true. Use this function before triggering any method to check if the task can be done.

const hasValidator = validator.hasValidator();
console.log(status); // true or false

Get information from Schema

We can use the validator to get detailed information about a tag or an attribute defined in the schema. We can use these methods to expand errors details with documentation and to get information about elements at any time. It is helppful to populate possible children for a tag or attribute.

Get Tag at

Get the element definition using xpath. Call getTagAt passing tagName, parentXpath and index . The validator will look for the tagName at the possible elements on the tags's parentXpath and index. It return an object NodeDetail with the element type (tag), the tag name, documentation [if available], fullName [extracted from documentation, if available], and ns [namespace if available]

Parameters

| Name | Type | Description | | ------------- | ------ | --------------------------------------------------------- | | tagName* | string | The tag name | | parentXpath* | string | The tag's parent Xpath | | index | number | The index position relative to its parent. Default is 0 |

Example:

const tag = await validator.getTagAt('p', 'TEI/text/body/div');

console.log(tag);
/*
{
  type: 'tag',
  name:'p',
  ns: 'http://www.tei-c.org/ns/1.0',
  documentation: '(paragraph) marks paragraphs in prose. [3.1. 7.2.5. ]',
  fullName: 'Paragraph',
  eventType: 'enterStartTag',
}
*/

Get Nodes for a Tag at

Get a list of element for a tag using xpath. Call getNodesForTagAt passing xpath and index . The validator return an array of objects NodeDetail with the element type (tag), the tag name, documentation [if available], fullName [extracted from documentation, if available], and ns [namespace if available]

Parameters

| Name | Type | Description | | ------- | ------ | --------------------------------------------------------- | | xpath* | string | The tag's Xpath | | index | number | The index position relative to its parent. Default is 0 |

Example:

const tags = await validator.getNodesForTagAt('TEI/text/body/div');

console.log(tags);
/*
[
  {
    type: 'tag',
    name:'p',
    ns: 'http://www.tei-c.org/ns/1.0',
    documentation: '(paragraph) marks paragraphs in prose. [3.1. 7.2.5. ]',
    fullName: 'Paragraph',
    eventType: 'enterStartTag',
  }
  ...
]
*/

Get Attributes for a Tag at

Get a list of attributes for a tag using xpath. Call getAttributesForTagAt passing xpath and index . The validator return an array of objects NodeDetail with the element type (attribute), the attribute name, documentation [if available], fullName [extracted from documentation, if available], and ns [namespace if available]

Parameters

Example:

const attributes = await validator.getAttributesForTagAt('TEI/text/body/div/p');

console.log(attributes);
/*
[
  {
    type: 'attribute',
    name: 'seg',
    fullName: 'arbitrary segment',
    documentation: `(arbitrary segment) represents any segmentation of text below the chunk level. [16.3.  6.2.  7.2.5. ]`,
    ns: 'http://www.tei-c.org/ns/1.0',
    eventType: 'attributeName',
  }
  ...
]
*/

Get Tag Attribute at

Get attribute's details for a tag using xpath. Call getTagAttributeAt passing attributeName and parentXpath . The validator return an object NodeDetail with the element type (attribute), the attribute name, documentation [if available], fullName [extracted from documentation, if available], and ns [namespace if available]

Parameters

| Name | Type | Description | | --------------- | ------ | ------------------------------------------------------ | | attributeName* | string | The attribute's name | | parentXpath* | string | The attribute's parent Xpath (i.e., the tag's xpath) |

Example:

const attribute = await validator.getTagAttributeAt('seg', 'TEI/text/body/div/p');

console.log(attribute);
/*
{
  type: 'attribute',
  name: 'seg',
  fullName: 'arbitrary segment',
  documentation: `(arbitrary segment) represents any segmentation of text below the chunk level. [16.3.  6.2.  7.2.5. ]`,
  ns: 'http://www.tei-c.org/ns/1.0',
  eventType: 'attributeName',
}
*/

Get Values for a Tag Attribute at

Get a list of possible values for tag's attribute using xpath. Call getValuesForTagAttributeAt passing xpath. The validator return an array of objects NodeDetail with the element type (value), and the value name.

Parameters

| Name | Type | Description | | ------- | ------ | --------------------------------------------------------------------------------------------------------- | | xpath* | string | The attribute's Xpath. The last part of the Xpath must start with a @ sign to define it as an attribute |

Example:

const attributeValue = await validator.getValuesForTagAttributeAt(
  '/TEI/text/body/div/closer/signed/persName/persName/@cert',
);

console.log(attributeValue);
/*
[
  { type: 'value', name: 'high', value: 'high', eventType: 'attributeValue', type: 'attributeValue'},
  { type: 'value', name: 'medium', value: 'medium', eventType: 'attributeValue', type: 'attributeValue'},
  { type: 'value', name: 'low', value: 'low', eventType: 'attributeValue', type: 'attributeValue'},
]
*/

Get Possible Nodes At

We can use the validator to get valid children tags and nodes for context. For instance, let's say we want to insert a tag inside an empty <p>.

Call the method getPossibleNodesAt passing an object Target and an optional object PossibleNodesAtOptions. The target should contain the tag's xpath and index and optionally a selection object. The selection tell the validator the context of your action and which type of action you want to make. The second and optional obejct has only one property - speculativeValidate (boolean) that tells the validator to speculativelly valides the intended action. The speculativeValidate property is true by default.

This method first get all the possible children tags for <p> at the exact position on the document (context). If the optional speculativeValidate is not present or false, the validor return the lisf of posivle tags in the context.

It the optional speculativeValidate is true, then the validator virtually loop through the list of possible tags, inserting each one inside <p> and (speculatively) validates according to the tag context but respecting the schema. If the insertion produces an invalid structure, the tag is marked as invalid. The remaining tags are considered validated suggestions.

The method returns the object PossibleNodesAt, which includes the original target object and a list of possile tags with a boolean property invalid attached to each one.

Parameters

| Name | Type | Description | | -------- | ------------------------------------------------- | --------------------------------------------------------------------------------- | | target* | Target | The target xpath and index and optionally a selection objects | | options | PossibleNodesAtOptions | The options for this request,most notably speculativeValidate |

Example:

With speculative validation

const results = await validator.getPossibleNodesAt({
  xpath: 'TEI/text/body/div/p',
  index: 0,
  selection: {
    endContainerIndex: 0
    endOffset: 20
    startContainerIndex: 0
    startOffset: 14
    type: "span"
  }
}, {speculativeValidate: true}); // this can be omitted since it is the default

console.log(results);
/*
{
  target: {
    index: 0,
    xpath: 'TEI/text/body/div/p',
    selection: {...}
  }
  nodes: [
    {
      type: 'tag',
      name:'p',
      ns: 'http://www.tei-c.org/ns/1.0',
      documentation: '(paragraph) marks paragraphs in prose. [3.1. 7.2.5. ]',
      fullName: 'Paragraph',
      eventType: 'enterStartTag',
      invalid: true,
    },
    {
      type: 'tag',
      name:'pb',
      ns: 'http://www.tei-c.org/ns/1.0',
      documentation: '(page break) marks the start of a new page in a paginated document. [3.10.3. ]',
      fullName: 'Page Break',
      eventType: 'enterStartTag',
      invalid: false,
    },
    ...
  ]
}
*/

Without speculative validation

const results = await validator.getPossibleNodesAt({
  xpath: 'TEI/text/body/div/p',
  index: 0,
  selection: {
    endContainerIndex: 0
    endOffset: 20
    startContainerIndex: 0
    startOffset: 14
    type: "span"
  }
}, {speculativeValidate: false});

console.log(results);
/*
{
  target: {
    index: 0,
    xpath: 'TEI/text/body/div/p',
    selection: {...}
  }
  nodes: [
    {
      type: 'tag',
      name:'p',
      ns: 'http://www.tei-c.org/ns/1.0',
      documentation: '(paragraph) marks paragraphs in prose. [3.1. 7.2.5. ]',
      fullName: 'Paragraph',
      eventType: 'enterStartTag',
    },
    {
      type: 'tag',
      name:'pb',
      ns: 'http://www.tei-c.org/ns/1.0',
      documentation: '(page break) marks the start of a new page in a paginated document. [3.10.3. ]',
      fullName: 'Page Break',
      eventType: 'enterStartTag',
    },
    ...
  ]
}
*/

Get Valid Nodes At

A convinent method to get valid nodes. This is similar to Get Possible Nodes At, except that it will return only that nodes considered valid.

Call the method getPossgetValidNodesAtibleNodesAt passing an object Target . The target should contain the tag's xpath and index and optionally a selection object. The selection tell the validator the context of your action and which type of action you want to make.

This method first get all the possible children tags for <p> at the exact position on the document (context). Then the validator virtually loop through the list of possible tags, inserting each one inside <p> and (speculatively) validates according to the tag context but respecting the schema.If the insertion produces an invalid structure, the tag is discarded. The remaining tags are considered validated suggestions.

The method returns the object PossibleNodesAt, which includes the original target object and a list of speculative valid tags.

Parameter

| Name | Type | Description | | -------- | ----------------- | --------------------------------------------------------------------------------- | | target* | Target | The target xpath and index and optionally a selection objects |

Example:

const results = await validator.getValidNodesAt({
  xpath: 'TEI/text/body/div/p',
  index: 0,
  selection: {
    endContainerIndex: 0
    endOffset: 20
    startContainerIndex: 0
    startOffset: 14
    type: "span"
  }
});

console.log(results);
/*
{
  target: {
    index: 0,
    xpath: 'TEI/text/body/div/p',
    selection: {...}
  }
  nodes: [
    {
      type: 'tag',
      name:'p',
      ns: 'http://www.tei-c.org/ns/1.0',
      documentation: '(paragraph) marks paragraphs in prose. [3.1. 7.2.5. ]',
      fullName: 'Paragraph',
      eventType: 'enterStartTag',
    },
    {
      type: 'tag',
      name:'pb',
      ns: 'http://www.tei-c.org/ns/1.0',
      documentation: '(page break) marks the start of a new page in a paginated document. [3.10.3. ]',
      fullName: 'Page Break',
      eventType: 'enterStartTag',
    },
    ...
  ]
}
*/

Reset

Use reset to dispose of the validator, schema and the document from the web worker. This is handy when switching between documents with different schemas in the same session.

Example:

validator.reset();

Clear Cache

Convenient function to delete all schemas cached by the LEAF-Writer Validator stored in the IndexedDB on the table LEAF-Writer-Validator.

Example:

await validator.clearCache();

Types

We use TypeDoc to autogenerate documentation from the code. Run npm run build-documentation to get a nice page with all the types.

InitializeParameters

| Name | Type | Description | | ----------- | ------- | -------------------------------------------------------------------------------- | | id* | string | Schema identifier | | url* | string | The schema url. | | shouldCache | boolean | Whether or not to cache the validator to this speciffic schema. Default is true. |

InitializeResponse

| Name | Type | Description | | --------- | ------- | ----------------------------------------------------------------- | | success* | boolean | Indiates if the LEAF-Writer Validator was initiated with success. | | error | Error | An error object with a property message (string) |

NodeDetail

| Name | Type | Description | | ------------- | ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | type* | attribute | attributeValue | tag | text | A simplfication on the internal validator events. Note: tag includes all tag events endTag | enterStartTag | leaveStartTag. | | name* | string | That name of the node (tag name, attribute name, or atribute value). | | eventType* | attributeName | attributeValue | endTag | enterStartTag | leaveStartTag | text | Intetnal validator events. Useful for debug. | | documentation | string | Documentation (if available). | | fullName | string | Full name extracted from documentation (if available). | | ns | string | The namespace. | | invalid | boolean | If speculative validated, it means that the node will produce invalid structure according to the context. | | value | string | The value a node can hold. Only available for attributeValue (the value of the attribute itself) or for text, whicj can take a form of a regular expression RegExp |

PossibleNodesAt

| Name | Type | Description | | -------- | --------------------------- | --------------------------- | | target* | target | The target in the document. | | nodes* | NodeDetail[] | An array of possible tags. |

PossibleNodesAtOptions

| Name | Type | Description | | ------------------- | ------- | ------------------------------ | ----------------------------------------------------------- | | speculativeValidate | boolean | The tag Xpath in the document. | nabled/disabled speculatively validation. Default is true |

Target

| Name | Type | Description | | --------- | ----------------------------------- | -------------------------------------------------------------------------------- | | xpath* | string | The tag Xpath in the document. | | index* | number | The index position relative to its parent. | | selection | TargetSelection | Give more specificity to the request. Omit to consider the caret exact position. |

TargetSelection

| Name | Type | Description | | ------------------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | type* | 'span' | 'inside' | 'around' | 'before' | 'after' | 'change' | span: Use when to add a portion of the document inside a new tag. We must also provide startContainerIndex, startOffset, endContainerIndex, and endOffset. inside: Similar to span. Use to add a new tag containing all the content of the target tag into the parent tag, as if we would have made a text selection with everything inside the target tag. We must also provide startContainerIndex, endContainerIndex, and xpath. around: Similar to inside. Use to add a new tag containing all the content of the target tag (and including the target tag itself) into the parent tag. We must also provide xpath. before: Add a new tag before a target and into the parent container. We must also provide containerIndex and xpath. after: Similar to before. Use to add a new tag after a target and into the parent container. We must also provide containerIndex and xpath. change: Similar to inside. Use to change the target tag, preserving the content inside. We must also provide startContainerIndex, endContainerIndex, xpath, and skip. | | startContainerIndex | number | The container index relative to its parent where the selection starts. Used with span, inside, and change | | startOffset | string | The index position relative to startContainerIndex where selection starts. This is where the selection caret starting point. Used with span | | endContainerIndex | string | The container index relative to its parent where the selection ends. Used with span, inside, and change | | endOffset | number | The index position relative to endContainerIndex where selection ends. This is where the selection caret endpoint. Used with span | | skip | string | The name of the tag to skip. Used with change to avoid suggesting changing a tag for itself. | | xpath | string | The tag Xpath in the document. Used with inside, around, change, before, and after | | containerIndex | number | The container index relative to the target parent. Used with before and after |

ValidationError

| Name | Type | Description | | --------- | ------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | | type* | 'AttributeNameError' | 'AttributeValueError' | 'ElementNameError' | 'ChoiceError' | 'ValidationError' | The error type. | | msg* | string | An explanatory message about the error, indicating for instance that an attribute doesn't belong to a tag or a tag cannot be a child of its parent | | target* | ValidationErrorTarget | The invalid element. | | element* | [ValidationErrorElement] | The specific parent tag where the error was found. |

ValidationErrorElement

| Name | Type | Description | | ------------------ | ------ | -------------------------------------------------------------------------------------------------------------- | | xpath* | string | The target Xpath in the document. It can be useful to locate and navigate to the exact error position quickly. | | name | string | The name of the element (tag or attribute), if defined in the schema. | | documentation | string | If available in the schema. It can help users understand the context where the error occurred. | | fullname | string | The full name of the element (tag or attribute), if defined in the document schema. | | parentElementXpath | string | Expose the parent element Xpath. It gets handy if the error is an attribute. | | parentElementIndex | number | Expose the parent element index relative to its parent. It gets handy if the error is an attribute. | | parentElementName | string | Expose the parent element name. It gets handy if the error is an attribute. |

ValidationErrorTarget

| Name | Type | Description | | ------------- | ------- | ------------------------------------------------------------------------------------------------------ | | xpath* | string | The target’s Xpath in the document. Useful to locate and navigate to the exact error position quickly. | | index* | number | The index position relative to its parent. | | isAttr* | boolean | If the error is an attribute. Default is false. | | ns | string | The namespace. | | name | string | The name of the element (tag or attribute), if defined in the schema. | | documentation | string | If available in the schema. It can help users understand the context where the error occurred. | | fullname | string | The full name of the element (tag or attribute), if defined in the document schema. |

WorkingStateData

| Name | Type | Description | | ---------- | ----------------------------------------------------------------- | --------------------------------------------------------------------- | | state* | 1 [INCOMPLETE] | 2 [WORKING] | 3 [INVALID] | 4 [VALID] | The state of the validation process. | | partDone* | number | The percentage of the document validated (0-1). | | valid | boolean | Of the document is valid or not. Only available on state 3 and 4. | | errors | ValidationError[] | An array of errors. Only available on state 3. |

CachedSchema

This is an internal type. It is only described here for completeness.

| Name | Type | Description | | ------------ | -------- | ------------------------------------------------------------------- | | createdAt* | Date | The timestamp when the cache was created | | gramarJson* | string | Stringfied JSON repersentaiton of Salve processed schema | | hash* | string | The hash representation of the schema file. We use sha-256. | | id* | string | The schema ID | | url* | string | The schema URL | | warnings | string[] | A list of warnings generated by Salve when the schema was processed |

Development

I am in debt to Louis-Dominique Dubeau, who developed Salve and salve-dom at the Mangalam Research Center for Buddhist Languages. For any information about Salve (Schema-Aware Library for Validation and Edition)and salve-dom, please refer to their documentation.

This project uses a slightly different version of Salve, tweaked by Raffaele Viglianti to add element documentation to Salve's result.

Why a web worker?

Validating an XML document is CPU-intensive. JavaScript is a single tread language. Depending on the document's size and the schema's complexity, the process can overload the main thread and freeze the page. By transferring the process to a web worker and using events and async/await, we can remove the burden on the main thread and have a more smooth frontend experience.

However, web workers have some limitations. The main limitation is that it does not have access to the DOM, where the XML document is located, nor access to DOM APIs, which salve-dom relies on. This limitation can be overcome by using JSDOM (read more below) to be able to manipulate a virtual DOM on the web worker. The downside is that the size of the web worker file increases substantially, but when minified by Webpack and cached in the browser won't cause a significant impact on the loading time.

Since the DOM cannot be accessed, the document needs to be stringfied and transferred to the web worker to be validated. Moreover, this transfer most occurs every time the document changes to keep both versions in sync, which is essential to support some of the functions on LEAF-Writer (e.g., get tag attributes, speculatively validation).

Besides these limitations, using a web worker for validation is better than other solutions we have in the past — an external micro-service based on Java and having the validator in the main thread. We have not run a benchmark test (yet), but we have tested with large files (a whole 2 MB XML book in TEI): validation is done in 1-2 seconds.

Changes to Salve and Salve-Dom to be able to work as a web worker

We made minor tweaks to the code to make Salve and salve-dom work on a web worker environment.

On Salve:

added globalObject: "this" to the Webpack's config file 7124f82d.

on salve-dom:

Added globalObject: "this" to the Webpack's config file 0ce55e81.
Export the NODE object 460d2212

JSDOM inside web worker

From JSDOM documentation: https://github.com/jsdom/jsdom#running-jsdom-inside-a-web-browser

Running JSDOM inside a web browser

jsdom has some support for being run inside a web browser, using browserify. Inside a web browser, we can use a browserified jsdom to create an entirely self-contained set of plain JavaScript objects that look and act much like the browser's existing DOM objects while being entirely independent of them. "Virtual DOM,” indeed!

jsdom's primary target is still Node.js, and so we use language features that are only present in recent Node.js versions (namely, Node.js v8+). Thus, older browsers will likely not work. (Even transpilation will not help: we use Proxys extensively throughout the jsdom codebase.)

Notably, jsdom works well inside a web worker. The original contributor, @lawnsea, who made this possible, has published a paper about his project which uses this capability.

Not everything works perfectly when running jsdom inside a web browser. Sometimes that is because of fundamental limitations (such as not having filesystem access). Still, sometimes it is simply because we haven't spent enough time making the appropriate minor tweaks. Bug reports are certainly welcome.

Discussion

https://github.com/jsdom/jsdom/issues/245 https://github.com/jsdom/jsdom/issues/1284 https://github.com/jsdom/jsdom/issues/2427

How To use JSDOM on LEAF-Writer Validator Web Worker

A browserified version of jsdom (v21.1.2) is already in place on the web workers folder /src/lib/jsdom

Important: jsom v21.1.2 is the latest that has support to be browserified. v.22.0.0 remove this support. Check here: https://github.com/jsdom/jsdom/releases/tag/22.0.0

If the file needs to be updated or regenerated, follow these steps:

Install JSDOM and Browserify npm install -D jsdom browserify
Browserify jsdom npm run browserify-jsdom (check package.json for the details)

Unit tests

We use Jest and jest-fetch-mock for unit tests. Run npm test

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

LEAF-Writer Validator

Install

Load as a web worker

Prebuilt

For development

Initialize

Validate

On Demand vs. Auto-validation

Types of errors

Element Name

Attribute Name

Attribute Value

Choice Error

Validation Error

Has validator

Get information from Schema

Get Tag at

Get Nodes for a Tag at

Get Attributes for a Tag at

Get Tag Attribute at

Get Values for a Tag Attribute at

Get Possible Nodes At

Get Valid Nodes At

Reset

Clear Cache

Types

InitializeParameters

InitializeResponse

NodeDetail

PossibleNodesAt

PossibleNodesAtOptions

Target

TargetSelection

ValidationError

ValidationErrorElement

ValidationErrorTarget

WorkingStateData

CachedSchema

Development

Why a web worker?

Changes to Salve and Salve-Dom to be able to work as a web worker

JSDOM inside web worker

Running JSDOM inside a web browser

Discussion

How To use JSDOM on LEAF-Writer Validator Web Worker

Unit tests