yml2vocab

v1.8.1

Published

2 months ago

Generation of vocabulary files starting by YAML

0High
0Medium
0Low

iherman

1. Generate RDFS vocabulary files from YAML

1.1. Introduction

This script in this module converts a simple RDF vocabulary, described in YAML, into a formal RDFS in JSON-LD, Turtle, and HTML. Optionally, a simple JSON-LD @context is also generated for the vocabulary. Neither the script nor the YAML format is prepared for complex vocabularies; its primary goal is to simplify the generation of simple, straightforward RDFS vocabularies not requiring, for instance, sophisticated OWL statements.

When running, the script relies on two files:

The vocabulary.yml file, containing the definition for the vocabulary entries. (It is also possible to use a different name for the YAML file, see below.)
The template.html file, used to create the HTML file version of the vocabulary. (It is also possible to use a different name for the template file, see below.) The template may also be an HTML fragment (ie, without the <html>, <head>, etc.), which comes handy if the generated fragment is to be included into the full specification as, say, an Appendix. Note that if a fragment is used, the defined_by entries use only the fragment part of the URL in the generated code (the URL is supposed to refer to the specification file itself).

1.2. Definition of the vocabulary in the YAML file

The vocabulary is defined in a YAML file, which contains several block sequences with the following keys: vocab, prefix, ontology, class, property, individual,datatype, and json_ld. Only the vocab and ontology blocks are required, all others are optional.

Each block sequence consists of blocks with a number of keys, depending on the specific block. The interpretation of these key/value pairs may depend on the top level block where they reside, but some have a common interpretation. The detailed specification of the keys and values are as follows.

1.2.1. General Vocabulary blocks

1.2.1.1. Vocabulary Constants —`vocab` Block

Constants for the vocabulary being defined. This block is required.

Example:

vocab:
    id: ex
    value: https://example.org/vocabulary#
    context: https://example.org/context.jsonld

1.2.1.2. CURIE Prefixes — `prefix` Block

List of CURIE prefix definitions for each external vocabulary being used. Each entry is as follows:

| Key | Possible values | Description | Required? | | ----- | --------------- | --------------------------------------------------------------------------------------------------------------- | --------- | | id | prefix string | Provides a CURIE prefix that can be used in the vocabulary description and in the generated RDF serializations. | Yes | | value | URL | The URL for the external vocabulary | Yes |

Example:

prefix:
    - id: oa
      value: http://www.w3.org/ns/oa#

    - id: as
      value: http://www.w3.org/ns/activitystreams#

Some id/value pairs are defined by default, and it is not necessary to define them here. These are:

1.2.1.3. Vocabulary Metadata —`ontology` Block

Definition of “ontology properties”, that is, statements made about the vocabulary itself. These are added to the top level definition of the ontology in the generated files. This block is required.

The block is a list of property/value pairs; each entry is as follows:

| Key | Possible values | Description | Required? | | ---------- | --------------- | ----------------------------------------------------------------- | --------- | | property | CURIE | Possible values: dc:title, dc:description, and rdfs:seeAlso | Yes | | value | URL or string | The value of the property. | Yes |

Example:

ontology:
    - property: dc:title
      value: EPUB Annotations vocabulary

    - property: dc:description
      value: See also <a href="http://example.org/explanation">further details</a.

    - property: rdfs:seeAlso
      value: https://exampl.org/description

It is good practice to provide, at least, dc:description as an ontology property with a short description of the vocabulary and dc:title with a short name of the vocabulary.

The script automatically adds a dc:date key with the generation time as a value.

1.2.1.4. Generated JSON-LD context data — `json_ld` block

A block affecting the generated JSON-LD context file, if generated.

Example:

json_ld:
    alias:
        "language"  : "@language"
        "direction" : "@direction"
        "id"        : "@id"
    import: "http://example.org/othervoc.jsonld"

1.2.2. Ontology term blocks

1.2.2.1. Common Term Entries

These keys are common to all term definitions, although their exact interpretation may be dependent on the terms themselves.

1.2.2.2. Class definitions —`class` Block

Example:

class:
    - id: Class1
      label: An example Class1
      upper_value: [schema:Resource, Class2]
      defined_by: https://example.org/vocabulary-definition#class1
      comment: Something about class1

    - id: Class2
      label: An example Class2
      upper_value: [Class3, Class4]
      defined_by: https://example.org/vocabulary-definition#class2
      comment: Referring to the union of <code>Class3</code> and <code>Class4</code>
      upper_union: true

    - id: Class5
      label: An example Class5
      one_of: [ex:Individual_1, ex:Individual_2, …,ex:Individual_n]
      comment: The class consists of the listed individuals.

1.2.2.3. Property definitions — `property` Block

Example:

property:
    - id: pr1
      label: this is example pr1
      upper_value: oa:hasTarget
      domain: Class1
      range: Class4
      defined_by: https://example.org/vocabulary-definition#pr11
      comment: Something about pr1.

    - id: pr2
      label: this is example pr2
      upper_value: [pr1, pr3]
      domain: Class2
      range: [Class3, Class10]
      defined_by: https://example.org/vocabulary-definition#pr2
      comment: Something about pr2; the range is the union of Class3 and Class10
      range_union: true

    - id: pr3
      label: this is an example pr3
      one_of: [ex:val1, ex:val2, ex:val3]
      comment: Restricting the values to ex:val1, ex:val2, or ex:val3; in JSON-LD using the generated context "val1", "val2", "val3" should be used.

1.2.2.4. Individual definitions —`individual` Block

No extra keys are defined for this block. The type key is used to define the class which contains this individual

Example:

individual:
    - id: ind1
      label: this is an individual belonging to Class1
      type: Class1
      defined_by: https://example.org/vocabulary-definition#ind1

1.2.2.5. Datatype definitions — `datatype` Block

</tbody>

Example:

datatype:
  - id: dt1
    label: Datatype some usage
    upper_value: xsd:string
    one_of: [One, Two, Three]
    defined_by: https://example.org/vocabulary-definition#dt1
    see_also:
      - label: Goal of the datatype
        url:  https://example.org/further-description.html

  - id: jsonld-number
    label: JSON-LD notion of numbers
    type: [xsd:decimal, xsd:float, xsd:double]
    upper_union: true
    comment: Datatype used by JSON-LD

1.3. Formatting the output

Some efforts are made to make the output files (HTML, JSON-LD, and Turtle) properly formatted to make them readable. A subset of the editorconfig facilities are also taken into account. Namely, if an .editorconfig file is found, the following supported pairs are used (with the default values in parenthesis):

indent_style (space)
insert_final_newline (false)
indent_size (4)
max_line_length (0)
end_of_line (lf)

See the .editorconfig for further details.

1.3.1. HTML Templating

The generation of the HTML output requires an HTML Template file. This can either be a complete HTML file or an HTML fragment (within a <section> or <div>). The template should contain some specific HTML elements with predefined identifier values (i.e., id attribute values); the yml2vocab fills the respective content within those HTML elements. Each term is put into its own section, with the term's id value serving both as the section header and its element identifiers. The content of the section is an HTML definition list with a human readable version of the vocabulary term data.

The required id values, and the containing elements, are as follows:

| id value | Corresponding element | Generated content | | :---------------------------------- | --------------------- | :------------------------------------------------------------------------------------- | | title or ontology_title | any textual | The title/name of the vocabulary (see the ontology block) | | description | any textual | The description of the vocabulary (see the ontology block) | | see_also | any textual | External reference for the vocabulary (see the ontology block) | | alt-turtle | <a> | Add a reference to the Turtle version of the vocabulary as an href attribute value | | alt-jsonld | <a> | Add a reference to the JSON-LD version of the vocabulary as an href attribute value | | time | any textual | Date of the vocabulary generation | | namespaces | <dl> | List of namespaces used by the vocabulary | | contexts | <ul> | List of context files where the vocabulary terms appear | | term_definitions | <section> | Section for the fully defined vocabulary terms | | class_definitions | <section> | Subsection for the fully defined classes | | property_definitions | <section> | Subsection for the fully defined properties | | datatype_definitions | <section> | Subsection for the fully defined datatypes | | individual_definitions | <section> | Subsection for the fully defined individuals | | reserved_term_definitions | <section> | Section for the reserved vocabulary terms | | reserved_class_definitions | <section> | Subsection for the reserved classes | | reserved_property_definitions | <section> | Subsection for the reserved properties | | reserved_datatype_definitions | <section> | Subsection for the reserved datatypes | | reserved_individual_definitions | <section> | Subsection for the reserved individuals | | deprecated_term_definitions | <section> | Section for the deprecated vocabulary terms | | deprecated_class_definitions | <section> | Subsection for the deprecated classes | | deprecated_property_definitions | <section> | Subsection for the deprecated properties | | deprecated_datatype_definitions | <section> | Subsection for the deprecated datatypes | | deprecated_individual_definitions | <section> | Subsection for the deprecated individuals |

If an element is missing, the content is ignored by the script.

The reserved and deprecated section are filled terms that have been labelled as reserved, resp. deprecated, using the status key in the term's defining block. If no reserved or deprecated entries are defined, the respective sections will be removed from the resulting HTML. Similarly, if no alternate context files are specified, the <ul id=contexts> entry is removed.

The template may contain CSS to provide a proper presentation in HTML; please look at the example template for a suitable CSS for the classes generated by the script.

2. Installation and use

The script has been written in TypeScript (version 5.0.2 and beyond) running on top of node.js (version 21 and beyond) or deno (version 2.1 and beyond).

Beyond the YAML file itself, the script relies on an HTML template file, i.e., a skeleton file in HTML that is completed by the vocabulary entries. The example template file on GitHub provides a good starting point for a template that also makes use of respec. The script relies on the existing id values and section structures to be modified/extended by the script. Unused subsections (e.g., when there are no deprecated classes) are removed from the final HTML file.

2.1. Running the script on a command line

2.1.1. NPM + Node.js

The script can be used as a standard npm module via:

npm install yml2vocab

The npm installation installs the node_modules/.bin/yml2vocab script. The script can be used as:

yml2vocab [-v vocab_file_name] [-t template_file_name] [-c]

2.1.2. Deno

If deno is installed globally, one can also run the script directly (without any further installation) from the code by

deno run -A /a/b/c/main.ts [-v vocab_fname] [-t template_fname] [-c]

on the top level. To make it simpler, a binary, compiled version of the program can be generated by

deno compile --allow-read --allow-write --allow-env main.ts

which results in an executable file, called yml2vocab, that can be stored anywhere in the user's $PATH.

The program can also be run without installing the package locally from JSR. Just do a:

deno run -A jsr:@iherman/yml2vocab/cli [-v vocab_file_name] [-t template_file_name] [-c]

2.1.3. Command line argument

The script generates the vocab_file_name.ttl, vocab_file_name.jsonld, and vocab_file_name.html files for the Turtle, JSON-LD, and HTML versions, respectively. The script relies on the vocab_file_name.yml file for the vocabulary specification in YAML and a template_file_name file for a template file. The defaults are vocabulary and template.html, respectively.

If the -c flag is also set, the additional vocab_file_name.context.jsonld is also generated, containing a JSON-LD file that can be used as a separate @context reference in a JSON-LD file. Note that this JSON-LD file does not necessarily use all the sophistication that JSON-LD defines for @context; these may have to be added manually.

2.2. Running from a Javascript/TypeScript program

2.2.1. Usage with node.js

The simplest way of using the module from Javascript is:

const yml2vocab = require('yml2vocab');
async function main() {
    await yml2vocab.generateVocabularyFiles("vocabulary","template.html",false);
}
main();

This reads (asynchronously) the YAML and template files and stores the generated vocabulary representations (see the command line interface for details) in the directory alongside the YAML file. By setting the last argument to true a @context is also generated.

The somewhat lower level yml2vocab.VocabGeneration class can also be used:

const yml2vocab = require('yml2vocab');
// YAML content is text form, before parsing
const vocabGeneration = new yml2vocab.VocabGeneration(yml_content);
// returns the turtle content as a string
const turtle: string  = vocabGeneration.getTurtle();
// returns the JSON-LD content as a string
const jsonld: string  = vocabGeneration.getJSONLD();
// returns the HTML content as a string
// The third argument specifies (as a boolean) whether a context file is also generated
// (if yes, some extra explanatory notes may appear in the HTML output)
const html: string    = vocabGeneration.getHTML(template_file_content, basename_for_files, context_generated);
// returns the minimal @context file for the vocabulary
const context: string = vocabGeneration.getContext();

Running TypeScript instead of Javascript is similar, except that the require must be replaced by:

import yml2vocab from 'yml2vocab';

There is no need to install any extra typing, it is included in the package. The interfaces are simply using strings, no extra TypeScript type definitions have been added.

2.2.1.1. Usage with deno

The package is also available on JSR @iherman/yml2vocab. All previous examples are valid for deno, except for the import statements which should be:

import yml2vocab from 'jsr:@iherman/yml2vocab'

Note that deno can also import npm packages if explicitly named, so the following import statement is also valid:

import yml2vocab from 'npm:yml2vocab'

No prior installation step is necessary.

3. Cloning the repository

The repository may also be cloned.

3.1. Content of the directory

Readme.md: this file.
package.json: configuration file for npm.
deno.json: configuration file for deno
example: a folder with examples for vocabulary definition files and the generated RDF vocabulary files.
lib directory: the TypeScript modules for the script.
dist directory: the Javascript distribution files (compiled from the TypeScript sources using tsc in node.js)
main.ts: the TypeScript entry point to the script as a command line tool
index.ts: the top level type interface, to be used if the files are used by an external script.
docs directory: documentation of the package as generated by Typedoc

The following files and directories are generated/modified by either the script or npm; better not to touch these directly:

package-lock.json: used by npm as an internal file for the packages.
node_modules directory: the various Javascript libraries used by the script. This directory should not be uploaded to GitHub, it is strictly for the local activation of the script.
deno.lock: used by deno to manage imported packages using its own mechanism (bypassing node_modules).

4. Acknowledgement

I got inspired by the structure and Ruby script that was created by my late colleague and friend Gregg Kellogg for version 1 of the Credentials Vocabulary. The vocabulary definition itself was using CSV. The CSV definitions have been changed to YAML, and the script itself has been re-written in TypeScript, and developed further since by adding new features based on usage.

Many features are the result of further discussions with Many Sporny, Benjamin Young, and Pierre-Antoine Champin.

I dedicate this script to the memory of Gregg. R.I.P.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

1. Generate RDFS vocabulary files from YAML

1.1. Introduction

1.2. Definition of the vocabulary in the YAML file

1.2.1. General Vocabulary blocks

1.2.1.1. Vocabulary Constants —vocab Block

1.2.1.2. CURIE Prefixes — prefix Block

1.2.1.3. Vocabulary Metadata —ontology Block

1.2.1.4. Generated JSON-LD context data — json_ld block

1.2.2. Ontology term blocks

1.2.2.1. Common Term Entries

1.2.2.2. Class definitions —class Block

1.2.2.3. Property definitions — property Block

1.2.2.4. Individual definitions —individual Block

1.2.2.5. Datatype definitions — datatype Block

1.3. Formatting the output

1.3.1. HTML Templating

2. Installation and use

2.1. Running the script on a command line

2.1.1. NPM + Node.js

2.1.2. Deno

2.1.3. Command line argument

2.2. Running from a Javascript/TypeScript program

2.2.1. Usage with node.js

2.2.1.1. Usage with deno

3. Cloning the repository

3.1. Content of the directory

4. Acknowledgement

1.2.1.1. Vocabulary Constants —`vocab` Block

1.2.1.2. CURIE Prefixes — `prefix` Block

1.2.1.3. Vocabulary Metadata —`ontology` Block

1.2.1.4. Generated JSON-LD context data — `json_ld` block

1.2.2.2. Class definitions —`class` Block

1.2.2.3. Property definitions — `property` Block

1.2.2.4. Individual definitions —`individual` Block

1.2.2.5. Datatype definitions — `datatype` Block