yml2vocab
v1.8.1
Published
Generation of vocabulary files starting by YAML
Readme
- 1. Generate RDFS vocabulary files from YAML
- 2. Installation and use
- 3. Cloning the repository
- 4. Acknowledgement
1. Generate RDFS vocabulary files from YAML
1.1. Introduction
This script in this module converts a simple RDF vocabulary, described in YAML, into a formal RDFS in JSON-LD, Turtle, and HTML. Optionally, a simple JSON-LD @context is also generated for the vocabulary. Neither the script nor the YAML format is prepared for complex vocabularies; its primary goal is to simplify the generation of simple, straightforward RDFS vocabularies not requiring, for instance, sophisticated OWL statements.
When running, the script relies on two files:
- The
vocabulary.ymlfile, containing the definition for the vocabulary entries. (It is also possible to use a different name for the YAML file, see below.) - The
template.htmlfile, used to create the HTML file version of the vocabulary. (It is also possible to use a different name for the template file, see below.) The template may also be an HTML fragment (ie, without the<html>,<head>, etc.), which comes handy if the generated fragment is to be included into the full specification as, say, an Appendix. Note that if a fragment is used, thedefined_byentries use only the fragment part of the URL in the generated code (the URL is supposed to refer to the specification file itself).
1.2. Definition of the vocabulary in the YAML file
The vocabulary is defined in a YAML file, which contains several block sequences with the following keys: vocab, prefix, ontology, class, property, individual,datatype, and json_ld. Only the vocab and ontology blocks are required, all others are optional.
Each block sequence consists of blocks with a number of keys, depending on the specific block. The interpretation of these key/value pairs may depend on the top level block where they reside, but some have a common interpretation. The detailed specification of the keys and values are as follows.
1.2.1. General Vocabulary blocks
1.2.1.1. Vocabulary Constants —vocab Block
Constants for the vocabulary being defined. This block is required.
Example:
vocab:
id: ex
value: https://example.org/vocabulary#
context: https://example.org/context.jsonld1.2.1.2. CURIE Prefixes — prefix Block
List of CURIE prefix definitions for each external vocabulary being used. Each entry is as follows:
| Key | Possible values | Description | Required? |
| ----- | --------------- | --------------------------------------------------------------------------------------------------------------- | --------- |
| id | prefix string | Provides a CURIE prefix that can be used in the vocabulary description and in the generated RDF serializations. | Yes |
| value | URL | The URL for the external vocabulary | Yes |
Example:
prefix:
- id: oa
value: http://www.w3.org/ns/oa#
- id: as
value: http://www.w3.org/ns/activitystreams#Some id/value pairs are defined by default, and it is not necessary to define them here. These are:
| Prefix | URL |
| :------- | :-------------------------------------------- |
| dc | http://purl.org/dc/terms/ |
| owl | http://www.w3.org/2002/07/owl# |
| rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
| rdfs | http://www.w3.org/2000/01/rdf-schema# |
| xsd | http://www.w3.org/2001/XMLSchema# |
| schema | http://schema.org/ |
| foaf | http://xmlns.com/foaf/0.1/ |
1.2.1.3. Vocabulary Metadata —ontology Block
Definition of “ontology properties”, that is, statements made about the vocabulary itself. These are added to the top level definition of the ontology in the generated files. This block is required.
The block is a list of property/value pairs; each entry is as follows:
| Key | Possible values | Description | Required? |
| ---------- | --------------- | ----------------------------------------------------------------- | --------- |
| property | CURIE | Possible values: dc:title, dc:description, and rdfs:seeAlso | Yes |
| value | URL or string | The value of the property. | Yes |
Example:
ontology:
- property: dc:title
value: EPUB Annotations vocabulary
- property: dc:description
value: See also <a href="http://example.org/explanation">further details</a.
- property: rdfs:seeAlso
value: https://exampl.org/descriptionIt is good practice to provide, at least, dc:description as an ontology property with a short description of the vocabulary and dc:title with a short name of the vocabulary.
The script automatically adds a dc:date key with the generation time as a value.
1.2.1.4. Generated JSON-LD context data — json_ld block
A block affecting the generated JSON-LD context file, if generated.
Example:
json_ld:
alias:
"language" : "@language"
"direction" : "@direction"
"id" : "@id"
import: "http://example.org/othervoc.jsonld"1.2.2. Ontology term blocks
1.2.2.1. Common Term Entries
These keys are common to all term definitions, although their exact interpretation may be dependent on the terms themselves.
1.2.2.2. Class definitions —class Block
Example:
class:
- id: Class1
label: An example Class1
upper_value: [schema:Resource, Class2]
defined_by: https://example.org/vocabulary-definition#class1
comment: Something about class1
- id: Class2
label: An example Class2
upper_value: [Class3, Class4]
defined_by: https://example.org/vocabulary-definition#class2
comment: Referring to the union of <code>Class3</code> and <code>Class4</code>
upper_union: true
- id: Class5
label: An example Class5
one_of: [ex:Individual_1, ex:Individual_2, …,ex:Individual_n]
comment: The class consists of the listed individuals.1.2.2.3. Property definitions — property Block
Example:
property:
- id: pr1
label: this is example pr1
upper_value: oa:hasTarget
domain: Class1
range: Class4
defined_by: https://example.org/vocabulary-definition#pr11
comment: Something about pr1.
- id: pr2
label: this is example pr2
upper_value: [pr1, pr3]
domain: Class2
range: [Class3, Class10]
defined_by: https://example.org/vocabulary-definition#pr2
comment: Something about pr2; the range is the union of Class3 and Class10
range_union: true
- id: pr3
label: this is an example pr3
one_of: [ex:val1, ex:val2, ex:val3]
comment: Restricting the values to ex:val1, ex:val2, or ex:val3; in JSON-LD using the generated context "val1", "val2", "val3" should be used.1.2.2.4. Individual definitions —individual Block
No extra keys are defined for this block. The type key is used to define the class which contains this individual
Example:
individual:
- id: ind1
label: this is an individual belonging to Class1
type: Class1
defined_by: https://example.org/vocabulary-definition#ind11.2.2.5. Datatype definitions — datatype Block
</tbody>Example:
datatype:
- id: dt1
label: Datatype some usage
upper_value: xsd:string
one_of: [One, Two, Three]
defined_by: https://example.org/vocabulary-definition#dt1
see_also:
- label: Goal of the datatype
url: https://example.org/further-description.html
- id: jsonld-number
label: JSON-LD notion of numbers
type: [xsd:decimal, xsd:float, xsd:double]
upper_union: true
comment: Datatype used by JSON-LD1.3. Formatting the output
Some efforts are made to make the output files (HTML, JSON-LD, and Turtle) properly formatted to make them readable. A subset of the editorconfig facilities are also taken into account. Namely, if an .editorconfig file is found, the following supported pairs are used (with the default values in parenthesis):
indent_style(space)insert_final_newline(false)indent_size(4)max_line_length(0)end_of_line(lf)
See the .editorconfig for further details.
1.3.1. HTML Templating
The generation of the HTML output requires an HTML Template file. This can either be a complete HTML file or an HTML fragment (within a <section> or <div>). The template should contain some specific HTML elements with predefined identifier values (i.e., id attribute values); the yml2vocab fills the respective content within those HTML elements. Each term is put into its own section, with the term's id value serving both as the section header and its element identifiers. The content of the section is an HTML definition list with a human readable version of the vocabulary term data.
The required id values, and the containing elements, are as follows:
| id value | Corresponding element | Generated content |
| :---------------------------------- | --------------------- | :------------------------------------------------------------------------------------- |
| title or ontology_title | any textual | The title/name of the vocabulary (see the ontology block) |
| description | any textual | The description of the vocabulary (see the ontology block) |
| see_also | any textual | External reference for the vocabulary (see the ontology block) |
| alt-turtle | <a> | Add a reference to the Turtle version of the vocabulary as an href attribute value |
| alt-jsonld | <a> | Add a reference to the JSON-LD version of the vocabulary as an href attribute value |
| time | any textual | Date of the vocabulary generation |
| namespaces | <dl> | List of namespaces used by the vocabulary |
| contexts | <ul> | List of context files where the vocabulary terms appear |
| term_definitions | <section> | Section for the fully defined vocabulary terms |
| class_definitions | <section> | Subsection for the fully defined classes |
| property_definitions | <section> | Subsection for the fully defined properties |
| datatype_definitions | <section> | Subsection for the fully defined datatypes |
| individual_definitions | <section> | Subsection for the fully defined individuals |
| reserved_term_definitions | <section> | Section for the reserved vocabulary terms |
| reserved_class_definitions | <section> | Subsection for the reserved classes |
| reserved_property_definitions | <section> | Subsection for the reserved properties |
| reserved_datatype_definitions | <section> | Subsection for the reserved datatypes |
| reserved_individual_definitions | <section> | Subsection for the reserved individuals |
| deprecated_term_definitions | <section> | Section for the deprecated vocabulary terms |
| deprecated_class_definitions | <section> | Subsection for the deprecated classes |
| deprecated_property_definitions | <section> | Subsection for the deprecated properties |
| deprecated_datatype_definitions | <section> | Subsection for the deprecated datatypes |
| deprecated_individual_definitions | <section> | Subsection for the deprecated individuals |
If an element is missing, the content is ignored by the script.
The reserved and deprecated section are filled terms that have been labelled as reserved, resp. deprecated, using the status key in the term's defining block. If no reserved or deprecated entries are defined, the respective sections will be removed from the resulting HTML. Similarly, if no alternate context files are specified, the <ul id=contexts> entry is removed.
The template may contain CSS to provide a proper presentation in HTML; please look at the example template for a suitable CSS for the classes generated by the script.
2. Installation and use
The script has been written in TypeScript (version 5.0.2 and beyond) running on top of node.js (version 21 and beyond) or deno (version 2.1 and beyond).
Beyond the YAML file itself, the script relies on an HTML template file, i.e., a skeleton file in HTML that is completed by the vocabulary entries. The example template file on GitHub provides a good starting point for a template that also makes use of respec. The script relies on the existing id values and section structures to be modified/extended by the script. Unused subsections (e.g., when there are no deprecated classes) are removed from the final HTML file.
2.1. Running the script on a command line
2.1.1. NPM + Node.js
The script can be used as a standard npm module via:
npm install yml2vocabThe npm installation installs the node_modules/.bin/yml2vocab script. The script can be used as:
yml2vocab [-v vocab_file_name] [-t template_file_name] [-c]2.1.2. Deno
If deno is installed globally, one can also run the script directly (without any further installation) from the code by
deno run -A /a/b/c/main.ts [-v vocab_fname] [-t template_fname] [-c]on the top level. To make it simpler, a binary, compiled version of the program can be generated by
deno compile --allow-read --allow-write --allow-env main.tswhich results in an executable file, called yml2vocab, that can be stored anywhere in the user's $PATH.
The program can also be run without installing the package locally from JSR. Just do a:
deno run -A jsr:@iherman/yml2vocab/cli [-v vocab_file_name] [-t template_file_name] [-c]2.1.3. Command line argument
The script generates the vocab_file_name.ttl, vocab_file_name.jsonld, and vocab_file_name.html files for the Turtle, JSON-LD, and HTML versions, respectively. The script relies on the vocab_file_name.yml file for the vocabulary specification in YAML and a template_file_name file for a template file. The defaults are vocabulary and template.html, respectively.
If the -c flag is also set, the additional vocab_file_name.context.jsonld is also generated, containing a JSON-LD file that can be used as a separate @context reference in a JSON-LD file. Note that this JSON-LD file does not necessarily use all the sophistication that JSON-LD defines for @context; these may have to be added manually.
2.2. Running from a Javascript/TypeScript program
2.2.1. Usage with node.js
The simplest way of using the module from Javascript is:
const yml2vocab = require('yml2vocab');
async function main() {
await yml2vocab.generateVocabularyFiles("vocabulary","template.html",false);
}
main();This reads (asynchronously) the YAML and template files and stores the generated vocabulary representations (see the command line interface for details) in the directory alongside the YAML file. By setting the last argument to true a @context is also generated.
The somewhat lower level yml2vocab.VocabGeneration class can also be used:
const yml2vocab = require('yml2vocab');
// YAML content is text form, before parsing
const vocabGeneration = new yml2vocab.VocabGeneration(yml_content);
// returns the turtle content as a string
const turtle: string = vocabGeneration.getTurtle();
// returns the JSON-LD content as a string
const jsonld: string = vocabGeneration.getJSONLD();
// returns the HTML content as a string
// The third argument specifies (as a boolean) whether a context file is also generated
// (if yes, some extra explanatory notes may appear in the HTML output)
const html: string = vocabGeneration.getHTML(template_file_content, basename_for_files, context_generated);
// returns the minimal @context file for the vocabulary
const context: string = vocabGeneration.getContext();Running TypeScript instead of Javascript is similar, except that the require must be replaced by:
import yml2vocab from 'yml2vocab';There is no need to install any extra typing, it is included in the package. The interfaces are simply using strings, no extra TypeScript type definitions have been added.
2.2.1.1. Usage with deno
The package is also available on JSR @iherman/yml2vocab. All previous examples are valid for deno, except for the import statements which should be:
import yml2vocab from 'jsr:@iherman/yml2vocab'Note that deno can also import npm packages if explicitly named, so the following import statement is also valid:
import yml2vocab from 'npm:yml2vocab'No prior installation step is necessary.
3. Cloning the repository
The repository may also be cloned.
3.1. Content of the directory
Readme.md: this file.package.json: configuration file fornpm.deno.json: configuration file fordenoexample: a folder with examples for vocabulary definition files and the generated RDF vocabulary files.libdirectory: the TypeScript modules for the script.distdirectory: the Javascript distribution files (compiled from the TypeScript sources usingtscinnode.js)main.ts: the TypeScript entry point to the script as a command line toolindex.ts: the top level type interface, to be used if the files are used by an external script.docsdirectory: documentation of the package as generated by Typedoc
The following files and directories are generated/modified by either the script or npm; better not to touch these directly:
package-lock.json: used bynpmas an internal file for the packages.node_modulesdirectory: the various Javascript libraries used by the script. This directory should not be uploaded to GitHub, it is strictly for the local activation of the script.deno.lock: used bydenoto manage imported packages using its own mechanism (bypassingnode_modules).
4. Acknowledgement
I got inspired by the structure and Ruby script that was created by my late colleague and friend Gregg Kellogg for version 1 of the Credentials Vocabulary. The vocabulary definition itself was using CSV. The CSV definitions have been changed to YAML, and the script itself has been re-written in TypeScript, and developed further since by adding new features based on usage.
Many features are the result of further discussions with Many Sporny, Benjamin Young, and Pierre-Antoine Champin.
I dedicate this script to the memory of Gregg. R.I.P.
