@lde/pipeline-void
v0.26.14
Published
VOiD (Vocabulary of Interlinked Datasets) statistical analysis for RDF datasets
Readme
Pipeline VoID
Extensions to @lde/pipeline for VoID (Vocabulary of Interlinked Datasets) statistical analysis of RDF datasets.
Stage factories
voidStages(options?)
Returns all VoID stages in their recommended execution order. The ordering is optimised for cache warming: classPartitions() runs before the per-class stages, so the ?s a ?class pattern is already cached on the SPARQL endpoint when the heavier per-class queries execute — preventing 504 timeouts on cold caches.
Accepts an optional VoidStagesOptions object:
| Option | Default | Description |
| ---------------- | ------- | --------------------------------------------------------------------- |
| timeout | 60 000 | SPARQL query timeout in milliseconds |
| batchSize | 10 | Maximum class bindings per executor call (per-class stages only) |
| maxConcurrency | 10 | Maximum concurrent in-flight executor batches (per-class stages only) |
| perClass | — | Override per-class iteration for all five per-class stages |
| uriSpaces | — | When provided, includes the object URI space stage |
| vocabularies | — | Additional vocabulary namespace URIs to detect beyond the built-in defaults |
import { voidStages } from '@lde/pipeline-void';
import { Pipeline, SparqlUpdateWriter, provenancePlugin } from '@lde/pipeline';
const stages = await voidStages({ uriSpaces: uriSpaceMap });
await new Pipeline({
datasetSelector: selector,
stages,
plugins: [provenancePlugin()],
writers: new SparqlUpdateWriter({
endpoint: new URL('http://localhost:7200/repositories/lde/statements'),
}),
}).run();Individual stage factories
Global and domain-specific factories accept VoidStageOptions (timeout) and return Promise<Stage>. Per-class factories accept PerClassVoidStageOptions (timeout, batchSize, maxConcurrency, perClass) — they default perClass to true; set it to false to run them as monolithic queries instead.
Global stages (one CONSTRUCT query per dataset):
| Factory | Query |
| ----------------------- | ------------------------------------------------------------------------------- |
| classPartitions() | class-partition.rq — Classes with entity counts |
| countDatatypes() | datatypes.rq — Dataset-level datatypes |
| countObjectLiterals() | object-literals.rq — Literal object counts |
| countObjectUris() | object-uris.rq — URI object counts |
| countProperties() | properties.rq — Distinct properties |
| countSubjects() | subjects.rq — Distinct subjects |
| countTriples() | triples.rq — Total triple count |
| detectLicenses() | licenses.rq — License detection |
| subjectUriSpaces() | subject-uri-space.rq — Subject URI namespaces |
Per-class stages (iterated with a class selector):
| Factory | Query |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| classPropertySubjects() | class-properties-subjects.rq — Properties per class (subject counts) |
| classPropertyObjects() | class-properties-objects.rq — Properties per class (object counts) |
| perClassDatatypes() | class-property-datatypes.rq — Per-class datatype partitions |
| perClassLanguages() | class-property-languages.rq — Per-class language tags |
| perClassObjectClasses() | class-property-object-classes.rq — Per-class object class partitions |
Domain-specific stages:
| Factory | Description |
| ------------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
| detectVocabularies() | entity-properties.rq — Entity properties with automatic void:vocabulary detection. Accepts DetectVocabulariesOptions with an optional vocabularies array to extend the built-in defaults. |
| uriSpaces(uriSpaceMap) | object-uri-space.rq — Object URI namespace linksets, aggregated against a provided URI space map |
Executor decorators
VocabularyExecutor— Wraps an executor; detects known vocabulary namespace prefixes invoid:propertyquads and appendsvoid:vocabularytriples. The built-in defaults are exported asdefaultVocabularies(sourced from@zazuko/prefixes).UriSpaceExecutor— Wraps an executor; consumesvoid:Linksetquads, matches eachvoid:objectsTargetagainst configured URI space prefixes usingstartsWith, and aggregates triple counts per matched space. Emitsvoid:objectsTargetpointing to the target dataset IRI (taken from the metadata quad subjects), not the raw prefix. Unmatched linksets are discarded.
