@lde/pipeline-void
v0.12.0
Published
VOiD (Vocabulary of Interlinked Datasets) statistical analysis for RDF datasets
Readme
Pipeline VoID
Extensions to @lde/pipeline for VoID (Vocabulary of Interlinked Datasets) statistical analysis of RDF datasets.
Stage factories
Global stages (one CONSTRUCT query per dataset):
| Factory | Query |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| classPartitions() | class-partition.rq — Classes with entity counts |
| classPropertySubjects() | class-properties-subjects.rq — Properties per class (subject counts) |
| classPropertyObjects() | class-properties-objects.rq — Properties per class (object counts) |
| countDatatypes() | datatypes.rq — Dataset-level datatypes |
| countObjectLiterals() | object-literals.rq — Literal object counts |
| countObjectUris() | object-uris.rq — URI object counts |
| countProperties() | properties.rq — Distinct properties |
| countSubjects() | subjects.rq — Distinct subjects |
| countTriples() | triples.rq — Total triple count |
| detectLicenses() | licenses.rq — License detection |
| subjectUriSpaces() | subject-uri-space.rq — Subject URI namespaces |
Per-class stages (iterated with a class selector):
| Factory | Query |
| ------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| perClassDatatypes() | class-property-datatypes.rq — Per-class datatype partitions |
| perClassLanguages() | class-property-languages.rq — Per-class language tags |
| perClassObjectClasses() | class-property-object-classes.rq — Per-class object class partitions |
Domain-specific stages:
| Factory | Description |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| detectVocabularies() | entity-properties.rq — Entity properties with automatic void:vocabulary detection |
| uriSpaces(uriSpaces) | object-uri-space.rq — Object URI namespace linksets, aggregated against a provided URI space map |
All factories return Promise<Stage>.
Executor decorators
VocabularyExecutor— Wraps an executor; detects known vocabulary namespace prefixes invoid:propertyquads and appendsvoid:vocabularytriples.UriSpaceExecutor— Wraps an executor; consumesvoid:Linksetquads, matchesvoid:objectsTargetagainst configured URI spaces, and emits aggregated linksets.
Usage
import {
countTriples,
classPartitions,
detectVocabularies,
} from '@lde/pipeline-void';
import { Pipeline, SparqlUpdateWriter, provenancePlugin } from '@lde/pipeline';
await new Pipeline({
datasetSelector: selector,
stages: [countTriples(), classPartitions(), detectVocabularies()],
plugins: [provenancePlugin()],
writers: new SparqlUpdateWriter({
endpoint: new URL('http://localhost:7200/repositories/lde/statements'),
}),
}).run();