mdld-parse

v0.6.2

Published

3 days ago

A standards-compliant parser for **MD-LD (Markdown-Linked Data)** — a human-friendly RDF authoring format that extends Markdown with semantic annotations.

0High
0Medium
0Low

davay

mdld markdown rdf semantic-web linked-data n3 rdfjs browser web-worker parser prefix-folding curie iri-authoring

MD-LD

Markdown-Linked Data (MD-LD) — a deterministic, streaming-friendly RDF authoring format that extends Markdown with explicit {...} annotations.

Documentation | Repository | Playground

What is MD-LD?

MD-LD allows you to author RDF graphs directly in Markdown using explicit {...} annotations:

[my] <tag:[email protected],2026:>

# 2024-07-18 {=my:journal-2024-07-18 .my:Event my:date ^^xsd:date}

## A good day {label}

Mood: [Happy] {my:mood}
Energy level: [8] {my:energyLevel ^^xsd:integer}

Met [Sam] {+my:sam .my:Person ?my:attendee} on my regular walk at [Central Park] {+my:central-park ?my:location .my:Place label @en} and talked about [Sunny] {my:weather} weather. 

Activities:

- **Walking** {+ex:walking ?my:hasActivity .my:Activity label}
- **Reading** {+ex:reading ?my:hasActivity .my:Activity label}

Generates valid RDF triples:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix sh: <http://www.w3.org/ns/shacl#>.
@prefix prov: <http://www.w3.org/ns/prov#>.
@prefix ex: <http://example.org/>.
@prefix my: <tag:[email protected],2026:>.

my:journal-2024-07-18 a my:Event;
    my:date "2024-07-18"^^xsd:date;
    rdfs:label "A good day";
    my:mood "Happy";
    my:energyLevel 8;
    my:attendee my:sam;
    my:location my:central-park;
    my:weather "Sunny";
    my:hasActivity <tag:[email protected],2026:journal-2024-07-18#walking>, <tag:[email protected],2026:journal-2024-07-18#reading>.
my:sam a my:Person.
my:central-park a my:Place;
    rdfs:label "Central Park"@en.
<tag:[email protected],2026:journal-2024-07-18#walking> a my:Activity;
    rdfs:label "Walking".
<tag:[email protected],2026:journal-2024-07-18#reading> a my:Activity;
    rdfs:label "Reading".

Read the FULL SPEC.

Core Features

Prefix folding: Build hierarchical namespaces with lightweight IRI authoring
Subject declarations: {=IRI} and {=#fragment} for context setting
Object IRIs: {+IRI} and {+#fragment} for temporary object declarations
Four predicate forms: p (S→L), ?p (S→O), !p (O→S)
Type declarations: .Class for rdf:type triples
Datatypes & language: ^^xsd:date and @en support
Fragments: Built-in document structuring with {=#fragment}
Round-trip serialization: Markdown ↔ RDF ↔ Markdown preserves structure

Installation

npm install mdld-parse

Node.js

import { parse } from 'mdld-parse';

const markdown = `# Document {=ex:doc .Article}

[Alice] {author}`;

const result = parse(markdown, {
  context: { ex: 'http://example.org/' }
});

console.log(result.quads);
// RDF/JS quads ready for n3.js, rdflib, etc.

Browser (ES Modules)

<script type="module">
  import { parse } from 'https://cdn.jsdelivr.net/npm/mdld-parse/+esm';
  
  const result = parse('# Hello {=ex:hello}');
</script>

Semantic Model

MD-LD encodes a directed labeled multigraph where three nodes may be in scope:

S — current subject (IRI)
O — object resource (IRI from link/image)
L — literal value (string + optional datatype/language)

Predicate Routing (§8.1)

Each predicate form determines the graph edge:

| Form | Edge | Example | Meaning | |-------|---------|------------------------------|------------------| | p | S → L | [Alice] {name} | literal property | | ?p | S → O | [NASA] {=ex:nasa ?org} | object property | | !p | O → S | [Parent] {=ex:p !hasPart} | reverse object |

Syntax Reference

Subject Declaration

Set current subject (emits no quads):

## Apollo 11 {=ex:apollo11}

Fragment Syntax

Create fragment IRIs relative to current subject:

# Document {=ex:document}
{=#summary}
[Content] {label}

ex:document#summary rdfs:label "Content" .

Fragments replace any existing fragment and require a current subject.

Subject remains in scope until reset with {=} or new subject declared.

Type Declaration

Emit rdf:type triple:

## Apollo 11 {=ex:apollo11 .ex:SpaceMission .ex:Event}

ex:apollo11 a ex:SpaceMission, ex:Event .

Literal Properties

Inline value carriers emit literal properties:

# Mission {=ex:apollo11}

[Neil Armstrong] {ex:commander}
[1969] {ex:year ^^xsd:gYear}
[Historic mission] {ex:description @en}

ex:apollo11 ex:commander "Neil Armstrong" ;
  ex:year "1969"^^xsd:gYear ;
  ex:description "Historic mission"@en .

Object Properties

Links create relationships (use ? prefix):

# Mission {=ex:apollo11}

[NASA] {=ex:nasa ?ex:organizer}

ex:apollo11 ex:organizer ex:nasa .

Resource Declaration

Declare resources inline with {=iri}:

# Mission {=ex:apollo11}

[Neil Armstrong] {=ex:armstrong ?ex:commander .prov:Person}

ex:apollo11 ex:commander ex:armstrong .
ex:armstrong a prov:Person .

Lists

Lists are pure Markdown structure. Each list item requires explicit annotations:

# Recipe {=ex:recipe}

Ingredients:

- **Flour** {+ex:flour ?ex:ingredient .ex:Ingredient label}
- **Water** {+ex:water ?ex:ingredient .ex:Ingredient label}

ex:recipe ex:ingredient ex:flour, ex:water .
ex:flour a ex:Ingredient ; rdfs:label "Flour" .
ex:water a ex:Ingredient ; rdfs:label "Water" .

Key Rules:

No semantic propagation from list scope
Each item must have explicit annotations
Use +IRI to maintain subject chaining for repeated object properties

Code Blocks

Code blocks are value carriers:

# Example {=ex:example}

```javascript {=ex:code .ex:SoftwareSourceCode ex:text}
console.log("hello");
```

ex:code a ex:SoftwareSourceCode ;
  ex:text "console.log(\"hello\")" .

Blockquotes

# Article {=ex:article}

> MD-LD bridges Markdown and RDF. {comment}

ex:article rdfs:comment "MD-LD bridges Markdown and RDF." .

Reverse Relations

Reverse the relationship direction:

# Part {=ex:part}

Part of: {!ex:hasPart}

- Book {=ex:book}

ex:book ex:hasPart ex:part .

Prefix Declarations

[ex] <http://example.org/>
[foaf] <http://xmlns.com/foaf/0.1/>

# Person {=ex:alice .foaf:Person}

Prefix Folding: Lightweight IRI Authoring

Build hierarchical namespaces by referencing previously defined prefixes:

# Create your domain authority
[my] <tag:[email protected],2026:>

# Build namespace hierarchy
[j] <my:journal:>
[p] <my:property:>
[c] <my:class:>
[person] <my:people:>

# Use in content
# 2026-01-27 {=j:2026-01-27 .c:Event p:date ^^xsd:date}

## Harry {=person:harry p:name}

Resolves to absolute IRIs:

j:2026-01-27 → tag:[email protected],2026:journal:2026-01-27
c:Event → tag:[email protected],2026:class:Event
p:date → tag:[email protected],2026:property:date
person:harry → tag:[email protected],2026:people:harry

Benefits:

Lightweight: No external ontology dependencies
Domain authority: Use tag: URIs for personal namespaces
Hierarchical: Build deep namespace structures
Streaming-safe: Forward-reference only, single-pass parsing

API Reference

`parse(markdown, options)`

Parse MD-LD markdown and return RDF quads with origin tracking.

Parameters:

markdown (string) — MD-LD formatted text
options (object, optional):
- context (object) — Prefix mappings (default: { '@vocab': 'http://www.w3.org/2000/01/rdf-schema#', rdf, rdfs, xsd, sh, prov })
- dataFactory (object) — Custom RDF/JS DataFactory

Returns: { quads, origin, context }

quads — Array of RDF/JS Quads
origin — Origin tracking object with:
- blocks — Map of block IDs to source locations
- quadIndex — Map of quads to block IDs
context — Final context used (includes prefixes)

Example:

const result = parse(
  `# Article {=ex:article .ex:Article}
  
  [Alice] {=ex:alice ?author}`,
  { context: { ex: 'http://example.org/' } }
);

console.log(result.quads);
// [
//   {
//     subject: { termType: 'NamedNode', value: 'http://example.org/article' },
//     predicate: { termType: 'NamedNode', value: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' },
//     object: { termType: 'NamedNode', value: 'http://example.org/Article' }
//   },
//   ...
// ]

`applyDiff({ text, diff, origin, options })`

Apply RDF changes back to markdown with proper positioning.

Parameters:

text (string) — Original markdown
diff (object) — Changes to apply:
- add (array) — Quads to add
- delete (array) — Quads to remove
origin (object) — Origin from parse() result
options (object, optional):
- context (object) — Context for IRI shortening

Returns: { text, origin }

text — Updated markdown
origin — Updated origin tracking vacant slots

Example:

const original = `# Article {=ex:article}

[Alice] {ex:author}`;

const result = parse(original, { context: { ex: 'http://example.org/' } });

// Add a new property
const newQuad = {
  subject: { termType: 'NamedNode', value: 'http://example.org/article' },
  predicate: { termType: 'NamedNode', value: 'http://example.org/datePublished' },
  object: { termType: 'Literal', value: '2024-01-01' }
};

const updated = applyDiff({
  text: original,
  diff: { add: [newQuad] },
  origin: result.origin,
  options: { context: result.context }
});

console.log(updated.text);
// # Article {=ex:article}
//
// [Alice] {author}
// [2024-01-01] {datePublished}

`generate(quads, context)`

Generate deterministic MDLD from RDF quads with origin tracking.

Parameters:

quads (array) — Array of RDF/JS Quads to convert
context (object, optional) — Prefix mappings (default: {})
- Merged with DEFAULT_CONTEXT for proper CURIE shortening
- Only user-defined prefixes are rendered in output

Returns: { text, origin, context }

text — Generated MDLD markdown
origin — Origin tracking object with:
- blocks — Map of block IDs to source locations
- quadIndex — Map of quads to block IDs
context — Final context used (includes defaults)

Example:

const quads = [
  {
    subject: { termType: 'NamedNode', value: 'http://example.org/article' },
    predicate: { termType: 'NamedNode', value: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' },
    object: { termType: 'NamedNode', value: 'http://example.org/Article' }
  },
  {
    subject: { termType: 'NamedNode', value: 'http://example.org/article' },
    predicate: { termType: 'NamedNode', value: 'http://example.org/author' },
    object: { termType: 'NamedNode', value: 'http://example.org/alice' }
  }
];

const result = generate(quads, { 
  ex: 'http://example.org/',
});

console.log(result.text);
// # Article {=ex:article .ex:Article}
//
// > alice {+ex:alice ?ex:author}

`locate(quad, origin, text, context)`

Locate the precise text range of a quad in MDLD text using origin tracking.

Parameters:

quad (object) — The quad to locate (subject, predicate, object)
origin (object, optional) — Origin object containing blocks and quadIndex
text (string, optional) — MDLD text (auto-parsed if origin not provided)
context (object, optional) — Context for parsing when text needs to be parsed

Returns: { blockId, entryIndex, range, content, blockRange, carrierType, isVacant } or null

blockId — ID of the containing block
entryIndex — Position within block entries
range — Precise character range of the quad content
content — Actual text content at that range
blockRange — Full range of the containing block
carrierType — Type of carrier (heading, blockquote, list, span)
isVacant — Whether the slot is marked as vacant

Example:

import { parse, locate } from './src/index.js';

const result = parse(mdldText, { context: { ex: 'http://example.org/' } });
const quad = result.quads[0]; // Find a quad to locate

// Pattern 1: With origin (most efficient)
const location1 = locate(quad, result.origin, mdldText);

// Pattern 2: Auto-parse text (convenient)
const location2 = locate(quad, null, mdldText, { ex: 'http://example.org/' });

console.log(location1.range); // { start: 38, end: 44 }
console.log(location1.content); // " Alice"
console.log(location1.carrierType); // "blockquote"

Value Carriers

Only specific markdown elements can carry semantic values:

Inline:

[text] {...} — span with annotation
[text](url) {...} — link to external resource
[text] {...} — inline resource declaration
![alt text](image.png) {...} — embedding with annotation

Block:

Headings (# Title)
List items (- item, 1. item) — pure Markdown structure
Blockquotes (> quote)
Code blocks (```lang)

Architecture

Design Principles

Zero dependencies — Pure JavaScript, ~15KB minified
Streaming-first — Single-pass parsing, O(n) complexity
Standards-compliant — RDF/JS data model
Origin tracking — Full round-trip support with source maps
Explicit semantics — No guessing, inference, or heuristics

RDF/JS Compatibility

Quads are compatible with:

n3.js — Turtle/N-Triples serialization
rdflib.js — RDF store and reasoning
sparqljs — SPARQL queries
rdf-ext — Extended RDF utilities

Forbidden Constructs

MD-LD explicitly forbids to ensure deterministic parsing:

❌ Implicit semantics or structural inference
❌ Auto-generated subjects or blank nodes
❌ Predicate guessing from context
❌ Multi-pass or backtracking parsers

Below is a tight, README-ready refinement of the Algebra section. It keeps the math precise, examples exhaustive, and language compact.

Algebra

Every RDF triple (s, p, o) can be authored explicitly, deterministically, and locally, with no inference, guessing, or reordering.

MD-LD models RDF authoring as a closed edge algebra over a small, explicit state. To be algebraically complete for RDF triple construction, a syntax must support:

Binding a subject S
Binding an object O
Emitting predicates in both directions
Distinguishing IRI nodes from literal nodes
Operating with no implicit state or inference

MD-LD satisfies these requirements with four explicit operators.

Each predicate is partitioned by direction and node kind:

| Predicate form | Emitted triple | | -------------- | -------------- | | p | S ─p→ L | | ?p | S ─p→ O | | not allowed | L ─p→ S | | !p | O ─p→ S |

This spans all 2 × 2 combinations of:

source ∈ {subject, object/literal}
target ∈ {subject, object/literal}

Therefore, the algebra is closed.

Use Cases

Personal Knowledge Management

[alice] <tag:[email protected],2026:>

# Meeting Notes {=alice:meeting-2024-01-15 .alice:Meeting}

Attendees:

- **Alice** {+alice:alice ?alice:attendee label}
- **Bob** {+alice:bob ?alice:attendee label}

Action items:

- **Review proposal** {+alice:task-1 ?alice:actionItem label}

Developer Documentation

# API Endpoint {=api:/users/:id .api:Endpoint}

[GET] {api:method}
[/users/:id] {api:path}

Example:

```bash {=api:/users/:id#example .api:CodeExample api:code}
curl https://api.example.com/users/123
```

Academic Research

[alice] <tag:[email protected],2026:>

# Paper {=alice:paper-semantic-markdown .alice:ScholarlyArticle}

[Semantic Web] {label}
[Alice Johnson] {=alice:alice-johnson ?alice:author}
[2024-01] {alice:datePublished ^^xsd:gYearMonth}

> This paper explores semantic markup in Markdown. {comment @en}

Testing

The parser includes comprehensive tests covering all spec requirements:

npm test

Tests validate:

Subject declaration and context
All predicate forms (p, ?p, !p)
Datatypes and language tags
Explicit list item annotations
Code blocks and blockquotes
Round-trip serialization

Contributing

Contributions welcome! Please:

Read the specification
Add tests for new features
Ensure all tests pass
Follow existing code style

Acknowledgments

Developed by Denis Starov.

Inspired by:

Thomas Francart's Semantic Markdown article
RDFa decades of structured data experience
CommonMark's rigorous parsing approach

License

See LICENCE

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

MD-LD

What is MD-LD?

Core Features

Installation

Node.js

Browser (ES Modules)

Semantic Model

Predicate Routing (§8.1)

Syntax Reference

Subject Declaration

Fragment Syntax

Type Declaration

Literal Properties

Object Properties

Resource Declaration

Lists

Code Blocks

Blockquotes

Reverse Relations

Prefix Declarations

Prefix Folding: Lightweight IRI Authoring

API Reference

parse(markdown, options)

applyDiff({ text, diff, origin, options })

generate(quads, context)

locate(quad, origin, text, context)

Value Carriers

Architecture

Design Principles

RDF/JS Compatibility

Forbidden Constructs

Algebra

Use Cases

Personal Knowledge Management

Developer Documentation

Academic Research

Testing

Contributing

Acknowledgments

License

`parse(markdown, options)`

`applyDiff({ text, diff, origin, options })`

`generate(quads, context)`

`locate(quad, origin, text, context)`