mdld-parse
v0.9.9
Published
A standards-compliant parser for **MD-LD (Markdown-Linked Data)** — a human-friendly RDF authoring format that extends Markdown with semantic annotations.
Maintainers
Readme
MD-LD
Markdown-Linked Data (MD-LD) — a deterministic, streaming-friendly RDF authoring format that extends Markdown with explicit {...} annotations.
🚀 Quick Start
pnpm install mdld-parseimport { parse } from 'mdld-parse';
const result = parse(`
[ex] <http://example.org/>
# Document {=ex:doc .ex:Article label}
[Alice] {?ex:author =ex:alice .prov:Person ex:firstName label}
[Smith] {ex:lastName}`);
console.log(result.quads);
// RDF/JS quads ready for n3.js, rdflib, etc.
// @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
// @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
// @prefix prov: <http://www.w3.org/ns/prov#>.
// @prefix ex: <http://example.org/>.
// ex:doc a ex:Article;
// rdfs:label "Document";
// ex:author ex:alice.
// ex:alice a prov:Person;
// rdfs:label "Alice";
// ex:firstName "Alice";
// ex:lastName "Smith".📚 Documentation Hub
- 📖 Documentation - Complete documentation with guides and references
- 🎯 Examples - Real-world MD-LD examples and use cases
- 📋 Specification - Formal specification and test suite
✨ Core Features
- 🔗 Prefix folding - Build hierarchical namespaces with lightweight IRI authoring
- 📍 Subject declarations -
{=IRI}and{=#fragment}for context setting - 🎯 Object IRIs -
{+IRI}and{+#fragment}for temporary object declarations - 🔄 Three predicate forms -
p(S→L),?p(S→O),!p(O→S) - 🏷️ Type declarations -
.Classfor rdf:type triples - 📅 Datatypes & language -
^^xsd:dateand@ensupport - 🧩 Fragments - Document structuring with
{=#fragment} - ⚡ Polarity system - Sophisticated diff authoring with
+and-prefixes - 📍 Origin tracking - Complete provenance with lean quad-to-source mapping
- 🎯 Elevated statements - Automatic rdf:Statement pattern detection for "golden" graph extraction
- 🏷️ Primary metadata - Structured primary object with subject, type, and label for document identity
🌟 What is MD-LD?
MD-LD allows you to author RDF graphs directly in Markdown using explicit {...} annotations:
[my] <tag:[email protected],2026:>
# 2024-07-18 {=my:journal-2024-07-18 .my:Event my:date ^^xsd:date}
## A good day {label}
Mood: [Happy] {my:mood}
Energy level: [8] {my:energyLevel ^^xsd:integer}
Met [Sam] {+my:sam .my:Person ?my:attendee} on my regular walk at [Central Park] {+my:central-park ?my:location .my:Place label @en} and talked about [Sunny] {my:weather} weather. Generates valid RDF triples with complete provenance tracking.
📦 Installation
Node.js
pnpm install mdld-parseimport { parse } from 'mdld-parse';
const markdown = `[ex] <tag:[email protected],2026:>
# Demo document {=ex:example/doc .prov:Entity label}
> A demo document for MD-LD {comment}
[Alice] {+ex:alice ?ex:author .prov:Person label}
`;
const result = parse({ text: markdown });
console.log(result.quads);
// RDF/JS quads ready for n3.js, rdflib, etc.Browser (ES Modules)
<script type="module">
import { parse } from 'https://cdn.jsdelivr.net/npm/mdld-parse/+esm';
const result = parse('[ex] <tag:[email protected],2026:>\n\n# Hello {=ex:hello label}');
</script>🧠 Semantic Model
MD-LD encodes a directed labeled multigraph where three nodes may be in scope:
- S — current subject (IRI)
- O — object resource (IRI from link/image)
- L — literal value (string + optional datatype/language)
Predicate Routing (§8.1)
Each predicate form determines the graph edge:
| Form | Edge | Example | Meaning |
|-------|---------|------------------------------|------------------|
| p | S → L | [Alice] {label} | literal property |
| ?p | S → O | [NASA] {=ex:nasa ?org} | object property |
| !p | O → S | [Parent] {=ex:p !hasPart} | reverse object |
📍 Elevated Statements
MD-LD automatically detects rdf:Statement patterns during parsing and extracts elevated SPO quads for convenient consumption by applications.
Pattern Detection
When the parser encounters a complete rdf:Statement pattern with rdf:subject, rdf:predicate, and rdf:object, it automatically adds the corresponding SPO quad to the statements array:
[ex] <http://example.org/>
## Elevated statement {=ex:stmt1 .rdf:Statement}
**Alice** {+ex:alice ?rdf:subject} *knows* {+ex:knows ?rdf:predicate} **Bob** {+ex:bob ?rdf:object}
Direct statement:**Alice** {=ex:alice} knows **Bob** {?ex:knows +ex:bob} 🎨 Syntax Quick Reference
Subject Declaration
Set current subject (emits no quads):
## Apollo 11 {=ex:apollo11}Fragment Syntax
Create fragment IRIs relative to current subject:
# Document {=ex:document}
{=#summary}
[Content] {label}Fragments replace any existing fragment and require a current subject.
Type Declaration
Emit rdf:type triple:
## Apollo 11 {=ex:apollo11 .ex:SpaceMission .ex:Event}Literal Properties
Inline value carriers emit literal properties:
# Mission {=ex:apollo11}
[Neil Armstrong] {ex:commander}
[1969] {ex:year ^^xsd:gYear}
[Historic mission] {ex:description @en}Object Properties
Links create relationships (use ? prefix):
# Mission {=ex:apollo11}
[NASA] {=ex:nasa ?ex:organizer}Resource Declaration
Declare resources inline with {=iri}:
# Mission {=ex:apollo11}
[Neil Armstrong] {=ex:armstrong ?ex:commander .prov:Person}🔧 API Reference
parse({ text, context, dataFactory, graph })
Parse MD-LD markdown and return RDF quads with lean origin tracking.
Parameters (named object):
text(string, required) — MD-LD formatted textcontext(object, optional) — Prefix mappings (default:{ '@vocab': 'http://www.w3.org/2000/01/rdf-schema#', rdf, rdfs, xsd, sh, prov })dataFactory(object, optional) — Custom RDF/JS DataFactorygraph(string, optional) — Named graph IRI
Returns: { quads, remove, statements, origin, context, primarySubject, primary, md }
quads— Array of RDF/JS Quads (final resolved graph state)remove— Array of RDF/JS Quads (external retractions targeting prior state)statements— Array of elevated RDF/JS Quads extracted from rdf:Statement patternsorigin— Lean origin tracking object with quadIndex for UI navigationcontext— Final context used (includes prefixes)primarySubject— String IRI or null (canonical append identity)primary— Object containing primary metadata (semantic surface descriptor)md— Clean Markdown with all MD-LD annotations stripped (round-trip safe)
Dual-Layer Architecture:
| Layer | Field | Purpose | Use Cases |
|-------|-------|---------|-----------|
| Canonical Identity | primarySubject | Append routing, storage, synchronization | append(), file placement, authority validation |
| Semantic Surface | primary | UI, indexing, navigation, agent orientation | Dashboards, search, previews, timelines |
Primary Object Structure:
primary: {
subject: string | null, // First non-fragment subject declaration
type: string | null, // First rdf:type declaration
label: string | null // First rdfs:label literal
}merge(docs, options)
Merge multiple MDLD documents with diff polarity resolution.
Parameters:
docs(array) — Array of markdown strings or ParseResult objectsoptions(object, optional):context(object) — Prefix mappings (merged with DEFAULT_CONTEXT)
Returns: { quads, remove, origin, context, primarySubjects, primary }
quads— Array of RDF/JS Quads (final resolved graph state)remove— Array of RDF/JS Quads (external retractions targeting prior state)origin— Merge origin with document trackingcontext— Final context used (includes prefixes)primarySubjects— Array of string IRIs (canonical append identities, ordered by merge)primary— Array of primary objects (semantic surface descriptors, ordered by merge)
Dual-Layer Architecture:
| Layer | Field | Purpose | Use Cases |
|-------|-------|---------|-----------|
| Canonical Identity | primarySubjects | Append routing, storage, synchronization | Multi-document append, file organization |
| Semantic Surface | primary | UI, indexing, navigation, agent orientation | Vault indexing, document discovery, search |
Primary Object Array Structure:
primary: [
{
subject: string | null, // First non-fragment subject declaration
type: string | null, // First rdf:type declaration
label: string | null // First rdfs:label literal
},
// ... one object per document
]generate({ quads, context, primarySubject })
Generate deterministic MDLD from RDF quads with visual styling.
Parameters (named object):
quads(array, required) — Array of RDF/JS Quads to convertcontext(object, optional) — Prefix mappings (default:{})primarySubject(string, optional) — String IRI to place first in output (ensures round-trip safety). If not provided, falls back to the first subject from quads.
Returns: { text, context }
Features:
- Visual carrier styles based on datatype (code spans for numbers, bold booleans, etc.)
- Label-in-heading: Uses
rdfs:labelin subject headings when available - Multiple labels: First label in heading, additional labels rendered as literals
- Round-trip safe: All data preserved through parse → generate → parse
- Composable:
generate(parse(text))extracts semantics;parse(generate({quads}))normalizes quads
generateNode({ quads, focusIRI, context })
Generate node-centric MDLD showing all quads where a specific IRI appears in any position.
Parameters (named object):
quads(array, required) — Array of RDF/JS Quads to searchfocusIRI(string, required) — The IRI to center the view oncontext(object, optional) — Prefix mappings (default:{})
Returns: { text, context }
Behavior (Safety-First):
- If
focusIRIis null/undefined: Returns empty text - If
focusIRInot in graph: Returns empty text (never falls back to all data) - If
quadsis empty: Returns empty text
Safety rationale: Prevents accidental rendering of entire databases on misspelled IRIs—critical for production use with LLM cost per token. Explicit emptiness signals "not found" to the caller.
Use case: Perfect for exploring a specific node and all its relationships—where it appears as subject, object, predicate, type, or datatype. Creates an exhaustive view of everything related to the focus IRI. Ideal for node-centric knowledge graph explorers.
updateValue() — Update Quad Carrier Text in Source Text
Update the carrier text of a literal quad in MDLD text. Only the carrier content is replaced — datatype (^^xsd:integer) and language (@en) annotations inside the {…} block are preserved as-is, since they are part of the annotation, not the carrier.
import { parse, updateValue } from 'mdld-parse';
const mdld = `[ex] <http://example.org/>
# Article {=ex:article .ex:Article}
[Alice Smith] {ex:author}`;
const result = parse({ text: mdld });
const authorQuad = result.quads.find(q =>
q.subject.value === 'http://example.org/article' &&
q.predicate.value === 'http://example.org/author'
);
const updatedText = updateValue({
text: mdld,
quad: authorQuad,
value: 'Bob Johnson',
origin: result.origin // optional (auto-parses if not provided)
});
console.log(updatedText);
// [ex] <http://example.org/>
//
// # Article {=ex:article .ex:Article}
//
// [Bob Johnson] {ex:author}Annotation annotations are preserved:
// Original: > 25 {ex:age ^^xsd:integer}
updateValue({ text, quad, value: '30' });
// Result: > 30 {ex:age ^^xsd:integer} ← datatype preserved
// Original: > Hello {ex:greeting @en}
updateValue({ text, quad, value: 'Good morning' });
// Result: > Good morning {ex:greeting @en} ← language tag preservedParameters:
text(string) — The original MDLD textquad(object) — The quad to update (subject, predicate, object)value(string) — The new carrier text to setorigin(object, optional) — ParseResult.origin (auto-parses if not provided)
Returns: Updated MDLD text, or original text if update fails (fail-safe)
How it works:
- Uses
locate()to find the quad's position in the source text - Uses
valueRangeto replace only the carrier text (excluding carrier markers like[,]) - Annotation block
{…}with predicate, datatype, language is untouched - Auto-parses if result not provided (convenient but less efficient)
Fail-safe behavior:
- Returns original text if quad cannot be located
- Returns original text if valueRange is not available
- Never corrupts the source text
Use case: Perfect for editor applications that need to update literal values while preserving carrier syntax, datatype annotations, and language tags.
Composition Patterns
With the unified named parameter API, parse() and generate() compose seamlessly through object spreading:
import { parse, generate, generateNode } from 'mdld-parse';
// Pattern 1: parse → generate (semantic extraction)
const canonical = generate({ ...parse({ text, context }) });
// text → quads → canonical MDLD (deterministic, visual styling applied)
// Pattern 2: generate → parse (normalize external RDF)
const normalized = parse({ ...generate({ quads: externalQuads, context }) });
// external quads → MDLD → validated quads (DataFactory-safe, no blank nodes)
// Pattern 3: parse → generateNode (node-centric exploration)
const nodeView = generateNode({ ...parse({ text }), focusIRI: 'http://example.org/alice' });
// full graph → isolated node view (safe: returns empty if IRI not found)Why this works:
parse()returns{ quads, context, primarySubject, md, ... }generate()accepts{ quads, context, primarySubject }generateNode()accepts{ quads, context, focusIRI }(with focusIRI override)- Perfect shape alignment enables elegant
{ ...spread }composition
The md Field — Clean Markdown Extraction
Every parse() result includes a md field containing the original Markdown with all MD-LD annotations stripped:
const result = parse({
text: `# Document {=ex:doc .Article}\n[Content] {ex:content}`,
context: { ex: 'http://example.org/' }
});
console.log(result.md);
// # Document\nContent
// Round-trip safety: re-parsing clean MD produces zero quads
const reparsed = parse({ text: result.md });
console.log(reparsed.quads.length); // 0Behavior:
- Valid MD-LD annotations (
{=...},{+...},{...}) are completely removed - Content from value carriers (
[text],**bold**,`code`) is preserved - Invalid syntax (annotations not at end-of-line) is preserved as visible markers
- Headings, lists, blockquotes, code blocks maintain their structure
- Prefix declarations at start of line are stripped
- Standalone subject declarations (
{=ex:subject}) are stripped
Use cases:
- Content extraction — Get readable Markdown without semantic markup
- Syntax validation — Remaining
{...}patterns indicate invalid MD-LD syntax - Round-trip testing —
parse(md).mdshould parse to zero quads - Preview generation — Show clean document before publishing
locate(quad, origin)
Locate the origin entry for a quad using the lean origin system.
Parameters:
quad(object) — The quad to locate (subject, predicate, object)origin(object) — Origin object containing quadIndex
Returns: { blockId, range, valueRange, carrierType, subject, predicate, context, value, polarity } or null
range: Full character range including carrier markers (e.g.,[,],{,})valueRange: Character range excluding carrier markers (null if not available)
render(quads, options)
Render RDF quads as HTML+RDFa for web display.
Parameters:
quads(array) — Array of RDF/JS Quads to renderoptions(object, optional):context(object) — Prefix mappings for CURIE shorteningbaseIRI(string) — Base IRI for resolving relative references
Returns: { html, context }
Utility Functions
import {
DEFAULT_CONTEXT, // Default prefix mappings
DataFactory, // RDF/JS DataFactory instance
hash, // String hashing function
expandIRI, // IRI expansion with context
shortenIRI, // IRI shortening with context
parseSemanticBlock // Parse semantic block syntax
} from 'mdld-parse';🏗️ Architecture
Design Principles
- Zero dependencies — Pure JavaScript, ~15KB minified
- Streaming-first — Single-pass parsing, O(n) complexity
- Standards-compliant — RDF/JS data model
- Origin tracking — Full round-trip support with source maps
- Explicit semantics — No guessing, inference, or heuristics
RDF/JS Compatibility
Quads are compatible with:
n3.js— Turtle/N-Triples serializationrdflib.js— RDF store and reasoningsparqljs— SPARQL queriesrdf-ext— Extended RDF utilities
🧪 Testing
The parser includes comprehensive tests covering all spec requirements:
pnpm testTests validate:
- Subject declaration and context
- All predicate forms (p, ?p, !p)
- Datatypes and language tags
- Explicit list item annotations
- Code blocks and blockquotes
- Round-trip serialization
