rdf-parser-ts

v0.2.5

Published

8 hours ago

Fast RDF-JS parser and writer for Turtle, N-Triples, N-Quads, and TriG for the browser and NodeJS.

0High
0Medium
0Low

pietercolpaert

rdf rdf-js turtle trig n-triples n-quads parser

RDF Parser for TypeScript

Fast RDF-JS parsing for Turtle, N-Triples, N-Quads, and TriG for the browser and NodeJS. Try the interactive browser demo at https://www.pieter.pm/rdf-parser.ts/

The package exposes an Comunica-compatible Parser and StreamParser API so it can be evaluated as a replacement for N3.js in consumers such as Comunica's actor-rdf-parse-n3.

The implementation is intentionally compact: the hot path is a single scanner/parser that avoids reusable lexer abstractions, token object allocation, and runtime dependency overhead. Public APIs stay RDF-JS-compatible while parser internals remain optimized for machine-generated maintenance.

[!NOTE]
I built this as an agentic coding experiment for myself. I’m happy to see spec compliance, messages support, and a significant performance improvement over N3.js, but integration tests with other software will need to show whether this work is as maintainable and useful as other libraries. This project would not have been possible without Blake Regalia’s work on Graphy and Ruben Verborgh’s work on N3.js.

Supported formats

N-Triples
N-Quads
Turtle prefixes, base IRIs, literals, numeric/boolean literals, predicate/object lists, blank nodes, collections, and RDF1.2 triple terms
TriG graph blocks and RDF1.2 triple terms
RDF Message Logs for N-Triples, N-Quads, Turtle, and TriG through VERSION "...-messages", MESSAGE, @version, and @message .

The spec-test wiring is copied from the adjacent N3.js setup and runs the official RDF 1.1 and RDF1.2 manifests through rdf-test-suite. This initial parser scaffold is designed to grow toward full conformance while keeping performance-focused internals.

Install

npm install rdf-parser-ts

For local development:

npm install
npm run build
npm test

Node.js 24 or newer is required.

Package layout

src/index.ts contains the RDF-JS data model, parser, stream parser, and serializer helpers.
src/bin/rdf-parser.ts provides the rdf-parser-ts CLI.
test/ contains unit tests with Vitest.
spec/ contains the rdf-test-suite adapter and EARL metadata, matching the N3.js spec-test setup.
perf/ contains synthetic performance benchmarks against N3.js.
dist/ is generated by npm run build, including Node.js builds and minified browser bundles.

Build and validation scripts

npm run build        # Build CommonJS, ESM, declarations, CLI, and minified browser bundles
npm run build:browser # Build dist/browser/index.mjs and dist/browser/index.global.js
npm run lint         # Type-check with tsc --noEmit
npm test             # Run unit tests
npm run check        # Type-check, build, then test
npm run ci           # Run check plus the quick performance regression warning check
npm run hooks:install # Install the pre-commit hook that rebuilds and stages dist/
npm run spec         # Run RDF 1.1 and RDF1.2 spec suites
npm run perf         # Benchmark 10⁴, 10⁵, and 10⁶ generated quads/triples
npm run perf:quick   # Smaller benchmark for local iteration
npm run perf:regression # Compare current build to a git baseline and warn on >20% throughput drops
npm run perf:graphy  # Graphy-compatible benchmark without RDF1.2 triple terms

The package exports CommonJS, ESM, and browser builds:

{
  "main": "./dist/index.js",
  "types": "./dist/index.d.ts",
  "browser": "./dist/browser/index.mjs",
  "unpkg": "./dist/browser/index.global.js"
}

The browser bundle is minified and supports string parsing with Parser, RDF serialization with Writer, and Web Streams parsing with StreamParser. The Node.js build keeps the Node Transform-based StreamParser; the browser build exposes a Web Streams-compatible StreamParser with readable and writable properties.

Browser usage and bundle size

With a browser-aware bundler, import the browser entry explicitly:

import { Parser, Writer, quadToString } from 'rdf-parser-ts/browser';

const quads = new Parser({ baseIRI: 'https://example.org/' }).parse('<s> <p> <o>.') ?? [];
console.log(quadToString(quads[0]!));

const writer = new Writer({ prefixes: { ex: 'https://example.org/' } });
writer.addQuad(quads[0]!);
writer.end((error, output) => {
  if (error) throw error;
  console.log(output);
});

For direct browser usage through a CDN, use the minified ESM bundle:

<script type="module">
  import { Parser, quadToString } from 'https://cdn.jsdelivr.net/npm/rdf-parser-ts/dist/browser/index.mjs';

  const quads = new Parser({ baseIRI: 'https://example.org/' }).parse('<s> <p> <o>.') ?? [];
  console.log(quadToString(quads[0]));
</script>

Or use the minified global bundle, which exposes RDFParserTS:

<script src="https://unpkg.com/rdf-parser-ts/dist/browser/index.global.js"></script>
<script>
  const { Parser, quadToString } = RDFParserTS;
  const quads = new Parser({ baseIRI: 'https://example.org/' }).parse('<s> <p> <o>.') || [];
  console.log(quadToString(quads[0]));
</script>

For streaming in browsers, StreamParser works with Web Streams. It can be passed to pipeThrough() or used through its import() convenience method:

import { StreamParser, quadToString } from 'rdf-parser-ts/browser';

const parser = new StreamParser({ baseIRI: 'https://example.org/' });
const rdfStream = new Blob(['<s> <p>', ' <o>.']).stream();

for await (const quad of rdfStream.pipeThrough(parser)) {
  console.log(quadToString(quad));
}

The browser StreamParser accepts string, Uint8Array, and ArrayBuffer chunks, emits RDF-JS quads, and supports prefix, comment, and messageCounter listeners with on() or addEventListener().

The browser build follows the same package-shipping idea as N3.js—publish a prebuilt minified browser artifact—but uses a browser-specific entry and sideEffects: false so bundlers avoid pulling Node stream code into browser builds.

Current browser bundle sizes after npm run build:

| Bundle | Raw | gzip | | --- | ---: | ---: | | dist/browser/index.mjs | 40,160 bytes (39.2 KiB) | 10,567 bytes (10.3 KiB) | | dist/browser/index.global.js | 40,643 bytes (39.7 KiB) | 10,747 bytes (10.5 KiB) |

The example/ folder contains a browser-only parser and writer demo. It accepts a URL or pasted RDF text, auto-detects the input format from URL, content type, or syntax hints, optionally serializes the parsed data in the selected output format, and reports quads/messages per second while processing.

Parsing strings

import { Parser, quadToString } from 'rdf-parser-ts';

const parser = new Parser({ baseIRI: 'http://example.org/' });
const quads = parser.parse(`
  @prefix ex: <http://example.com/>.
  ex:s ex:p "hello"@en;
       ex:n 42;
       a ex:Thing.
`);

for (const quad of quads ?? []) {
  console.log(quad.subject.termType, quad.predicate.value, quad.object.value);
  console.log(quadToString(quad));
}

Parser#parse() returns RDF-JS quads when no callback is provided. With a callback, it follows the N3.js-style callback flow and calls the callback once per quad, then once with quad === null and the prefix map.

const parser = new Parser();

parser.parse('<s> <p> <o>.', (error, quad, prefixes) => {
  if (error) throw error;
  if (quad) console.log(quad);
  else console.log('done', prefixes);
});

Writing RDF

Writer serializes RDF-JS quads to Turtle/TriG-style output by default and supports N-Triples or N-Quads line formats through the format option. Its API follows N3.js-style usage: add quads with addQuad() or addQuads(), add prefixes with addPrefix() or addPrefixes(), and collect the final string with end() when no output stream is supplied.

import { DataFactory, Writer } from 'rdf-parser-ts';

const { namedNode, literal, quad } = DataFactory;

const writer = new Writer({ prefixes: { ex: 'https://example.org/' } });
writer.addQuad(quad(
  namedNode('https://example.org/s'),
  namedNode('https://example.org/p'),
  literal('hello'),
));

writer.end((error, output) => {
  if (error) throw error;
  console.log(output);
  // @prefix ex: <https://example.org/>.
  //
  // ex:s ex:p "hello".
});

For line-based output, choose N-Triples or N-Quads:

const writer = new Writer({ format: 'N-Quads' });
writer.addQuad(quad(namedNode('s'), namedNode('p'), literal('o'), namedNode('g')));
writer.end((error, output) => {
  if (error) throw error;
  console.log(output); // <s> <p> "o" <g> .
});

Writer also supports blank-node property-list helpers through blank(), RDF list helpers through list(), RDF 1.2 triple terms, base IRI shortening, datatype/language literal serialization, and output streams. The parser accepts RDF 1.2 triple terms (<<(...)>>) and Turtle/TriG reified triples (<<...>>) with RDF 1.2 rdf:reifies semantics. On Node.js, StreamWriter is a Transform stream in object mode for serializing quad streams to text.

Writer can also serialize RDF Message Logs. In line formats such as N-Quads, it writes VERSION "1.2-messages" and MESSAGE delimiters. In Turtle/TriG-style output, it writes @version "1.2-messages" . and @message . delimiters.

One option is to pass message quads directly, which is useful when piping parser output or streaming message entries. Gaps in messageCounter values are preserved as empty messages.

import { Parser, Writer, isMessageQuad } from 'rdf-parser-ts';

const output = new Parser({ format: 'N-Quads', rdfMessages: true }).parse(`
  <http://example.org/s1> <http://example.org/p> <http://example.org/o1> .
  MESSAGE
  <http://example.org/s2> <http://example.org/p> <http://example.org/o2> .
`);

const writer = new Writer({ format: 'N-Quads' });

for (const item of output ?? []) {
  writer.addQuad(isMessageQuad(item) ? item : { quad: item, messageCounter: 0 });
}

writer.end((error, serialized) => {
  if (error) throw error;
  console.log(serialized);
});

Alternatively, call addMessage() with the quads belonging to each message:

import { DataFactory, Writer } from 'rdf-parser-ts';

const { namedNode, quad } = DataFactory;

const writer = new Writer({ prefixes: { ex: 'http://example.org/' }, version: '1.2-messages' });

writer.addMessage([
  quad(namedNode('http://example.org/s1'), namedNode('http://example.org/p'), namedNode('http://example.org/o1')),
]);
writer.addMessage([]); // preserve an empty message
writer.addMessage([
  quad(namedNode('http://example.org/s2'), namedNode('http://example.org/p'), namedNode('http://example.org/o2')),
]);

writer.end((error, serialized) => {
  if (error) throw error;
  console.log(serialized);
});

StreamWriter accepts both RDF-JS quads and { quad, messageCounter } entries, so new StreamParser({ rdfMessages: true }).pipe(new StreamWriter({ format: 'N-Quads' })) preserves message boundaries.

RDF Messages

RDF Messages mode is enabled automatically when the input contains a messages version label, such as VERSION "1.2-messages" or @version "1.2-messages" .. It can also be enabled explicitly with rdfMessages: true or messages: true in the parser options.

When RDF Messages mode is active, Parser#parse() returns entries that contain both the parsed quad and the message counter. Counters start at 0 and increase at each MESSAGE or @message . delimiter.

import { Parser, isMessageQuad, quadToString } from 'rdf-parser-ts';

const output = new Parser().parse(`
  VERSION "1.2-messages"
  <http://example.org/s1> <http://example.org/p> <http://example.org/o1> .
  MESSAGE
  <http://example.org/s2> <http://example.org/p> <http://example.org/o2> .
`);

for (const entry of output ?? []) {
  if (isMessageQuad(entry)) {
    console.log(entry.messageCounter, quadToString(entry.quad));
  }
}

The callback form still emits quads, with an additional optional message-counter argument when RDF Messages mode is active:

new Parser().parse(input, (error, quad, prefixes, messageCounter) => {
  if (error) throw error;
  if (quad) console.log(messageCounter, quadToString(quad));
});

Use toMessages() to group parser output into Message instances. Message extends Array and contains the quads belonging to one RDF Message. Empty messages are preserved when the input contains delimiters before the first quad or between two delimiters, while a final delimiter after a non-empty message does not create an additional empty trailing message.

import { Parser, toMessages } from 'rdf-parser-ts';

const output = new Parser({ rdfMessages: true }).parse(`
  MESSAGE
  <http://example.org/s> <http://example.org/p> <http://example.org/o> .
`);

const messages = toMessages(output ?? []);
console.log(messages[0]?.length); // 0
console.log(messages[1]?.length); // 1

For direct message-level parsing, use parseMessages():

const messages = new Parser({ baseIRI: 'http://example.org/' }).parseMessages(`
  VERSION "1.2-messages"
  <s1> <p> <o1> .
  MESSAGE
  <s2> <p> <o2> .
`);

Blank node labels are scoped per message in RDF Messages mode, so the same blank node label in two messages produces distinct blank node terms.

Streaming parsing

StreamParser is a Node.js Transform stream in object mode. It accepts string or Buffer chunks and emits RDF-JS quads. In RDF Messages mode, it emits { quad, messageCounter } entries and a messageCounter event for each parsed quad.

import { createReadStream } from 'node:fs';
import { StreamParser } from 'rdf-parser-ts';

const parser = new StreamParser({
  baseIRI: 'http://example.org/',
  format: 'application/n-quads',
});

createReadStream('data.nq')
  .pipe(parser)
  .on('data', quad => {
    console.log(quad.subject.value, quad.predicate.value, quad.object.value);
  })
  .on('prefix', (prefix, iri) => {
    console.log('prefix', prefix, iri.value);
  })
  .on('comment', comment => {
    console.log('comment', comment);
  });

The import() convenience method mirrors N3.js:

const parser = new StreamParser();
parser.import(createReadStream('data.ttl')).on('data', quad => console.log(quad));

RDF-JS data model

The default DataFactory creates RDF-JS-compatible terms:

import { DataFactory } from 'rdf-parser-ts';

const s = DataFactory.namedNode('http://example.org/s');
const p = DataFactory.namedNode('http://example.org/p');
const o = DataFactory.literal('hello', 'en');
const q = DataFactory.quad(s, p, o);

console.log(q.termType);        // Quad
console.log(q.object.termType); // Literal
console.log(q.equals(q));       // true

Exports include:

DataFactory
NamedNode
BlankNode
Literal
Variable
DefaultGraph
Quad
Message
Writer
StreamWriter
namedNode
blankNode
literal
variable
defaultGraph
quad
termToString()
quadToString()
termToId()
termFromId()
isMessageQuad()
toMessages()

Custom RDF-JS factories

Pass a custom factory to produce terms owned by another RDF-JS implementation, such as Comunica's data factory.

import { StreamParser } from 'rdf-parser-ts';

const parser = new StreamParser({
  factory: dataFactory,
  baseIRI: action.metadata?.baseIRI,
  format: mediaType,
  parseUnsupportedVersions: true,
  version: action.metadata?.version,
});

This option shape matches the usage pattern in Comunica's ActorRdfParseN3: a consumer can replace import { StreamParser } from 'n3' with import { StreamParser } from 'rdf-parser-ts' for evaluation.

CLI

After building or installing the package, the rdf-parser-ts binary reads RDF from a file or stdin and writes N-Quads-style output.

rdf-parser-ts --base http://example.org/ data.ttl
cat data.nq | rdf-parser-ts --format application/n-quads

Options:

--format, -f: format hint, such as text/turtle or application/n-quads.
--base, -b: base IRI for relative IRIs.
--help, -h: print usage.

RDF Working Group test suites

The spec/ setup mirrors N3.js:

spec/parser.cjs implements the rdf-test-suite parser interface by piping streamify-string(data) into new StreamParser(...) and collecting with arrayify-stream.
spec/earl-meta.json contains metadata for EARL report generation.
.rdf-test-suite-cache/ is used for downloaded manifests.
Library-specific RDF Messages tests cover the behavior described by the RDF Messages tests document: VERSION and @version, MESSAGE and @message ., message counters, empty messages, final delimiters, repeated prefixes, named graphs, blank-node scoping, and delimiter errors.

Available spec commands:

npm run spec-1-1-ntriples
npm run spec-1-1-nquads
npm run spec-1-1-turtle
npm run spec-1-1-trig
npm run spec-1-2-ntriples
npm run spec-1-2-nquads
npm run spec-1-2-turtle
npm run spec-1-2-trig
npm run spec-1-1-earl
npm run spec-1-2-earl

Use npm run spec-clean to remove the manifest cache.

The N-Triples and N-Quads RDF 1.1/RDF1.2 scripts run without skips. The Turtle and TriG scripts run the same official manifests with explicit --skip patterns for currently unsupported edge cases such as full PN_CHARS Unicode coverage, escaped prefixed names, some IRI-resolution cases, RDF1.2 annotation/reifier syntax, and Turtle/TriG version directives. This keeps npm run spec reproducible and green while making the remaining conformance work visible in package.json.

Performance benchmarks

The benchmark generates synthetic RDF1.2 N-Quads-like input with a mix of:

default-graph triples
named-graph quads
IRI objects
string literals
language-tagged literals
integer, decimal, and boolean literals
triple terms as objects

Default sizes are $10^4$, $10^5$, and $10^6$ statements.

npm run perf
npm run perf:quick
node perf/bench.js --sizes 10000,50000 --no-n3
node perf/bench.js --sizes 10000,50000 --no-triple-terms

Graphy 4.x's N-Quads reader does not parse RDF1.2 triple terms, so the default triple-term benchmark prints a skipped Graphy row. Use --no-triple-terms or npm run perf:graphy for direct rdf-parser-ts, N3.js, Graphy, and Graphy relaxed-mode numbers on the same generated line-format input.

Quick benchmark snapshot

The following results were captured with Node.js v25.9.0 on Linux x64 using the quick benchmark commands. They are intended as a local performance snapshot, not as stable release guarantees; larger runs with npm run perf and node --expose-gc are more representative.

Default RDF1.2 triple-term input, from npm run perf:quick:

| Statements | Parser | Time | Throughput | Input | RSS delta | | ---: | --- | ---: | ---: | ---: | ---: | | 1,000 | rdf-parser-ts | 0.002s | 439,540 q/s | 0.1 MiB | 2.1 MiB | | 1,000 | rdf-parser-ts/relax | 0.001s | 782,497 q/s | 0.1 MiB | 0.4 MiB | | 1,000 | N3.js | 0.008s | 127,099 q/s | 0.1 MiB | 1.9 MiB | | 10,000 | rdf-parser-ts | 0.023s | 435,162 q/s | 1.1 MiB | 5.6 MiB | | 10,000 | rdf-parser-ts/relax | 0.012s | 839,620 q/s | 1.1 MiB | 5.0 MiB | | 10,000 | N3.js | 0.047s | 214,567 q/s | 1.1 MiB | 6.6 MiB |

Line-format input without RDF1.2 triple terms, from node perf/bench.js --sizes 1000,10000 --no-triple-terms:

| Statements | Parser | Time | Throughput | Input | RSS delta | | ---: | --- | ---: | ---: | ---: | ---: | | 1,000 | rdf-parser-ts | 0.001s | 773,045 q/s | 0.1 MiB | 0.5 MiB | | 1,000 | rdf-parser-ts/relax | 0.002s | 525,116 q/s | 0.1 MiB | 0.3 MiB | | 1,000 | N3.js | 0.006s | 166,228 q/s | 0.1 MiB | 1.9 MiB | | 1,000 | Graphy | 0.003s | 317,648 q/s | 0.1 MiB | 1.9 MiB | | 1,000 | Graphy/relax | 0.002s | 594,226 q/s | 0.1 MiB | -1.0 MiB | | 10,000 | rdf-parser-ts | 0.010s | 959,069 q/s | 0.9 MiB | 1.4 MiB | | 10,000 | rdf-parser-ts/relax | 0.010s | 982,829 q/s | 0.9 MiB | 4.7 MiB | | 10,000 | N3.js | 0.026s | 386,462 q/s | 0.9 MiB | 3.3 MiB | | 10,000 | Graphy | 0.011s | 881,601 q/s | 0.9 MiB | 7.5 MiB | | 10,000 | Graphy/relax | 0.005s | 1,857,713 q/s | 0.9 MiB | 2.4 MiB |

On these generated inputs, the strict parser is already ahead of N3.js, and relax: true improves the RDF1.2 triple-term case by reducing validation overhead on hot line-format paths. The no-triple-term run shows the intended fast-path shape most clearly: common escapeless N-Quads statements are parsed with direct index scanning, bounded named-node caching, and fallback only when the specialized parser cannot handle a line. Graphy remains a strong baseline for ordinary N-Quads and benefits from its own relaxed mode, but the current Graphy reader is skipped for the default RDF1.2 triple-term workload. Memory deltas in the quick run are noisy because the process is short-lived and includes JIT, parser warmup, and garbage-collection timing.

For cleaner memory measurements, run Node with explicit garbage collection:

npm run build
node --expose-gc perf/bench.js

The benchmark prints elapsed time, throughput, input size, and RSS delta for this parser and N3.js.

Performance-oriented implementation notes

The parser uses a single pass over the input string.
It tracks positions with numeric indexes and avoids token objects.
It emits quads directly from parse routines.
For strict N-Triples/N-Quads input, it uses a Graphy-inspired fast path for common escapeless statements before falling back to the general parser.
The fast path caches recurring predicate, datatype, and graph named nodes in a bounded cache.
A relax: true option mirrors Graphy's relaxed mode by skipping part of the validation cost on hot line-format paths and by fast-parsing common RDF1.2 triple-term objects; spec tests use strict validation by default.
The default data model has small classes with simple equals() implementations.
StreamParser incrementally parses complete statement prefixes and only retains incomplete trailing input between chunks; very large single statements still need to be held until their terminating boundary arrives.
Runtime dependencies are avoided for the library itself; dependencies are development/test/benchmark-only.

License

MIT Licensed