@markschema/mdshape

v1.0.0

Published

a month ago

Type-safe Markdown validation with a schema-first API.

0High
0Medium
0Low

danielrefiski

markdown schema validation type-safe remark frontmatter mdshape markschema

Read the docs →

What is mdshape?

mdshape transforms unstructured Markdown into typed, validated data. You define a schema — sections, fields, frontmatter, lists — and mdshape parses the Markdown AST, extracts the values, and returns precise errors when the document doesn't match.

Features

Schema-driven extraction — define schemas with document(), section(), match(), and block(), get typed JSON back
Type-safe — full TypeScript type inference from your schemas
Structure validation — enforce heading order, section sequence, field presence, and block constraints
Rich block support — tables, code blocks, Mermaid diagrams, math expressions, footnotes, images, and links — all typed in a single schema
GitHub Flavored Markdown — tables, task lists, strikethrough out of the box
Math — LaTeX notation via remark-math
Frontmatter — YAML parsing and validation with metadata() and metadataObject()
Typed diagnostics — issue code, field path, line number, column position, and actionable messages
RAG-ready — convert Markdown to structured, typed JSON chunks ready for vector databases
Composable — schemas compose like Zod: .optional(), .default(), .refine(), .transform(), .pipeline()
Zero config — works out of the box with sensible defaults
Zero external runtime — built on top of remark, no additional dependencies needed

Installation

npm install @markschema/mdshape

Basic usage

Say you have a Markdown file like this:

---
title: mdshape
version: 1
---

# What is mdshape?

## Overview

- Name: mdshape
- Category: Schema Validation
- License: MIT

## Description

**mdshape** is a TypeScript library that lets you define schemas for Markdown
documents. Think of it as **Zod or Yup for Markdown** — it parses and validates
the structure of your `.md` files, extracting typed data you can trust.

## Use Cases

### RAG Pipelines

**SUMMARY:** Validate Markdown before feeding it into your retrieval-augmented
generation pipeline. Catch structural errors early and ensure consistent
document formats.

### PDF-to-Markdown

**SUMMARY:** After converting PDFs to Markdown, use mdshape to verify the output
matches your expected structure — headings, sections, metadata, and fields.

### Documentation Standards

**SUMMARY:** Enforce consistent structure across your docs: required sections,
valid metadata, and properly formatted content.

You define a schema that describes the expected structure:

import { md } from "@markschema/mdshape";

const useCaseSchema = md.object({
  title: md.headingText(),
  summary: md.match.label("SUMMARY").value(md.string().min(20)),
});

const schema = md.document({
  metadata: md.metadataObject(
    md.object({
      title: md.string().min(1),
      version: md.coerce.number().pipeline(md.number().int().min(1)),
    }),
  ),
  title: md.heading(1),
  overview: md.section("Overview").fields({
    Name: md.string().min(1),
    Category: md.string(),
    License: md.enum(["MIT", "Apache-2.0", "GPL-3.0"]),
  }),
  description: md.section("Description").paragraph(),
  useCases: md.section("Use Cases").subsections(3).each(useCaseSchema).min(2),
});

Parsing data

Use .safeParse to validate the Markdown string against your schema. If it's valid, you get back strongly-typed, validated data.

const result = schema.safeParse(markdownString);

if (result.success) {
  console.log(result.data);
}

Output:

{
  "metadata": {
    "title": "mdshape",
    "version": 1
  },
  "title": "What is mdshape?",
  "overview": {
    "Name": "mdshape",
    "Category": "Schema Validation",
    "License": "MIT"
  },
  "description": "mdshape is a TypeScript library that lets you define schemas for Markdown documents. Think of it as Zod or Yup for Markdown — it parses and validates the structure of your .md files, extracting typed data you can trust.",
  "useCases": [
    {
      "title": "RAG Pipelines",
      "summary": "Validate Markdown before feeding it into your retrieval-augmented generation pipeline. Catch structural errors early and ensure consistent document formats."
    },
    {
      "title": "PDF-to-Markdown",
      "summary": "After converting PDFs to Markdown, use mdshape to verify the output matches your expected structure — headings, sections, metadata, and fields."
    },
    {
      "title": "Documentation Standards",
      "summary": "Enforce consistent structure across your docs: required sections, valid metadata, and properly formatted content."
    }
  ]
}

You can also use .parse() which returns data directly or throws a TypeMdError on failure:

const data = schema.parse(markdownString);

Handling errors

When validation fails, .safeParse() returns detailed errors with position information pointing to the exact location in the original Markdown.

const result = schema.safeParse(invalidMarkdown);

if (!result.success) {
  result.error.issues;
  /* [
    {
      code: 'invalid_type',
      path: ['metadata', 'title'],
      message: 'Expected string',
      position: { start: { line: 2, column: 1 } }
    }
  ] */
}

Main builders

| Builder | Description | | --------------------------------------------------- | ----------------------------- | | md.document() | Root document schema | | md.section() | Groups content under headings | | md.heading() | Validates headings by level | | md.headingText() | Extracts heading text | | md.block() | Block-level elements | | md.object() | Extracts structured fields | | md.metadata() | Validates YAML frontmatter | | md.match() | Pattern matching with labels | | md.string() md.number() md.boolean() | Typed primitives | | md.email() md.url() md.date() | Specialized primitives | | md.enum() md.literal() | Restricted values | | md.array() md.tuple() md.list() md.record() | Collections | | md.union() md.discriminatedUnion() | Composite types | | md.preprocess() md.coerce() | Input transforms |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme