npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@thds/markdown-block-extractor

v1.2.1

Published

A TypeScript library for extracting structured blocks and media items from markdown content. Optimized for React, Vite, and Deno.

Readme

Markdown Block Extractor

A TypeScript library for extracting structured blocks and media items from markdown content. This library processes markdown with custom block markers and extracts both regular content blocks and media items with detailed metadata. Optimized for React, Vite, Deno, and modern JavaScript applications.

Features

  • Block Extraction: Extract content blocks marked with HTML comments
  • Flexible Block IDs: Support for any string as block ID including GUIDs, alphanumeric strings, and special characters
  • Media Detection: Automatically detect images and videos in both markdown and HTML syntax
  • Rich Metadata: Generate detailed metadata for each block including word count, line count, and content features
  • TypeScript Support: Full TypeScript definitions included
  • React/Vite Optimized: Built with Vite for optimal bundling in modern React applications
  • Deno Compatible: Works seamlessly in Deno environments and edge functions
  • Browser Compatible: Uses native crypto.randomUUID() for UUID generation
  • Tree Shakeable: ES modules with proper exports for efficient bundling
  • Multiple Build Formats: CommonJS, ES modules, UMD builds, and Deno imports available
  • Proper AST Structure: Each block contains a complete Abstract Syntax Tree with individual nodes
  • Title Extraction: Automatic title extraction from headings or first text line

How the Parsing Pipeline Works

The markdown block extractor uses a sophisticated multi-stage parsing pipeline to ensure accurate extraction and proper AST structure. Here's how it works:

1. Initial Markdown Parsing

const ast = unified()
  .use(remarkParse)
  .parse(markdown) as Node;

The original markdown is parsed into an Abstract Syntax Tree (AST) using remark-parse. This creates the initial structure, but blocks with HTML content get treated as single HTML nodes.

2. Transform Pipeline (Applied in Order)

The AST goes through several transformation stages:

Stage 1: Orphan Content Wrapper

  • Wraps content outside of blocks in custom blocks
  • Ensures all content is contained within blocks

Stage 2: Custom Blocks Processing

  • Identifies and processes <!-- block:id=X --> and <!-- custom-block:id=X --> markers
  • Creates BlockNode and CustomBlockNode structures
  • Collects content between markers as children

Stage 3: Media Extraction

  • Extracts images and videos from both markdown and HTML content
  • Associates media items with their containing blocks
  • Runs on the original parsed content (before re-parsing)

Stage 4: Block Extraction with Proper ASTKey Innovation

  • Re-parses block content to create proper AST structure
  • Uses position information to extract original markdown content
  • Re-parses that content to get individual nodes (headings, paragraphs, etc.)

3. The Re-parsing Process

When processing each block, the system:

  1. Extracts Original Content:

    const blockContent = originalMarkdown.substring(startOffset, endOffset);
    const markdownContent = blockContent.substring(contentStart, contentEnd).trim();
  2. Re-parses the Original Markdown:

    const parsed = unified()
      .use(remarkParse)
      .parse(markdownContent) as Node;
  3. Creates Proper AST Structure:

    • Instead of a single HTML node, you get individual nodes:
      • heading nodes for # Title
      • paragraph nodes for text content
      • image nodes for ![alt](url)
      • text nodes for plain text
      • etc.

4. Example Transformation

Before (Single HTML node):

{
  "type": "html",
  "value": "<img src=\"test.jpg\">\n# Test Block\nSome content here."
}

After (Proper AST structure):

{
  "type": "root",
  "children": [
    {
      "type": "html",
      "value": "<img src=\"test.jpg\">"
    },
    {
      "type": "heading",
      "depth": 1,
      "children": [
        {
          "type": "text",
          "value": "Test Block"
        }
      ]
    },
    {
      "type": "paragraph",
      "children": [
        {
          "type": "text",
          "value": "Some content here."
        }
      ]
    }
  ]
}

5. Benefits of This Approach

  • Proper AST Structure: Each block has individual nodes instead of monolithic HTML
  • Preserved Functionality: Media extraction, metadata, and title extraction all work correctly
  • Position-Based Extraction: Uses original markdown positions for accurate content extraction
  • Fallback Handling: If position extraction fails, falls back to node-based extraction
  • Backward Compatibility: All existing functionality is preserved

6. Final Result Structure

Each BlockExtract contains:

  • ast: Proper AST with individual nodes (headings, paragraphs, images, etc.)
  • title: Extracted from the proper AST structure
  • markdown: Stringified version of the proper AST
  • mediaItems: Correctly associated media items
  • metadata: Accurate metadata based on original content
  • All other existing fields

Installation

NPM (React/Vite/Node.js)

npm install @thds/markdown-block-extractor
import { parse } from "@thds/markdown-block-extractor";

Deno

import { parse } from "https://deno.land/x/[email protected]/src/index.ts";

Or using an import map in deno.json:

{
  "imports": {
    "@thds/markdown-block-extractor": "https://deno.land/x/[email protected]/src/index.ts"
  }
}
import { parse } from "@thds/markdown-block-extractor";

React Usage

import React, { useEffect, useState } from 'react';
import { parse, type ParseResult } from '@thds/markdown-block-extractor';

function MarkdownProcessor() {
  const [result, setResult] = useState<ParseResult | null>(null);
  
  useEffect(() => {
    const markdown = `<!-- block:id=my-block-123 -->
# My Block
![Image](https://example.com/image.jpg)
Some content here.
<!-- end-block:id=my-block-123 -->`;
    
    const parsed = parse(markdown);
    setResult(parsed);
  }, []);
  
  return (
    <div>
      {result?.blockExtracts.map(block => (
        <div key={block.id}>
          <h3>{block.title || `Block ${block.id}`}</h3>
          <p>Word count: {block.metadata.wordCount}</p>
          <p>Has images: {block.metadata.hasImages ? 'Yes' : 'No'}</p>
          <div dangerouslySetInnerHTML={{ __html: block.markdown }} />
        </div>
      ))}
    </div>
  );
}

Deno Edge Function Usage

import { parse } from "https://deno.land/x/[email protected]/src/index.ts";

Deno.serve(async (req: Request) => {
  if (req.method !== "POST") {
    return new Response("Method not allowed", { status: 405 });
  }

  try {
    const { markdown } = await req.json();
    const result = parse(markdown);
    
    return new Response(JSON.stringify(result), {
      headers: { "Content-Type": "application/json" }
    });
  } catch (error) {
    return new Response(
      JSON.stringify({ error: error.message }),
      { status: 500, headers: { "Content-Type": "application/json" } }
    );
  }
});

Deno Local Usage

import { parse } from "https://deno.land/x/[email protected]/src/index.ts";

const markdown = `<!-- block:id=my-block-123 -->
# My Block
![Image](https://example.com/image.jpg)
Some content here.
<!-- end-block:id=my-block-123 -->`;

const result = parse(markdown);
console.log(result.blockExtracts);

Usage

import { parse } from '@thds/markdown-block-extractor';

const markdown = `<!-- block:id=my-block-123 -->
# My Block
![Image](https://example.com/image.jpg)
Some content here.
<!-- end-block:id=my-block-123 -->`;

const result = parse(markdown);

console.log(result.blockExtracts);
// [
//   {
//     id: "my-block-123",
//     type: "block",
//     title: "My Block",
//     markdown: "# My Block\n![Image](https://example.com/image.jpg)\nSome content here.",
//     ast: {
//       type: "root",
//       children: [
//         {
//           type: "heading",
//           depth: 1,
//           children: [{ type: "text", value: "My Block" }]
//         },
//         {
//           type: "image",
//           url: "https://example.com/image.jpg",
//           alt: "Image"
//         },
//         {
//           type: "paragraph",
//           children: [{ type: "text", value: "Some content here." }]
//         }
//       ]
//     },
//     mediaItems: [
//       {
//         type: "image",
//         url: "https://example.com/image.jpg",
//         syntax: "markdown"
//       }
//     ],
//     metadata: {
//       lineCount: 3,
//       hasImages: true,
//       hasVideos: false,
//       hasCodeBlocks: false,
//       hasTables: false,
//       hasLists: false,
//       hasLinks: false,
//       wordCount: 4,
//       characterCount: 50
//     }
//   }
// ]

Block Syntax

The library recognizes two types of blocks with flexible ID support:

Regular Blocks

<!-- block:id=1 -->
Your content here
<!-- end-block:id=1 -->

<!-- block:id=my-block-123 -->
Your content here
<!-- end-block:id=my-block-123 -->

<!-- block:id=550e8400-e29b-41d4-a716-446655440000 -->
Your content here
<!-- end-block:id=550e8400-e29b-41d4-a716-446655440000 -->

Custom Blocks

<!-- custom-block:id=2 -->
Your content here
<!-- end-custom-block:id=2 -->

<!-- custom-block:id=special-block_123 -->
Your content here
<!-- end-custom-block:id=special-block_123 -->

Block ID Rules

  • Block IDs can be any non-empty string
  • Supports GUIDs, alphanumeric strings, and special characters
  • Must match exactly between start and end markers
  • Empty or whitespace-only IDs are treated as invalid

API Reference

parse(markdown: string): ParseResult

Parses markdown content and returns extracted blocks and media items.

Parameters:

  • markdown (string): The markdown content to parse

Returns:

  • ParseResult: Object containing:
    • blockExtracts: Array of extracted blocks
    • mediaItems: Array of all media items found
    • ast: The parsed AST tree

Types

BlockExtract

interface BlockExtract {
  id: string;
  type: 'block' | 'customBlock';
  title: string;
  markdown: string;
  ast: Node;
  mediaItems: MediaItem[];
  metadata: BlockMetadata;
  position?: Position;
}

MediaItem

interface MediaItem {
  type: 'image' | 'video';
  url: string;
  alt?: string;
  title?: string;
  blockId?: string;
  blockType?: string;
  position?: Position;
  syntax: 'markdown' | 'html';
}

BlockMetadata

interface BlockMetadata {
  lineCount: number;
  hasImages: boolean;
  hasVideos: boolean;
  hasCodeBlocks: boolean;
  hasTables: boolean;
  hasLists: boolean;
  hasLinks: boolean;
  wordCount: number;
  characterCount: number;
}

Development

Running the Example

# Node.js/NPM
npm run example
# or
npx tsx examples/example.ts

# Deno
npm run example:deno
# or
deno run --allow-read examples/deno-example.ts

Running Tests

# Node.js/NPM
npm test
# or
npm run test:run

# Deno
npm run test:deno
# or
deno test --allow-read --allow-net

Building

# Node.js/NPM builds
npm run build

# Deno build
npm run build:deno
# or
deno run --allow-read --allow-write npm:typescript@^5.9.2 --project tsconfig.deno.json

Releasing (npm + JSR)

# Interactive release: bumps package.json and deno.json, builds, validates, and publishes
npm run release

# Under the hood, you can also run each step manually:
npm run build
deno publish --dry-run
npm run publish:npm
npm run publish:jsr

Development Mode

# Node.js/NPM
npm run dev
# Runs the example in watch mode

# Deno
npm run dev:deno
# or
deno run --allow-read --watch examples/deno-example.ts

Project Structure

markdown-block-extractor/
├── src/                    # Source code
│   ├── index.ts           # Main library entry point
│   ├── types/             # TypeScript type definitions
│   │   └── index.ts
│   ├── utils/             # Utility functions
│   │   ├── index.ts
│   │   ├── block-utils.ts
│   │   ├── html-parser.ts
│   │   ├── media-utils.ts
│   │   ├── metadata-utils.ts
│   │   ├── node-utils.ts
│   │   └── uuid-utils.ts
│   └── plugins/           # Remark plugins
│       ├── remark-block-extractor.ts
│       ├── remark-custom-blocks.ts
│       ├── remark-media-extractor.ts
│       └── remark-orphan-content-wrapper.ts
├── dist/                  # Built output
│   ├── index.cjs.js       # CommonJS build
│   ├── index.es.js        # ES modules build
│   ├── index.umd.js       # UMD build
│   └── index.d.ts         # TypeScript definitions
├── dist-deno/             # Deno build output
├── tests/                 # Test files
│   └── test.spec.ts
├── examples/              # Example usage
│   ├── example.ts         # Node.js example
│   ├── deno-example.ts    # Deno example
│   └── deno-edge-function.ts # Deno edge function example
├── package.json           # NPM package configuration
├── deno.json              # Deno configuration
├── vite.config.ts         # Vite build configuration
├── tsconfig.json          # TypeScript configuration
├── tsconfig.deno.json     # Deno TypeScript configuration
└── README.md

Package Information

  • NPM Package: @thds/markdown-block-extractor
  • Version: 1.1.0
  • Author: THDS GmbH
  • License: CC-BY-NC-4.0
  • Node.js: >=16.0.0

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC-4.0).

Important: This license prohibits commercial use. You may use this library for personal, educational, or non-commercial projects, but commercial use requires explicit permission from the copyright holder.

For commercial licensing inquiries, please contact THDS GmbH.