npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@wordpress/block-serialization-default-parser

v4.56.0

Published

Block serialization specification parser for WordPress posts.

Downloads

186,554

Readme

Block Serialization Default Parser

This library contains the default block serialization parser implementations for WordPress documents. It provides native PHP and JavaScript parsers that implement the specification from @wordpress/block-serialization-spec-parser and which normally operates on the document stored in post_content.

Installation

Install the module

npm install @wordpress/block-serialization-default-parser --save

This package assumes that your code will run in an ES2015+ environment. If you're using an environment that has limited or no support for such language features and APIs, you should include the polyfill shipped in @wordpress/babel-preset-default in your code.

API

parse

Parser function, that converts input HTML into a block based structure.

Usage

Input post:

<!-- wp:columns {"columns":3} -->
<div class="wp-block-columns has-3-columns">
	<!-- wp:column -->
	<div class="wp-block-column">
		<!-- wp:paragraph -->
		<p>Left</p>
		<!-- /wp:paragraph -->
	</div>
	<!-- /wp:column -->

	<!-- wp:column -->
	<div class="wp-block-column">
		<!-- wp:paragraph -->
		<p><strong>Middle</strong></p>
		<!-- /wp:paragraph -->
	</div>
	<!-- /wp:column -->

	<!-- wp:column -->
	<div class="wp-block-column"></div>
	<!-- /wp:column -->
</div>
<!-- /wp:columns -->

Parsing code:

import { parse } from '@wordpress/block-serialization-default-parser';

parse( post ) ===
	[
		{
			blockName: 'core/columns',
			attrs: {
				columns: 3,
			},
			innerBlocks: [
				{
					blockName: 'core/column',
					attrs: null,
					innerBlocks: [
						{
							blockName: 'core/paragraph',
							attrs: null,
							innerBlocks: [],
							innerHTML: '\n<p>Left</p>\n',
						},
					],
					innerHTML: '\n<div class="wp-block-column"></div>\n',
				},
				{
					blockName: 'core/column',
					attrs: null,
					innerBlocks: [
						{
							blockName: 'core/paragraph',
							attrs: null,
							innerBlocks: [],
							innerHTML: '\n<p><strong>Middle</strong></p>\n',
						},
					],
					innerHTML: '\n<div class="wp-block-column"></div>\n',
				},
				{
					blockName: 'core/column',
					attrs: null,
					innerBlocks: [],
					innerHTML: '\n<div class="wp-block-column"></div>\n',
				},
			],
			innerHTML:
				'\n<div class="wp-block-columns has-3-columns">\n\n\n\n</div>\n',
		},
	];

Parameters

  • doc string: The HTML document to parse.

Returns

  • ParsedBlock[]: A block-based representation of the input HTML.

Theory

What is different about this one from the spec-parser?

This is a recursive-descent parser that scans linearly once through the input document. Instead of directly recursing it utilizes a trampoline mechanism to prevent stack overflow. It minimizes data copying and passing through the use of globals for tracking state through the parse. Between every token (a block comment delimiter) we can instrument the parser and intervene should we want to; for example we might put a hard limit on how long we can be parsing a document or provide additional debugging diagnostics for a document.

The spec parser is defined via a Parsing Expression Grammar (PEG) which answers many questions inherently that we must answer explicitly in this parser. The goal for this implementation is to match the characteristics of the PEG so that it can be directly swapped out and so that the only changes are better runtime performance and memory usage.

How does it work?

Every serialized Gutenberg document is nominally an HTML document which, in addition to normal HTML, may also contain specially designed HTML comments -- the block comment delimiters -- which separate and isolate the blocks serialized in the document.

This parser attempts to create a state-machine around the transitions triggered from those delimiters -- the "tokens" of the grammar. Every time we find one we should only be doing either of:

  • enter a new block;
  • exit out of a block.

Those actions have different effects depending on the context; for instance, when we exit a block we either need to add it to the output block list or we need to append it as the next innerBlock on the parent block below it in the block stack (the place where we track open blocks). The details are documented below.

The biggest challenge in this parser is making the right accounting of indices required to construct the innerHTML values for each block at every level of nesting depth. We take a simple approach:

  • Start each newly opened block with an empty innerHTML.
  • Whenever we push a first block into the innerBlocks list, add the content from where the content of the parent block started to where this inner block starts.
  • Whenever we push another block into the innerBlocks list, add the content from where the previous inner block ended to where this inner block starts.
  • When we close out an open block, add the content from where the last inner block ended to where the closing block delimiter starts.
  • If there are no inner blocks then we take the entire content between the opening and closing block comment delimiters as the innerHTML.

I meant, how does it perform?

This parser operates much faster than the generated parser from the specification. Because we know more about the parsing than the PEG does we can take advantage of several tricks to improve our speed and memory usage:

  • We only have one or two distinct tokens, depending on how you look at it, and they are all readily matched via a regular expression. Instead of parsing on a character-per-character basis we can allow the PCRE RegExp engine to skip over large swaths of the document for us in order to find those tokens.
  • Since preg_match() takes an offset parameter we can crawl through the input without passing copies of the input text on every step. We can track our position in the string and only pass a number instead.
  • Not copying all those strings means that we'll also skip many memory allocations.

Further, tokenizing with a RegExp brings an additional advantage. The parser generated by the PEG provides predictable performance characteristics in exchange for control over tokenization rules -- it doesn't allow us to define RegExp patterns in the rules so as to guard against e.g. cataclysmic backtracking that would break the PEG guarantees.

However, since our "token language" of the block comment delimiters is regular and can be trivially matched with RegExp patterns, we can do that here and then something magical happens: we jump out of PHP or JavaScript and into a highly-optimized RegExp engine written in C or C++ on the host system. We thereby leave the virtual machine and its overhead.

Contributing to this package

This is an individual package that's part of the Gutenberg project. The project is organized as a monorepo. It's made up of multiple self-contained software packages, each with a specific purpose. The packages in this monorepo are published to npm and used by WordPress as well as other software projects.

To find out more about contributing to this package or Gutenberg as a whole, please read the project's main contributor guide.