@zamanehmedia/gutenberg-block-parser

v2.1.1

Published

2 months ago

Parse Wordpress blocks, enforce content schemas, reliable receive consistent content, and enjoy the full design flexibility of a truly headless CMS, all with Wordpress.

0High
0Medium
0Low

article19

droo

wordpress wordpress-development wordpress-headless gutenberg gutenberg-blocks headless-cms headless-wordpress

Gutenberg Block Parser

✅ Fully parse Wordpress blocks, including innerHTML content
✅ Parse custom blocks
✅ Enforce sanitation schemas
✅ Supports most core blocks
✅ Typescript support

Turn Wordpress block data from this:

<!-- wp:paragraph -->
<p>This is a paragraph</p>
<!-- /wp:paragraph -->

<!-- wp:image {"id":58,"sizeSlug":"large","linkDestination":"none"} -->
<figure class="wp-block-image size-large">
  <img
    src="/wp-content/uploads/2022/05/nice-flower-1337198953S0C-1024x678.jpg"
    alt="a photo of a purple flower"
    class="wp-image-58"
  />
  <figcaption class="wp-element-caption">image caption</figcaption>
</figure>
<!-- /wp:image -->

Into this:

[
  {
    "__typeName": "BlockCoreParagraph",
    "attrs": {},
    "blockName": "core/paragraph",
    "content": "This is a paragraph",
    "innerHTML": "<p>This is a paragraph</p>",
    "innerContent": ["<p>This is a paragraph</p>"]
  },
  {
    "__typeName": "BlockCoreImage",
    "attrs": {
      "id": 58,
      "linkDestination": "none",
      "sizeSlug": "large"
    },
    "blockName": "core/image",
    "src": "/wp-content/uploads/2022/05/nice-flower-1337198953S0C-1024x678.jpg",
    "alt": "a photo of a purple flower",
    "caption": "image caption",
    "className": ["wp-image-58"],
    "innerHTML": "<figure class=\"wp-block-image size-large\"><img src=\"/wp-content/uploads/2022/05/nice-flower-1337198953S0C-1024x678.jpg\" alt=\"\" class=\"wp-image-58\" /><figcaption class=\"wp-element-caption\">image caption</figcaption></figure>",
    "innerContent": [
      "<figure class=\"wp-block-image size-large\"><img src=\"/wp-content/uploads/2022/05/nice-flower-1337198953S0C-1024x678.jpg\" alt=\"\" class=\"wp-image-58\" /><figcaption class=\"wp-element-caption\">image caption</figcaption></figure>"
    ]
  }
]

So you can use the data like this:

{#each data.post.blocks}
  {#if block.blockName === 'core/image'}
    // In addition to the attributes returned by the Block Serialization Default Parser,
    // you also have access to the parsed content of the of the core/image block.
    <figure>
      <img src={block.src} alt={block.alt} />
      {#if block.caption}
        <figcaption>{block.caption}</figcaption>
      {/if}
    <figure/>
  {/if}

  {#if block.blockName === 'core/paragraph'}
    // For text properties, the HTML sanitation will allow for the default WP inline elements
    <p>{@html block.content}</p>
  {/if}
{/each}

Note: If the parsing of the block failed, the properties will be undefined.

Installation

npm install @zamanehmedia/gutenberg-block-parser

Wordpress compatibility

The current version has been tested with Wordpress 6.2. If you find that some blocks are not parsing successfully with older versions of Wordpress, you may consider writing a custom parser. We will attempt to keep the package up-to-date with any changes to the Wordpress block formats. Please submit your issues on Github.

Basic Usage

With the WP REST API

import { parsePost } from "gutenberg-block-parser";

const posts = await wpRestAuthenticatedFetch(
  `/wp-json/wp/v2/posts/${params.id}?context=edit`
).then((post) => {
  return parsePost(post);
});

With WPGraphQL

import { parseRawContent } from "gutenberg-block-parser";

const query = `
  query fetchPost($id:ID!) {
    post(id: $id, idType: DATABASE_ID) {
      id
      databaseId
      title
      content(format: RAW)
    }
  }
`;
const posts = await wpGraphqlAuthenticatedFetch(query, { id: params.id }).then(
  (post) => {
    return {
      ...post,
      blocks: await parseRawContent(post.content),
    };
  }
);

API

All functions are asynchronous to allow for async operations in custom parsers.

| Function | Description | | ---------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | parsePosts(posts: WP_REST_API_Post[], options?: Options) | Takes an array of WP Posts in the WP REST Post format, returns array with blocks added to each post | | parsePost(post: WP_REST_API_Post, options?: Options) | Takes a WP Post in the WP REST Post format, returns object with added blocks | | parseRawContent(rawContent: string, options?: Options) | Takes a string with WP block data and returns an array with parsed content | | parseBlocks(blocks: ParsedBlock[], options?: Options) | For advanced usage: Takes array from WP's Block Serialization Default Parser and runs it through the innerHTML parsers. | | parseBlock(block: ParsedBlock, options?: Options) | For advanced usage: parseBlocks, but for a single block. |

Options

| Option | Default | Description | | ------------------------ | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | | parsers | undefined | An object with custom parsers to use. See more on Usage with Custom Blocks. | | sanitationSchema | undefined | An object with a custom sanitation schema to use. See more on Usage with Custom Sanitation Schemas. | | stripNullishBlocks | true | Wordpress passes along empty blocks that only contain a couple of \n. We strip them off by default, but here you can specify you want to keep them | | failOnMissingParser | false | By default we log an error if we can't find a parser. Pass true to throw an error instead. | | failOnParserError | false | By default we log an error if something goes wrong in the parser. Pass true to throw an error instead. | | failOnMissingBlockName | false | By default we log an error if a blockName is missing. Pass true to throw an error instead. | | failOnAnyError | false | Pass true to throw errors on all scenarios above. |

Usage with Custom Parsers

Custom Blocks

If you are using custom blocks you can pass along a custom parser to extract any content from the template.

We recommend using hast and its util packages for this. See any of the core parsers for an example on how to go about this.

To get you started, here is a basic vanilla JS example:

import { fromHtml } from "hast-util-from-html";
import { select } from "hast-util-select";
import { toString } from "hast-util-to-string";

/*
  @param {ParsedBlock} parsedBlock - result from the WP block seralisation parser with the block comments parsed, including any attributes
  @param {Options} options - options passed down from the top level API methods
*/
export async function parseCustomCallToAction(parsedBlock, options) {
  const root = fromHtml(block.innerHTML);
  const heading = select("h3", root);
  const link = select("a", root);

  return {
    ...parsedBlock,
    heading: toString(heading),
    href: link.properties.href,
    label: toString(link),
  };
}

import { parsePost } from "gutenberg-block-parser";
import { parseCustomCallToAction } from "custom-parsers/call-to-action/call-to-action";

const rawContentWithCustomBlock = `
  <!-- wp:custom/call-to-action -->
  <h3>Do something nice</h3>
  <a href="https://example.com">Go to this page</a>
  <!-- /wp:custom/call-to-action -->
`;

const post = {
  id: 1,
  content: {
    raw: rawContentWithCustomBlock,
  },
};

const parsers = {
  "custom/call-to-action": parseCustomCallToAction,
};

const postWithParsedBlocks = await parsePost(post, { parsers });

Override Core Parsers

You can also override or extend core parsers. In the example below we are pulling media data from our cache for the image after running the block through the default core parser.

import { parseCoreImage } from "gutenberg-block-parser";

export async function parseCoreImageWithMediaData(parsedBlock, options) {
  // run the block through the package core parser so we don't have to repeat the HTML parsing
  const parsedCoreBlock = await parseCoreImage(parsedBlock);

  return {
    ...parsedCoreBlock,
    mediaData: await loadMediaDataForImageId(parsedBlock.attrs.id),
  };
}

Usage with Custom Sanitation Schemas

By default the core parsers strip out any HTML that is not a default Wordpress inline element. You can find the defaults here.

You can pass a custom schema in the options on a global or block level. The schema API can be found in the hast-util-sanitize documentation.

const sanitationSchema = {
  // for all parsers, allow only em, strong and a tags
  global: {
    tagNames: ["em", "strong", "a"],
  },
  // make exceptions on a block level by passing a schema per sanitized property
  "core/pullquote": {
    content: {
      tagNames: ["em", "strong"],
    },
    citation: {
      tagNames: ["a"],
    },
  },
};

const blocks = await parsePosts(posts, { sanitationSchema });

Usage with Typescript

Type narrowing

We add the __typeName property to each block in order to enable type narrowing in your block loop.

Eg, in Sveltekit:

{#if block.__typeName === 'BlockCoreParagraph'}
	<Paragraph {block} />
{/if}

{#if block.__typeName === 'BlockCoreImage'}
	<Image {block} />
{/if}

Custom Parsers

In order to type any custom blocks, create a new union with the Block type, and pass it as a generic into the API methods. The type narrowing should now work with your custom block typing. Make sure you add the __typeName property to your custom block type.

import {
  type GutenbergParsedBlock,
  type Block,
  type BaseBlock,
  parsePosts,
} from "gutenberg-block-parser";

type BlockCustomAd = BaseBlock & {
  __typeName: "BlockCustomAd";
  blockName: "custom/ad";
};

type CustomBlock = Block | BlockCustomAd;

const parsers = {
  "custom/ad": async (block: GutenbergParsedBlock): BlockCustomAd => {
    return {
      ...block,
      __typeName: "BlockCustomAd",
      blockName: "custom/ad",
      innerBlocks: [],
      innerContent: [],
    };
  },
};

const postWithParsedBlocks = await parsePosts<CustomBlock>(posts, {
  parsers,
});

Supported Core Blocks

| Name | __typeName | Supported? | | ---------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | | Classic | | 🚫 Not formatted as gutenberg block | | Text | | Code | BlockCoreCode | ✅ | | Heading | BlockCoreHeading | ✅ | | List | BlockCoreList | ✅ | | Paragraph | BlockCoreParagraph | ✅ | | Preformatted | BlockCorePreformatted | ✅ | | Pullquote | BlockCorePullquote | ✅ | | Quote | BlockCoreQuote | ✅ | | Verse | BlockCoreVerse | ✅ | | Media | | Audio | BlockCoreAudio | ✅ | | Cover | BlockCoreCover | ✅ | | File | BlockCoreFile | ✅ | | Gallery | BlockCoreGallery | ✅ | | Image | BlockCoreImage | ✅ | | Media & Text | BlockCoreMediaText | ✅ | | Video | BlockCoreVideo | ✅ | | Design | | Buttons & Button | BlockCoreButtons & BlockCoreButton | ✅ | | Columns & Column | BlockCoreColumns & BlockCoreColumn | ✅ | | Group | BlockCoreGroup | ✅ | | More | BlockCoreMore | ✅ | | Page Break (nextpage) | BlockCorePageBreak | ✅ | | Separator | BlockCoreSeparator | ✅ | | Spacer | BlockCoreSpacer | ✅ | | Row | | ✅ Through group block | | Stack | | ✅ Through group block | | Widgets | | Archives | BlockCoreArchives | ✅ | | Calendar | BlockCoreCalendar | ✅ | | Categories (post-terms) | BlockCorePostTerms | ✅ | | Custom HTML | BlockCoreHtml | ✅ | | Latest Comments | BlockCoreLatestComments | ✅ | | Latest Posts | BlockCoreLatestPosts | ✅ | | Page List | BlockCorePageList | ✅ | | RSS | BlockCoreRSS | ✅ | | Search | BlockCoreSearch | ✅ | | Shortcode* | BlockCoreShortcode | ✅ | | Social Links & Social Link | BlockCoreSocialLinks & BlockCoreSocialLink | ✅ | | Tag Cloud | BlockCoreTagCloud | ✅ | | Theme | | We don't support any of the theme blocks | | | Embeds | | | Embed | BlockCoreEmbed | ✅ | | Amazon Kindle | | ✅ Through embed block | | Animoto | | ✅ Through embed block | | Cloudup | | ✅ Through embed block | | Crowdsignal | | ✅ Through embed block | | Dailymotion | | ✅ Through embed block | | Flickr | | ✅ Through embed block | | Imgur | | ✅ Through embed block | | Issuu | | ✅ Through embed block | | Kickstarter | | ✅ Through embed block | | Mixcloud | | ✅ Through embed block | | Pinterest | | ✅ Through embed block | | Reddit | | ✅ Through embed block | | ReverbNation | | ✅ Through embed block | | Screencast | | ✅ Through embed block | | Scribd | | ✅ Through embed block | | Slideshare | | ✅ Through embed block | | SmugMug | | ✅ Through embed block | | Soundcloud | | ✅ Through embed block | | Speaker Deck | | ✅ Through embed block | | Spotify | | ✅ Through embed block | | TED | | ✅ Through embed block | | TikTok | | ✅ Through embed block | | Tumblr | | ✅ Through embed block | | Twitter | | ✅ Through embed block | | VideoPress | | ✅ Through embed block | | Vimeo | | ✅ Through embed block | | Wolfram | | ✅ Through embed block | | Wordpress | | ✅ Through embed block | | Wordpress.tv | | ✅ Through embed block | | Youtube | | ✅ Through embed block |

*Zamaneh Media advises against using shortcodes from scraped sites, as this would necessitate replicating WordPress's backend functionality. If the use of internal shortcodes is essential, only specific shortcodes from trusted sites should be allowlisted.