@atlaskit/editor-bitbucket-transformer
v9.6.4
Published
Editor Bitbucket transformer
Readme
@atlaskit/editor-bitbucket-transformer
Editor Bitbucket transformer for converting between ADF, Markdown, and HTML formats.
Description
This package provides transformation utilities specifically designed for Bitbucket's editor integration. It handles the complex pipeline of converting between ADF (Atlassian Document Format), Markdown, and HTML, with special focus on caption escaping and attribute preservation through the transformation process.
Key Features
Transformers
- Serializer: Main serialization utilities for ADF to Markdown conversion
- Table Serializer: Specialized table transformation handling
- Utility Functions: Common transformation utilities
Caption Escaping Pipeline
- HTML Attribute Escaping: Safely escapes captions for Markdown storage
- Unified Escaping Strategy: Handles both HTML meta characters and Markdown punctuation
- Round-trip Safety: Ensures data integrity through ADF → Markdown → HTML → ADF pipeline
Examples
The package includes comprehensive examples in the examples/ directory:
- Basic transformer example
- Bitbucket HTML handling
- Bitbucket Markdown processing
- Helper utilities and styling
Team
Editor: Collaboration
Caption escaping and the Markdown/HTML/ADF pipeline
This package serializes ADF to Markdown for storage, and later reconstructs ADF from HTML that is rendered by the backend using python-markdown. Image captions are stored in Markdown using python-markdown’s attr_list syntax on the image:
{: data-layout='center' data-caption='...'}Why escaping captions is tricky
- python-markdown parses Markdown first, then applies
attr_listto bind attributes to elements. If caption text insidedata-captioncontains Markdown markers (e.g.**,__,`,~~), those markers can be interpreted during Markdown parsing. - When Markdown markers are interpreted within the attribute text, the attribute list can be malformed or ejected, resulting in the attributes not being set on the element (e.g. losing
data-caption).
Unified attribute escaping
To ensure captions survive the Markdown → HTML transformation safely, we use a unified escaping strategy:
escapeHtmlAttribute(text): Escapes both HTML attribute meta characters (&,<,>,",') and Markdown/attr_list-sensitive punctuation by converting them into numeric entities:*→*,_→_,`→`,~→~|→|,{→{,}→}[→[,]→],(→(,)→),!→!
This prevents python-markdown from interpreting the caption content as Markdown or as part of an attr_list, preserving the attribute safely.
unescapeHtmlAttribute(text): Decodes the same superset of entities when reading back from HTML into ADF. Decoding is order-sensitive—&is decoded last to prevent double-unescape issues. After unescaping, we sanitize and parse a limited subset of Markdown formatting for display (**,_,~~, and`) into safe HTML tags we control.
End-to-end flow
- ADF → Markdown (bbc-frontbucket - using this package):
- Captions are serialized into
data-captionusingescapeHtmlAttributeto encode HTML meta characters and Markdown punctuation.
- Captions are serialized into
- Markdown → HTML (bbc-core):
- python-markdown takes markdown renders HTML and applies
attr_list. - Because punctuation was entity-encoded, the attributes remain intact even if the caption contains Markdown markers or
{...}-like text.
- python-markdown takes markdown renders HTML and applies
- HTML → ADF (frontend):
- We read
data-captionfrom the html and callunescapeHtmlAttribute, which decodes both HTML and punctuation entities. - We then sanitize and parse a safe subset of Markdown to generate ADF caption nodes (e.g.,
<strong>,<em>,<s>,<code>).
- We read
Migration
escapeAttrListValuehas been removed. UseescapeHtmlAttributeandunescapeHtmlAttribute.
Security considerations
- We always encode
<and>in attributes and, when parsing back from HTML, we sanitize caption content before any Markdown formatting is converted to HTML. - Our Markdown parsing is intentionally minimal and maps directly to a small set of safe tags.
