@mtcute/html-parser
v0.29.5
Published
HTML entities parser for mtcute
Readme
@mtcute/html-parser
HTML entities parser for mtcute
NOTE: The
htmlvariant uses HTML-like whitespace collapsing, which is incompatible with Bot API HTML. Usethtmlfor Bot API-compatible whitespace handling.Please read Syntax below for a detailed explanation
Features
- Supports all entities that Telegram supports
- Supports nested entities
- Proper newline/whitespace handling (just like in real HTML)
- Interpolation!
Usage
This package exports two tagged template functions: html and thtml.
html - HTML-like whitespace
Whitespace is collapsed just like in real HTML: newlines and consecutive spaces become a single space.
Use <br> for line breaks and for multiple spaces.
import { html } from '@mtcute/html-parser'
tg.sendText(
'me',
html`
Hello, <b>me</b>! Updates from the feed:<br>
${await getUpdatesFromFeed()}
`
)
// text: "Hello, me! Updates from the feed:\n..."thtml - preserved whitespace
Whitespace (spaces and newlines) is kept as-is, Bot API style. Common leading indentation is automatically stripped (dedented), so it's safe to use in indented code.
import { thtml } from '@mtcute/html-parser'
tg.sendText(
'me',
thtml`
Hello, <b>me</b>!
Updates from the feed:
${await getUpdatesFromFeed()}
`
)
// text: "Hello, me!\nUpdates from the feed:\n..."Both functions also have .escape() and .unparse() static methods.
thtml.unparse() preserves whitespace in the output (no <br> / conversion).
Syntax
@mtcute/html-parser uses htmlparser2 under the hood, so the parser
supports nearly any HTML. However, since the text is still processed in a custom way for Telegram, the supported subset
of features is documented below:
Line breaks and spaces (html)
When using html, line breaks are not preserved, <br> is used instead,
making the syntax very close to the one used when building web pages.
Multiple spaces and indents are collapsed (except in pre), when you do need multiple spaces use instead.
When using thtml, whitespace is preserved as-is and no collapsing is performed.
Inline entities
Inline entities are entities that are in-line with other text. We support these entities:
| Name | Code | Result (visual) |
| ---------------- | ---------------------------------------------------------------- | ---------------------------- |
| Bold | <b>text</b>, <strong>text</strong> | text |
| Italic | <i>text</i>, <em>text</em> | text |
| Underline | <u>text</u>, <ins>text</ins> | text |
| Strikethrough | <s>text</s>, <del>text</del>, <strike>text</strike> | ~~text~~ |
| Spoiler | <spoiler>text</spoiler>, <tg-spoiler>, <span class="tg-spoiler"> | N/A |
| Monospace (code) | <code>text</code> | text |
| Text link | <a href="https://google.com">Google</a> | Google |
| Text mention | <a href="tg://user?id=1234567">Name</a> | N/A |
| Custom emoji | <emoji id="12345">😄</emoji> (or <tg-emoji emoji-id="...">) | N/A |
| Date-time | <tg-time unix="1647531900" format="t">22:45</tg-time> | N/A |
Note: It is up to the client to look up user's input entity by ID for text mentions. In most cases, you can only use IDs of users that were seen by the client while using given storage.
Alternatively, you can explicitly provide access hash like this:
<a href="tg://user?id=1234567&hash=abc">Name</a>, whereabcis user's access hash written as a hexadecimal integer. Order of the parameters does matter, i.e.tg://user?hash=abc&id=1234567will not be processed as expected.
Date-time entities
Date-time entities display a unix timestamp formatted according to the user's locale.
Two tag syntaxes are supported:
<tg-time unix="1647531900" format="t">22:45</tg-time>
<time datetime="2022-03-17T22:45:00" format="t">22:45</time>The format attribute is optional and must match r|w?[dD]?[tT]?:
| Char | Meaning |
|------|---------|
| r | Relative time (cannot combine with others) |
| w | Day of the week |
| d | Short date (e.g. "17.03.22") |
| D | Long date (e.g. "March 17, 2022") |
| t | Short time (e.g. "22:45") |
| T | Long time (e.g. "22:45:00") |
When omitted, the underlying text is displayed as-is, but the user can still see the date in their local format.
Block entities
The only block entity that Telegram supports are <pre> and <blockquote>, therefore it is the only tags we support too.
<pre>
Optionally, language for <pre> block can be specified in two ways:
<!-- mtcute style -->
<pre language="typescript">export type Foo = 42</pre>
<!-- Bot API style -->
<pre><code class="language-typescript">export type Foo = 42</code></pre>| Code | Result (visual) | | ----------------------------------------------------------------------------------- | ---------------------------- | | <pre>multiline\ntext</pre> | multilinetext | | <pre language="javascript"> export default 42</pre> | export default 42 |
<blockquote>
<blockquote> can be "expandable", in which case clients will only render the first three lines of the blockquote,
and the rest will only be shown when the user clicks on the blockquote.
<blockquote expandable>
This is a blockquote that will be collapsed by default.<br/>
Lorem ipsum dolor sit amet, consectetur adipiscing elit.<br/>
Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.<br/>
This text is not shown until the blockquote is expanded.
</blockquote>Nested and overlapped entities
HTML is a nested language, and so is this parser. It does support nested entities, but overlapped entities will not work as expected!
Overlapping entities are supported in unparse(), though.
| Code | Result (visual) |
|---------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| <b>Welcome back, <i>User</i>!</b> | Welcome back, User! |
| <b>bold <i>and</b> italic</i> | bold and italic⚠️ word "italic" is not actually italic! |
| <b>bold <i>and</i></b><i> italic</i>⚠️ this is how unparse() handles overlapping entities | bold and italic |
Interpolation
Both html and thtml support interpolation as tagged template literals.
You can interpolate one of the following:
string- will not be parsed, and appended to plain text as-is- In case you want the string to be parsed, use
htmlas a simple function: html`... ${html('bold')} ...`
- In case you want the string to be parsed, use
number- will be converted to string and appended to plain text as-isTextWithEntitiesorMessageEntity- will add the text and its entities to the output. This is the type returned byhtmlitself:const bold = html`**bold**` const text = html`Hello, ${bold}!`- falsy value (i.e.
null,undefined,false) - will be ignored
Note that because of interpolation, you almost never need to think about escaping anything, since the values are not even parsed as HTML, and are appended to the output as-is.
