wikitext
v0.0.4
Published
Parse wikitext into an Abstract Syntax Tree
Readme
Wikitext
A zero-dependency wikitext parser for JavaScript.
Description
This library allows you to parse wikitext to extract values or manipulate the contents. It turns raw MediaWiki-style wikitext into a structured, machine-friendly representation you can explore, transform, lint, or post-process however you want.
This project is actively evolving. Although the API is expected to remain stable and focus mostly in finding edge-cases or supporting other wikitext elements, breaking changes may occur until 1.0.
Features
- Parses common wikitext constructs, like headings, links, and templates.
- Produces a clean, navigable AST.
- Zero dependencies.
- Written in TypeScript.
Design
The resulting AST is inspired in the HTMLElement structure. All elements (except TextNodes) can have children for their values; some structures, such as TemplateNodes, can have multiple groups of children: one for its name and other for its parameters.
All nodes implement their own toString method to turn the AST back
into wikitext. You can modify any of the nodes’ children and get the
resulting wikitext back.
At the current version, this library does not support some complex wikitext elements such as wikitables and HTML tags that are usually supported in wikitext.
This project does not try to be a resilient parser and will throw an error when it finds wikitext it can’t understand.
Installation
npm install wikitext
yarn add wikitextUsage
import { parse } from 'wikitext';
function main() {
const source = `{{Item Infobox
| name = Item Name
| price = {{Price|200}}
}}
'''Item Name''' is an item in [[game]]. `;
const page = parse(source);
const price = page.findTemplate(/price/i)
if (!price) return;
price.set(1, '300');
console.log(`${page}`)
}Calling the previous function will print in the console:
{{Item Infobox
| name = Item Name
| price = {{Price|300}}
}}
'''Item Name''' is an item in [[game]].Paranoid mode
Parsing wikitext can be prone to errors due to ambiguous tokens. Due to
this, by default the parse function runs in “paranoid mode”, which
will stringify the AST to compare it back to the input text, and will
throw an error if it does not match. This step can be unnecessary for
most cases, and you can disable it by passing an options object to the
function:
parse(input, { paranoid: false })This isn’t an exhaustive list of cases where the parser can fail, but some examples are:
- Pages with unbalanced tokens such as
{{or[[. - Pages with parameters like
{{{1}}}, which should not show in regular pages but makes this parser unsuitable to parse template code.
Use cases
- Format pages.
- Extract information from pages, e.g. collecting values in specific templates’ parameters.
- Updating information in pages, e.g. templates’ parameters.
