clean-html-js
v1.3.16
Published
extract reading material from a url
Maintainers
Readme
clean-html-js
clean html content for reading. simply pass in your content as html and get a readability object
Installation Instructions
$ yarn add clean-html-jsExample

import cleanHtml from "clean-html-js";
const url = "https://www.a11ywatch.com";
async function grabReaderData() {
const source = await fetch(url);
const html = await source.text();
const readabilityArticle = await cleanHtml(html, url);
}
async function grabReaderDataSimple() {
const readabilityArticle = await cleanHtml("", url);
}
grabReaderData().then((data) => {
console.log(data);
});
// or just the url
grabReaderDataSimple().then((data) => {
console.log(data);
});- For more help getting started checkout Example
Available Params
| param | default | type | description | | --------- | ------- | ------ | -------------------------------------------------------------------- | | html | "" | string | Required: html string to parse | | sourceUrl | "" | string | Optional: url of the html source to prevent fetching extra resources | | config | {} | Config | Optional: config object |
If html is not provided and sourceUrl is found an attempt to fetch the html is done.
Config
merges with config
| prop | default | type | description | | ----------- | ------- | ---------------- | ------------------------------------------------- | | allowedTags | null | array of strings | html elements allowed note:(svgs must be inlined) | | nonTextTags | null | array of strings | html elements that should not be treated as text |
Testing
to test custom pages pass in your params seperated by commas into the jest test example yarn jest '-params=mozilla,https://www.mozilla.com' or yarn jest '-params=a11ywatch,https://www.a11ywatch.com'. First param is the html file being pulled from the examples folder and the second is an optional uri for the resources.
npm test
