npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

destructure-html

v2.1.6

Published

destructure-html simplifies HTML element deconstruction and data extraction, making it effortless to extract desired information from complex HTML structures with its intuitive syntax and powerful features.

Downloads

16

Readme

The destructure-html is a lightweight package that simplifies HTML deconstruction and data extraction, making it easy to extract information & elements from complex HTML structures.

Install the package:

npm i destructure-html

This package

  • was created to extract relevant information seamlessly from scraped data
  • enables destructuring data which is in the form of html
  • constructs data in any form from raw html

New features will be consistently updated and released on a regular basis.


🏃 Quick Start

CommonJS

// commonjs require statement
const dsh = require('destructure-html')


// scraped data from netflix
const htmlData = `
<div class="lolomoRow ltr-0" data-context="genre">42479280414AECBB...
<div class="lolomoRow ltr-0" data-context="continueWatching"><h2 class="rowHeader"...
<div class="lolomoRow  ltr-0" data-context="trendingNow"><h2 class="rowHeader ltr-0"...
`;


// This will return an array of src values which may containt images or other important data from the html content
const getHtmlText = dsh.grabSrcValues(htmlData)
console.log(getHtmlText);



// output: [
//  'https://occ-0-1947-2164.1.nflxso.net/dnm/api/v6/6gmvu2hxdfnQ55LZZjyzYR4kzGk/AAAABaJ71EC0meuaQJkcwU3H1IVx-9PSbCQ-1vzPySh7k3264YotnvQ9lQmPQP_S_cb95GRP9lUkJsTlkmGcIpqXspMai9q5C_2Mq-k.jpg?r=183',
  .
  .
  .
//  'https://occ-0-1947-2164.1.nflxso.net/dnm/api/v6/6gmvu2hxdfnQ55LZZjyzYR4kzGk/AAAABeo26eQTyK5t9xceCCE86N3JsqgZ2eCMMsHxyBzGx8UTvD8-aHTe6EAtYMbn5R4gfMWLRNbUhOZZljpBjZ8zTIiPJjt3L-3TWyKv-5fSvooKuS0sLg0v0oT9--ay1HFx3MU3.jpg?r=438' ]

ModernJS

// modernjs import statement
import { getContentByUniqueText } from 'destructure-html'


// scraped data from netflix
const exampleHtmlData = `
<div class="lolomoRow ltr-0" data-context="genre">42479280414AECBB...
<div class="lolomoRow ltr-0" data-context="continueWatching"><h2 class="rowHeader"...
<div class="lolomoRow  ltr-0" data-context="trendingNow"><h2 class="rowHeader ltr-0"...
`;


// This will return the whole html content from the starting of the tag with a unique text
// like an unique class or other attribute that only the div contains in the whole page
const htmlTag = getContentByUniqueText(html, "continueWatching")
console.log(htmlTag);



// output: <div class="lolomoRow ltr-0" data-context="continueWatching"><h2 
// class="title">Continue Watching for Aditya</div><div class="aro - row 
// - header more - visible"><div><di ... div></div></div></div></div>

CDN package

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">


    <!-- The package can be imported via CDN links as well -->
    <script src="https://unpkg.com/[email protected]/lib/es5/index.js"></script>


    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    I don't know what I'm doing with my life.
</body>
</html>

✂️ What to use When && When to use What

To establish a clearer relationship with the table below, let's consider the following example data.

Example data (htmlData)

const htmlData =

<div class="gray">
    <p>Some text</p>
    <div class="blue" id="blue-div">
      <div>More text</div>
      <a href="https://lorem-ipsum.com/browse/69">
        <img src="https://placeholder.com/first.png" alt="" />
      </a>
    </div>
    <div class="blue">
      <div>Some More text</div>
      <a href="https://lorem-ipsum.com/browse/420">
        <img src="https://placeholder.com/second.png" alt="" />
      </a>    
    </div>
</div>
<div>
    <p>Another paragraph</p>
</div>

| Functions | Parameter(s) | Parameter Example | Output | Takes | Returns | | --- | --- | --- | --- | --- | --- | | grabHrefValues() | html (string) | grabHrefValues(htmlData); | [ 'https://lorem-ipsum.com/browse/69', 'https://lorem-ipsum.com/browse/420' ] | Accepts the HTML data as a parameter. | Returns an array of all href values found in the provided input. | | grabSrcValues() | html (string) | grabSrcValues(htmlData); | [ 'https://placeholder.com/first.png', 'https://placeholder.com/second.png' ] | Accepts the HTML data as a parameter. | Returns an array of all src values found in the provided input. | | findNestedTexts() | html (string) | const htmlText = findNestedTexts(exampleHtmlData); | [ 'Some text', 'More text', 'Some More text', 'Another paragraph' ] | Accepts the complete HTML data as a parameter. | Returns an array containing all the text found at different locations within the HTML data. | | getContentById() | html (string), id (string) | const htmlContent = getContentById(exampleHtmlData, "blue-div"); | <div class="blue" id="blue-div"><div>More text</div><a href="https://lorem-ipsum.com/browse/69">![alt](https//placeholder.com/first.png)</a></div>| Important: The ID should be a unique identifier present only within this element. Accepts the HTML data and a unique ID as parameters. | Returns the entire HTML content of the specified element, including its tags and inner content, which can be used to extract text or other relevant data later. | | findTagById() | htmlData,uniqueId | findTagById(exampleHtmlData, "gray"); | <div id="gray"> | Accepts the HTML data and a unique text identifier present within the HTML tag as parameters. | Returns the complete opening tag of the HTML element matching the specified ID, without its content or closing tag. | | findTagByClass() | htmlData,className | const htmlTag = findTagByClass(exampleHtmlData, "blue"); | 2 | Accepts the HTML data and a class name used for styling as parameters. | If there is a single HTML tag with the provided class name, it returns a string containing the entire HTML tag similar to the findTagById() function. If there are multiple HTML tags with the same class, it returns the total count of occurrences. |
| getContentBetweenTags() | htmlData,openingTag | const htmlContent = getContentBetweenTags(exampleHtmlData, `<div class="gray">`); | <div class="gray"> <p>Some text</p> <div class="blue"> <div>More text</div> </div> <div class="blue"> <div>Some More text</div> </div></div> | Accepts the HTML data and the complete opening tag of a div element (obtained from either findTagById() or findTagByClass()) as parameters. | Returns all the HTML content starting from the specified opening tag, including all content within until the closing tag. |


🙌 Contributing

Contributions to destructure-html are welcome and encouraged! To contribute to the project, follow these steps:

  1. Fork the repository and clone it to your local machine.
  2. Set up your development environment.
  3. Make changes or add new features to the codebase.
  4. Write tests to ensure the code behaves as expected.
  5. Commit your changes and push them to your forked repository.
  6. Submit a pull request with a clear description of the changes you made and their purpose.
  7. Your pull request will be reviewed by the maintainers, and any necessary feedback will be provided.
  8. Once your changes pass the review process, they will be merged into the main repository.

By contributing to destructure-html, you help improve the package and make it more robust for everyone to use.


📲 Contact me

If you have any questions, feedback, or need support with destructure-html, you can reach out to me through the following channels:

GitHub Issues: https://github.com/adxxtya/destructure-html/issues

I am always ready to assist you and appreciate any feedback or suggestions you may have.