craw

v1.0.0

Published

4 years ago

a website-crawler library for nodejs

0High
0Medium
0Low

sammwy

crawler web-scrap spider

CRAW

a website-crawler library for nodejs

Documentation

Documentation of the library in a summarized and precise way.

Usage

const craw = require('craw');

async function start () {
  const result = await craw("https://2lstudios.dev/");
  console.log(result.toJSON());
}

start();

result.getContent()

Get the content of the website as headers, paragraphs, paragraphs and all the text in general.
Output:

{
  text: "....", // String
  h1: [], // Array
  h2: []. // Array
  h3: [], // Array
  h4: [], // Array
  h5: [], // Array
  h6: [], // Array
  words: [] // Array
}

result.getFrames()

Get a list with iframes from the website.
Output:

[...]  // Array

result.getImports()

Get a list of imports from the website. (like css, favicon and js)
Output:

{
  scripts: [ // Array
    {
      integrity: "...", // String
      src: "...", // String
      async: ... // Boolean
    }
  ],

  styles: [ // Array
    {
      integrity: "...", // String
      href: "...", // String
      rel: "..." // String
    }
  ],
  
  favicon: {
    type: "...", // String
    href: "..." // String 
  }
}

result.getLinks()

Get a list of hyperlinks from the website.
Output:

[ // Array
  {
    url: "...", // String
    anchor: "...", // String
    rel: [ ... ] // Array of Strings
  }
]

result.getMedia()

Get a list of multimedia elements from the website. (Like images, audios and videos)
Output:

{
  audios: [ // Array
    {
      src: "...", // String
      type: "..." // String
    }
  ],
  images: [ // Array
    {
      src: "...", // String
      alt: "...", // String
      loading: "..." // String
    }
  ],
  videos: [ ... ] // Array of strings
}

result.getMeta()

Get a list of metadata tags from the website.
Output:

{
  author: "...", // String
  viewport: "...", // String
  robots: "...", // String
  description: "...", // String
  keywords: [], // Array of strings
  image: "...", // String (Favicon)
  charset: "...", // String
  ... any other metadata tag like OG or Twitter ...
}

result.getTitle()

Get the title of the website.
Output:

"..." // String

result.toJSON()

Run all functions and add the results of each one in the same object.