bluemango-scraper

v1.0.0

Published

3 years ago

Scraper === This library is a helper for creating a script that scrapes values of a page

0High
0Medium
0Low

Scraper

This library is a helper for creating a script that scrapes values of a page

var scraper = new Scraper({
  title: {
    type:'text',
    selector:'h1.the-title' // this is a jquery selector that will look for the first h1 with the class the-title
  }
})
return scraper.getResults() // {title:'hello world'}

The config

The config is an objects where where every key stands for a field that wil be returned in the results. the value of the config item can be a string, object or a function

###whitelistedDomain Only allow the library to work when the value matches the current domain. Value should be a regular expression of type string.

using an object

There are several types that you can select to configure your scraper via objects. the types you can choose from are listed below

text

Use text to extract text from a page

var scraper = new Scraper({
  title: {
    type:'text', // default value
    selector: '.title span',
    test: '/[0-9]*/g' // this wil test if the result only contains digits
  }
})

url

var scraper = new Scraper({
  clickUrl: {
    type:'url', // will return the current pageurl
    prefix: 'htttp://yourredirect.com/url=' (optional) use when you want to prefix your url,
    query: {myparam:''} // (optional) returns the url with only myparam appended to the url
  }
})

image

var scraper = new Scraper({
  imageUrl: {
    type:'image', // will return the src of an image
    selector: 'image#myImage'
  }
})

regex

var scraper = new Scraper({
  title: {
    type:'regex',
    selector: '.title span',
    test: '/€([0-9]*)/g' // wil return the first regex group
  }
})

template

var scraper = new Scraper({
  title: {
    type:'template',
    template: 'hello {{name}}' // will return the value of name
  },
  name: {
    type: 'text',
    selector: '.profile .name'
  }
})

dictionary

var scraper = new Scraper({
  custom1:{
    type:'dictionary',
    selector:'#yourdealCompareBlock > div > div > img',
    dictionary:{
      'VODAFONE': '/vodafone/g', // result is VODAFOME when vodafone is found in the selector text
      'TELFORT': '/telfort/g', // result is TELFORT when telfort is found in the selector text
      'T-MOBILE': '/tmobile/g', // result is T-MOBILE when tmobile is found in the selector text
      'TELE2': '/tele2/g',
      'BEN': '/ben/g',
      'KPN': '/kpn/g',
      'HI': '/hi/g'
    }
  },
})

using a string

Just a short hand for a selector with the type text (see example below)

var scraper1 = new Scraper({
  title: h1.the-title
})

var scraper2 = new Scraper({
  title: {
    type:'text',
    selector:'h1.the-title' // this is a jquery selector that will look for the first h1 with the class the-title
  }
})

using an function

var scraper = new Scraper({
  number1: {type:'template', template:'1'}
  number2: {type:'template', template:'3'}
  sum: function(scraper){
    var n1 = Number(scraper.getField('number1'))
    var n2 = Number(scraper.getField('number2'))
    return n1 + n2 // = 4
  }
})

Default fields

by default the scraper only returns the following fields ['id', 'available', 'title', 'imageUrl', 'clickUrl', 'category', 'basket', 'description', 'priceNormal', 'priceDiscount', 'logoUrl', 'stickerText', 'custom1', 'custom2', 'custom3', 'custom4']