@datafire/link_fish

v5.0.0

Published

2 years ago

DataFire integration for link.fish API

Downloads

0High
0Medium
0Low

datafire

@datafire/link_fish

Client library for link.fish API

Installation and Usage

npm install --save @datafire/link_fish

let link_fish = require('@datafire/link_fish').create({
  username: "",
  password: ""
});

.then(data => {
  console.log(data);
});

Description

API to easily extract data from websites.

Base URL

All URLs referenced in the documentation have the following base:

https://api.link.fish

The REST API is only served over HTTPS. To ensure data privacy, unencrypted HTTP is not supported.

Authentication

HTTP requests to the REST API are protected with HTTP Basic authentication. You will use the email address of your link.fish account as the username and your API access token as the password for HTTP Basic authentication.

If you do not have an account yet, go to https://link.fish/api and create one first.

You will receive the API access token automatically via email after you signed up. To generate a new token and invalidate the current one log into your link.fish account at https://app.link.fish and go to: "Plugins" -> "API Dashboard"

There you can also see how many credits you used already.

Errors

The API uses standard HTTP status codes to indicate the success or failure of the API call. The body of the response will be JSON in the following format:

{

  "status": {HTTP STATUS CODE}
  "message": "{ERROR MESSAGE}"
}

Like for example when the authorization is not provided or wrong:

{

  "status": 401
  "message": "Unauthorized"
}

Request IDs

Each API request has an associated request identifier. You can find it in the response headers, under X-LF-Request-Id. In case you have problems please provide this identifier that we can help you as good and fast as possible.

Example:

X-LF-Request-Id: f7f0036f-5277-421a-b143-f7a151571d18

Item format

The data is by default deeply nested. So if it should be checked if there is an offer with a price, the whole tree has to be checked. To make that simpler, it is also possible to return the data "flat". If selected it will flatten the tree by copying all the data to the main level under a property with the name of its type and link the data internally.

Information: We created a node module which allows converting between the two formats. It did not get open sourced yet. If you are in need, simply contact us via [email protected].

Response Content Type

By default, all data gets returned as JSON. If the data should be returned as XML add the following header:

Accept: application/xml

Credits

Depending on the request made a different amount of credits get charged. How many which request costs can be found on the API pricing page. Additionally, does a header named "X-LF-Credits-Charged" get added to each successful response with information about the credits.

Example:

X-LF-Credits-Charged: 1 # Credits used for current requests
X-LF-Credits-Subscription-Max: 1000 # Total credits available in subscription
X-LF-Credits-Subscription-Used: 512 # Credits still left in current month

You can check anytime how many credits you did use already by logging into your link.fish account at https://app.link.fish and checking under: "Plugins" -> "API Dashboard"

If you have problems, questions or improvement advice please send us an email to [email protected]

Actions

Urls.apps.get

Visits the URL and checks if there are mobile apps on them and returns the found ones.

Will by default return the app identifiers and not the full URL to the apps. To return URLs instead set the parameter "return_urls" to true.

The URLs can also be created manually like this:

| Property | URL | | -------- | -------------------------------------------------- | | android | https://play.google.com/store/apps/details?id={ID} | | ios | https://itunes.apple.com/us/app/app-name/id{ID} |

Properties only get set when a value for it has been found. That means that if no app has been found only the property "url" will be set.

link_fish.Urls.apps.get({
  "url": ""
}, context)

Input

input object
- url required string: The URL of the website to query
- return_urls boolean: Returns app URLs instead of the identifiers
- browser_render boolean: If the page should be fully rendered with a browser to extract data. The request will then cost 5 credits instead of 1!

Output

output Apps

Urls.browser_data.get

Visits the URL with a full browser and extracts the data. This request costs 5 credits.

link_fish.Urls.browser_data.get({
  "url": ""
}, context)

Input

input object
- url required string: The URL of the website to query
- item_format string (values: normal, flat): If the items should be return "normal" with multiple levels or "flat" with just one level and linked instead.
- simplify_special_types boolean: Some types like "PropertyValue" do save key and value in separate properties which makes the data harder to process. If this option gets set it converts them automatically into the regular key -> value format.
- include_raw_html boolean: Returns additionally also the raw HTML as property "rawHtml".
- screenshot string (values: none, normal, full): If and what kind of screenshot should be returned. Do only request screenshot generation when really needed because it will increase the response time significantly.
- screenshot_width integer: The widh of the screenshot in pixel.
- screenshot_file_format string (values: png, jpg): The file format of the screenshot

Output

output UrlBrowser

Urls.browser_screenshot.get

Visits the URL with full browser and creates a screenshot. This request costs 5 credits.

link_fish.Urls.browser_screenshot.get({
  "url": ""
}, context)

Input

input object
- url required string: The URL of the website to create screenshot of
- type string (values: normal, full): What kind of screenshot should be returned. If it should be a regular 16:9 screenshot or one with the full page height
- file_format string (values: png, jpg): The file format of the screenshot
- width integer: The widh of the screenshot in pixel.

Output

Output schema unknown

Urls.data.get

Visits the URL and extracts the data.

link_fish.Urls.data.get({
  "url": ""
}, context)

Input

input object
- url required string: The URL of the website to query
- item_format string (values: normal, flat): If the items should be return "normal" with multiple levels or "flat" with just one level and linked instead.
- simplify_special_types boolean: Some types like "PropertyValue" do save key and value in separate properties which makes the data harder to process. If this option gets set it converts them automatically into the regular key -> value format.
- include_raw_html boolean: Returns additionally also the raw HTML as property "rawHtml".
- browser_render boolean: If the page should be fully rendered with a browser to extract data. The request will then cost 5 credits instead of 1!

Output

output Url

Urls.data_raw.get

Visits the URL and extracts the data.

link_fish.Urls.data_raw.get({
  "url": ""
}, context)

Input

input object
- url required string: The URL to get the data of

Output

output DataRaw

Urls.data_tabular.get

Visits the URL and extracts tabular data.

link_fish.Urls.data_tabular.get({
  "url": ""
}, context)

Input

input object
- url required string: The URL to get the data of
- selector string: CSS selector to define tabular data which should get returned
- browser_render boolean: If the page should be fully rendered with a browser to extract data. The request will then cost 5 credits instead of 1!

Output

output DataTabular

Urls.geo_coordinates.get

Visits the URL and checks if there are Geo Coordinates on them and returns the found ones.

Properties only get set when a value for both latitude and longitude have been found. That means that if no geo coordinates have been found only the property "url" will be set.

link_fish.Urls.geo_coordinates.get({
  "url": ""
}, context)

Input

input object
- url required string: The URL of the website to query
- browser_render boolean: If the page should be fully rendered with a browser to extract data. The request will then cost 5 credits instead of 1!

Output

output GeoCoordinates

Urls.social_media.get

Visits the URL and checks if there are any social media accounts and returns the found ones.

Will by default return the account identifiers and not the full URL to the profiles. To return URLs instead set the parameter "return_urls" to true.

The URLs can also be created manually like this:

| Property | URL | | --------------- | -------------------------------------- | | facebookPage | https://facebook.com/{ID} | | githubUser | https://github.com/{ID} | | googlePlus | https://plus.google.com/+{ID} | | instagram | https://instagram.com/{ID} | | linkedInCompany | https://linkedin.com/company/{ID} | | pinterest | https://pinterest.com/{ID} | | twitter | https://twitter.com/{ID} | | youTubeUser | https://youtube.com/user/{ID} |

Properties only get set when a value for it has been found. That means that if no social media account has been found only the property "url" will be set.

link_fish.Urls.social_media.get({
  "url": ""
}, context)

Input

input object
- url required string: The URL of the website to query
- return_urls boolean: Returns profile URLs instead of the profile names/ids
- browser_render boolean: If the page should be fully rendered with a browser to extract data. The request will then cost 5 credits instead of 1!

Output

output SocialMedia

Definitions

ApiResponsError

ApiResponsError object
- message required string: The error message.
- status required string: The HTTP status code for the error.

Apps

Apps object
- android required string: Android app
- ios required string: iOS app
- url required string: The url of the website. Can be different to requested one if there were redirects

DataRaw

DataRaw object
- data required object: The found data (can be an object or array)
- sourceFormat string: The format the source data was in. (json/xml)
- statusCode required string: The HTTP status code the URL returned
- url required string: The url of the website. Can be different to requested one if there were redirects

DataTabular

DataTabular object
- data required object
  - data array: The found tabular data on the website.
    - items array
      - items array
        items string
  - metadata object
- statusCode required string: The HTTP status code the URL returned
- url required string: The url of the website. Can be different to requested one if there were redirects

GeoCoordinates

GeoCoordinates object
- latitude required number: The latitude
- longitude required number: The longitude
- url required string: The url of the website. Can be different to requested one if there were redirects

SocialMedia

SocialMedia object
- facebookPage required string: The facebook page name
- githubUser required string: The Github user name
- linkedInCompany required string: The LinkedIn page name
- twitter required string: The Twitter handle
- url required string: The url of the website. Can be different to requested one if there were redirects

Url

Url object
- additionalData required object
  - locality object
    - country string: The recognized country of the website determined by TLD.
    - language string: The language of website. Recognized by header information if supplied or text analysis.
- favicon string: Url of website favicon
- items required array: The found data items on the website.
  - items object: The properties depend on the type of the item. The field "@type" exists for all.
    - @type string: The item type
- statusCode required string: The HTTP status code the URL returned
- title required string: The title of the page
- url required string: The url of the website. Can be different to requested one if there were redirects

UrlBrowser

UrlBrowser object
- additionalData required object
  - locality object
    - country string: The recognized country of the website determined by TLD.
    - language string: The language of website. Recognized by header information if supplied or text analysis.
- favicon string: Url of website favicon
- items required array: The found data items on the website.
  - items object: The properties depend on the type of the item. The field "@type" exists for all.
    - @type string: The item type
- screenshot string: Base64 encoded PNG screenshot of website (if generation got requested)
- statusCode required string: The HTTP status code the URL returned
- title required string: The title of the page
- url required string: The url of the website. Can be different to requested one if there were redirects

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@datafire/link_fish

Installation and Usage

Description

Base URL

Authentication

Errors

Request IDs

Item format

Response Content Type

Credits

Actions

Urls.apps.get

Input

Output

Urls.browser_data.get

Input

Output

Urls.browser_screenshot.get

Input

Output

Urls.data.get

Input

Output

Urls.data_raw.get

Input

Output

Urls.data_tabular.get

Input

Output

Urls.geo_coordinates.get

Input

Output

Urls.social_media.get

Input

Output

Definitions

ApiResponsError

Apps

DataRaw

DataTabular

GeoCoordinates

SocialMedia

Url

UrlBrowser