webidl-scraper

v0.0.4

Published

4 years ago

Scrape IDL definitions from Web standard specs

0High
0Medium
0Low

motiz88

webidl

webidl-scraper

Scrape IDL definitions from Web standard specs

Installation

Download node at nodejs.org and install it, if you haven't already.

npm install -g webidl-scraper

Usage

  webidl-scraper [options] <inputs: file | URL | "-" ...> (use - for stdin)

  Scrape IDL definitions from Web standard specs.

  Options:

    -h, --help                output usage information
    -V, --version             output the version number
    -o, --output-file <file>  output the scraped IDL to <file> (use - for stdout, the default)
    --with-class-extract      do not ignore <pre class="idl extract" />
    --with-data-no-idl        do not ignore <pre data-no-idl />
    --with-idl-index          do not ignore IDL after id="idl-index"

Examples

Scrape a Web page for IDL fragments:

webidl-scraper https://html.spec.whatwg.org/
# Output to stdout

webidl-scraper http://dev.w3.org/csswg/cssom/ -o cssom.idl
# Save to cssom.idl

curl -sL http://dev.w3.org/csswg/cssom/ | webidl-scraper - > cssom.idl 
# Use curl for HTTP and redirect stdout to cssom.idl

Scrape an HTML file for IDL fragments:

webidl-scraper html5-spec.html -o html5-spec.idl

Scraping algorithm

These steps are derived experimentally and may change. I have tried to include links to sources and/or motivating examples for all the rules.

Get the contents of <pre class="idl" />, tags, excluding class="idl extract" (reference #1, #2).
If the document has an IDL Index section (example) - marked by an element with id="idl-index" - ignore IDL fragments that follow, on the assumption that they will contain no new IDL.
Also ignore tags that have the data-no-idl attribute (following Bikeshed).

Tests

npm install
npm test

  Scraper CLI
    fixtures/html/*.html
      cssom-with-class-extract.html [--with-class-extract]
        √ should match cssom-with-class-extract.idl (111ms)
      cssom-with-idl-index.html [--with-idl-index]
        √ should match cssom-with-idl-index.idl
      cssom.html
        √ should match cssom.idl
      dom-with-data-no-idl.html [--with-data-no-idl]
        √ should match dom-with-data-no-idl.idl
      dom.html
        √ should match dom.idl
      html5.html
        √ should match html5.idl (553ms)
      noidl.html
        √ should match noidl.idl
    with input type
      URL
        √ should complete without errors
      glob pattern (test/**/cs*.html)
        √ should complete without errors
      file name (test/fixtures/html/cssom.html)
        √ should complete without errors
      stdin (-)
        √ should complete without errors
    with -o <file>
      √ should create the file
  scraper-core
    fixtures/html/*.html
      cssom-with-class-extract.html + options/cssom-with-class-extract.json
        √ should match cssom-with-class-extract.idl
      cssom-with-idl-index.html + options/cssom-with-idl-index.json
        √ should match cssom-with-idl-index.idl
      cssom.html
        √ should match cssom.idl
      dom-with-data-no-idl.html + options/dom-with-data-no-idl.json
        √ should match dom-with-data-no-idl.idl
      dom.html
        √ should match dom.idl
      html5.html
        √ should match html5.idl (566ms)
      noidl.html
        √ should match noidl.idl
  19 passing (2s)

Dependencies

commander: the complete solution for node.js command-line programs
glob: a little globber
htmlparser2: Fast & forgiving HTML/XML/RSS parser
request: Simplified HTTP request client.
rx: Library for composing asynchronous and event-based operations in JavaScript

Dev Dependencies

chai: BDD/TDD assertion library for node.js and the browser. Test framework agnostic.
find-port: find an unused port in your localhost
mocha: simple, flexible, fun test framework
node-static: simple, compliant file streaming module for node
rx-node: RxJS Bindings for Node.js and io.js
temp: Temporary files and directories

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

webidl-scraper

Installation

Usage

Examples

Scraping algorithm

Tests

Dependencies

Dev Dependencies

License