url-reference
v0.9.1
Published
URLReference class
Maintainers
Keywords
Readme
URLReference
NB. This package has been obsoleted by spec-url which now exposes the same URLReference API. Please use the spec-url package instead.
"URL or Relative Reference"
The URLReference class is designed to overcome shortcomings of the URL class.
Features
- Supports Relative and scheme-less URLs.
- Supports Nullable Components.
- Distinct Rebase, Normalize and Resolve methods.
- Resolve is Behaviourally Equivalent with the WHATWG URL Standard.
Examples
new URLReference ('filename.txt#top', '//host') .href
// => '//host/filename.txt#top'
new URLReference ('?do=something', './path/to/resource?do=nothing') .href
// => './path/to/resource?do=something'
new URLReference ('take/action.html') .resolve ('http://🌲') .href
// => 'http://xn--vh8h/take/action.html'Summary
The module exports a single class URLReference with nullable properties (getters/setters):
scheme,username,password,hostname,port,pathname,pathroot,driveletter,filename,query,fragment.
It has three key methods:
rebase,normalizeandresolve.
It can be converted to an ASCII, or to a Unicode string via:
- the
hrefgetter and thetoStringmethod.
Terminology
The WHATWG URL standard uses the phrase "special URL" for URLs that have a special scheme.
A scheme is a special scheme if it is equivalent to http, https, ws, wss, ftp or file.
The path of an URL may either be hierarchical, or opaque:
An hierarchical path is subdivided into path components, an opaque path is not.
The path of a "special URL" is always considered to be hierarchical.
The path of a non-special URL is opaque unless the URL has an authority or if its path starts with a path-root /.
URLReference API
Constructor
new URLReference ()new URLReference (input)new URLReference (input, base)
Constructs a new URLReference object. The result may represent a relative URL. The resolve method can be used to ensure that the result represents an absolute URL.
Arguments input and base are optional. Each may be a string to be parsed, or an existing URLReference object. If a base argument is supplied, then input is rebased onto base after parsing.
Parsing behaviour
The parsing behaviour adapts to the scheme of input or the scheme of base otherwise:
The invalid
\code-points before the host and in the path are converted to/if the input has a special scheme or if it has no scheme at all.Windows drive letters are detected if the scheme is equivalent to
fileor if no scheme is present at all. If no scheme is present and a windows drive letter is detected then then the scheme is implicitly set tofile.
The hostname is always parsed as an opaque hostname string. Parsing and validating a hostname as a domain is done by the resolve method instead.
Examples:
const r1 = new URLReference ();
// r.href == '' // The 'empty relative URL'
const r2 = new URLReference ('/big/trees/');
// r.href == '/big/trees/'
const r3 = new URLReference ('index.html', '/big/trees/');
// r.href == '/big/trees/index.html'
const r4 = new URLReference ('README.md', r3);
// r.href == '/big/trees/README.md'Parsing Behaviour Examples:
const r1 = new URLReference ('\\foo\\bar', 'http:')
// r1.href == 'http:/foo/bar'
const r2 = new URLReference ('\\foo\\bar', 'ofp:/')
// r2.href == 'ofp:/\\foo\\bar'
const r3 = new URLReference ('/c:/path/to/file')
// r3.href == 'file:/c:/path/to/file'
// r3.hostname == null
// r3.driveletter == 'c:'
const r4 = new URLReference ('/c:/path/to/file', 'http:')
// r4.href == 'http:/c:/path/to/file'
// r4.hostname == null
// r4.driveletter == null
Rebase
Rebase – urlReference .rebase (base)
The base argument may be a string or a URLReference object. Rebase returns a new URLReference instance. It throws an error if the base argument reprensents an URL with an opaque path (unless urlReference consists of a fragment identifier only).
Rebase implements a slight generalisation of reference transformation as defined in RFC3986 URI. In our case the base argument is allowed to be a relative reference, in addition to an absolute URL.
Rebase applies a non-strict reference transformation to URLReferences that have a "special scheme" and a strict reference transformation in all other cases:
- The RFC3986 (URI) standard defines a strict and a non-strict variant of reference transformation. The non-strict variant ignores the scheme of the input if it is equivalent to the scheme of the base. The WHATWG uses the non-strict behaviour for "special" URLs and the strict behaviour for other URLs.
Note: The non-strict WHATWG behaviour has a surprising consequene. An URLReference that has a special scheme may still "behave as a relative URL".
Example — non-strict behaviour:
const base = new URLReference ('http://host/dir/')
const rel = new URLReference ('http:?do=something')
const rebased = rel.rebase (base)
// rebased.href == 'http://host/dir/?do=something'Example — strict behaviour:
Rebase applies a "strict" reference transformation to non-special URLReferences. The strict variant does not remove the scheme from the input:
const base = new URLReference ('ofp://host/dir/')
const abs = new URLReference ('ofp:?do=something')
const rebased = abs.rebase (base)
// rebased.href == 'ofp:?do=something'Example — opaque path behaviour:
It is not possible to rebase a relative URLReference on a base that has an opaque path.
const base = new URLReference ('ofp:this/is/an/opaque-path/')
const rel = new URLReference ('filename.txt')
// const rebased = rel.rebase (base) // throws:
// TypeError: Cannot rebase <filename.txt> onto <ofp:this/is/an/opaque-path/>
const base2 = new URLReference ('ofp:/not/an/opaque-path/')
const rebased = rel.rebase (base2) // This works as expected
// rebased.href == 'ofp:/not/an/opaque-path/filename.txt'Normalize
Normalize – urlReference .normalize ()
Normalize collapses dotted segments in the path, removes default ports and percent encodes certain code-points. It behaves in the same way as the WHATWG URL constructor, except for the fact that it supports relative URLs. It does not interpret hostnames as a domain, this is done in the resolve method instead. Normalize always returns a new URLReference instance.
Resolve
Resolve
urlReference .resolve ()urlReference .resolve (base)
The optional base argument may be a string or an existing URLReference object.
Resolve returns a new URLReference that represents an absolute URL.
It throws an error if this is not possible.
Resolve does additional processing and checks on the authority:
- Asserts that file-URLs and web-URLs have an authority.
- Asserts that the authority of web-URLs is not empty.
- Asserts that file-URLs do not have a username, password or port.
- Parses opaque hostnames of file-URLs and web-URLs as a domain or an IPv4-address.
Resolve uses the same forceful error correcting behaviour as the WHATWG URL constructor.
Note: An unpleasant aspect of the WHATWG behaviour is that if the input is a non-file special URL, and the input has no authority, then the first non-empty path component will be coerced to an authority:
const r1 = new URLReference ('http:/foo/bar')
// r.host == null
// r.pathname == '/foo/bar'
const r2 = r1.resolve ('http://host/')
// The scheme of r1 is ignored because it matches the base.
// Thus the hostname is taken from the base.
// r2.href == 'http://host/foo/bar'
const r3 = r1.resolve ()
// r1 does not have an authority, so the first non-empty path
// component `foo` is coerced into an authority for the result.
// r1.href == 'http://foo/bar'String – urlReference .toString ()
Converts the URLReference to a string. This preserves unicode characters in the URL, unlike the href getter which ensures that the result consists of ASCII code-points only.
new URLReference ('take/action.html') .resolve ('http://🌲') .toString ()
// => 'http://🌲/take/action.html'
new URLReference ('take/action.html') .resolve ('http://🌲') .href
// => 'http://xn--vh8h/take/action.html'Properties
Access to the components of the URLReference goes through the following getters/setters. All properties are nullable, however some invariants are maintained.
schemeusernamepasswordhostnameportpathname
driveletterpathrootfilename
queryfragment
The properties driveletter, pathroot and filename do not use the idiomatic
camelCase style. This is is done to remain consistent with existing property
names of the WHATWG URL class, such as pathname and hostname.
Licence
MIT Licenced.
