hyper-digraph

v0.1.2

Published

3 years ago

Decentralised directed graphs based on hypercore

0High
0Medium
0Low

hdegroote

digraph directed graph decentralised digraph decentralised graph hyperbee hypercore

Hyper DiGraph

Decentralised directed graphs based on hypercore, using hyperbees under the hood.

Warning: alfa status--breaking changes possible until v1 release.

Note: this package is not concerned with requesting the required hyperbees on the network. If a bee is not found, the diGraph will error out when trying to access nodes in that bee. Support for better handling such scenarios is pending.

Install

npm add hyper-digraph

Usage

See example.js

API

const diGraph = await DiGraph.createNew({name: 'My digraph', hyperInterface, NodeClass? })

Creates a new diGraph, with an empty and childless root node.

You can optionally specify your own NodeClass (a subclass of Node). All methods returning Node objects will instead return objects of this class. This can be useful for defining additional validation logic on node content, or for defining more-fine-grained getters than the generic node.getContent().

const diGraph = await DiGraph.loadFromBee({ hash: hypercoreKey, hyperInterface, NodeClass? })

Loads a diGraph from an existing hyperbee

const root = diGraph.rootNode

Get the root node

await diGraph.createIndex()

Creates an index with all referenced bees, so a reader can ensure access to all required hyperbees before accessing the diGraph.

await diGraph.getReferencedBees()

Returns a list of all bees referenced by this diGraph. Note that this returns the bees referenced at the time the index was last created, so it is not guaranteed to be up to date.

await diGraph.deepValidate()

Checks if all nodes are available, and that they are indeed valid nodes. Note that this is an expensive operation (it requests all nodes of the graph), which is not needed if you just want to consume a graph: the validity of each individual node is always checked when first accessing it.

const allNodesIterator = diGraph.yieldAllNodesOnce()

Async iterator which efficiently yields all nodes of the diGraph once, without guaranteeing any order. This method currently throws an error if a node is not found (for example if it is stored in a bee to which you currently don't have access).

You should have all referencedBees available before calling this method, either by having them locally or by being connected to a peer.

const depthFirstIterator = diGraph.walkDepthFirst()

Async iterator to walk the diGraph depth-first. Note that this algorithm will enter in an infinite loop if the diGraph contains cycles.

const stepper = diGraph.recStepper()

High-level algorithm which can be used to efficiently traverse the digraph in an order of choice, by implementing your own recursive algorithm on top of it. It yields {node, childYielders} pairs, where childYielders contains a function for each child which, if awaited, yields that child node and its own childYielders.

See diGraph.walkDepthFirst() for an example of how to implement an algorithm on top of diGraph.recStepper().

Node API

await node.getContent()

Gets the content of the node

await node.setContent(content)

Sets the content of the node (can by any JSON)

await node.getChildren()

Returns a list of child nodes.

await node.pushChild({location, hash?, version?})

Appends a child to the node. The child should exist at location 'location' in the hyperbee with key 'hash'. If hash is not specified, the location is assumed to be in the same hyperbee as where the parent node lives. If version is specified, that particular version is referenced. If not, it references the latest version.

Note: it is not checked whether the child actually exists at the specified location when pushing it.

node.invalidateCache()

Invalidates the cache, so that next time a property is accessed, the entry will be loaded from storage again.

Data Model

Nodes are represented as entries of a hyperbee. The key (str) is their location, and the value is JSON which must contain an entry for keys 'children' and 'content' (and may contain arbitrary other keys as well).

Children must be a list of child-objects, where each child object must have a location (str), and may have a hash (hypercore hexadecimal hash) and a version (int).

A child without hash is interpreted as belonging to the same hyperbee as the parent.

A child without version is interpreted as meaning the latest version. Note that the latest version can change over time.

The presence of a child in a node implies the existence of a directed edge from the node to the child.

Graphs in Scope

Any directed graph with a single root node can be represented by this module, including trees, graphs with cycles, and graphs with multiple paths to a node.

A single root is assumed because it is easier to reason about. You can transform any graph with multiple root nodes to a graph with a single root by creating a new node with as children all previous roots.

Note on Algorithmic Efficiency

The main bottleneck when traversing a decentralised digraph is the time spent fetching remote entries.

When using diGraph.recStepper() to create your own walking algorithm, care should be taken not to wait unnecessarily on node-resolution before requesting the next.

When order is unimportant, diGraph.yieldAllNodesOnce() is efficient in the sense that child-entries are always queried asynchronously, even before they are awaited.

Suppose graph

    ROOT
    |   \
    N1-->N2
  / | \    \
N3  N4 N5   N6

Suppose all entries exist on different hyperbees, and that an entry is obtained in 100ms.

Ignoring processing time:

The algorithm will first query the root.
After 100ms it yields the ROOT, and knows the location of N1 and N2.
After 200ms it yields N1 and N2, and knows then location of N3-N6.
After 300ms it yields N3-N6 (it skips N2 because it was already handled).
Each additional level adds another 100ms.

Had this graph been walked by awaiting node-per-node, it would have taken 700ms (without repetitions) or 900ms (with repetitions).

Pending Work

Pending work is to facilitate interactions with the network (e.g. hyperswarm), by exposing an event which triggers when a certain hyperbee is needed and not yet available.

This is necessary to make this module more robust. Currently, before walking a diGraph, we need to ensure access to all referenced bees. Therefore, we have to rely on the writer of the diGraph to have set an up-to-date index. Furthermore, when nodes were added without specifying a version, their children might change and suddenly reference children in bees which are not contained in the index (and might no longer be referencing previously indexed bees).

This will be solved by waiting on unavailable entries (rather than throwing an error) and emitting an event indicating their unavailability.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme