bricabrac-sfmodules

v0.22.3

Published

4 months ago

A collection of (sometimes not-so) small-ish utilities

Downloads

168

0High
0Medium
0Low

loveencounterflow

Bric-A-Brac Standard Brics

A collection of (sometimes not-so) small-ish utilities

Table of Contents generated with DocToc

DBric Database Adapter

can use node:sqlite ('nSQL') or better-sqlite3 ('bSQL') as implementation class for the Dbric::db property, but since there are subtle (and not so subtle) differences in behavior and capabilities, it will probably be best to choose one of the two and stick to it.
- generally recommended to use bSQL (better-sqlite3) as it has stricter error handling (e.g. bSQL will complain when a statement without required parameters is executed, nSQL has been found to silently assume null for missing parameters).
- subtle differences in the interpretation of options for UDF aggregate functions
handlers invoked for lifetime stages:
- on_create()
- on_prepared()
- on_fresh()
- on_populated()
consider to apply Gaps and Islands empty select statements for referential integrity (ESSFRIs) (i.e. select * from my_view where false;) to newly created views
- alternatively, implement a method to do zero-row selects from all relations / only views; call it always on instantiation
- create a method to do system-defined and user-defined health checks on DB
generate insert statements
implement the optiona to generate triggers to be called before each insert, thus enabling error messages that quote the offending row; this could be enabled by registering a function with a suitable known name, such as trigger_on_before_insert()

Overwriting / Overriding / Shadowing Behavior

pending; probably to be handled by life cycle methods

API

Class property `plugins`

the class property plugins defines the so-called 'acquisition chain', which is the sequence of objects that are visited during the construction of a Dbric instance
neutral values are null, an empty list, [ 'prototypes', ], [ 'me', ] and [ 'prototypes', 'me', ]
in addition to the optional entries prototypes and me, which indicate the relative positioning of the instance's prototype chain and the instance itself, suitable objects that act as Dbric plugins may be placed into the

Using Ersatz `super()` in Plugin `methods`

functions defined in the exports.methods member of a plugin object will become methods of the database adapter instance and can be used like regular (non-plugin) methods
a hitch with defining methods on a further unspecified object is that JavaScript limits the use of super() to instance methods
for this reason, instead of using super() in your plugin methods, use the 'Ersatz Super Device' @_super().
@_super() expects needs to be told the name of the upstream method to be called. Typically, within a plugin method frob(), when you want to call the original version of that method, you'd use @_super 'frob', followed by whatever are deemed the appropriate arguments. Other than that, any valid API name can be used when calling @_super().
In any event, @_super method_name will refer to the instance's method named method_name, not another plugin's method having the same name. Just as with named statements, methods coming later in the plugin chain will replace values of the same name that have been provided by a plugin coming earlier in the chain.

`Dbric_classprop_absorber::_get_acquisition_chain()` (private)

returns a list of objects { type, contributor, } that the capabilities of the Dbric instance will be based on
uses class property plugins, q.v.
order in list follows logic of Object.assign() (i.e. later entries shadow earlier ones)
always omitted from the list are Object.getPrototypeOf {}, Object.getPrototypeOf Object, Dbric_classprop_absorber and Dbric, since these never contribute to instance capabilities

To Do

[—] DBric: parameterized views as in DBay parametrized-views.demo
[—] adapter for recutils?
- https://www.gnu.org/software/recutils/manual/recutils.html
- https://news.ycombinator.com/item?id=46265811
check that setting prefix is valid both in JS and SQL contexts when used to form unescaped identifiers as in (JS) object.$prefix_property and (SQL) create table $prefix_name
[—] rename class properties:
- build -> build_statements
- statements -> runtime_statements

allow functions for the entire @build property or any of its elements (but not both, functions may not transitively return functions) as well as for the other class properties whose names start with one of scalar_udf_, table_udf_, aggregate_udf_, window_udf_, virtual_table_udf_; these functions will be called in the context of the instance and thus allow to use values that are only known at runtime
allow single string for build, can be segmented

Examples for class definitions; notice use of symbolic $PREFIX

class My_db extends Dbric_std
  @build: [
    SQL"create table words ( w text );",
    SQL"insert into words ( w ) values ( 'first' );",
    ]
  @create_statement_$PREFIX_insert_word: SQL"insert into words ( w ) values ( $w );"`
  @create_statement_$PREFIX_select_word: SQL"select w as word from words where w regexp $pattern;"`
  @create_scalar_udf_$PREFIX_square:
    deterministic:      true
    value:              ( n ) -> n * n
  @create_table_udf_$PREFIX_letters_of:
    deterministic:      true
    value:              ( word ) -> yield chr for chr in Array.from word
  @create_aggregate_udf_your_name_here:     ...
  @create_window_udf_your_name_here:        ...
  @create_virtual_table_udf_your_name_here: ...

Values can be represented as functions:

class My_db extends Dbric_std
  @build: -> [
    SQL"create table words ( w text );",
    SQL"insert into words ( w ) values ( #{LIT @cfg.first_word} );",
    ]
  @create_statement_$PREFIX_insert_word: ->
    ...
    return SQL"insert into words ( w ) values ( $w );"`
  @create_statement_$PREFIX_select_word: SQL"select w as word from words where w regexp $pattern;"`
  @create_scalar_udf_$PREFIX_square: ->
    value = if whatever then ( ( n ) -> n * n ) else ( ( n ) -> n ** 2 )
    return { value, }
  @create_table_udf_$PREFIX_letters_of:
    deterministic:      true
    value:              ( word ) -> yield chr for chr in Array.from word
  @create_aggregate_udf_your_name_here:     ...
  @create_window_udf_your_name_here:        ...
  @create_virtual_table_udf_your_name_here: ...

[—] what about transitive plugins, i.e. plugins as declared by a base class?
[—] implement life cycle methods to be called at various points during instantiation; might use these to handle name clashes

Won't Do

[+] abandoned prefix schema altogether because implementation effort appears to be unbalanced with realistically assumed benefits; implemented parts remain in code for the time being.

InterMission: Tables and Methods to Handle Integer Intervals

Ranges / Integer Intervals

Points: The smallest entity of a hoard, represented as an integer; integers between 0x00_0000 and 0x10_ffff can alternatively be represented as their corresponding Unicode character (i.e. a string that will yield an array of length one when passed into Array.from string; it may have string.length == 2 notwithstanding, which is an unfortunate legacy of JavaScript being based on UTF-16 code units, not Unicode 32bit code points).
Bounds:
- In the case of a Run instance, its defining lowest and highest points.
- In the case of a Scatter instance, its minimal and maximal points found among the los and his of its constituent runs, if any.
- In the case of a Hoard instance, its first and last points, which demarcate the maximal extent of any runs it contains via its constituent scatters. The default has first: 0x00_0000, last: 0x10_ffff so as to accommodate all Unicode code points, although in an actual implementation one may want to set first: 0x00_0001 so as to avoid '\x00' for compatibility with traditional C string processing.
Run: A span of consecutive integers i defined by two numbers lo and hi such that lo <= hi and lo <= i <= hi; empty intervals are not representable, nor are non-contiguous runs possible; for single-integer runs lo == i == hi holds.
Scatter: A set of Runs that is associated with a shared set of data. For example, the Unicode codepoints that are Latin letters and that are representable with a single UTF-8 byte is commonly seen in the regular expression /[A-Za-z]/ which in our model can be represented as a scatter [ (0x41,0x5a), (0x61,0x7a), ] (i.e. [ ('A','Z'), ('a','z'), ]).
- The canonical ordering of a scatter is by ascending lo bounds, then by ascending hi bounds for runs that share lo points.
- A normalized (a.k.a. 'simplified') scatter has no overlapping and no directly adjacent runs and is sorted by ascending bounds; thus, [ (3,7), (0,5), ] and [ (0,5), (6,7), ] both get normalized to [ (0,7), ], but [ (0,5), (7,7), ] is already normal as it is sorted and the gap at (6,6) prevents further simplification.
- A crucial functionality of our model is the ability to build non-normalized scatters which can at some later point get normalized to a lesser or greater degree depending on their associated data.
Scatter of a single Universal Inclusion Run (0x00_0000..0x10_ffff); this sets data ugc:Cn (Unassigned, 'a reserved unassigned code point or a noncharacter') as a baseline for all other, more specific data; it also reflects the absolute boundaries of the universe of discourse. Among other things, this enables set algebra with finite complementary sets such as 'the set of all codepoints that are not classified as ugc:Lu (uppercase Letters)'
Multiple scatters of multiple universal Exclusion / Gap / Hole Runs
Hoard (Horde?) or Layered Scalar Property Set: An unordered collection of any number of positive and negative scatters plus one neutral scatter.
- The neutral scatter s0 defines the universe of discourse (universal set, G. Grundmenge; its bounds s0.lo, s0.hi define the lowest r.lo and the highest r.hi that any normalized run r can have). Since the single neutral scatter must axiomatically be contiguous, it can be represented by a single run whose sole properties r0.lo, r0.hi can then be implemented as properties of the SPS.
- Hits: The positive or inclusive scatters determine which integers are part of the universal set i.e. which are considers 'hits' with respect to one ore more scatters; the (one or more) scatters that are hit by a given element (point, scalar, integer) determine(s) which properties that element will be associated with.
- Gaps: the negative or exclusive scatters determine which integers are not part of the 'set of interest', i.e. which are considered 'holes' or 'gaps'. Gaps take precedence over hits in the sense that any element that is found in both positive and negative scatters is considered a gap and not a hit.
- Layers are selectable sets of hit scatters so element properties can be selected by purpose (ex. different virtual / composite fonts for different printing styles).
  - To this end, hit layers can add 'tags' which are arbitrary data items that can be selected by way of on API. The universal set and gap scatters could likewise have tags but the present intention is to probably disallow those and at any rate disregards any tags on non-hit scatters when selecting.

Implementation of Hoards in SQL:

The below implementation notes assume use of hoard data structures specifically for the purpose of classifying Unicode codepoints (i.e. practically speaking glyphs) and determine associated properties; given a codepoint (CP), such data structures should be able to answer questions like: is this CP a CJK ideograph? Is it part of IICORE (i.e. moderately frequently used / not rare)? Assuming bold gothic (or handwritten, or running text) typesetting context, which font should be used and what typographic tweaks should be applied to the outline when rendering it?

The boundaries represented by the universal set can probably best be hard coded as SQL check constraints lo between 0x000000 and 0x10ffff, hi between 0x000000 and 0x10ffff. These are not moving targets and already represent the entire neutral scatter.

create table jzr_glyphruns (
    rowid     text    unique  not null generated always as ( 't:uc:rsg:V=' || rsg ),
    scatter   text            not null,
    lo        integer         not null,
    hi        integer         not null,
  -- primary key ( rowid ),
  foreign key scatter references jzr_glyphscatters ( rowid ),
  constraint "Ωconstraint___5" check ( lo between 0x000000 and 0x10ffff ),
  constraint "Ωconstraint___6" check ( hi between 0x000000 and 0x10ffff ),
  constraint "Ωconstraint___7" check ( lo <= hi ),
  constraint "Ωconstraint___8" check ( rowid regexp '^.*$' )
  );

create table jzr_glyphscatters (
    rowid     text    unique  not null generated always as ( 't:uc:rsg:V=' || rsg ),
    data      json            not null
  -- primary key ( rowid )
  );

create table jzr_glyphhoard (
    rowid     text    unique  not null generated always as ( 't:uc:rsg:V=' || rsg ),
    data      json            not null
  -- primary key ( rowid )
  );

To Do

[—] reject floats
[—] implement UR bounds, default 0x00_0000..0x10_ffff

Prototype Tools

Usage Example

{ enumerate_prototypes_and_methods,
  wrap_methods_of_prototypes, } = require 'bricabrac-sfmodules/lib/prototype-tools'

enumerate_prototypes_and_methods = ( clasz ) ->: Given an ES class object, return an object whose keys are method names and whose values are objects in the shape of { prototype, descriptor, }, where prototype contains the object that the method was found on and descriptor is the object descriptor as returned by Object.getOwnPropertyDescriptor(). The algorithm will start to examine clasz:: (i.e. clasz.prototype) for own property descriptors and add all descriptors whose values are functions and then proceed to do the same recursively with the result of Object.getPrototypeOf clasz:: and so on until it arrives at the end of the prototype chain. The result is useful to wrap methods of classes (and is used by wrap_methods_of_prototypes(), below) while ensuring that all instances of those classes use the wrapped versions right from instance creation onwards.
wrap_methods_of_prototypes = ( clasz, handler = -> ) ->: given an ES class object and a handler function, re-define all functions defined on clasz:: (i.e. clasz.prototype) and its prototypes to call the handler instead of the original method. The handler will be passed an object { name, fqname, prototype, method, context, P, callme, } where
- name is the method's name,
- fqname is the concatenation of the prototype's constructor's name, a dot, and the method's name,
- prototype is the object is was found on,
- context represents the instance (what the docs call thisArg, i.e. the first argument to Function::apply() and Function::call()),
- P is an array with the arguments, and
- callme is a ready-made convenience function defined as callme = ( -> method.call @, P... ).bind @ that can be called by the handler to execute the original method and obtain its return value.
This setup gives handlers a maximum of flexibility to intercept and change arguments, to measure the execution time of methods and to look at and change their return values. A conservative wrapper that only takes notes on which methods have been called would not need to make use of prototype, method, context, or P and could look like this:
```
counts = {}
handler = ({ name, fqname, prototype, method, context, P, callme, }) ->
  counts[ name ] = ( counts[ name ] ? 0 ) + 1
  return callme()
```
A more invasive handler could record the return value as R = method.call context, [ P..., extra_argument, ] and access R before returning it as a proxy for the original method.
An important characteristic of wrap_methods_of_prototypes() is that the class you pass in and all the classes it extends directly or transitively will get their methods wrapped, which will affect the entirety of the current execution context (the process). This is different from instrumenting instances where it is easier to restrict the effects of instrumentation to the scope of, say, a single unit test case—typically each# case in your entire test suite will get to use a wrapped version of the instrumented class once you have instrumented it with wrap_methods_of_prototypes().

To Do

[—] Examples
[—] Should extend to cover the other callable types (generatorfunctionss, asyncfunctions, ...)

Is Done

[+] Move enumerate_prototypes_and_methods() and wrap_methods_of_prototypes() to object-tools

Coverage Analyzer

Usage Example

{ Coverage_analyzer, } = require '../../../apps/bricabrac-sfmodules/lib/coverage-analyzer'
ca = new Coverage_analyzer()
ca.wrap_class My_class
db = new My_class()
debug 'Ωbbdbr_320', ca
warn 'Ωbbdbr_320', ca.unused_names
help 'Ωbbdbr_320', ca.used_names
help 'Ωbbdbr_320', ca.counts

To Do

Is Done

[+] Move enumerate_prototypes_and_methods() and wrap_methods_of_prototypes() to object-tools

Unsorted

[!NOTE] Documentation in this section is considered WTBD

To Do

Infrastructure for `letsfreezethat`

Clone actions:
- take
- toss
- call
- fallback
- error
- assign
- dive

rename clone -> project

s =
  take:     Symbol 'take'
  toss:     Symbol 'toss'
  call:     Symbol 'call'
  error:    Symbol 'error'
  assign:   Symbol 'assign'
  dive:     Symbol 'dive'
s.fallback = s.error

clone = ( x, howto = new Howto() ) ->

  if x?
    protoype  = Object.getPrototypeOf x
    R         = if protoype?  then ( new x.constructor ) else ( Object.create null )
    switch action
      when 'assign'
        Object.assign R, x
      when 'dive'
        for k, v of x
          R[ k ] = clone v
      else throw new Error "Ω___8 unknown action #{rpr_string action}"
    return R
  else
    protoype  = null
    R     = x

p = Object.getPrototypeOf
debug 'Ωjzrsdb__11', p {}
debug 'Ωjzrsdb__12', p 8
debug 'Ωjzrsdb__13', p Bsql3
debug 'Ωjzrsdb__14', p new Bsql3()
debug 'Ωjzrsdb__15', ( p -> ) is ( p -> )
misfit = Symbol 'misfit'
clone = ( x, seen = new Map() ) ->

Fast Line Reader

[+] fix bugs where start-of-lines are missing with small chunk_sizes
[—] allow alternative sources for buffers
[—] ensure compatibility with GUY.fs.walk_lines_with_positions() especially
- [—] recognition of different line endings
- [—] treatment of \r in the vicinity of \n
- [+] treatment of trailing empty lines
- [—] ( '\n' ).split /\r\n|\r|\n/ gives [ '', '', ], so this method should do the same

Coarse SQLite Statement Segmenter

meant to be the basis for being able to read SQLite DB dump files from within a DBric application
necessitated by the fact that calls UDFs may be present in DDL statements for generated columns, views, and triggers
even using the REGEXP operator will fail with the sqlite3 command line tool in case a regular expression with capabilities beyond what sqlite3 offers is used (e.g. it doesn't understand \p{...} escapes which JavaScript does with the v flag, as in /^\p{L}+/v)
to do its job, the segmenter has to locate those semicolons in an SQL source that are not part of comments, string literals, quoted names, or are statement-internal syntax
turns out a lexer that recognizes line comments (double backslash // to the end of line), block comments (enclosed in /* ... */), string literals (using single quotes '...'), quoted names (enclosed in either quotes as in "name" or brackets as in [name]), the only remaining cases are statement-internal semicolons; those can only appear in CREATE TRIGGER statements.
Unfortunately, since the relevant portions of SQLite's SQL syntax both allow arbitrary expressions in the crucial parts of CREATE TRIGGER statements combined with the fact that SQLite is happy not only to accept but also to emit, also and entirely unnecessarily in dump files, unquoted names that are also keywords (such as in delete from end where end = 'x' returning end;, which is, incredibly, valid SQL as understood by SQLite), recognizing all top-level statement-final semicolons is beyond what can be done with a lexer without turning it into a more-or-less full-fledged parser.
We therefore accept that lexing can capture a good portion but not all of what one might encounter in a file full of SQL statements and provide—next to a Segmenter class that does its best to fish good candidates for portions of text that represent exactly one statement—another class, Undumper, that, when given a DBric instance, will walk over the segments (candidate statements) yielded by the Segmenter and apply them to the DBric database; if this should result in an incomplete source error, it will then glob on the next segment to the source and try again; if the source file is well-formed, this will eventually lead to the database accepting the input. This is not a beautiful state of affairs but it's also irrelevant to performance in realistic cases.
To this we may add that given how SQL is commonly written and what SQLite itself produces for its dump files, it can be safely said that with a high degree of certainty the syntactically relevant semicolons of SQL source text are found as the last character of lines. That is, as soon as we have the 'incomplete source' coping mechanism in place, we can actually fall back to a much faster way to process the source—essentially only looking for semicolons at the end of lines, /;[\x20\x09]*\n/gm.
two modes of operation:
- { mode: 'fast', }: assume only line-trailing semicolons. This should be compatible with DB dump files produced by the SQLite command line tool. Observe that since fast mode gives a 20x speed gain over slow mode, it has been made the default. If the assumption is violated by the input, behavior is undefined but will likely result in SQL errors; in that case, try slow mode.
- { mode: 'slow', }: scan source for string literals, comments and so on

SQLite Undumper

Assuming db is an instance of DBric, better-sqlite3, or NodeJS SQLITE.DatabaseSync, then from within your application when you already have all necessary UDFs declared, a single call to Undumper.undump() will read a dump file line by line, look for statements, apply them to the db object's .exec() or .execute() method, and return a number of statistics when finished; while it's doing it's job, it'll display a nice progress bar in the terminal:

{ Undumper, } = SFMODULES.require_sqlite_undumper()
path          = 'path/to/my-db.dump.sql'
statistics    = Undumper.undump { db, path, } # default is { mode: 'fast', }
# statistics:
{ line_count:         102726,
  statement_count:    102600,
  dt_ms:              1724.422669,
  statements_per_s:   59498 }

read_and_apply_dump:                      19 %▕██▌          ▏

read_and_apply_dump:                     dt:              1,724.423 ms
read_and_apply_dump:                     n:             102,700.000
read_and_apply_dump:                     ???:                16.791 ms/1k
read_and_apply_dump:                     f:              59,556.164 Hz

JetStream

JetStream is a utlity to construct data processing pipelines from series of data transform.
Each transform is a generator function that accepts one data item and yields any number of transformed data items.
Because of this setup, each transform can choose to ignore a data item, or to swallow it, or to produce many new data items in response.
This makes JetStream much more flexible and useful than approaches that depend on non-generator functions. You can still build useful stuff with those but they're inherently incapable to, say, turn a (stream of) string(s) into a stream of characters.
Currently JetStream currently only uses synchronous transforms.
Actually more of a bucket chain than a garden hose for what it's worth.
Can 'configure' transforms to receive only some, not all data items; can re-use the same transform in multiple configurations in a single pipeline.
Default is for a transform to receive all data items but no cues.
Whatever the last transform in the pipeline yields becomes part of the pipeline's output, except that it will be implicitly filtered by a conceptual 'outlet' transform. The default for the outlet is the same as that for any transform (i.e. all data items, no cues); this can be changed bei configuring the pipeline:
- at instantiation time: jet = new Jetstream { outlet: 'data,#stop', }
- dynamically: jet.configure { outlet: 'data,#stop', }

JetStream: Instantiation, Configuration, Building

Jetstream::constructor: ( cfg ) ->—
Jetstream::configure: ( cfg ) ->—dynamically set properties to determine pipeline characteristics. cfg should be an object with the following optional keys:
- outlet—a JetStream selector that determines the filtering to be applied after the last transform for a given item has finished and before that value is made available in the output. Default is 'data', meaning all data items but no cues will be considered for output.
- pick—after result items have been filtered as determined by the outlet setting, apply a sieve to the stream or list of results. Default is { pick: 'all', }, meaning 'return all results' (as an iterator for walk(), as a list for run()). 'first' will pick the first, 'last' the last element (of the stream or list); observe that in these cases, walk() will still be an iterator (over zero or one values), but run() will return only the first or last values instead of a possibly empty list. If there are no results, calling run() will cause an error unless a fallback has been explicitly set.
- fallback—determine a return value for run() to be used in case no other values were produced by the pipeline.
- Observe that no matter whether or not you use pick: 'all', pick: 'first', or pick: 'last'—when you call Jetstream::run(), all transforms will be called the same number of times with the same values. The same is true when you use Jetstream::walk() and make sure the generator runs to completion.
- empty_call—the value to send, if any, when Jetstream::walk() or Jetstream::run() are called without any data values and an empty 'shelf' (i.e. no unprocessed data from calls to Jetstream::send()). This allows to start pipelines with a value-producing transform that yields values from some source whenever it gets called; whether it drops or passes on the value of Jetstream::cfg.empty_call is up to the implementation.
Jetstream::push: ( P..., t ) ->—add a transform t to the pipeline. t can be a generator function or a non-generator function; in the latter case the transform is called a 'watcher' as it can only observe items. If t is preceded by one or several arguments, those arguments will be interpreted as configurations of t. So far selectors are the only implemented configuration option.

JetStream: Adding Data

Jetstream::send: ( ds... ) ->—'shelve' zero or more items in the pipeline; processing will start when Jetstream::walk() is called.
Jetstream::cue: ( ids ) ->—create a public Symbol from id and Jetstream::send() it. Convenience method equivalent to Jetstream::send Symbol.from id

JetStream: Running and Retrieving Results

Jetstream::walk: ( ds... ) ->—'shelve' zero or more items in the pipeline and return an iterator over the processed results. Calling Jetstream::walk d1, d2, ... is equivalent to calling Jetstream::send d1, d2, ... followed by Jetstream::walk(). When the iterator stops, the pipeline has been exhausted (no more shelved items); further processing will only occur when at least one item has been sent and Jetstream::walk() has been called.
Jetstream::run: ( ds... ) ->—same as calling Jetstream::walk() with the same arguments, but will return either a list containing all results or—depending on configuration—a single result.
Jetstream::pick_first: ( ds... ) ->—same as calling [ ( Jetstream::walk()... )..., ] with the same arguments, and either picking the first value in the list, or, if it's empty, use the configured fallback value, or else throw an error. Observe that for a pipeline that is configured to always pick the first or last value, using Jetstream::pick_first() will behave just like Jetstream::run().
Jetstream::pick_last: ( ds... ) ->—same as calling [ ( Jetstream::walk()... )..., ] with the same arguments, and either picking the last value in the list, or, if it's empty, use the configured fallback value, or else throw an error. Observe that for a pipeline that is configured to always pick the first or last value, using Jetstream::pick_last() will behave just like Jetstream::run().

JetStream: Note on Picking Values

The result of a JetStream run is always a (possibly empty) list of values, unless either the stream has been configured to pick the last or the first value, or Jetstream::pick_first() or Jetstream::pick_last() have been called. The semantics of picking or getting singular values have been intentionally designed so that the least possible change is made with regard to calling of transforms and handling of intermediate values. This also means that if your pipeline computes a million values of which you only need the first value which doesn't depend on any other value in the result list, then { pick: 'first', } is probably not the right tool to do that because in order to get that single value, each transform will still be called a million times. One should rather look for a cutoff point in the input early on and terminate processing as soon as possible rather than burdening the pipeline with throwaway values.

JetStream: Selectors

to be rewritten

[—] When instantiating a pipeline (new Jetstream()), should be possible to register cues? Registered cues would then only be sent into transforms that are configured to listen to them (ex. $ { first, }, ( d ) -> ...). Signals can be sent by tranforms or the Jetstream API.
- [—] problem with that: composing pipelines. Transforms rely on testing for d is whatever_cue which fails when a sub-pipeline has been built with a different private symbol
- [—] maybe treat all symbols specially? Could match an s1 = Symbol 'A', s2 = Symbol 'A' by demanding configuration of $ { A, }, ( d ) -> ... matching the string value of symbols
- [—] 'Signals' are meta-data as opposed to 'common'/'business data'. As such cues should, in general, only be sent into those transforms that are built to digest them; ex. when you have a transform ( d ) -> d ** 2, that transform will fail when anything but a number is sent into it. That's a Good Thing if the business data itself contained something else but numbers (now you know your pipeline was incorrectly constructed), but a Bad Thing if this happens because now the transform was called with a cue it didn't ask for and isn't prepared to deal with.
- [—] hold open the possiblity to send arbitrary structured data as cues (meta data), not only Symbols
- [—] The Past: The way we've been dealing with cues is we had a few known ones like first, before_last, last, and so on; the user would declare them with the $ (transform configurator) method, using values of their own choosing. Most of the time cue values are declared in the application as appropriately named private symbols such as right before usage first = Symbol 'first', then the transform gets declared and added as $ { first, }, t = ( d ) -> ..., finally, in the transform, a check d is first is used to sort out meta data from business data. This all hinges on the name (first) being known to the pipeline object (Jetstream instance) knowing the names (first, before_last and so on) and their semantics (so names are a controlled vocabulary), and the transform knowing their identity (because you can't check for a specific private symbol if you don't hold that symbol). In essence we're using the same data parameter d to transport both business data and meta data.
- [—] The Future:
  - Meta data has distinct types: private symbols, public symbols, instances of class Signal.
  - Each piece of meta data has a name; for symbols s, that's ( String s )[ 7 ... ( String s ).length - 1 ].
  - Meta data only sent to transforms that are explicitly configured to handle them.
  - Generic configuration could use $ { select, }, ( d ) -> ... where select is a boolean or a boolen function.
  - The default is select: ( d ) -> not @is_cue d (or select: 'data'), i.e. 'deselect all cues'. select: -> true (or indeed select: true) means 'send all business and meta data'. select: false indicates 'transform not used'. select: 'cues' means 'send all cues but no data'.
  - ### TAINT unify usage of 'meta', 'cue'
  - Unselected data that is not sent into the transform is to be sent on to the next transform.
  - The custom select() function will be called in a context that provides convenience methods.
  - As a shortcut, a descriptive string may be used to configure selection:
    - format similar to CSS selectors
    - 'data': select all business data, no cues (the default)
    - 'cue': select all cues, no data
    - 'cue, data': select all data and all cues (same as select: ( -> true ))
  - Another approach:
    - Jetstream::push() defined as ( selectors..., transform ) -> ...
    - selectors can be one or more single and arrays of selectors
    - will be flattened, meaning Jetstream::push s1, [ s2, [ s3, ], ], s4, t means the same as Jetstream::push s1, s2, s3, s4, t
    - at first we don't support concatenation of selectors, only series of disjunct selectors
    - concatenated selectors will likely have to default to logical conjunction ('and')
    - as such concatening selectors with , (comma) will likely be used to indicate disjunction ('or'), as in CSS
    - transform gets to see item when (at least) one selector matches
      - a missing selector expands to the data selector. By default, transforms get to see only data items, no cues, which is the right thing to do in most cases.
      - an empty selector selects nothing, so the transform gets skipped. As is true for transforms that do not accept everything, unselected items are sent to the successor of the current transform.
      - data matches business data items (implicitly present)
      - cue, cue matches cues (opt-in); equivalent to :not(data)
      - cue#first matches cues with ID (name) first
      - cue#last matches cues with ID (name) last
      - cue#first,cue#last ( or [ 'cue#first', 'cue#last', ] ) matches cues with IDs first or last
      - #first', '#last same, ID selectors implicitly refer to cue, therefore #first equals cue#first
      - *, data,cue, data#*,cue#* are alternatives to mean 'all data items and all cues'
      - :not(data) prevents business data items from being sent (opt-out); since all items are classified as either data or cue, it implicitly selects all cues
      - :not(cue) prevents cues from being sent (implicitly present), implicitly selects all data items
    - Jetstream data items are conceptualized as HTML elements
```
"abc"               -> <data type=text value='abc'/>
876                 -> <data type=float value='876'/>
Symbol     'first'  -> <cue type=symbol id=first/>
Symbol.for 'first'  -> <cue type=symbol id=first/>
```

stream.push 'data', '#first', '#last', ( d ) ->

Loupe, Show

add cfg parameter
implement stripping of ANSI codes
implement 'colorful', 'symbolic' mode
implement callbacks for specifc types / filters

Random

[—] Provide alternative to ditched unique, such as filling a Set to a certain size with characters
[—] Provide internal implementations that capture attempt counts for testing, better insights
[—] use custom class for stats that handles excessive retry counts
[—] implement iterators
[—] should on_exhaustion, on_stats, max_retries be implemented for each method?

Random: Implementation Structure

the library currently supports four data types to generate instance values for: float, integer, chr, text
for each case, instance values can be produced...
- ...that are not smaller than a given minimum and not larger than a given maximum
- ...that are filtered according to a given RegEx pattern or an arbitrary function
- ...that, in the case of texts, are not shorter and not longer than a given pair of minimum_length and maximum_length
- ...that are unique in relation to a given collection (IOW that are new to a given collection)
the foundational Pseudo-Random Number Generator (PRNG) that enables the generation of pseudo-random values is piece of code that I found on the Internet (duh), is called SplitMix32 and is, according to the poster,
A 32-bit state PRNG that was made by taking MurmurHash3's mixing function, adding a incrementor and tweaking the constants. It's potentially one of the better 32-bit PRNGs so far; even the author of Mulberry32 considers it to be the better choice. It's also just as fast.
Like JavaScript's built-in Math.random() generator, this PRNG will generate evenly distributed values t between 0 (inclusive) and 1 (exclusive) (i.e. 0 < t ≤ 1), but other than Math.random(), it allows to be given a seed to set its state to a known fixed point, from whence the series of random numbers to be generated will remain constant for each instantiation. This randomly-deterministic (or deterministically random, or 'random but foreseeable') operation is valuable for testing.
Since the random core value t (accessible as Get_random::_float()) is always in the interval [0,1), it's straightforward to both scale (stretch or shrink) it to any other length [0,p) and / or transpose (shift left or right) it to any other starting point [q,q+1), meaning it can be projected into any interval [min,max) by computing j = min + ( t * ( max - min ) ). That projected value j can then be rounded e.g. to an integer number n, and that integer n can be interpreted as a Unicode Code Point and be used in String.fromCodePoint() to obtain a 'character'. Since many Unicode codepoints are unassigned or contain control characters, Get_random methods will filter codepoints to include only 'printable' characters. Lastly, characters can be concatenated to strings which, again, can be made shorter or longer, be built from filtered codepoints from a narrowed set like, say, /^[a-zA-ZäöüÄÖÜß]$/ (most commonly used letters to write German), or adhere to some predefined pattern or other arbitrary restrictions. It all comes out of [0,1) which I find amazing.
A further desirable restriction on random values that is sometimes encountered is the exclusion of duplicates; Get_random can help with that.
each type has dedicated methods to produce instances of each type:
- a convenience function bearing the name of the type: Get_random::float(), Get_random::chr() and so on. These convenience functions will call the associated 'producer methods' Get_random::float_producer(), Get_random::chr_producer() and so on which will analyze the arguments given and return a function that in turn will produce random values according to the specs indicated by the arguments.

References

Proposal to add new Random.Seeded() to JS

To Do

[—] implement a 'raw codepoint' convenience method?
[—] adapt Get_random::float(), Get_random::integer() to match Get_random::chr(), Get_random::text()
[—] ensure Get_random::cfgon_stats is called when given even when missing or null in method call
[—] need better rpr()
- [—] one rpr() for use in texts such as error messages, one rpr() ('show()'?) for use in presentational contexts
[—] review shuffling algorithm, see The Danger of Naïveté for a discussion of how to shuffle correctly

Benchmark

[—] implement ?min_count / ?max_count / ?min_dt / ?max_dt, prioritize: ( 'dt' | 'count' )
- probably best to stick with min or max for both count and dt
[—] allow to call as timeit name, -> ... and / or timeit { name, ..., }, -> ... so function name can be overriden
[—] implement 'tracks' / 'splits' such that within a timeit() run, the executed function can call sub-timers with track track_name, -> .... Different contestants can re-use track names that can then be compared, ex.:
```
timeit contestant_a = ( { track, progress, } ) ->
  data = null
  track load_data         = -> data = a.load_data()
  a.do_other_stuff()
  track evaluate          = -> data = a.evaluate()
  track only_a_does_this  = -> data = a.only_a_does_this()
  track save_data         = -> data = a.save_data()
timeit contestant_b = ( { track, progress, } ) ->
  data = null
  track load_data         = -> data = b.load_data()
  b.do_other_stuff()
  track evaluate          = -> data = b.evaluate()
  track save_data         = -> data = b.save_data()
```
will show elapsed total times for contestant_a, contestant_b, as well as comparisons of tracks contestant_a/load_data v. contestant_b/load_data, contestant_a/evaluate v. contestant_b/evaluate and so on; track contestant_a/only_a_does_this is shown without comparison. There will inevitable also be an 'anonymous track', i.e. time spent by each contestant outside of any named track (here symbolized by _.do_other_stuff(), but in principle also comprising any part of the function between tracks, and time spent to set up and finish each track); these extra times should also be shown, at least when exceeding a given threshold. In timeit() runs that have no track() calls, the anonymous track is all there is.
[—] incorporate functionality of with_capture_output() (setting { capture_output: true, }), return stdout, stdin contents

Errors

[—] custom error base class
- [—] or multiple ones, each derived from a built-in class such as RangeError, TypeError, AggregateError
[—] solution to capture existing error, issue new one a la Python's raise Error_2 from Error_1
[—] omit repeated lines when displaying error.cause?

Remap

[—] provide facility to retrieve all own keys (strings+symbols)
[—] use property descriptors
[—] can be expanded to provide shallow_clone(), deep_clone()

Other

[—] publish clean() solution to the 'Assign-Problem with Intermediate Nulls and Undefineds' in the context of a Bric-A-Brac SFModule
[—] integrate jizura-sources-db/bin/_lxu-utils as a Bric-A-Brac SFModule
[—] implement API for loadExtension

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Bric-A-Brac Standard Brics

DBric Database Adapter

Overwriting / Overriding / Shadowing Behavior

API

Class property plugins

Using Ersatz super() in Plugin methods

Dbric_classprop_absorber::_get_acquisition_chain() (private)

To Do

Won't Do

InterMission: Tables and Methods to Handle Integer Intervals

Ranges / Integer Intervals

To Do

Prototype Tools

Usage Example

To Do

Is Done

Coverage Analyzer

Usage Example

To Do

Is Done

Unsorted

To Do

Infrastructure for letsfreezethat

Fast Line Reader

Coarse SQLite Statement Segmenter

SQLite Undumper

JetStream

JetStream: Instantiation, Configuration, Building

JetStream: Adding Data

JetStream: Running and Retrieving Results

JetStream: Note on Picking Values

JetStream: Selectors

See Also

Loupe, Show

Random

Random: Implementation Structure

References

To Do

Benchmark

Errors

Remap

Other

Class property `plugins`

Using Ersatz `super()` in Plugin `methods`

`Dbric_classprop_absorber::_get_acquisition_chain()` (private)

Infrastructure for `letsfreezethat`