bricabrac-sfmodules
v0.22.3
Published
A collection of (sometimes not-so) small-ish utilities
Downloads
231
Readme
Bric-A-Brac Standard Brics
A collection of (sometimes not-so) small-ish utilities
Table of Contents generated with DocToc
- DBric Database Adapter
- InterMission: Tables and Methods to Handle Integer Intervals
- Prototype Tools
- Coverage Analyzer
- Unsorted
DBric Database Adapter
can use
node:sqlite('nSQL') orbetter-sqlite3('bSQL') as implementation class for theDbric::dbproperty, but since there are subtle (and not so subtle) differences in behavior and capabilities, it will probably be best to choose one of the two and stick to it.- generally recommended to use bSQL (
better-sqlite3) as it has stricter error handling (e.g. bSQL will complain when a statement without required parameters is executed, nSQL has been found to silently assumenullfor missing parameters). - subtle differences in the interpretation of options for UDF aggregate functions
- generally recommended to use bSQL (
handlers invoked for lifetime stages:
on_create()on_prepared()on_fresh()on_populated()
consider to apply Gaps and Islands empty select statements for referential integrity (ESSFRIs) (i.e.
select * from my_view where false;) to newly created views- alternatively, implement a method to do zero-row selects from all relations / only views; call it always on instantiation
- create a method to do system-defined and user-defined health checks on DB
generate
insertstatementsimplement the optiona to generate
triggers to be called before eachinsert, thus enabling error messages that quote the offending row; this could be enabled by registering a function with a suitable known name, such astrigger_on_before_insert()
Overwriting / Overriding / Shadowing Behavior
- pending; probably to be handled by life cycle methods
API
Class property plugins
- the class property
pluginsdefines the so-called 'acquisition chain', which is the sequence of objects that are visited during the construction of aDbricinstance - neutral values are
null, an empty list,[ 'prototypes', ],[ 'me', ]and[ 'prototypes', 'me', ] - in addition to the optional entries
prototypesandme, which indicate the relative positioning of the instance's prototype chain and the instance itself, suitable objects that act as Dbric plugins may be placed into the
Using Ersatz super() in Plugin methods
- functions defined in the
exports.methodsmember of a plugin object will become methods of the database adapter instance and can be used like regular (non-plugin) methods - a hitch with defining methods on a further unspecified object is that JavaScript limits the use of
super()to instance methods - for this reason, instead of using
super()in your plugin methods, use the 'Ersatz Super Device'@_super(). @_super()expects needs to be told the name of the upstream method to be called. Typically, within a plugin methodfrob(), when you want to call the original version of that method, you'd use@_super 'frob', followed by whatever are deemed the appropriate arguments. Other than that, any valid API name can be used when calling@_super().- In any event,
@_super method_namewill refer to the instance's method namedmethod_name, not another plugin's method having the same name. Just as with namedstatements, methods coming later in the plugin chain will replace values of the same name that have been provided by a plugin coming earlier in the chain.
Dbric_classprop_absorber::_get_acquisition_chain() (private)
- returns a list of objects
{ type, contributor, }that the capabilities of theDbricinstance will be based on - uses class property
plugins, q.v. - order in list follows logic of
Object.assign()(i.e. later entries shadow earlier ones) - always omitted from the list are
Object.getPrototypeOf {},Object.getPrototypeOf Object,Dbric_classprop_absorberandDbric, since these never contribute to instance capabilities
To Do
[—]DBric: parameterized views as in DBayparametrized-views.demo[—]adapter for recutils?- https://www.gnu.org/software/recutils/manual/recutils.html
- https://news.ycombinator.com/item?id=46265811
check that setting
prefixis valid both in JS and SQL contexts when used to form unescaped identifiers as in (JS)object.$prefix_propertyand (SQL)create table $prefix_name[—]rename class properties:build->build_statementsstatements->runtime_statements
allow functions for the entire
@buildproperty or any of its elements (but not both, functions may not transitively return functions) as well as for the other class properties whose names start with one ofscalar_udf_,table_udf_,aggregate_udf_,window_udf_,virtual_table_udf_; these functions will be called in the context of the instance and thus allow to use values that are only known at runtimeallow single string for
build, can be segmented
Examples for class definitions; notice use of symbolic $PREFIX
class My_db extends Dbric_std
@build: [
SQL"create table words ( w text );",
SQL"insert into words ( w ) values ( 'first' );",
]
@create_statement_$PREFIX_insert_word: SQL"insert into words ( w ) values ( $w );"`
@create_statement_$PREFIX_select_word: SQL"select w as word from words where w regexp $pattern;"`
@create_scalar_udf_$PREFIX_square:
deterministic: true
value: ( n ) -> n * n
@create_table_udf_$PREFIX_letters_of:
deterministic: true
value: ( word ) -> yield chr for chr in Array.from word
@create_aggregate_udf_your_name_here: ...
@create_window_udf_your_name_here: ...
@create_virtual_table_udf_your_name_here: ...Values can be represented as functions:
class My_db extends Dbric_std
@build: -> [
SQL"create table words ( w text );",
SQL"insert into words ( w ) values ( #{LIT @cfg.first_word} );",
]
@create_statement_$PREFIX_insert_word: ->
...
return SQL"insert into words ( w ) values ( $w );"`
@create_statement_$PREFIX_select_word: SQL"select w as word from words where w regexp $pattern;"`
@create_scalar_udf_$PREFIX_square: ->
value = if whatever then ( ( n ) -> n * n ) else ( ( n ) -> n ** 2 )
return { value, }
@create_table_udf_$PREFIX_letters_of:
deterministic: true
value: ( word ) -> yield chr for chr in Array.from word
@create_aggregate_udf_your_name_here: ...
@create_window_udf_your_name_here: ...
@create_virtual_table_udf_your_name_here: ...[—]what about transitive plugins, i.e. plugins as declared by a base class?[—]implement life cycle methods to be called at various points during instantiation; might use these to handle name clashes
Won't Do
[+]abandoned prefix schema altogether because implementation effort appears to be unbalanced with realistically assumed benefits; implemented parts remain in code for the time being.
InterMission: Tables and Methods to Handle Integer Intervals
Ranges / Integer Intervals
Points: The smallest entity of a hoard, represented as an integer; integers between
0x00_0000and0x10_ffffcan alternatively be represented as their corresponding Unicode character (i.e. a string that will yield an array of length one when passed intoArray.from string; it may havestring.length == 2notwithstanding, which is an unfortunate legacy of JavaScript being based on UTF-16 code units, not Unicode 32bit code points).Bounds:
- In the case of a Run instance, its defining
lowest andhighest points. - In the case of a Scatter instance, its
minimal andmaximal points found among thelos andhis of its constituent runs, if any. - In the case of a Hoard instance, its
firstandlastpoints, which demarcate the maximal extent of any runs it contains via its constituent scatters. The default hasfirst: 0x00_0000, last: 0x10_ffffso as to accommodate all Unicode code points, although in an actual implementation one may want to setfirst: 0x00_0001so as to avoid'\x00'for compatibility with traditional C string processing.
- In the case of a Run instance, its defining
Run: A span of consecutive integers
idefined by two numbersloandhisuch thatlo <= hiandlo <= i <= hi; empty intervals are not representable, nor are non-contiguous runs possible; for single-integer runslo == i == hiholds.Scatter: A set of Runs that is associated with a shared set of data. For example, the Unicode codepoints that are Latin letters and that are representable with a single UTF-8 byte is commonly seen in the regular expression
/[A-Za-z]/which in our model can be represented as a scatter[ (0x41,0x5a), (0x61,0x7a), ](i.e.[ ('A','Z'), ('a','z'), ]).The canonical ordering of a scatter is by ascending
lobounds, then by ascendinghibounds for runs that sharelopoints.A normalized (a.k.a. 'simplified') scatter has no overlapping and no directly adjacent runs and is sorted by ascending bounds; thus,
[ (3,7), (0,5), ]and[ (0,5), (6,7), ]both get normalized to[ (0,7), ], but[ (0,5), (7,7), ]is already normal as it is sorted and the gap at(6,6)prevents further simplification.A crucial functionality of our model is the ability to build non-normalized scatters which can at some later point get normalized to a lesser or greater degree depending on their associated data.
Scatter of a single Universal Inclusion Run (
0x00_0000..0x10_ffff); this sets dataugc:Cn(Unassigned, 'a reserved unassigned code point or a noncharacter') as a baseline for all other, more specific data; it also reflects the absolute boundaries of the universe of discourse. Among other things, this enables set algebra with finite complementary sets such as 'the set of all codepoints that are not classified asugc:Lu(uppercase Letters)'Multiple scatters of multiple universal Exclusion / Gap / Hole Runs
Hoard (Horde?) or Layered Scalar Property Set: An unordered collection of any number of positive and negative scatters plus one neutral scatter.
The neutral scatter
s0defines the universe of discourse (universal set, G. Grundmenge; its boundss0.lo,s0.hidefine the lowestr.loand the highestr.hithat any normalized runrcan have). Since the single neutral scatter must axiomatically be contiguous, it can be represented by a single run whose sole propertiesr0.lo,r0.hican then be implemented as properties of the SPS.Hits: The positive or inclusive scatters determine which integers are part of the universal set i.e. which are considers 'hits' with respect to one ore more scatters; the (one or more) scatters that are hit by a given element (point, scalar, integer) determine(s) which properties that element will be associated with.
Gaps: the negative or exclusive scatters determine which integers are not part of the 'set of interest', i.e. which are considered 'holes' or 'gaps'. Gaps take precedence over hits in the sense that any element that is found in both positive and negative scatters is considered a gap and not a hit.
Layers are selectable sets of hit scatters so element properties can be selected by purpose (ex. different virtual / composite fonts for different printing styles).
- To this end, hit layers can add 'tags' which are arbitrary data items that can be selected by way of on API. The universal set and gap scatters could likewise have tags but the present intention is to probably disallow those and at any rate disregards any tags on non-hit scatters when selecting.
Implementation of Hoards in SQL:
The below implementation notes assume use of hoard data structures specifically for the purpose of classifying Unicode codepoints (i.e. practically speaking glyphs) and determine associated properties; given a codepoint (CP), such data structures should be able to answer questions like: is this CP a CJK ideograph? Is it part of IICORE (i.e. moderately frequently used / not rare)? Assuming bold gothic (or handwritten, or running text) typesetting context, which font should be used and what typographic tweaks should be applied to the outline when rendering it?
The boundaries represented by the universal set can probably best be hard coded as SQL
checkconstraintslo between 0x000000 and 0x10ffff,hi between 0x000000 and 0x10ffff. These are not moving targets and already represent the entire neutral scatter.create table jzr_glyphruns ( rowid text unique not null generated always as ( 't:uc:rsg:V=' || rsg ), scatter text not null, lo integer not null, hi integer not null, -- primary key ( rowid ), foreign key scatter references jzr_glyphscatters ( rowid ), constraint "Ωconstraint___5" check ( lo between 0x000000 and 0x10ffff ), constraint "Ωconstraint___6" check ( hi between 0x000000 and 0x10ffff ), constraint "Ωconstraint___7" check ( lo <= hi ), constraint "Ωconstraint___8" check ( rowid regexp '^.*$' ) );create table jzr_glyphscatters ( rowid text unique not null generated always as ( 't:uc:rsg:V=' || rsg ), data json not null -- primary key ( rowid ) );create table jzr_glyphhoard ( rowid text unique not null generated always as ( 't:uc:rsg:V=' || rsg ), data json not null -- primary key ( rowid ) );
To Do
[—]reject floats[—]implement UR bounds, default0x00_0000..0x10_ffff
Prototype Tools
Usage Example
{ enumerate_prototypes_and_methods,
wrap_methods_of_prototypes, } = require 'bricabrac-sfmodules/lib/prototype-tools'enumerate_prototypes_and_methods = ( clasz ) ->: Given an ES class object, return an object whose keys are method names and whose values are objects in the shape of{ prototype, descriptor, }, whereprototypecontains the object that the method was found on anddescriptoris the object descriptor as returned byObject.getOwnPropertyDescriptor(). The algorithm will start to examineclasz::(i.e.clasz.prototype) for own property descriptors and add all descriptors whose values arefunctions and then proceed to do the same recursively with the result ofObject.getPrototypeOf clasz::and so on until it arrives at the end of the prototype chain. The result is useful to wrap methods of classes (and is used bywrap_methods_of_prototypes(), below) while ensuring that all instances of those classes use the wrapped versions right from instance creation onwards.wrap_methods_of_prototypes = ( clasz, handler = -> ) ->: given an ES class object and ahandlerfunction, re-define allfunctions defined onclasz::(i.e.clasz.prototype) and its prototypes to call the handler instead of the original method. The handler will be passed an object{ name, fqname, prototype, method, context, P, callme, }wherenameis the method's name,fqnameis the concatenation of the prototype's constructor's name, a dot, and the method's name,prototypeis the object is was found on,contextrepresents the instance (what the docs callthisArg, i.e. the first argument toFunction::apply()andFunction::call()),Pis an array with the arguments, andcallmeis a ready-made convenience function defined ascallme = ( -> method.call @, P... ).bind @that can be called by the handler to execute the original method and obtain its return value.
This setup gives handlers a maximum of flexibility to intercept and change arguments, to measure the execution time of methods and to look at and change their return values. A conservative wrapper that only takes notes on which methods have been called would not need to make use of
prototype,method,context, orPand could look like this:counts = {} handler = ({ name, fqname, prototype, method, context, P, callme, }) -> counts[ name ] = ( counts[ name ] ? 0 ) + 1 return callme()A more invasive handler could record the return value as
R = method.call context, [ P..., extra_argument, ]and accessRbefore returning it as a proxy for the original method.An important characteristic of
wrap_methods_of_prototypes()is that the class you pass in and all the classes it extends directly or transitively will get their methods wrapped, which will affect the entirety of the current execution context (the process). This is different from instrumenting instances where it is easier to restrict the effects of instrumentation to the scope of, say, a single unit test case—typically each# case in your entire test suite will get to use a wrapped version of the instrumented class once you have instrumented it withwrap_methods_of_prototypes().
To Do
[—]Examples[—]Should extend to cover the other callable types (generatorfunctionss,asyncfunctions, ...)
Is Done
[+]Moveenumerate_prototypes_and_methods()andwrap_methods_of_prototypes()toobject-tools
Coverage Analyzer
Usage Example
{ Coverage_analyzer, } = require '../../../apps/bricabrac-sfmodules/lib/coverage-analyzer'
ca = new Coverage_analyzer()
ca.wrap_class My_class
db = new My_class()
debug 'Ωbbdbr_320', ca
warn 'Ωbbdbr_320', ca.unused_names
help 'Ωbbdbr_320', ca.used_names
help 'Ωbbdbr_320', ca.countsTo Do
Is Done
[+]Moveenumerate_prototypes_and_methods()andwrap_methods_of_prototypes()toobject-tools
Unsorted
[!NOTE] Documentation in this section is considered WTBD
To Do
Infrastructure for letsfreezethat
- Clone actions:
taketosscallfallbackerrorassigndive
rename clone -> project
s =
take: Symbol 'take'
toss: Symbol 'toss'
call: Symbol 'call'
error: Symbol 'error'
assign: Symbol 'assign'
dive: Symbol 'dive'
s.fallback = s.error
clone = ( x, howto = new Howto() ) ->
if x?
protoype = Object.getPrototypeOf x
R = if protoype? then ( new x.constructor ) else ( Object.create null )
switch action
when 'assign'
Object.assign R, x
when 'dive'
for k, v of x
R[ k ] = clone v
else throw new Error "Ω___8 unknown action #{rpr_string action}"
return R
else
protoype = null
R = xp = Object.getPrototypeOf
debug 'Ωjzrsdb__11', p {}
debug 'Ωjzrsdb__12', p 8
debug 'Ωjzrsdb__13', p Bsql3
debug 'Ωjzrsdb__14', p new Bsql3()
debug 'Ωjzrsdb__15', ( p -> ) is ( p -> )
misfit = Symbol 'misfit'
clone = ( x, seen = new Map() ) ->Fast Line Reader
[+]fix bugs where start-of-lines are missing with smallchunk_sizes[—]allow alternative sources for buffers[—]ensure compatibility withGUY.fs.walk_lines_with_positions()especially[—]recognition of different line endings[—]treatment of\rin the vicinity of\n[+]treatment of trailing empty lines[—]( '\n' ).split /\r\n|\r|\n/gives[ '', '', ], so this method should do the same
Coarse SQLite Statement Segmenter
meant to be the basis for being able to read SQLite DB dump files from within a DBric application
necessitated by the fact that calls UDFs may be present in DDL statements for generated columns, views, and triggers
even using the
REGEXPoperator will fail with thesqlite3command line tool in case a regular expression with capabilities beyond whatsqlite3offers is used (e.g. it doesn't understand\p{...}escapes which JavaScript does with thevflag, as in/^\p{L}+/v)to do its job, the segmenter has to locate those semicolons in an SQL source that are not part of comments, string literals, quoted names, or are statement-internal syntax
turns out a lexer that recognizes line comments (double backslash
//to the end of line), block comments (enclosed in/* ... */), string literals (using single quotes'...'), quoted names (enclosed in either quotes as in"name"or brackets as in[name]), the only remaining cases are statement-internal semicolons; those can only appear inCREATE TRIGGERstatements.Unfortunately, since the relevant portions of SQLite's SQL syntax both allow arbitrary expressions in the crucial parts of
CREATE TRIGGERstatements combined with the fact that SQLite is happy not only to accept but also to emit, also and entirely unnecessarily in dump files, unquoted names that are also keywords (such as indelete from end where end = 'x' returning end;, which is, incredibly, valid SQL as understood by SQLite), recognizing all top-level statement-final semicolons is beyond what can be done with a lexer without turning it into a more-or-less full-fledged parser.We therefore accept that lexing can capture a good portion but not all of what one might encounter in a file full of SQL statements and provide—next to a
Segmenterclass that does its best to fish good candidates for portions of text that represent exactly one statement—another class,Undumper, that, when given aDBricinstance, will walk over the segments (candidate statements) yielded by theSegmenterand apply them to theDBricdatabase; if this should result in anincomplete sourceerror, it will then glob on the next segment to the source and try again; if the source file is well-formed, this will eventually lead to the database accepting the input. This is not a beautiful state of affairs but it's also irrelevant to performance in realistic cases.To this we may add that given how SQL is commonly written and what SQLite itself produces for its dump files, it can be safely said that with a high degree of certainty the syntactically relevant semicolons of SQL source text are found as the last character of lines. That is, as soon as we have the 'incomplete source' coping mechanism in place, we can actually fall back to a much faster way to process the source—essentially only looking for semicolons at the end of lines,
/;[\x20\x09]*\n/gm.two modes of operation:
{ mode: 'fast', }: assume only line-trailing semicolons. This should be compatible with DB dump files produced by the SQLite command line tool. Observe that sincefastmode gives a 20x speed gain overslowmode, it has been made the default. If the assumption is violated by the input, behavior is undefined but will likely result in SQL errors; in that case, tryslowmode.{ mode: 'slow', }: scan source for string literals, comments and so on
SQLite Undumper
Assuming db is an instance of DBric, better-sqlite3, or NodeJS SQLITE.DatabaseSync, then from within
your application when you already have all necessary UDFs declared, a single call to Undumper.undump()
will read a dump file line by line, look for statements, apply them to the db object's .exec() or
.execute() method, and return a number of statistics when finished; while it's doing it's job, it'll
display a nice progress bar in the terminal:
{ Undumper, } = SFMODULES.require_sqlite_undumper()
path = 'path/to/my-db.dump.sql'
statistics = Undumper.undump { db, path, } # default is { mode: 'fast', }
# statistics:
{ line_count: 102726,
statement_count: 102600,
dt_ms: 1724.422669,
statements_per_s: 59498 }read_and_apply_dump: 19 %▕██▌ ▏read_and_apply_dump: dt: 1,724.423 ms
read_and_apply_dump: n: 102,700.000
read_and_apply_dump: ???: 16.791 ms/1k
read_and_apply_dump: f: 59,556.164 HzJetStream
- JetStream is a utlity to construct data processing pipelines from series of data transform.
- Each transform is a generator function that accepts one data item and yields any number of transformed data items.
- Because of this setup, each transform can choose to ignore a data item, or to swallow it, or to produce many new data items in response.
- This makes JetStream much more flexible and useful than approaches that depend on non-generator functions. You can still build useful stuff with those but they're inherently incapable to, say, turn a (stream of) string(s) into a stream of characters.
- Currently JetStream currently only uses synchronous transforms.
- Actually more of a bucket chain than a garden hose for what it's worth.
- Can 'configure' transforms to receive only some, not all data items; can re-use the same transform in multiple configurations in a single pipeline.
- Default is for a transform to receive all data items but no cues.
- Whatever the last transform in the pipeline
yields becomes part of the pipeline's output, except that it will be implicitly filtered by a conceptual 'outlet' transform. The default for the outlet is the same as that for any transform (i.e. all data items, no cues); this can be changed bei configuring the pipeline:- at instantiation time:
jet = new Jetstream { outlet: 'data,#stop', } - dynamically:
jet.configure { outlet: 'data,#stop', }
- at instantiation time:
JetStream: Instantiation, Configuration, Building
Jetstream::constructor: ( cfg ) ->—Jetstream::configure: ( cfg ) ->—dynamically set properties to determine pipeline characteristics.cfgshould be an object with the following optional keys:outlet—a JetStream selector that determines the filtering to be applied after the last transform for a given item has finished and before that value is made available in the output. Default is'data', meaning all data items but no cues will be considered for output.pick—after result items have been filtered as determined by theoutletsetting, apply a sieve to the stream or list of results. Default is{ pick: 'all', }, meaning 'return all results' (as an iterator forwalk(), as a list forrun()).'first'will pick the first,'last'the last element (of the stream or list); observe that in these cases,walk()will still be an iterator (over zero or one values), butrun()will return only the first or last values instead of a possibly empty list. If there are no results, callingrun()will cause an error unless afallbackhas been explicitly set.fallback—determine a return value forrun()to be used in case no other values were produced by the pipeline.- Observe that no matter whether or not you use
pick: 'all',pick: 'first', orpick: 'last'—when you callJetstream::run(), all transforms will be called the same number of times with the same values. The same is true when you useJetstream::walk()and make sure the generator runs to completion. empty_call—the value to send, if any, whenJetstream::walk()orJetstream::run()are called without any data values and an empty 'shelf' (i.e. no unprocessed data from calls toJetstream::send()). This allows to start pipelines with a value-producing transform thatyields values from some source whenever it gets called; whether it drops or passes on the value ofJetstream::cfg.empty_callis up to the implementation.
Jetstream::push: ( P..., t ) ->—add a transformtto the pipeline.tcan be a generator function or a non-generator function; in the latter case the transform is called a 'watcher' as it can only observe items. Iftis preceded by one or several arguments, those arguments will be interpreted as configurations oft. So far selectors are the only implemented configuration option.
JetStream: Adding Data
Jetstream::send: ( ds... ) ->—'shelve' zero or more items in the pipeline; processing will start whenJetstream::walk()is called.Jetstream::cue: ( ids ) ->—create a public Symbol fromidandJetstream::send()it. Convenience method equivalent toJetstream::send Symbol.from id
JetStream: Running and Retrieving Results
Jetstream::walk: ( ds... ) ->—'shelve' zero or more items in the pipeline and return an iterator over the processed results. CallingJetstream::walk d1, d2, ...is equivalent to callingJetstream::send d1, d2, ...followed byJetstream::walk(). When the iterator stops, the pipeline has been exhausted (no more shelved items); further processing will only occur when at least one item has been sent andJetstream::walk()has been called.Jetstream::run: ( ds... ) ->—same as callingJetstream::walk()with the same arguments, but will return either a list containing all results or—depending on configuration—a single result.Jetstream::pick_first: ( ds... ) ->—same as calling[ ( Jetstream::walk()... )..., ]with the same arguments, and either picking the first value in the list, or, if it's empty, use the configuredfallbackvalue, or else throw an error. Observe that for a pipeline that is configured to alwayspickthe first or last value, usingJetstream::pick_first()will behave just likeJetstream::run().Jetstream::pick_last: ( ds... ) ->—same as calling[ ( Jetstream::walk()... )..., ]with the same arguments, and either picking the last value in the list, or, if it's empty, use the configuredfallbackvalue, or else throw an error. Observe that for a pipeline that is configured to alwayspickthe first or last value, usingJetstream::pick_last()will behave just likeJetstream::run().
JetStream: Note on Picking Values
The result of a JetStream run is always a (possibly empty) list of values, unless either the stream has been
configured to pick the last or the first value, or Jetstream::pick_first() or Jetstream::pick_last() have
been called. The semantics of picking or getting singular values have been intentionally designed so that
the least possible change is made with regard to calling of transforms and handling of intermediate values.
This also means that if your pipeline computes a million values of which you only need the first value which
doesn't depend on any other value in the result list, then { pick: 'first', } is probably not the right
tool to do that because in order to get that single value, each transform will still be called a million
times. One should rather look for a cutoff point in the input early on and terminate processing as soon as
possible rather than burdening the pipeline with throwaway values.
JetStream: Selectors
to be rewritten
[—]When instantiating a pipeline (new Jetstream()), should be possible to register cues? Registered cues would then only be sent into transforms that are configured to listen to them (ex.$ { first, }, ( d ) -> ...). Signals can be sent by tranforms or theJetstreamAPI.[—]problem with that: composing pipelines. Transforms rely on testing ford is whatever_cuewhich fails when a sub-pipeline has been built with a different private symbol[—]maybe treat all symbols specially? Could match ans1 = Symbol 'A',s2 = Symbol 'A'by demanding configuration of$ { A, }, ( d ) -> ...matching the string value of symbols[—]'Signals' are meta-data as opposed to 'common'/'business data'. As such cues should, in general, only be sent into those transforms that are built to digest them; ex. when you have a transform( d ) -> d ** 2, that transform will fail when anything but a number is sent into it. That's a Good Thing if the business data itself contained something else but numbers (now you know your pipeline was incorrectly constructed), but a Bad Thing if this happens because now the transform was called with a cue it didn't ask for and isn't prepared to deal with.[—]hold open the possiblity to send arbitrary structured data as cues (meta data), not onlySymbols[—]The Past: The way we've been dealing with cues is we had a few known ones likefirst,before_last,last, and so on; the user would declare them with the$(transform configurator) method, using values of their own choosing. Most of the time cue values are declared in the application as appropriately named private symbols such as right before usagefirst = Symbol 'first', then the transform gets declared and added as$ { first, }, t = ( d ) -> ..., finally, in the transform, a checkd is firstis used to sort out meta data from business data. This all hinges on the name (first) being known to the pipeline object (Jetstreaminstance) knowing the names (first,before_lastand so on) and their semantics (so names are a controlled vocabulary), and the transform knowing their identity (because you can't check for a specific private symbol if you don't hold that symbol). In essence we're using the same data parameterdto transport both business data and meta data.[—]The Future:Meta data has distinct types: private symbols, public symbols, instances of class
Signal.Each piece of meta data has a name; for symbols
s, that's( String s )[ 7 ... ( String s ).length - 1 ].Meta data only sent to transforms that are explicitly configured to handle them.
Generic configuration could use
$ { select, }, ( d ) -> ...whereselectis a boolean or a boolen function.The default is
select: ( d ) -> not @is_cue d(orselect: 'data'), i.e. 'deselect all cues'.select: -> true(or indeedselect: true) means 'send all business and meta data'.select: falseindicates 'transform not used'.select: 'cues'means 'send all cues but no data'.### TAINTunify usage of 'meta', 'cue'Un
selected data that is not sent into the transform is to be sent on to the next transform.The custom
select()function will be called in a context that provides convenience methods.As a shortcut, a descriptive string may be used to configure selection:
- format similar to CSS selectors
'data': select all business data, no cues (the default)'cue': select all cues, no data'cue, data': select all data and all cues (same asselect: ( -> true ))
Another approach:
Jetstream::push()defined as( selectors..., transform ) -> ...selectorscan be one or more single and arrays of selectorswill be flattened, meaning
Jetstream::push s1, [ s2, [ s3, ], ], s4, tmeans the same asJetstream::push s1, s2, s3, s4, tat first we don't support concatenation of selectors, only series of disjunct selectors
concatenated selectors will likely have to default to logical conjunction ('and')
as such concatening selectors with
,(comma) will likely be used to indicate disjunction ('or'), as in CSStransform gets to see item when (at least) one selector matches
- a missing selector expands to the
dataselector. By default, transforms get to see only data items, no cues, which is the right thing to do in most cases. - an empty selector selects nothing, so the transform gets skipped. As is true for transforms that do not accept everything, unselected items are sent to the successor of the current transform.
datamatches business data items (implicitly present)cue,cuematches cues (opt-in); equivalent to:not(data)cue#firstmatches cues with ID (name)firstcue#lastmatches cues with ID (name)lastcue#first,cue#last( or[ 'cue#first', 'cue#last', ]) matches cues with IDsfirstorlast#first', '#lastsame, ID selectors implicitly refer tocue, therefore#firstequalscue#first*,data,cue,data#*,cue#*are alternatives to mean 'all data items and all cues':not(data)prevents business data items from being sent (opt-out); since all items are classified as eitherdataorcue, it implicitly selects all cues:not(cue)prevents cues from being sent (implicitly present), implicitly selects alldataitems
- a missing selector expands to the
Jetstream data items are conceptualized as HTML elements
"abc" -> <data type=text value='abc'/> 876 -> <data type=float value='876'/> Symbol 'first' -> <cue type=symbol id=first/> Symbol.for 'first' -> <cue type=symbol id=first/>
stream.push 'data', '#first', '#last', ( d ) ->See Also
- in case more complicated selectors must be parsed: https://github.com/fb55/css-what
Loupe, Show
- add
cfgparameter - implement stripping of ANSI codes
- implement 'colorful', 'symbolic' mode
- implement callbacks for specifc types / filters
Random
[—]Provide alternative to ditchedunique, such as filling aSetto a certain size with characters[—]Provide internal implementations that capture attempt counts for testing, better insights[—]use custom class forstatsthat handles excessive retry counts[—]implement iterators[—]shouldon_exhaustion,on_stats,max_retriesbe implemented for each method?
Random: Implementation Structure
the library currently supports four data types to generate instance values for:
float,integer,chr,textfor each case, instance values can be produced...
- ...that are not smaller than a given
minimum and not larger than a givenmaximum - ...that are
filtered according to a given RegEx pattern or an arbitrary function - ...that, in the case of
texts, are not shorter and not longer than a given pair ofminimum_lengthandmaximum_length - ...that are unique in relation to a given collection (IOW that are new to a given collection)
- ...that are not smaller than a given
the foundational Pseudo-Random Number Generator (PRNG) that enables the generation of pseudo-random values is piece of code that I found on the Internet (duh), is called SplitMix32 and is, according to the poster,
A 32-bit state PRNG that was made by taking MurmurHash3's mixing function, adding a incrementor and tweaking the constants. It's potentially one of the better 32-bit PRNGs so far; even the author of Mulberry32 considers it to be the better choice. It's also just as fast.
Like JavaScript's built-in
Math.random()generator, this PRNG will generate evenly distributed valuestbetween0(inclusive) and1(exclusive) (i.e.0 < t ≤ 1), but other thanMath.random(), it allows to be given aseedto set its state to a known fixed point, from whence the series of random numbers to be generated will remain constant for each instantiation. This randomly-deterministic (or deterministically random, or 'random but foreseeable') operation is valuable for testing.Since the random core value
t(accessible asGet_random::_float()) is always in the interval[0,1), it's straightforward to both scale (stretch or shrink) it to any other length[0,p)and / or transpose (shift left or right) it to any other starting point[q,q+1), meaning it can be projected into any interval[min,max)by computingj = min + ( t * ( max - min ) ). That projected valuejcan then be rounded e.g. to an integer numbern, and that integerncan be interpreted as a Unicode Code Point and be used inString.fromCodePoint()to obtain a 'character'. Since many Unicode codepoints are unassigned or contain control characters,Get_randommethods will filter codepoints to include only 'printable' characters. Lastly, characters can be concatenated to strings which, again, can be made shorter or longer, be built from filtered codepoints from a narrowed set like, say,/^[a-zA-ZäöüÄÖÜß]$/(most commonly used letters to write German), or adhere to some predefined pattern or other arbitrary restrictions. It all comes out of[0,1)which I find amazing.A further desirable restriction on random values that is sometimes encountered is the exclusion of duplicates;
Get_randomcan help with that.each type has dedicated methods to produce instances of each type:
- a convenience function bearing the name of the type:
Get_random::float(),Get_random::chr()and so on. These convenience functions will call the associated 'producer methods'Get_random::float_producer(),Get_random::chr_producer()and so on which will analyze the arguments given and return a function that in turn will produce random values according to the specs indicated by the arguments.
- a convenience function bearing the name of the type:
References
To Do
[—]implement a 'raw codepoint' convenience method?[—]adaptGet_random::float(),Get_random::integer()to matchGet_random::chr(),Get_random::text()[—]ensureGet_random::cfgon_statsis called when given even when missing ornullin method call[—]need betterrpr()[—]onerpr()for use in texts such as error messages, onerpr()('show()'?) for use in presentational contexts
[—]review shuffling algorithm, see The Danger of Naïveté for a discussion of how to shuffle correctly
Benchmark
[—]implement ?min_count/ ?max_count/ ?min_dt/ ?max_dt,prioritize: ( 'dt' | 'count' )- probably best to stick with
minormaxfor bothcountanddt
- probably best to stick with
[—]allow to call astimeit name, -> ...and / ortimeit { name, ..., }, -> ...so function name can be overriden[—]implement 'tracks' / 'splits' such that within atimeit()run, the executed function can call sub-timers withtrack track_name, -> .... Different contestants can re-use track names that can then be compared, ex.:timeit contestant_a = ( { track, progress, } ) -> data = null track load_data = -> data = a.load_data() a.do_other_stuff() track evaluate = -> data = a.evaluate() track only_a_does_this = -> data = a.only_a_does_this() track save_data = -> data = a.save_data() timeit contestant_b = ( { track, progress, } ) -> data = null track load_data = -> data = b.load_data() b.do_other_stuff() track evaluate = -> data = b.evaluate() track save_data = -> data = b.save_data()will show elapsed total times for
contestant_a,contestant_b, as well as comparisons of trackscontestant_a/load_datav.contestant_b/load_data,contestant_a/evaluatev.contestant_b/evaluateand so on; trackcontestant_a/only_a_does_thisis shown without comparison. There will inevitable also be an 'anonymous track', i.e. time spent by each contestant outside of any named track (here symbolized by_.do_other_stuff(), but in principle also comprising any part of the function between tracks, and time spent to set up and finish each track); these extra times should also be shown, at least when exceeding a given threshold. Intimeit()runs that have notrack()calls, the anonymous track is all there is.[—]incorporate functionality ofwith_capture_output()(setting{ capture_output: true, }), return stdout, stdin contents
Errors
[—]custom error base class[—]or multiple ones, each derived from a built-in class such asRangeError,TypeError,AggregateError
[—]solution to capture existing error, issue new one a la Python'sraise Error_2 from Error_1[—]omit repeated lines when displayingerror.cause?
Remap
[—]provide facility to retrieve all own keys (strings+symbols)[—]use property descriptors[—]can be expanded to provideshallow_clone(),deep_clone()
Other
[—]publishclean()solution to the 'Assign-Problem with Intermediate Nulls and Undefineds' in the context of a Bric-A-Brac SFModule[—]integratejizura-sources-db/bin/_lxu-utilsas a Bric-A-Brac SFModule[—]implement API forloadExtension
