@fraczak/k
v2.1.6
Published
k-language for JSON-like data transformation
Downloads
22
Readme
k-language
npm install '@fraczak/k'
From javascript:
import k from "@fraczak/k";
const fn = k.compile("<.name,.nom,'?'>");
console.log([{name:"x"},{nom:"y"},{}].map(fn));
// returns: [ "x", "y", "?" ]
k - the way of building and manipulating JSON-like data
Technically, k
is a notation for defining first-order partial functions.
An example of a partial function is the projection, e.g., ".toto
",
which maps an object to its property named toto
, or it is not defined if
the property doesn't exist. E.g.,
1 .toto :
2 {"toto": 5, "titi": 10} --> 5
3 {"titi": 10} ... undefined // it is not a value!
Note: The above 3 lines should be read as follows. A
k
-expression is printed in the first line (before ":
"). The following lines are examples of the function defined by thek
-expression applayed to JSON values (first part of each line). If the function for the value is defined, then the result is printed after "-->
" (line 2 in the above example). If the function is not defined for the value, then "... undefined
" is printed (line 3).
Combining "partial functions"
There are three ways of combining functions:
composition:
(f1 f2 ...)
, e.g.(.toto .titi)
extracts nested field.(.toto .titi) : {"toto": {"titi": 10}} --> 10 {"toto": 10 } ... undefined {} ... undefined
merge:
< f1, f2,... >
, e.g.,<.toto, .titi>
extracts fieldtoto
if present; otherwise extractstiti
.<.toto, .titi> : {"toto": 5, "titi": 10} --> 5 {"titi": 10 } --> 10 {} ... undefined
product:
{ f1 label1, f2 label2, ...}
, e.g.,{.toto TOTO, .titi TITI}
extracts two fields and builds a record out of them.{.toto TOTO, .titi TITI} : {"toto": 5, "titi": 10, "x": 3} --> {"TOTO": 5, "TITI": 10} {"titi": 10 } ... undefined
QUIZ: What is:
- empty composition:
()
? - empty merge:
<>
? - empty product:
{}
? {{{{} s} s} s}
?({{{() a} b} c} .c .b .a)
?
Syntactic sugar
- parenthesis can be omitted, except for the empty composition
()
, - dot (
.
) in "projection" acts as a separator so the space around it can be omitted.
For example, [(.toto .titi (.0 .1))]
can be written as [.toto.titi.0.1]
.
Comments can be introduced by //
, --
, %
, or #
and extends to
the end of line. Multiline C
-like comments, /* ... */
, are also
supported.
Basic extensions
Constants, i.e., literals for strings
, integers
, booleans
, and null
A constant defines a function which ignores its argument and produces the constant value. E.g.:
{123 int, "kScript" str, true bool, null null} :
"any" --> {"int":123,"str":"kScript","bool":true,"null":null}
Those values, i.e., strings
, integers
, booleans
, and null
, admit the
projection to the canonicat string representation of the value, e.g.:
.2 :
2 --> {}
"2" --> {}
4 ... undefined
true ... undefined
.true :
true --> {}
"true" --> {}
4 ... undefined
"toto" ... undefined
.null :
null --> {}
"null" --> {}
4 ... undefined
false ... undefined
Vector product
Vector product can be seen as an abbreviation for product whose field
names are integers starting from zero. E.g., {.toto 0, .titi 1, 123 2}
can be written as [.toto, .titi, 123]
.
[.toto, .titi, 12] :
{"toto": 5, "titi": 10 } --> [5, 10, 12]
.1 :
["A","B","C"] --> "B"
["a"] ... undefined
Pragmatic extensions, aka "standard library"
GT
-- identity for lists of decreasing elements; undefined otherwiseGT: [4,3] --> [4,3] [3,4] ... undefined [] --> [] [4,3,0] --> [4,3,0]
EQ
-- identity for lists of equal elements; undefined otherwiseEQ: [4,4] --> [4,4] [4,5] ... undefined [4,4,4] --> [4,4,4] [] --> []
PLUS
andTIMES
-- sum and product of lists of numbers{PLUS plus, TIMES times} : [1,2] --> {"plus":3,"times":2} [2,2,2] --> {"plus":6,"times":8} [] --> {"plus":0,"times":1}
CONCAT
-- concatenation of lists of stringsCONCAT: ["a","bc","d"] --> "abcd" ["a","bc","d"] --> "abcd"
toJSON
andfromJSON
-- conversion to and from JSON stringstoJSON: {"a": 12} --> "{\"a\":12}" fromJSON: "{\"a\":12}" --> {"a":12} "2.12" --> 2.12 "[1,2.11,0.3e-32]" --> [1,2.11,3e-33]
other predefined parial functions are:
DIV
,FDIV
,CONS
,SNOC
,toDateMsec
,toDateStr
, and_log!
.
Function and code (i.e., type) definitions
dec = [(),-1] PLUS;
max = <SNOC [.0, .1 max] <GT.0, .1>, .0> ;
factorial = < [(),0] GT .0 [dec factorial, ()] TIMES, 1 >;
Codes (prefixed by $
) can be defined by taged union and product. E.g.:
$nat = <nat 1, {} 0>;
$pair = {nat x, nat y};
suc = {$nat 1};
add = $pair <{.x.1 x, .y suc y} add, .y>;
Basic extension codes
Since basic extension introduces integers, booleans, and strings, there are three
predefined types: int
, bool
, and string
. A vector product code can also be defined
by [ codeExp ]
. All members of the vector are the same code. E.g.,
$intVector = [ int ];
$boolVector = [ bool ];
$tree = [ tree ];
emptyList? = $intVector $[ string ]; -- as only an empty vector can be a vector
-- of integers and a vector of strings
List comprehension on vectors (experimental!)
A vector can be "open" by PIPE (|
) operator so the following partial function is applied to
each element of the vector one by one, yielding another open value. An open value can be "closed",
i.e., turn into a regular vector using CARET
(^
) operator. E.g.:
| .x ^ :
[{x:12}, {x:8,y:10}, {y:98}] --> [12,8]
[1,2,3] --> []
The PIPE operator can be used for defining the Cartesian product:
[.0 |, .1 |] ^ :
[[1,2], [3,4]] --> [[1,3],[1,4],[2,3],[2,4]]
or
{ | x, | y } ^ :
[1,2] --> [{"x":1,"y":1},{"x":2,"y":1},{"x":1,"y":2},{"x":2,"y":2}]
QUIZ: Write a function which will take a list of integers and an integer x
, and count how many
times value x
appears in the list. (see Examples/list-comprehension.k
for a solution)
count_occurrences =
$ { [int] list, int x }
-- ???
$ int;
WARNING: When using CARET
operator paranthesis may be required, e.g., like in:
| ( [|,|] ^ ) ^ :
[[1,2],[3,4]] --> [[[1,1],[1,2],[2,1],[2,2]],[[3,3],[3,4],[4,3],[4,4]]]
Examples
projection:
.x ."field name" .4
The function is defined only if its argument is a structure with the field (or a vector with the index).
constants, literals for Strings, Booleans, and Integers. Examples:
"a string" 'another "String"' 123 false null
"built-in" functions:
[1, 2, 3] PLUS -- integer constant function 6 [4, 4] TIMES toJSON -- string constant function "16" [3, 2] GT -- vector constant function [3, 2] [3, 4] GT -- ... undefined
A more interesting example could be:
< GT .0, .1>
which selects the maximum element in two element vector, i.e.,
[3,8] < GT .0, .1 > --> 8
User defined functions
k
-expression can be prefixed by function definitions. E.g.:
dec = [(),-1] PLUS;
zero? = [(),0] EQ 0;
factorial = <
zero? 1,
[dec factorial, ()] TIMES
>;
{ () x, factorial "x!" }
Another example could be finding the biggest (max) value in a vector:
max = <
SNOC # [x0, x1, x2, ...] --> [x0, [x1, x2, ...]]
[.0, .1 max] # [x0, [x1, x2, ...]] --> [x0, max(x1,x2,...)], i.e., recursive call
<GT .0, .1> # if x0 > max(x1,x2,...) then x0 else max(x1,x2,...)
, # when SNOC is not defined, i.e., if the input vector has one element:
.0 # [x0] --> x0
>;
max
Value encodings (codes)
There are three predefined value encodings: int
, string
, and
bool
. The language supports code
-expressions:
- product, e.g.,
{int x, int y, bool flag}
- disjoint union, e.g.,
<{} true, {} false>
- vector, e.g.,
[ int ]
(all elements of the vector use the same encoding)
One can define recursive codes. E.g.:
$tree = <string leaf, {tree left, tree right} tree>;
Each code definition starts with a $
.
The above example defines new code called tree.
The code can be then used in a k
-expression as a filter. A code
-expression
within k
-expression is again prefixed by $
.
$ tree = <string leaf, {tree left, tree right} tree>;
inc = [(),1] PLUS;
max = <GT .0, .1>;
height = $ tree <
.leaf 0,
.tree [.left height, .right height] max inc
> $ int;
height
k
(a command-line JSON processor using k
syntax)
There is a wrapper, k
(./node_modules/.bin/k
), which makes it easy to
run the language from command line.
> k
... errors ...
Usage: ./node_modules/.bin/k ( k-expr | -k k-file) [ -1 ] [ json-file ]
E.g., cat '{"a": 10}' | ./node_modules/.bin/k '[(),()]'
For example:
One
k
-expression with onejson
-object:> echo '{"x": 12, "y": 13}' | k '{ <.x, "no x"> x, () input}' {"x":12,"input":{"x":12,"y":13}}
By providing only
k
-expression, the script will compile thek
-expression and apply the generated function to thestdin
, line by line:> k '<["x=",.x," & y=",.y],["only x=",.x],["only y=",.y],["no x nor y"]>{CONCAT "x&y"}' {"y": 123, "x": 432,"others": "..."} --> {"x&y":"x=432 & y=123"} {"x": 987} --> {"x&y":"only x=987"} {"z": 123} --> {"x&y":"no x nor y"} ^D - to interrupt
If the input is a multiline json object, we need to add
-1
to the command-line options.If the
k
-expression is long, it can be put in a file, e.g.:> cat test.k --------- comments start by #, --, or // ---------------------------------- < -- merge of 4 partial functions... ["x=", .x, " & y=", .y], -- produces a vector of 4 values, if fields 'x' and 'y' are present ["only x=", .x], -- produces a pair '["only x=", "value-of-x"]', for input like {"x":"value-of-x"} -- it is defined only if field 'x' is present ["only y=", .y], ["no x nor y"] -- defined for all input, returns always the same one element vector > -- one of the string vectors is passed to the following partial function, -- which produces a record (map) with one field "x&y", whose value is the -- result of concatenating elements of the passed in vector { CONCAT "x&y" } ------------------------------------------------------------------------------
We can use it by:
> k -k test.k
If we want to read
json
objects from a file, e.g.,my-objects.json
, we do> k -k test.k my-objects.json {"x&y":"x=432 & y=123"} {"x&y":"only x=987"} {"x&y":"no x nor y"}
where:
> cat my-objects.json #################################################### # empty lines and lines starting with # are ignored {"y": 123, "x": 432,"others": "..."} {"x": 987} {"z": 123} ####################################################
Short comparaison with jq
tutorial examples: https://stedolan.github.io/jq/tutorial/
curl 'https://api.github.com/repositories/5101141/commits?per_page=5' | jq '.' curl 'https://api.github.com/repositories/5101141/commits?per_page=5' | k '()' -1
curl 'https://api.github.com/repositories/5101141/commits?per_page=5' | jq '.[0]' curl 'https://api.github.com/repositories/5101141/commits?per_page=5' | k '.0' -1
jq '.[0] | {message: .commit.message, name: .commit.committer.name}' k '.0 {.commit.message message, .commit.committer.name name}' -1
- note:
k
-expression defines a partial function yielding a single json object, i.e., as far ask
is concerned, examples 4 and 5 of thejq
tutorial are equivalent. jq '[.[] | {message: .commit.message, name: .commit.committer.name}]' k '| .commit {.message message, .committer.name name} ^' -1
jq '[.[] | {message: .commit.message, name: .commit.committer.name, parents: [.parents[].html_url]}]' k '| {.commit.message message, .commit.committer.name name, .parents | .html_url ^ parents} ^' -1
k-REPL (Read-Evaluate-Print Loop)
Also there is a REPL, k-repl
(./node_modules/.bin/k-repl
), which acts like a toy
shell for the language. E.g.:
> ./node_modules/.bin/k-repl
{'a' a, 'b' b} toJSON
=> "{\"a\":\"a\",\"b\":\"b\"}"
{"a" a} toJSON fromJSON
=> {"a": "a"}
inc = [(),1] PLUS; 1 inc inc
=> 3
inc inc inc inc
=> 7
Using k
from javascript
import k from "@fraczak/k";
let k_expression = '()';
k.run(k_expression,"ANYTHING...");
// RETURNS: "ANYTHING..."
k_expression = '{"ala" name, 23 age}';
k.run(k_expression,"ANYTHING...");
// RETURNS: {"name":"ala","age":23}
k_expression = '[.year, .age]';
k.run(k_expression,{"year":2002,"age":19});
// RETURNS: [2002,19]
k_expression = '[(), ()]';
k.run(k_expression,"duplicate me");
// RETURNS: ["duplicate me","duplicate me"]
k_expression = '[[[()]]]';
k.run(k_expression,"nesting");
// RETURNS: [[["nesting"]]]
k_expression = '[[()]] {() nested, .0.0 val}';
k.run(k_expression,"nesting and accessing");
// RETURNS: {"nested":[["nesting and accessing"]],"val":"nesting and accessing"}
k_expression = '0000';
k.run(k_expression,{"test":"parse integer"});
// RETURNS: 0
k_expression = '[.y,.x] PLUS';
k.run(k_expression,{"x":3,"y":4});
// RETURNS: 7
var k_fn = k.compile('{.name nom, <[.age, 18] GT .0, [.age, 12] GT "ado", "enfant"> age}');
k_fn({"age":23,"name":"Emily"});
// RETURNS: {"nom":"Emily","age":23}
k_fn({"age":16,"name":"Katrina"});
// RETURNS: {"nom":"Katrina","age":"ado"}
k_fn({"age":2,"name":"Mark"});
// RETURNS: {"nom":"Mark","age":"enfant"}
k_fn = k.compile('$t = < i: int, t: [ t ] > ; <$t, $int>');
k_fn(1);
// RETURNS: 1
k_fn({"i":1});
// RETURNS: {"i":1}
k_fn([{"i":2},{"i":3},{"t":[]}]);
// RETURNS: undefined
k_fn({"t":[{"i":2},{"i":3},{"t":[]}]});
// RETURNS: {"t":[{"i":2},{"i":3},{"t":[]}]}
k_fn = k.compile('$ < < [ int ] ints, [ bool ] bools > list, string None>');
k_fn({"None":"None"});
// RETURNS: {"None":"None"}
k_fn({"list":{"ints":[]}});
// RETURNS: {"list":{"ints":[]}}
k_fn({"list":{"ints":[1,2,3]}});
// RETURNS: {"list":{"ints":[1,2,3]}}