@linely-io/extractly
v1.2.0
Published
Playbook-driven extraction engine: DOM actions, fetch, transforms, and loops for structured data from pages.
Maintainers
Readme
name: extractly package: '@linely-io/extractly' runtime: node: '>=24' module: ESM entrypoint: './index.js' exports:
- Engine
- EventEmitter
- actionRegistry
- taskPlanFromPlaybook
- taskPlanFromDefinition
- defaultBrowserEnv
- ExpressionEvaluator
- validatePlan
- validateAction
playbook:
schemaFile: './playbook.schema.json'
acceptedFormats:
- json
- yaml actionSyntax:
- object
- stringFlags
- arrayTuple actions:
- fetch
- stop
- waitfor.timeout
- waitfor.el
- doc.use
- doc.select
- doc.selectall
- doc.exists
- doc.style.print
- interact.click
- interact.type
- interact.scroll
- localstorage.get
- nextdata.find
- screenshot
- code.run
What it is
extractly is a playbook-driven extraction engine: you define a playbook (JSON/YAML) with actions like DOM selection, fetch, loops, transforms, and conditionals, then run it through Engine.
- Runtime: ESM-only, Node 24+
Install
npm install @linely-io/extractlyQuickstart
import { Engine, taskPlanFromPlaybook, defaultBrowserEnv } from '@linely-io/extractly';
const env = defaultBrowserEnv();
const engine = new Engine(env);
const playbook = {
id: 'example',
vars: { name: 'Bobby' },
actions: [
{ id: 'name', run: '$ vars.name' },
{ id: 'return', run: '$ results.name' },
],
};
const plan = taskPlanFromPlaybook(playbook);
const execution = await engine.run(plan);
console.log(execution.toResultValue());Playbook format (human + machine readable)
- Machine-readable schema: see
playbook.schema.json(JSON Schema). - Human-readable guide: the rest of this README describes the same fields and behavior as the schema.
Root fields
id(string, optional): identifier for the playbook (defaults to"root").version(string, optional): freeform version tag.vars(object, optional): playbook-level variables (merged intovarsscope).definitions(object, optional): named nested playbooks.actions(array, required): a list of action definitions (see below).
Action definition syntax
Actions can be expressed in three equivalent forms:
- Object form (most explicit)
- String flags form (compact)
- Array tuple form (compact)
Object form:
{
"id": "title",
"run": "doc.select",
"params": { "selector": "h1" },
"transform": "t?.textContent"
}String flags form (parsed by parseFlags):
run=doc.select params=h1 transform=`t?.textContent`Array tuple form:
["doc.select", { "selector": "h1" }]Execution data available to expressions
Expressions (e.g. in when, transform, $ ...) can reference:
vars: merged variable scope (playbook vars + action vars + inherited vars)results: results keyed by action id (or auto ids when noidis provided)t: the “current value” for transforms and some control-flow expressions- Environment data: values provided by the environment (e.g. browser/document when available)
Control flow
when(string): expression; action is skipped when falsey.loop(array|string|object): iterates over items; per-iteration data is available asvars.loop.itemandvars.loop.index.loopOptions(object): supports{ "parallel": boolean, "limit": number, "transform": string }.until(string): repeats an action until the expression becomes truthy (orstopis returned).transform(string): expression applied to the result ofrun/loop/until(receives the prior result ast).
Actions (registered run values)
These are the canonical run strings registered in actionRegistry:
fetch: HTTP fetch (supportsdecoderand optionalpagerflow).code.run: evaluate an expression (also available via{ run: "$ <expr>" }shorthand).stop: returns a special value to signal “stop” (notably used byuntil).waitfor.timeout: sleep fortimeout/ms.waitfor.el: wait until a DOM selector exists (browser contexts).doc.use: set the active document/root for subsequent DOM actions.doc.select:querySelector.doc.selectall:querySelectorAll(returns array).doc.exists: boolean selector existence check.doc.style.print: debug/print computed style information (browser contexts).interact.click: click a selector (browser only).interact.type: type/set value in an input (browser only).interact.scroll: scroll to top/bottom (browser only).localstorage.get: read a key fromlocalStorage(browser only).nextdata.find: extract__NEXT_DATA__(browser contexts).screenshot: take a screenshot (browser contexts).
Example: calling a nested definition (run: "playbook")
{
"id": "example",
"vars": { "name": "Bobby" },
"definitions": {
"invoice": {
"actions": [
{ "id": "customerName", "run": "$ vars.name" },
{ "id": "return", "run": "$ results.customerName" }
]
}
},
"actions": [
{ "id": "invoiceData", "run": "playbook", "params": ["invoice", { "name": "Bobby" }] }
]
}