retold-facto
v1.0.2
Published
Data warehouse and knowledge graph storage for the Retold ecosystem.
Maintainers
Readme
Retold Facto
A data warehouse and knowledge graph for the Retold ecosystem. Facto ingests records from arbitrary sources, tracks their provenance and certainty, compiles them into projections via declarative mappings, and deploys those projections to any Meadow-supported backend. It runs as a standalone REST server, a Pict web application, and -- optionally -- as an Ultravisor beacon exposing its ingest / transform / deploy operations as workflow capabilities.
Features
- Records + Certainty -- Every ingested record carries source provenance, schema version, ingest-job lineage, and a configurable
CertaintyIndexentry so downstream queries can filter on confidence - Ingest Engine -- Batch ingests from CSV, JSON, folder scans, or direct API calls; tracks
IngestJobstatus, dedupes with content signatures, and auto-increments dataset versions - Projection Engine -- Compiles raw records into flat, denormalized projections using declarative
MappingsJSON and five built-in merge strategies (WriteAll, FirstWriteWins, ReliabilityOverwrite, MergeAndReinforce, FieldFillOnly) - Connection Manager -- First-class support for SQLite, MySQL, PostgreSQL, and MSSQL projection targets via Meadow connectors; masked-password safe API
- Mapping DSL --
Entity + GUIDTemplate + MappingsJSON descriptors drivemeadow-integration'sTabularTransformfor flattening, comprehension, and de-duplication - Source Catalog -- Research-grade catalog with
SourceCatalogEntry,CatalogDatasetDefinition, and a folder scanner that discovers datasets fromREADME.mdfiles - Multi-Entity Web UI -- Two Pict browser applications (
pict-appandpict-app-full) provide source, dataset, record, projection, mapping, and connection management - Ultravisor Beacon -- Optional beacon mode exposes three capabilities (
FactoData,FactoTransform,FactoDeploy) so workflows can orchestrate Facto remotely - Meadow Native -- Schema is defined in a single stricture JSON file; REST endpoints for every entity come for free via
meadow-endpoints - Orator Native -- Built on the standard Retold Orator + Restify stack; every subsystem exposes its own REST surface
Installation
npm install retold-facto
# or globally for the CLI
npm install -g retold-factoQuick Start
# Initialize the default SQLite schema
retold-facto init
# Start the server (default :8386)
retold-facto serve
# Start on a custom port with a custom database
retold-facto serve --port 9000 --db /var/data/facto.sqlite
# Scan a folder of README-based dataset definitions
retold-facto scan ./my-dataOpen http://localhost:8386/ for the web UI, or hit the REST API at /1.0/* (auto-generated Meadow CRUD) and /facto/* (subsystem endpoints).
CLI
retold-facto <command> [options]
Commands:
serve [default] Start the REST API + Pict web UI
init Create the default schema (21 tables)
ingest <file> [dataset-id] [source-id] [type]
Parse and ingest a CSV/JSON file
source list | add Source CRUD shortcuts
dataset list | add Dataset CRUD shortcuts
scan <folder> Discover datasets from README-annotated folders
scan provision <folder> Provision discovered datasets into Facto
scan ingest <folder> Ingest discovered datasets end-to-end
Options:
-c, --config <file> JSON configuration file
-p, --port <port> API server port (default 8386)
-d, --db <path> SQLite database path (default ./data/facto.sqlite)
-s, --scan-path <path> Add a scan path (repeatable)
-l, --log [path] Write a log fileSubsystems
Facto is composed of twelve service managers layered over Meadow. Each one owns a subset of the schema and a subset of the REST surface:
| Subsystem | Purpose | Docs |
|---|---|---|
| Recordset | Records, CertaintyIndex, IngestJob lifecycle | docs/subsystems/recordset.md |
| Projection | MultiSetProjection, ProjectionStore, merge strategies, deployment | docs/subsystems/projection.md |
| Mapping | Entity + GUIDTemplate + Mappings transform descriptors | docs/subsystems/mapping.md |
| Connection | External database connections for projection targets | docs/subsystems/connection.md |
| Audit | Timestamped CRUD columns, ingest job logs, certainty logs | docs/subsystems/audit.md |
Ultravisor Integration
Facto can register as an Ultravisor beacon and expose its operations as workflow capabilities:
FactoData-- Source / Dataset / Record / IngestJob / ProjectionStore CRUDFactoTransform-- Apply a mapping to a batch of records (pure function, no side effects)FactoDeploy-- Deploy a projection schema to an external store
In a typical Retold deployment, Ultravisor orchestrates pipelines that dispatch these capabilities to one or more Facto beacons running close to their data sources. See docs/ultravisor-integration.md for the full beacon contract and workflow patterns.
Facto also runs perfectly well without Ultravisor -- beacon mode is optional.
Documentation
Testing
npm test # Mocha TDD unit tests
npm run test-browser # Puppeteer headless browser testsBuilding
npm run build
npm run build-codemirrorRelated Packages
- meadow -- ORM / query DSL
- meadow-endpoints -- auto-generated REST CRUD
- meadow-integration --
TabularTransformandCertaintyAccumulatorused by the projection engine - orator -- REST server framework
- pict -- MVC framework for the web UI
- stricture -- schema definition language (MicroDDL)
- ultravisor -- workflow orchestrator (optional beacon target)
- ultravisor-beacon -- beacon protocol client
- bibliograph -- (dependency; reserved for richer audit logging)
License
MIT
Contributing
Pull requests welcome. See the Retold Contributing Guide for the code of conduct, contribution process, and testing requirements.
