@quereus/quereus
v0.6.7
Published
Federated SQL query processor
Downloads
1,165
Readme
Quereus - A TypeScript SQL Query Processor
Quereus is a feature-complete SQL query processor specifically designed for efficient in-memory data processing with a strong emphasis on the virtual table interface. It provides rich SQL query and constraint capabilities (joins, aggregates, subqueries, CTEs, window functions, constraints) over data sources exposed via the virtual table mechanism. Quereus features a modern type system with temporal types, JSON support, and plugin-extensible custom types. It has no persistent file storage, though one could be built as a virtual table module.
Project Goals
- Virtual Table Centric: Provide a robust and flexible virtual table API as the primary means of interacting with data sources. All tables are virtual tables.
- In-Memory Focus: Includes a comprehensive in-memory virtual table implementation (
MemoryTable) with support for transactions and savepoints. - Modern Type System: Extensible logical/physical type separation with built-in temporal types (DATE, TIME, DATETIME), native JSON type with deep equality comparison, and plugin support for custom types. See Type System Documentation.
- TypeScript & Modern JS: Leverage TypeScript's type system and modern JavaScript features and idioms.
- Async VTab Operations: Virtual table data operations (reads/writes) are asynchronous. Cursors are implemented as async iterables.
- Cross-Platform: Target diverse Javascript runtime environments, including Node.js, browser, and React Native. Plugin loading (via
@quereus/plugin-loader) uses dynamicimport()and is not compatible with React Native; use static imports for RN. - Minimal Dependencies: Avoid heavy external dependencies where possible.
- SQL Compatibility: Comprehensive support for modern SQL features including joins, window functions, subqueries, CTEs, constraints, views, and advanced DML/DDL operations.
- Key-Based Addressing: All tables are addressed by their defined Primary Key. The concept of a separate, implicit
rowidfor addressing rows is not used. - Third Manifesto Friendly: Embraces some of the principles of the Third Manifesto, such as allowing for empty keys. Utilizes algebraic planning.
Architecture Overview
Quereus is built on a modern architecture based on partially immutable PlanNodes and an Instruction-based runtime with robust attribute-based context system.
- SQL Input: The process starts with a SQL query string.
- Parser (
src/parser):- Lexer (
lexer.ts): Tokenizes the raw SQL string. - Parser (
parser.ts): Builds an Abstract Syntax Tree (AST).
- Lexer (
- Planner (
src/planner):- Traverses the AST to construct a tree of immutable
PlanNodeobjects representing the logical query structure. - Handles Common Table Expressions (CTEs) and Subqueries by converting them into relational
PlanNodes. - Resolves table and function references using the Schema Manager.
- Performs query planning, incorporating virtual table
getBestAccessPlanmethod and table schema statistics. - Optimizer (
src/planner/optimizer): Transforms logical plans into efficient physical execution plans through a rule-based optimization system. See Optimizer Documentation for details.
- Traverses the AST to construct a tree of immutable
- Runtime (
src/runtime):- Emitters (
src/runtime/emitters.ts,src/runtime/emit/): TranslatePlanNodes into a graph ofInstructionobjects. - Scheduler (
src/runtime/scheduler.ts): Manages the execution flow of theInstructiongraph. - Instructions: JavaScript functions that operate on
RuntimeValues (which can beSqlValueorAsyncIterable<Row>). Async parameters are awaited. - Invokes virtual table methods (e.g.,
querywhich returnsAsyncIterable<Row>,update) to interact with data. - Calls User-Defined Functions (UDFs) and aggregate functions.
- Handles transaction and savepoint control.
- Emitters (
- Virtual Tables (
src/vtab):- The core data interface. Modules implement
VirtualTableModule. MemoryTable(vtab/memory/table.ts) is a key implementation usingdigitree.
- The core data interface. Modules implement
- Schema Management (
src/schema): Manages schemas, tables, columns, functions. - User-Defined Functions (
src/func): Support for custom JS functions in SQL. - Core API (
src/core):Database,Statementclasses.
Source File Layout
The project is organized into the following main directories:
src/common: Foundational types, constants, and error classes.src/core: High-level API classes (Database,Statement).src/parser: SQL lexing, parsing, and AST definitions.src/planner: Building and optimizingPlanNodetrees from AST.src/runtime: Emission, scheduling, and execution of runtimeInstructions.src/schema: Management of database schemas.src/vtab: Virtual table interface and implementations (includingmemory).src/func: User-defined functions.src/util: General utility functions.docs: Project documentation.
Logging
Quereus uses the debug library for internal logging. This allows for fine-grained control over log output based on namespaces, which correspond to the different modules of the system (e.g., planner, runtime, vtab:memory).
To enable logging during development or troubleshooting, set the DEBUG environment variable. Examples:
# Enable all Quereus logs
DEBUG=quereus:*
# Enable all virtual table logs
DEBUG=quereus:vtab:*
# Enable VDBE runtime logs and any warnings/errors from other modules
DEBUG=quereus:runtime,quereus:*:warn,quereus:*:error
# Enable everything EXCEPT verbose runtime logs
DEBUG=*,-quereus:runtimeDeveloper Usage
To add logging within a module:
Import the logger factory:
import { createLogger } from '../common/logger.js'; // Adjust path as needed const log = createLogger('my-module:sub-feature');Log messages: Use the logger instance like
console.log, utilizing format specifiers (%s,%d,%j,%O) for better performance and readability.log('Processing item ID %d', itemId); log('Current state: %O', complexObject);(Optional) Create specialized loggers for levels: You can use
.extend()for specific levels like warnings or errors, which allows finer control via theDEBUGvariable.const warnLog = log.extend('warn'); const errorLog = log.extend('error'); warnLog('Potential issue detected: %s', issueDescription); if (errorCondition) { errorLog('Operation failed: %O', errorObject); // It's often still good practice to throw an actual Error here }
Instruction Tracing
For detailed runtime analysis and debugging, Quereus provides an instruction tracing system. Set a tracer on the database instance, and all statement executions will be traced:
import { Database, CollectingInstructionTracer } from 'quereus';
const db = new Database();
const tracer = new CollectingInstructionTracer();
// Enable tracing for all database operations
db.setInstructionTracer(tracer);
// Execute statements - they will automatically be traced
await db.exec("CREATE TABLE users (id INTEGER, name TEXT)");
await db.exec("INSERT INTO users VALUES (:id, :name)", { id: 1, name: "Alice" });
const users = await db.prepare("SELECT * FROM users").all();
// Analyze the trace
const events = tracer.getTraceEvents();
console.log(`Executed ${events.length} instruction operations`);
// Disable tracing
db.setInstructionTracer(null);The CollectingInstructionTracer captures detailed information about each instruction execution, including inputs, outputs, timing, and any sub-programs. This is particularly useful for performance analysis and debugging complex query plans.
Type Representations
Quereus uses native JavaScript types for SQL values. Understanding these mappings is essential when working with query results and parameters:
| SQL Type | JavaScript Type | Example |
|----------|----------------|---------|
| NULL | null | null |
| INTEGER | number or bigint | 42, 9007199254740992n |
| REAL | number | 3.14 |
| TEXT | string | "hello" |
| BLOB | Uint8Array | new Uint8Array([1, 2, 3]) |
| DATE | string | "2024-01-15" |
| TIME | string | "14:30:00" |
| DATETIME | string | "2024-01-15T14:30:00" |
| TIMESPAN | string | "PT1H30M" (ISO 8601 duration) |
Important Notes:
- Dates and times are always strings in ISO 8601 format (e.g.,
date('now')returns"2024-01-15") - Timespans are ISO 8601 duration strings (e.g.,
"PT1H30M"for 1 hour 30 minutes) - BLOBs are
Uint8Arraytyped arrays for binary data - Large integers use
bigintwhen they exceed JavaScript's safe integer range (±2^53 - 1) - Query results are objects with column names as keys:
{ id: 1, name: "Alice" } - Rows are internally arrays (
SqlValue[]) but the API returns objects for convenience - Results are async iterators - use
for awaitto stream rows without loading all into memory - Multi-statement
evalreturns only the last statement's results
See the Usage Guide for detailed information on type handling, row representation, async iteration, and best practices.
React Native
Quereus works in React Native with two considerations:
structuredClonePolyfill: React Native doesn't providestructuredClone. Install a polyfill like@ungap/structured-cloneand apply it before importing Quereus:import structuredClone from '@ungap/structured-clone'; if (typeof globalThis.structuredClone === 'undefined') { globalThis.structuredClone = structuredClone; } import { Database } from 'quereus';Static Plugin Loading: Dynamic
import()isn't supported in React Native. Register plugins statically instead. See Plugin System - Static Registration for details.
Documentation
- Usage Guide: Detailed usage examples, type representations, and API reference
- SQL Reference Guide: Detailed SQL reference guide (includes Declarative Schema)
- Functions: Details on the built-in functions
- Window Function Architecture: Details on the window function architecture and implementation.
- Memory Tables: Implementation details of the built-in MemoryTable module
- Date/Time Handling: Details on date/time parsing, functions, and the Temporal API.
- Runtime: Details on the runtime and opcodes.
- Error Handling: Details on the error handling and status codes.
- Plugin System: Complete guide to creating and using plugins for virtual tables, functions, and collations.
- TODO List: Planned features and improvements
Key Design Decisions
- Federated / VTab-Centric: All tables are virtual tables.
- Async Core: Core operations are asynchronous. Cursors are
AsyncIterable<Row>. - Key-Based Addressing: Rows are identified by their defined Primary Key. No separate implicit
rowid. - Relational Orthogonality: Any statement that results in a relation can be used anywhere that expects a relation value, including mutating statements with RETURNING clauses.
- Declarative Schema (Optional): Keep using DDL normally. Optionally use order‑independent
declare schema { ... }to describe end‑state; the engine computes diffs against current state using module‑reported catalogs and emits canonical DDL. You may auto‑apply viaapply schemaor fetch the DDL and run it yourself (enabling custom backfills). Supports seeds, imports (URL + cache), versioning, and schema hashing. Destructive changes require explicit acknowledgement. - JavaScript Types: Uses standard JavaScript types (
number,string,bigint,boolean,Uint8Array,null) internally. - Object-Based API: Uses classes (
Database,Statement) to represent resources with lifecycles, rather than handles. - Transient Schema: Schema information is primarily in-memory; persistence is not a goal. Emission of schema SQL export is supported.
- Bags vs Sets Distinction: Explicit type-level distinction between relations that guarantee unique rows (sets) and those that allow duplicates (bags), enabling sophisticated optimizations and maintaining algebraic correctness in line with Third Manifesto principles.
- Attribute-Based Context System: Robust column reference resolution using stable attribute IDs eliminates architectural fragilities and provides deterministic context lookup across plan transformations.
Key Design Differences
While Quereus supports standard SQL syntax, it has several distinctive design choices:
- Modern Type System: Uses logical/physical type separation instead of SQLite's type affinity model. Includes native temporal types (DATE, TIME, DATETIME) and JSON type with deep equality comparison. Conversion functions (
integer(),date(),json()) are preferred over CAST syntax. See Type System Documentation. - Virtual Table Centric: Uses
CREATE TABLE ... USING module(...)syntax. All tables are virtual tables. - Default NOT NULL Columns: Following Third Manifesto principles, columns default to NOT NULL unless explicitly specified otherwise. This behavior can be controlled via
pragma default_column_nullability = 'nullable'to restore SQL standard behavior. - No Rowids: All tables are addressed by their Primary Key. When no explicit PRIMARY KEY is defined, Quereus includes all columns in the primary key.
- Async API: Core execution is asynchronous with async/await patterns throughout.
- No Triggers or Built-in Persistence: Persistent storage can be implemented as a VTab module.
Row-Level Constraints
- Row-level CHECKs that reference only the current row are enforced immediately.
- Row-level CHECKs that reference other tables (e.g., via subqueries) are automatically deferred and enforced at COMMIT using the same optimized engine as global assertions. No
DEFERRABLEorSET CONSTRAINTSmanagement is required by the user. - Determinism Enforcement: CHECK constraints and DEFAULT values must use only deterministic expressions. Non-deterministic values (like
datetime('now')orrandom()) must be passed via mutation context to ensure captured statements are replayable. See Runtime Documentation.
Current Status
Quereus is a feature-complete SQL query processor with a modern planner and instruction-based runtime architecture. The engine successfully handles complex SQL workloads including joins, window functions, subqueries, CTEs, constraints, and comprehensive DML/DDL operations.
Current capabilities include:
- Modern Type System - Temporal types (DATE, TIME, DATETIME), JSON with deep equality, plugin-extensible custom types
- Complete JOIN support - INNER, LEFT, RIGHT, CROSS joins with proper NULL padding
- Advanced window functions - Ranking, aggregates, and frame specifications
- Full constraint system - NOT NULL, CHECK constraints with operation-specific triggers
- Comprehensive subqueries - Scalar, correlated, EXISTS, and IN subqueries
- Relational orthogonality - INSERT/UPDATE/DELETE with RETURNING can be used as table sources
- Complete set operations - UNION, INTERSECT, EXCEPT with proper deduplication
- DIFF (symmetric difference) -
A diff Bequals(A except B) union (B except A), handy for table equality checks vianot exists(A diff B) - Robust transaction support - Multi-level savepoints and rollback
- Rich built-in function library - Scalar, aggregate, window, JSON, and date/time functions
Optimizer Status:
Quereus features a sophisticated rule-based query optimizer that transforms logical plans into efficient physical execution plans. The optimizer uses a single plan node hierarchy with logical-to-physical transformation, generic tree rewriting infrastructure, and comprehensive optimization rules including constant folding, intelligent caching, and streaming aggregation.
See the Optimizer Documentation for architecture details and Optimizer Conventions for development guidelines. TODO List has remaining priorities.
Recent changes:
- Retrieve growth and push-down stabilized: query-based modules slide full nodes via
supports(); index-style fallback injects supported-only fragments insideRetrieve, preserving residuals above. - Retrieve logical properties now expose
bindingsCountandbindingsNodeTypes(visible inquery_plan().properties) to aid verification that parameters/correlations are captured.
Testing
The tests are located in test/*.spec.ts and are driven by Mocha with ts-node/esm.
yarn testQuereus employs a multi-faceted testing strategy:
SQL Logic Tests (
test/logic/):- Inspired by SQLite's own testing methodology.
- Uses simple text files (
*.sqllogic) containing SQL statements and their expected JSON results (using→marker) or expected error messages (using-- error:directive). - Driven by a Mocha test runner (
test/logic.spec.ts) that executes the SQL against a freshDatabaseinstance for each file. - Configurable Diagnostics: On unexpected failures, the test runner provides clean error messages by default with optional detailed diagnostics controlled by command line arguments:
yarn test --verbose- Show execution progress during testsyarn test --show-plan- Include concise query plan in diagnosticsyarn test --plan-full-detail- Include full detailed query plan (JSON format)yarn test --plan-summary- Show one-line execution path summaryyarn test --expand-nodes node1,node2...- Expand specific nodes in concise planyarn test --max-plan-depth N- Limit plan display depthyarn test --show-program- Include instruction program in diagnosticsyarn test --show-stack- Include full stack trace in diagnosticsyarn test --show-trace- Include execution trace in diagnosticsyarn test --trace-plan-stack- Enable plan stack tracing in runtime- Environment variables (deprecated):
QUEREUS_TEST_VERBOSE=true,QUEREUS_TEST_SHOW_PLAN=true, etc.
- This helps pinpoint failures at the Parser, Planner, or Runtime layer while keeping output manageable.
- Provides comprehensive coverage of SQL features: basic CRUD, complex expressions, all join types, window functions, aggregates, subqueries, CTEs, constraints, transactions, set operations, views, and error handling.
Property-Based Tests (
test/property.spec.ts):- Uses the
fast-checklibrary to generate a wide range of inputs for specific, tricky areas. - Focuses on verifying fundamental properties and invariants that should hold true across many different values.
- Currently includes tests for:
- Collation Consistency: Ensures
ORDER BYresults match the behavior of thecompareSqlValuesutility forBINARY,NOCASE, andRTRIMcollations across various strings. - Numeric Affinity: Verifies that comparisons (
=,<) in SQL handle mixed types (numbers, strings, booleans, nulls) consistently with SQLite's affinity rules, usingcompareSqlValuesas the reference. - JSON Roundtrip: Confirms that arbitrary JSON values survive being processed by
json_quote()andjson_extract('$')without data loss or corruption.
- Collation Consistency: Ensures
- Uses the
Performance Sentinels (Planned):
- Micro-benchmarks for specific scenarios (e.g., bulk inserts, complex queries) to catch performance regressions.
CI Integration (Planned):
- Utilize GitHub Actions (or similar) to run test suites automatically, potentially with different configurations (quick checks, full runs, browser environment).
This layered approach aims for broad coverage via the logic tests while using property tests to explore edge cases in specific subsystems more thoroughly.
Supported Built-in Functions
- Scalar:
lower,upper,length,substr/substring,abs,round,coalesce,nullif,like,glob,typeof - Aggregate:
count,sum,avg,min,max,group_concat,json_group_array,json_group_object - Window Functions: Complete implementation with
row_number,rank,dense_rank,ntile(ranking);count,sum,avg,min,maxwith OVER clause (aggregates); Full frame specification support (ROWS BETWEEN,UNBOUNDED PRECEDING/FOLLOWING);NULLS FIRST/LASTordering - Date/Time:
date,time,datetime,julianday,strftime(supports common formats and modifiers) - JSON:
json_valid,json_schema,json_type,json_extract,json_quote,json_array,json_object,json_insert,json_replace,json_set,json_remove,json_array_length,json_patch - Query Analysis:
query_plan,scheduler_program,execution_trace(debugging and performance analysis)
