flatsql
v0.3.9
Published
SQL queries over raw FlatBuffer storage - zero conversion, real-time indexing, pure streaming
Downloads
424
Maintainers
Readme
FlatSQL
SQL queries over raw FlatBuffer storage — A streaming query engine that keeps data in native FlatBuffer format while providing SQL access via SQLite virtual tables.
Live Demo
Try FlatSQL in your browser: https://digitalarsenal.github.io/flatsql/
Installation
npm install flatsqlQuick Start
import { initFlatSQL } from 'flatsql/wasm';
// Initialize
const flatsql = await initFlatSQL();
// Create database with schema
const db = flatsql.createDatabase(`
table User {
id: int (id);
name: string;
email: string (key);
age: int;
}
`, 'myapp');
// Register file identifier for routing
db.registerFileId('USER', 'User');
// Ingest FlatBuffer data (streaming)
db.ingest(flatbufferStream);
// Query with SQL
const result = db.query('SELECT * FROM User WHERE age > 25');
console.log(result.columns, result.rows);What is FlatSQL?
FlatSQL bridges two technologies:
- FlatBuffers — Google's efficient cross-platform serialization library. Data is stored in binary format with zero-copy access (no parsing/unpacking needed).
- SQLite — The most widely deployed SQL database engine, used here only for SQL parsing and query execution.
The key insight: instead of converting FlatBuffers to SQLite rows (expensive), FlatSQL uses SQLite virtual tables to query FlatBuffer data directly. Your data stays in portable FlatBuffer format, readable by any FlatBuffer tooling, while you get SQL query capabilities.
Why FlatSQL?
Traditional approach:
FlatBuffer → Deserialize → SQLite rows → Query → Serialize → FlatBufferFlatSQL approach:
FlatBuffer → Query (via virtual table) → FlatBufferBenefits:
- Zero conversion overhead — Data stays in FlatBuffer format
- Streaming ingestion — Indexes built during data arrival, not after
- Portable output — Exported data is standard FlatBuffers, readable by any tooling
- Multi-source federation — Query across multiple FlatBuffer sources with automatic source tagging
Source Code
| Repository | Description | |------------|-------------| | digitalarsenal/flatsql | This project — FlatSQL query engine | | digitalarsenal/flatbuffers | Fork of Google FlatBuffers with WASM support | | flatc-wasm | FlatBuffer compiler running in WebAssembly |
Usage
WASM (Browser/Node.js)
The C++ engine compiles to WebAssembly for cross-platform deployment:
import { initFlatSQL } from 'flatsql/wasm';
const flatsql = await initFlatSQL();
// Create database with schema
const db = flatsql.createDatabase(`
table User {
id: int (id);
name: string;
email: string (key);
age: int;
}
`, 'myapp');
// Register file identifier routing
db.registerFileId('USER', 'User');
// Enable demo field extractors (for testing)
db.enableDemoExtractors();
// Ingest FlatBuffer stream
// Format: [4-byte size LE][FlatBuffer][4-byte size LE][FlatBuffer]...
db.ingest(streamData);
// Query with SQL
const result = db.query('SELECT id, name, email FROM User WHERE age > 25');
console.log(result.columns); // ['id', 'name', 'email']
console.log(result.rows); // [[1, 'Alice', '[email protected]'], ...]
// Export database
const exported = db.exportData();
// Cleanup
db.destroy();TypeScript (Pure JavaScript)
A TypeScript implementation for environments where WASM isn't available:
import { FlatSQLDatabase, FlatcAccessor } from 'flatsql';
import { FlatcRunner } from 'flatc-wasm';
const flatc = await FlatcRunner.init();
const schema = `
namespace App;
table User {
id: int (key);
name: string (required);
email: string;
age: int;
}
`;
const accessor = new FlatcAccessor(flatc, schema);
const db = FlatSQLDatabase.fromSchema(schema, accessor, 'myapp');
// Insert records
db.insert('User', { id: 1, name: 'Alice', email: '[email protected]', age: 30 });
db.insert('User', { id: 2, name: 'Bob', email: '[email protected]', age: 25 });
// Query
const result = db.query('SELECT name, email FROM User WHERE age > 20');
console.log(result.rows);
// Export as standard FlatBuffers
const exported = db.exportData();Native C++ (Embedded)
For performance-critical applications, link the C++ library directly:
#include <flatsql/database.h>
auto db = flatsql::FlatSQLDatabase::fromSchema(schema, "mydb");
// Register file ID routing
db.registerFileId("USER", "User");
// Set field extractor
db.setFieldExtractor("User", extractUserField);
// Ingest streaming data
size_t recordsIngested = 0;
db.ingest(data, length, &recordsIngested);
// Query
auto result = db.query("SELECT * FROM User WHERE id = 5");
for (size_t i = 0; i < result.rowCount(); i++) {
std::cout << result.getString(i, "name") << std::endl;
}Architecture
┌─────────────────────────────────────────────────────────────┐
│ FlatSQLDatabase │
├─────────────────────────────────────────────────────────────┤
│ SchemaParser │ SQLiteEngine │
│ (FlatBuffers IDL) │ (Virtual Tables) │
├─────────────────────────────────────────────────────────────┤
│ TableStore (per table) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ SQLite Indexes Field Extractors │ │
│ │ (id, email, timestamp) (getField callbacks) │ │
│ └────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ StackedFlatBufferStore (append-only) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ [Header][FB₁][FB₂][FB₃]... │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘Stream Format
FlatSQL ingests size-prefixed FlatBuffer streams:
[4-byte size LE][FlatBuffer with file_id][4-byte size LE][FlatBuffer]...The 4-byte file identifier in each FlatBuffer determines which table receives the record.
SQL Support
Supported
SELECTwith column selectionWHEREwith=,<,>,<=,>=,BETWEEN,LIKE,AND,ORORDER BY(ASC/DESC)LIMITandOFFSETCOUNT(*)aggregate- Index-accelerated queries on
(id)and(key)columns
Not Supported
JOIN(query one table at a time)GROUP BY,HAVING, most aggregatesINSERT,UPDATE,DELETE(use API methods instead)- Subqueries, CTEs, window functions
Performance
FlatSQL outperforms traditional SQLite on query operations:
| Operation | FlatSQL | SQLite | Speedup | |-----------|---------|--------|---------| | Point query (by id) | 3.50 ms | 3.93 ms | 1.1x | | Point query (by key) | 5.23 ms | 6.94 ms | 1.3x | | Direct index lookup | 1.56 ms | 3.93 ms | 2.5x | | Full scan | 0.84 ms | 1.25 ms | 1.5x | | Direct iteration | 0.05 ms | 1.25 ms | 25x |
Benchmarks: 10,000 records, 10,000 query iterations, Apple M3 Ultra
Performance Trade-offs
FlatSQL uses SQLite's virtual table (VTable) API to expose FlatBuffer data as queryable tables. This architecture enables SQL queries over raw binary data, but comes with fundamental trade-offs that affect performance characteristics.
Access Paths
FlatSQL provides three access paths with different performance profiles:
| Access Path | Latency | Throughput | Use Case | |-------------|---------|------------|----------| | Zero-Copy API | 1.7 µs | 580K ops/sec | Direct index lookup, returns raw FlatBuffer pointer | | VTable SQL | 12.9 µs | 78K ops/sec | Full SQL queries via SQLite | | Pure SQLite | 2.5 µs | 400K ops/sec | Baseline comparison |
Why VTable Queries Are Slower Than Pure SQLite
SQLite's VTable API has fundamental limitations that prevent FlatSQL from matching pure SQLite query performance:
Per-Column Extraction — The
xColumn()callback is invoked once per column per row. There is no batch API to extract multiple fields at once, meaning each field access has function call overhead.Mandatory Value Conversion — All values must be converted to SQLite's internal format via
sqlite3_result_*()functions. Even though FlatBuffers already store data in an efficient binary format, we must convert strings to SQLite strings, integers to SQLite integers, etc.Row-by-Row Processing — The
xNext()callback advances one row at a time with no vectorized or batch iteration. This prevents SIMD optimizations or processing multiple records per call.No Direct Memory Access — SQLite cannot read FlatBuffer memory directly; all data must flow through the VTable callback interface, adding overhead for every field access.
When to Use Each Access Path
Use the Zero-Copy API when:
- You need maximum throughput for point lookups
- You're building hot paths that query by indexed keys
- You can work directly with FlatBuffer data structures
// Zero-copy: returns pointer to raw FlatBuffer (1.7 µs)
const uint8_t* data = db.findRawByIndex("User", "email", email, &len);
auto user = GetUser(data); // Direct FlatBuffer accessUse VTable SQL when:
- You need complex queries (filtering, sorting, aggregation)
- Query flexibility is more important than raw speed
- You're doing ad-hoc exploration of the data
// VTable SQL: full query capability (12.9 µs)
auto result = db.query("SELECT * FROM User WHERE age > 25 ORDER BY name");Architecture: SQLite-Backed Indexes
FlatSQL uses SQLite's highly optimized B-tree for indexing (not a custom implementation). This provides:
- Battle-tested performance — SQLite's B-tree is used by billions of devices
- Consistent behavior — Same indexing code path as pure SQLite
- Fast path optimization — Type-specific lookups bypass
std::variantoverhead
The index stores (key, sequence) → (offset, length) mappings, allowing O(log n) lookups that return pointers directly into the FlatBuffer storage.
Trade-off Summary
| Aspect | FlatSQL | Pure SQLite | |--------|---------|-------------| | Point lookup (indexed) | 1.4x faster (zero-copy API) | Baseline | | SQL queries | 5-7x slower (VTable overhead) | Baseline | | Storage format | FlatBuffers (portable, zero-copy) | SQLite pages | | Data conversion | None (zero-copy) or on-demand | Always required | | Streaming ingest | Append-only, real-time indexing | Row-by-row inserts |
Bottom line: FlatSQL excels when you need streaming ingestion of FlatBuffer data with SQL query capability. Use the zero-copy API for performance-critical lookups; use SQL for complex queries where flexibility matters more than speed.
Building from Source
Prerequisites
- Node.js 18+
- CMake 3.20+ (for WASM builds)
- Emscripten (for WASM builds)
TypeScript Build
npm install
npm run build
npm testWASM Build
# Clone DA-FlatBuffers (required dependency)
git clone https://github.com/DigitalArsenal/flatbuffers.git ../flatbuffers
# Build WASM
cd cpp
emcmake cmake -B build-wasm -DCMAKE_BUILD_TYPE=Release
cmake --build build-wasmOutput: wasm/flatsql.js and wasm/flatsql.wasm
Run Demo Locally
npm run serve
# Open http://localhost:8081API Reference
initFlatSQL()
import { initFlatSQL } from 'flatsql/wasm';
const flatsql = await initFlatSQL();createDatabase(schema, name)
const db = flatsql.createDatabase(schemaString, 'dbname');db.registerFileId(fileId, tableName)
db.registerFileId('USER', 'User'); // Route "USER" FlatBuffers to User tabledb.ingest(data, source?)
const bytesConsumed = db.ingest(uint8ArrayStream);
// Or with source tagging (requires registerSource first):
const bytesConsumed = db.ingest(uint8ArrayStream, 'satellite-1');db.query(sql)
const result = db.query('SELECT * FROM User WHERE age > 25');
// result.columns: string[]
// result.rows: any[][]db.exportData()
const data = db.exportData(); // Returns Uint8Arraydb.registerSource(sourceName)
db.registerSource('satellite-1'); // Creates User@satellite-1, Post@satellite-1, etc.db.createUnifiedViews()
db.createUnifiedViews(); // Creates unified views with _source columndb.listSources()
const sources = db.listSources(); // ['satellite-1', 'satellite-2', ...]Multi-Source Queries
FlatSQL supports federating multiple data sources with the same schema. Each source gets its own set of tables, and you can query them individually or across all sources.
Use Case
Imagine you have multiple satellites streaming telemetry data with the same schema:
// Register sources
db.registerSource('satellite-1');
db.registerSource('satellite-2');
db.registerSource('ground-station');
// Register file IDs and extractors (must be done after registerSource)
db.registerFileId('TELE', 'Telemetry');
db.enableDemoExtractors();
// Create unified views (call once after all sources registered)
db.createUnifiedViews();
// Ingest from different sources
db.ingest(satellite1Stream, 'satellite-1');
db.ingest(satellite2Stream, 'satellite-2');
db.ingest(groundStream, 'ground-station');
// Query a specific source
db.query('SELECT * FROM "Telemetry@satellite-1" WHERE signal > 50');
// Query across all sources (unified view)
db.query('SELECT * FROM Telemetry WHERE timestamp > 1000');Table Naming Convention
- Source-specific tables:
TableName@sourceName(e.g.,User@siteA,Telemetry@satellite-1) - Unified views:
TableName(e.g.,User,Telemetry) - combines all source tables with a_sourcecolumn
The _source Column
Unified views include a _source column that identifies which source each row came from:
-- See source for each record
SELECT _source, id, name FROM User LIMIT 10;
-- Count records by source
SELECT _source, COUNT(*) as count FROM User GROUP BY _source;
-- Filter by source in unified view
SELECT * FROM User WHERE _source = 'satellite-1';License
Apache 2.0
Contributing
Contributions welcome. Please open an issue first to discuss significant changes.
Contact
For questions, licensing inquiries, or commercial support: [email protected]
Built on DA-FlatBuffers and SQLite.
