@colql/colql

v0.5.0

Published

a day ago

Memory-efficient in-memory columnar query engine for TypeScript

0High
0Medium
0Low

columnar in-memory query typescript analytics database

ColQL is a zero-dependency, in-memory columnar query engine for TypeScript apps that need compact process-local storage, typed schemas, explicit indexes, and safe mutations.

It is not a SQL database or persistence layer. ColQL is for data you already want to keep inside a Node.js process.

Why ColQL?

Compact columnar storage backed by typed arrays, dictionaries, and bit-packed booleans
Lazy queries with filtering, projection, aggregation, streaming, limit, and offset
Object predicates plus tuple-style where(column, operator, value)
Explicit equality indexes and sorted numeric indexes for hot predicates
Unique indexes for stable ID lookups and duplicate-key protection
Public query.explain() diagnostics for planner visibility without executing queries
JS Array migration helpers such as fromRows, firstWhere, countWhere, and exists
Mutable tables with updateMany and deleteMany
Runtime validation with structured ColQLError codes
Binary serialization for table data
TypeScript inference for rows, predicates, projections, and mutation payloads
Zero runtime dependencies

Install

npm install @colql/colql

Quick Example

import { column, table } from "@colql/colql";

const users = table({
  id: column.uint32(),
  age: column.uint8(),
  status: column.dictionary(["active", "passive", "archived"] as const),
  score: column.float64(),
  verified: column.boolean(),
});

users.insertMany([
  { id: 1, age: 29, status: "active", score: 91.5, verified: true },
  { id: 2, age: 17, status: "passive", score: 72.0, verified: false },
  { id: 3, age: 44, status: "active", score: 88.2, verified: true },
]);

users.createIndex("status");
users.createSortedIndex("age");
users.createUniqueIndex("id");

const activeAdults = users
  .where({
    status: "active",
    age: { gte: 18 },
  })
  .select(["id", "age", "score"])
  .toArray();

console.log(
  users.where({ status: "active", age: { gte: 18 } }).select(["id"]).explain(),
);

const result = users.updateMany(
  { status: "passive", age: { lt: 18 } },
  { status: "archived" },
);

console.log(activeAdults);
console.log(result.affectedRows);

console.log(users.findBy("id", 1));

Performance Snapshot

ColQL includes a Fastify example that can boot with 1M deterministic rows and exercise indexed, range, scan, callback-filter, mutation, stress, and memory paths.

These are local reference numbers from examples/fastify-api after the example's mutation validation path, not guarantees. Actual numbers vary by Node.js version, CPU, memory pressure, data distribution, query selectivity, projection size, and mutation frequency.

| 1M-row workload | Local avg | Expected shape | |---|---:|---| | Selective equality query with createIndex() | 2.06ms | Fastest path when candidate sets are small | | Numeric range query with createSortedIndex() | 27.69ms | Helps selective ranges; broad ranges may resemble scans | | Broad structured predicate | 25.31ms | May intentionally scan when an index is not selective | | filter(fn) callback predicate | 218.93ms | Full-scan escape hatch; not index-aware |

ColQL also includes a JS Array comparison benchmark. It is meant to show tradeoffs, not to prove that ColQL is always faster. In local 1M-row runs, JS arrays are often faster for simple broad scans and callback predicates, while ColQL is strongest when compact storage, projection with limits, structured predicates, or explicit indexed lookups matter.

Run the example locally:

cd examples/fastify-api
npm install
npm run test:large

For benchmark scripts and interpretation notes, see Performance and Benchmarks. For PR-level regression checks, run npm run bench:codspeed; CodSpeed uses smaller deterministic datasets as regression signals, not absolute throughput claims. Setup is excluded where possible, while destructive mutation benchmarks that need fresh tables use setup-inclusive names. Larger 1M-row comparisons remain covered by the local/manual benchmark scripts. For JS Array comparisons, run npm run benchmark:array-comparison; results are local guidance, not universal promises or CI requirements. For a scenario-style local workload, run npm run benchmark:session-analytics.

When To Use ColQL

Use ColQL when:

you need to keep thousands to millions of records in memory
JavaScript object arrays use too much memory
filters and aggregations should avoid intermediate arrays
a TypeScript schema can describe your columns
explicit indexes are acceptable for hot equality or range predicates
stable identity can be modeled with an explicit ID column and unique index
runtime validation matters because data may come from untyped sources

Avoid ColQL when:

you need durable storage, transactions, joins, or SQL
data must be shared across pods, workers, processes, or machines
writes dominate the workload and broad indexes are frequently dirtied
row indexes must be stable external identifiers
a small/simple JavaScript array is already clear and fast enough
every query requires arbitrary sorting or grouping
you need concurrent writers or multi-process coordination
you want automatic indexes, compound indexes, or query planning across tables
you want analytical SQL over files or large columnar datasets, where DuckDB may be a better fit

Row indexes are physical positions and can change after deletes. Use an explicit id column for stable identity.

Decision Guide

| Tool | Good fit | Not a good fit | |---|---|---| | ColQL | Process-local TypeScript data that benefits from compact in-memory columns, explicit indexes, validation, and inspectable query plans | Persistence, SQL, joins, transactions, shared state, or distributed coordination | | JavaScript arrays | Small or simple datasets, ad hoc transforms, or write-heavy logic where object arrays are already clear and fast enough | Memory-sensitive data, repeated projections, structured indexed lookups, or runtime schema validation | | SQLite | Durable embedded relational storage with SQL, transactions, and indexes | Pure process-local ephemeral caches where a database file and SQL layer are unnecessary | | DuckDB | Analytical SQL, file-based analytics, large columnar datasets, and ad hoc aggregations | TypeScript-first mutable process-local tables with explicit in-memory indexes |

Examples

The Fastify example demonstrates HTTP query params mapped to object predicates, range queries, filter(fn), updateMany, deleteMany, query diagnostics, index stats, and memory counters.

Documentation

Detailed documentation is available under docs/doc.

Common APIs

users.insert(row);
users.insertMany(rows);

users.where({ status: "active", age: { gte: 18 } }).toArray();
users.where("age", ">=", 18).select(["id"]).toArray();
users.whereIn("status", ["active", "passive"]);
users.filter((row) => row.score > 90);
users.where({ status: "active" }).explain();

users.count();
users.avg("age");
users.top(10, "score");

users.update(0, { status: "active" });
users.updateMany({ status: "passive" }, { status: "active" });
users.deleteMany({ status: "archived" });

users.createIndex("id");
users.createSortedIndex("age");
users.createUniqueIndex("id");
users.findBy("id", 123);

const buffer = users.serialize();
const restored = table.deserialize(buffer);
restored.createIndex("status");

filter(fn) is intentionally a full-scan escape hatch. Prefer structured predicates when you want index planning. query.explain() returns structured diagnostics without executing the query, scanning rows, materializing rows, calling onQuery, or rebuilding dirty indexes.

Error Handling

ColQL validates schemas, inserted rows, query predicates, mutation payloads, indexes, and serialized input at runtime. Failures throw ColQLError with a stable code, a message, and optional details.

import { ColQLError } from "@colql/colql";

try {
  users.insert({ id: 4, age: 300, status: "active", score: 1, verified: true });
} catch (error) {
  if (error instanceof ColQLError) {
    console.log(error.code); // COLQL_OUT_OF_RANGE
  }
}

Development

npm install
npm run check
npm test
npm run test:types
npm run build
npm run bench:codspeed
npm run benchmark:memory
npm run benchmark:query
npm run benchmark:indexed
npm run benchmark:range
npm run benchmark:optimizer
npm run benchmark:serialization
npm run benchmark:delete
npm run benchmark:array-comparison
npm run benchmark:session-analytics

Status

ColQL v0.5.x focuses on trust and stability hardening: narrower index invalidation after updates, stricter snapshot deserialization, a type-test gate for public TypeScript behavior, and clearer diagnostics. Breaking changes may still happen before 1.0.0, but the project is moving toward API stabilization.

Limitations

ColQL intentionally does not include SQL parsing, joins, transactions, concurrency control, automatic indexes, compound indexes, or durable storage. Equality and sorted indexes are derived performance structures; query results must be the same whether ColQL uses an index or a full scan. Unique indexes are derived too, but they also enforce uniqueness while present and are not serialized.