miniml
v1.0.16
Published
A minimal, embeddable semantic data modeling language for generating SQL queries from YAML model definitions. Inspired by LookML.
Maintainers
Readme
MiniML
A minimal, embeddable semantic data modeling language for generating SQL queries from YAML model definitions, inspired by LookML.
Generates SQL queries driven by a semantic model and selected dimensions and measures. Capable of generating optimal SQL from minimal queries to very complex queries with multiple joins based on the specified configuration.
Also generates metadata for each model suitable for use by AI to execute queries.
Designed to be a stand-alone and light-weight library, embeddable in any server or app without requiring additional data platforming. Supports BigQuery and Snowflake dialects.
Table of Contents
- Quick Start
- Installation
- Programmatic Usage
- Model Definition
- Query Generation
- API Reference
- Examples
- Supported SQL Dialects
- Contributing
Quick Start
1. Define a Model
Create a YAML model file (sales.yaml) that describes your data structure:
description: Sales fact table capturing transactional-level purchase data across products, stores, and customers.
from: acme.sales
join:
customer_join: JOIN acme.customers USING (customer_id)
product_join: JOIN acme.products USING (product_id)
store_join: JOIN acme.stores USING (store_id)
always_join:
- product_join # Always include product data for consistency
date_field: date # Primary date field for filtering
default_date_range: last 7 days # Default time window when no dates specified
include_today: false # Exclude today's partial data
dimensions:
date:
- The calendar date on which the sale occurred.
- DATE(sale_date)
sale_id: Unique identifier for each sale transaction.
customer_id: Unique identifier for the customer making the purchase.
customer_name:
- Display name or label associated with the customer.
- customer_name
- customer_join
product_id: Unique identifier for the product sold.
product_name:
- Display name or label associated with the product.
- product_name
- product_join
store_id: Unique identifier for the store where the sale took place.
store_name:
- Display name or label of the store location.
- store_name
- store_join
measures:
total_amount:
- Total sale amount generated by the transaction.
- SUM(total_amount)
price_avg:
- Average unit price across all sales.
- AVG(unit_price)
count:
- The count of records for the row.
- COUNT(*)2. Produce a description of the model
npx miniml sales.yamlOUTPUT
## DIMENSIONS
- `date` The calendar date on which the sale occurred.
- `sale_id` Unique identifier for each sale transaction.
- `customer_id` Unique identifier for the customer making the purchase.
- `customer_name` Display name or label associated with the customer.
- `product_id` Unique identifier for the product sold.
- `product_name` Display name or label associated with the product.
- `store_id` Unique identifier for the store where the sale took place.
- `store_name` Display name or label of the store location.
## MEASURES
- `total_amount` Total sale amount generated by the transaction.
- `price_avg` Average unit price across all sales.
- `count` The count of records for the row.3. Generate a SQL query from the model
npx miniml sales.yaml --dimensions=date --measures=total_amountOUTPUT
SELECT
DATE(sale_date) AS date,
SUM(total_amount) AS total_amount
FROM acme.sales
JOIN acme.products USING (product_id)
WHERE DATE(sale_date) >= CURRENT_TIMESTAMP - INTERVAL 7 DAY
AND DATE(sale_date) < DATE_TRUNC(CURRENT_TIMESTAMP, DAY)
GROUP BY ALLAdd more options:
npx miniml sales.yaml \
--dimensions=date,customer_name,product_name \
--measures=total_amount,price_avg,count \
--date-from=2025-01-01 \
--date-to=2025-01-31 \
--order-by=-date \
--limit=10OUTPUT
SELECT
DATE(sale_date) AS date,
customer_name AS customer_name,
product_name AS product_name,
SUM(total_amount) AS total_amount,
AVG(unit_price) AS price_avg,
COUNT(*) AS count
FROM acme.sales
JOIN acme.customers USING (customer_id)
JOIN acme.products USING (product_id)
WHERE DATE(sale_date) BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY ALL
ORDER BY date DESC
LIMIT 10Installation
To use MiniML in your project:
npm install minimlProgrammatic Usage
import { loadModel, renderQuery } from 'miniml';
// Load the model
const model = await loadModel('./sales.yaml');
// Generate a SQL query
const sql = renderQuery(model, {
dimensions: ['date', 'customer_name'],
measures: ['total_amount', 'count'],
date_from: '2025-01-01',
date_to: '2025-01-31',
order_by: ['-date'],
limit: 10
});
console.log(sql);OUTPUT
SELECT
DATE(sale_date) AS date,
customer_name AS customer_name,
SUM(total_amount) AS total_amount,
COUNT(*) AS count
FROM acme.sales
JOIN acme.customers USING (customer_id)
WHERE DATE(sale_date) BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY ALL
ORDER BY date DESC
LIMIT 10Model Definition
Basic Structure
A MiniML model is a YAML file with the following structure:
description: Human-readable description of the model
dialect: bigquery | snowflake
from: schema.table_name
join:
join_name: JOIN clause definition
always_join:
- join_name # Joins to always include
where: base_filter_conditions
date_field: primary_date_column # Auto-detected if not specified
default_date_range: last 30 days # Default time window
include_today: false # Include/exclude current day
dimensions:
dimension_name: definition
measures:
measure_name: definition
tags:
- public # Tags that can be used for purposes such as access control
info: Detailed usage instructions for the modelDimensions
Dimensions are fields used for grouping data. They can be defined in several formats:
Simple string (field name only):
dimensions:
customer_id: Unique customer identifierArray format (description, SQL, join):
dimensions:
customer_name:
- Display name of the customer
- customer_name
- customer_joinObject format:
dimensions:
date:
description: The transaction date
sql: DATE(created_at)
join: date_dimension_joinMeasures
Measures are aggregated fields. They default to SUM() if no SQL is provided:
Simple string (defaults to SUM):
measures:
revenue: Total revenue amount # becomes SUM(revenue) AS revenueCustom aggregation:
measures:
avg_price:
- Average unit price
- AVG(unit_price)Joins
Define reusable join clauses. Joins are rendered in the order they are defined under the join section, regardless of the order they are referenced by dimensions or measures.
join:
customer_join: LEFT JOIN customers USING (customer_id)
product_join: LEFT JOIN products USING (product_id)
store_join: LEFT JOIN stores USING (store_id)
always_join:
- product_join # Always include this join, regardless of field usage
dimensions:
customer_name:
- Customer display name
- customer_name
- customer_join # References the join above
store_name:
- Store location name
- store_name
- store_joinGenerated SQL join order:
-- Joins appear in YAML definition order: customer_join, product_join, store_join
-- Even if referenced in reverse order by dimensions/measures
FROM base_table
JOIN customers USING (customer_id) -- First in YAML
JOIN products USING (product_id) -- Second in YAML
JOIN stores USING (store_id) -- Third in YAMLThis predictable ordering ensures consistent SQL generation and helps with query optimization and dependency management between joins.
Always Join
The always_join property specifies joins that should always be included in generated SQL queries, regardless of whether any dimensions or measures explicitly reference them. This is useful for joins that provide essential context, filtering, or data integrity constraints.
join:
customer_join: LEFT JOIN customers USING (customer_id)
product_join: LEFT JOIN products USING (product_id)
store_join: LEFT JOIN stores USING (store_id)
always_join:
- product_join # Always included for data integrity
- store_join # Always included for consistent filtering
dimensions:
date: Transaction date
customer_name:
- Customer name from customer table
- customer_name
- customer_join # Only included when customer_name is usedAlternatively, specify all to always include all joins.
always_join: allExample Query:
// This query only uses dimensions that don't require joins
const sql = renderQuery(model, {
dimensions: ['date'],
measures: ['count']
});Generated SQL:
SELECT
DATE(sale_date) AS date,
COUNT(*) AS count
FROM acme.sales
JOIN products USING (product_id) -- Always included
JOIN stores USING (store_id) -- Always included
GROUP BY ALLUse Cases:
- Data Quality: Ensure referential integrity by always joining lookup tables
- Security: Always include joins for row-level security filters
- Business Logic: Maintain consistent business rules across all queries
- Performance: Pre-join commonly used dimensions to optimize query plans
Best Practices:
- Use sparingly - only for joins that are truly always needed
- Consider the performance impact of additional joins on large datasets
- Document why specific joins are marked as
always_joinin your model
Date Field Configuration
MiniML provides sophisticated date handling capabilities through three key properties: date_field, default_date_range, and include_today.
Date Field (date_field)
The date_field property specifies the primary date dimension used for automatic date filtering. If not explicitly defined, MiniML automatically detects it using these rules (in order):
- Exact matches:
"date","timestamp" - Pattern matches: Fields ending with
"date","time","_at","_on","_until", starting with"date", or containing"datetime"
# Explicit date field
date_field: order_date
dimensions:
order_date: The date when the order was placed
created_at: Record creation timestamp
# Auto-detected: MiniML will choose "order_date" (exact match)# Auto-detection example
dimensions:
transaction_date: Purchase date # Will be auto-detected
customer_id: Customer identifier
# No explicit date_field needed - "transaction_date" will be chosenDefault Date Range (default_date_range)
Automatically applies date filtering when no explicit date_from/date_to parameters are provided. Supports human-readable relative date expressions:
default_date_range: last 30 days
# Other examples:
# default_date_range: last 7 days
# default_date_range: last 12 months
# default_date_range: last 1 year
# default_date_range: last 24 hoursSupported time units: hour/hours, day/days, week/weeks, month/months, year/years
Generated SQL Examples:
BigQuery:
WHERE order_date >= CURRENT_TIMESTAMP - INTERVAL 30 DAYSnowflake:
WHERE order_date >= CURRENT_TIMESTAMP - INTERVAL '30 DAY'Include Today (include_today)
Controls whether the current day is included in default date range filtering. Defaults to true if not specified.
default_date_range: last 7 days
include_today: false # Exclude today from the range (default is true)With include_today: true (default):
-- BigQuery
WHERE order_date >= CURRENT_TIMESTAMP - INTERVAL 7 DAY
-- Includes all of today's dataWith include_today: false:
-- BigQuery
WHERE order_date >= CURRENT_TIMESTAMP - INTERVAL 7 DAY
AND order_date < DATE_TRUNC(CURRENT_TIMESTAMP, DAY)
-- Excludes today's data (up to midnight)Complete Date Configuration Example
description: E-commerce orders with 30-day default window
dialect: bigquery
from: ecommerce.orders
date_field: order_date # Primary date field for filtering
default_date_range: last 30 days # Auto-apply 30-day lookback
include_today: false # Exclude today's partial data
dimensions:
order_date:
- Order placement date
- DATE(order_date)
customer_id: Customer identifier
measures:
revenue:
- Total revenue
- SUM(amount)Query without explicit dates:
const sql = renderQuery(model, {
dimensions: ['order_date'],
measures: ['revenue']
});Generated SQL:
SELECT
DATE(order_date) AS order_date,
SUM(amount) AS revenue
FROM ecommerce.orders
WHERE order_date >= CURRENT_TIMESTAMP - INTERVAL 30 DAY
AND order_date < DATE_TRUNC(CURRENT_TIMESTAMP, DAY)
GROUP BY ALLQuery with explicit dates (overrides defaults):
const sql = renderQuery(model, {
dimensions: ['order_date'],
measures: ['revenue'],
date_from: '2024-01-01',
date_to: '2024-01-31'
});
// Uses explicit dates instead of default_date_rangeDate Granularity
When using the primary date_field as a dimension, you can apply date truncation:
const sql = renderQuery(model, {
dimensions: ['order_date'],
measures: ['revenue'],
date_granularity: 'week' // Group by week
});Generated SQL:
-- BigQuery
SELECT
DATE_TRUNC(DATE(order_date), WEEK) AS order_date,
SUM(amount) AS revenue
FROM ecommerce.orders
WHERE order_date >= CURRENT_TIMESTAMP - INTERVAL 30 DAY
GROUP BY ALLSupported granularities: MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR
SQL Dialect Detection
MiniML can automatically detect the SQL dialect from the file path:
- Files containing "bigquery" use BigQuery syntax
- Files containing "snowflake" use Snowflake syntax
- Otherwise, specify explicitly with the
dialectfield
Examples:
models/bigquery/sales.yaml # Automatically uses BigQuery dialect
models/snowflake/users.yaml # Automatically uses Snowflake dialect
models/sales.yaml # Requires explicit dialect: bigqueryThis means you can omit the dialect field when your file paths include the database type.
Query Generation
Query Options
The renderQuery function accepts these options:
interface MinimlQueryOptions {
dimensions?: string[]; // Fields to group by
measures?: string[]; // Aggregated fields to calculate
date_from?: string | null; // Start date filter (YYYY-MM-DD)
date_to?: string | null; // End date filter (YYYY-MM-DD)
where?: string; // Additional WHERE conditions
having?: string; // HAVING clause for aggregate filters
order_by?: string[]; // Sort fields (prefix with - for DESC)
limit?: number; // Limit result count
distinct?: boolean; // Add DISTINCT to SELECT
date_granularity?: string; // Date truncation (day, week, month, year)
}Date Handling
MiniML provides powerful date filtering and processing capabilities. For comprehensive documentation on date configuration, see the Date Field Configuration section.
Basic date filtering:
// Uses model's default_date_range if defined
renderQuery(model, { measures: ['count'] });
// Override with specific date range
renderQuery(model, {
measures: ['count'],
date_from: '2024-01-01',
date_to: '2024-01-31'
});
// Specify a null for date_from to bypass the default_date_range
renderQuery(model, { measures: ['count'], date_from: null });Date granularity:
// Group by week using the primary date_field
renderQuery(model, {
dimensions: ['date'],
measures: ['count'],
date_granularity: 'week'
});
// Generates: DATE_TRUNC('WEEK', sale_date) AS dateFiltering
WHERE clause (for dimensions):
renderQuery(model, {
dimensions: ['customer_id'],
measures: ['total_amount'],
where: "customer_type = 'premium'"
});HAVING clause (for measures):
renderQuery(model, {
dimensions: ['customer_id'],
measures: ['total_amount'],
having: "total_amount > 1000"
});Ordering
renderQuery(model, {
dimensions: ['customer_id'],
measures: ['total_amount'],
order_by: ['customer_id', '-total_amount'] // ASC, then DESC
});API Reference
Core Functions
loadModel(file: string): Promise<MinimlModel>
Asynchronously loads and processes a YAML model file.
loadModelSync(file: string): MinimlModel
Synchronously loads and processes a YAML model file.
renderQuery(model: MinimlModel, options: MinimlQueryOptions): string
Generates a SQL query from a model and query options.
Interfaces
MinimlModel
interface MinimlModel {
description: string;
dialect: string;
from: string;
join: Record<string, string>;
always_join?: string[]; // Joins to always include
where: string;
date_field?: string; // Primary date field (auto-detected)
default_date_range?: string; // Default time window (e.g., "last 30 days")
include_today?: boolean; // Include current day in default range
dimensions: Record<string, MinimlDef>;
measures: Record<string, MinimlDef>;
info: string; // Auto-generated documentation
}MinimlDef
interface MinimlDef {
key: string;
description: string;
sql?: string;
join?: string;
}Examples
E-commerce Analytics
# ecommerce.yaml
description: E-commerce sales and customer analytics
dialect: bigquery
from: ecommerce.orders
date_field: order_date
join:
customer_join: LEFT JOIN ecommerce.customers USING (customer_id)
product_join: LEFT JOIN ecommerce.products USING (product_id)
where: status = 'completed'
dimensions:
date: Order date
customer_id: Customer identifier
customer_email:
- Customer email address
- email
- customer_join
product_category:
- Product category
- category
- product_join
measures:
revenue: Total revenue
order_count:
- Number of orders
- COUNT(DISTINCT order_id)
avg_order_value:
- Average order value
- AVG(total_amount)Usage:
const model = await loadModel('./ecommerce.yaml');
// Revenue by category, last 7 days
const sql = renderQuery(model, {
dimensions: ['product_category'],
measures: ['revenue', 'order_count'],
date_from: '2024-01-20',
date_to: '2024-01-27',
order_by: ['-revenue'],
limit: 10
});Complex Joins and Calculations
# advanced.yaml
description: Advanced analytics with multiple joins
dialect: snowflake
from: fact.sales
date_field: sale_date
join:
customer_join: LEFT JOIN dim.customers c ON c.id = customer_id
product_join: LEFT JOIN dim.products p ON p.id = product_id
region_join: LEFT JOIN dim.regions r ON r.code = c.region_code
dimensions:
customer_segment:
- Customer segment classification
- c.segment
- customer_join
product_brand:
- Product brand name
- p.brand
- product_join
region_name:
- Geographic region
- r.name
- region_join
measures:
revenue_per_customer:
- Average revenue per customer
- SUM(amount) / COUNT(DISTINCT customer_id)
profit_margin:
- Profit margin percentage
- (SUM(amount) - SUM(cost)) / SUM(amount) * 100Supported SQL Dialects
BigQuery
- Supports
GROUP BY ALLsyntax - Uses
DATE_TRUNCfor date granularity - Compatible with BigQuery standard SQL functions
Snowflake
- Supports Snowflake-specific functions
- Uses
DATE_TRUNCfor date granularity - Compatible with Snowflake SQL syntax
More dialects coming soon!
Advanced Features
Template Variables in Info
Models support Jinja-style templating in the info section:
info: |
## Available Dimensions
{%- for dimension in dimensions %}
- `{{ dimension.key }}`: {{ dimension.description }}
{%- endfor %}
## Available Measures
{%- for measure in measures %}
- `{{ measure.key }}`: {{ measure.description }}
{%- endfor %}Filter Reference Expansion
MiniML automatically expands dimension and measure references in filter clauses:
// If you have a measure: conversion_rate: "clicks / impressions"
// And use it in a HAVING clause:
renderQuery(model, {
having: "conversion_rate > 0.05"
});
// MiniML expands this to: HAVING (clicks / impressions) > 0.05Validation
MiniML validates query parameters against model definitions and provides helpful error messages:
// This will throw an error with specific invalid keys
renderQuery(model, {
dimensions: ['invalid_dimension'],
measures: ['invalid_measure']
});
// Error: The following keys are invalid:
// - dimensions: invalid_dimension
// - measures: invalid_measureSecurity
Because MiniML allows flexible where and having clauses to be specified in native SQL format, comprehensive SQL validation is implemented to guard against SQL injection attacks for user-provided query parameters:
- AST-based validation: All user-provided SQL expressions are parsed and validated using node-sql-parser
- Allowlist approach: Only safe SQL constructs are permitted (comparisons, logical operators, basic functions)
- Model-aware validation: Column references are validated against your model schema
- Dialect-specific rules: Validation adapts to BigQuery/Snowflake syntax differences
- Date input validation: Date parameters (
date_from,date_to) are strictly validated using regex patterns
Security Model
Trusted Inputs: YAML model files are considered trusted input and are not validated for SQL injection. Model files should be:
- Created and maintained by trusted developers
- Stored securely with appropriate access controls
- Reviewed before deployment to production environments
- Treated as code artifacts subject to version control and security policies
Untrusted Inputs: All query parameters provided at runtime (where, having, date_from, date_to, etc.) are considered untrusted and undergo strict validation.
Safe Expression Examples
WHERE clauses:
renderQuery(model, {
where: "account_name = 'Acme Corp'"
});
renderQuery(model, {
where: "date >= '2024-01-01' AND category_name LIKE 'Electronics%'"
});
renderQuery(model, {
where: "revenue > 1000 OR quantity IS NOT NULL"
});HAVING clauses:
renderQuery(model, {
having: "total_amount > 1000 AND order_count >= 5"
});Blocked Constructs
For security, the following are automatically blocked:
- Subqueries:
SELECT,EXISTS,ANY,ALL - DDL/DML statements:
DROP,ALTER,CREATE,INSERT,UPDATE,DELETE - System functions: Potentially dangerous database functions
- Comments: SQL comments that could hide malicious code
- Unauthorized columns: References to columns not defined in your model
Validation Errors
When unsafe expressions are detected, MiniML provides helpful error messages:
// Error: "Subqueries are not allowed in WHERE clauses. Use simple comparisons instead."
renderQuery(model, {
where: "customer_id IN (SELECT id FROM users WHERE active = 1)"
});
// Error: "Column 'user_password' not found. Available dimensions: account_name, date, category_name"
renderQuery(model, {
where: "user_password = 'secret'"
});Building and Development
# Install dependencies
npm install
# Build TypeScript
npm run build
# Run tests (requires Mocha setup)
npx mocha test/**/*.test.jsContributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
MIT License - see LICENSE file for details.
