dataform-date
v0.0.6
Published
Offers date range control for Dataform in GCP
Readme
Dataform Date
Simplify date range control for Dataform and Google Analytics 4.
Getting Started
- In Dataform add the dataform-date package to package.json
"dataform-date":"latest" - In the
jsblock of your SQLX file, require the package:js { const dates = require("dataform-date"); } - Call the functions inline e.g.:
${dates.range("suffix","20241001","20241010")}
Functions
range(date_range_column, start_date, end_date, overwrite)load(date_range_column, start_date, end_date, full_refresh_start_date, full_refresh_end_date)create_table_id(date_override)
range(date_range_column, start_date, end_date, overwrite)
Creates a SQL BETWEEN clause string to filter a date range. Invalid dates like "2024-13-02" will throw an error.
Arguments:
date_range_column(string): The name of the date column to filter on.- Use
"suffix"or"ga4"to filter on_table_suffix(for GA4 partitioned tables), which formats dates asYYYYMMDD. - For any other string, it's treated as a regular date column, and dates are formatted as
YYYY-MM-DD.
- Use
start_date(string | number): The start of the date range.- As a string: A static date in
YYYY-MM-DDorYYYYMMDDformat. - As a number: A dynamic offset in days from the current date (e.g.,
-7for 7 days ago). - Defaults to
-1(yesterday) if not provided.
- As a string: A static date in
end_date(string | number): The end of the date range.- Same format as
start_date. - Defaults to
-1(yesterday) if not provided.
- Same format as
overwrite(boolean): Defaults totrue. Iffalse, it adds a condition to only select dates that are not already in the target table (...AND ${date_range_column} NOT IN (SELECT DISTINCT ${date_range_column} FROM ${self()})).
Examples:
- Default (no arguments): Filters
_table_suffixfor yesterday.${dates.range()} -- output: _table_suffix between "<yesterday_YYYYMMDD>" and "<yesterday_YYYYMMDD>" - GA4
_table_suffixwith dynamic dates: Filters the last 7 days of data.${dates.range("suffix", -7, -1)} -- output: _table_suffix between "<7_days_ago_YYYYMMDD>" and "<yesterday_YYYYMMDD>" - Custom date column: Filters
session_datefor the last 3 days.${dates.range("session_date", -3, -1)} -- output: session_date between "<3_days_ago_YYYY-MM-DD>" and "<yesterday_YYYY-MM-DD>" - Mixed static and dynamic dates: Filters from a fixed date up to yesterday.
${dates.range("session_date", "2024-01-01", -1)} -- output: session_date between "2024-01-01" and "<yesterday_YYYY-MM-DD>" - Disabling overwrite:
${dates.range("session_date", -7, -1, false)} -- output: session_date between "<7_days_ago_YYYY-MM-DD>" and "<yesterday_YYYY-MM-DD>" and session_date not in (select distinct session_date from ${self()})
load(date_range_column, start_date, end_date, full_refresh_start_date, full_refresh_end_date)
A wrapper around range() to simplify creating incremental tables in Dataform. It generates a date filter that behaves differently for incremental runs vs. full refreshes.
Arguments:
date_range_column(string): The date column name (seerange()for details).start_date(string | number): The start date for an incremental run.end_date(string | number): The end date for an incremental run.full_refresh_start_date(string | number): The start date for a full refresh.full_refresh_end_date(string | number): The end date for a full refresh.
Example:
In your definitions/my_table.sqlx:
config { type: "incremental" }
select * from source_table
where ${dates.load("session_date", -3, -1, "2023-01-01", -1)}
-- For an incremental run, this resolves to:
-- where session_date between "<3_days_ago_YYYY-MM-DD>" and "<yesterday_YYYY-MM-DD>"
-- For a full refresh, this resolves to:
-- where session_date between "2023-01-01" and "<yesterday_YYYY-MM-DD>"create_table_id(date_override)
Creates a fully-qualified table ID for a specific GA4 daily table (e.g., my_project.my_dataset.events_20241010). The date suffix changes automatically based on the execution date.
Arguments:
date_override(string | number):- As a string: A static date suffix in
YYYYMMDDformat. - As a number: A dynamic offset in days from the current date.
- Defaults to
-1(yesterday) if not provided.
- As a string: A static date suffix in
Examples:
(Assuming dataform.json has defaultDatabase: "my_gcp_project" and defaultSchema: "my_gcp_dataset")
- Default (yesterday's table):
${dates.create_table_id()} // output: `my_gcp_project.my_gcp_dataset.events_<yesterday_YYYYMMDD>` - Dynamic date (table from 3 days ago):
${dates.create_table_id(-3)} // output: `my_gcp_project.my_gcp_dataset.events_<3_days_ago_YYYYMMDD>` - Static date:
${dates.create_table_id("20241001")} // output: `my_gcp_project.my_gcp_dataset.events_20241001`
This updated documentation provides a much clearer and more comprehensive guide to using your package.
