npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ga4-export-fixer

v0.9.0

Published

<img src="docs/images/header.svg" alt="ga4-export-fixer">

Readme

An enhanced, incremental GA4 events table, built with Dataform

npm version License Dependencies

ga4-export-fixer is a Dataform NPM package that transforms raw GA4 BigQuery export data into a cleaner, more queryable incremental table. It combines daily, fresh (360), and intraday exports so the best available version of each event is always in use, adds session-level fields like session_id and landing_page, promotes key event parameters to columns, and fixes known GA4 export issues — handling the boilerplate transformations that are otherwise tedious to include in every GA4 query.

The goal of the package is to speed up development when building data models and pipelines on top of GA4 export data, allowing you to focus on your use case instead of wrestling with the raw export format.

Example data model built with ga4-export-fixer

Table of Contents

Main Features

Planned Features

Features under consideration for future releases:

  • Aggregated tables (ga4_session, ga4_ecommerce...)
  • Web and app specific default configurations
  • Custom channel grouping
  • Custom traffic source attribution

Installation

Bash

npm install ga4-export-fixer

In Google Cloud Dataform

Include the package in the package.json file in your Dataform repository.

package.json

{
  "dependencies": {
    "@dataform/core": "3.0.42",
    "ga4-export-fixer": "0.9.0"
  }
}

Note: The best practice is to specify the package version explicitly (e.g. "0.1.2") rather than using "latest" or "*", to avoid unexpected breaking changes when the package is updated.

In Google Cloud Dataform, click "Install Packages" to install it in your development workspace.

If your Dataform repository does not have a package.json file, see this guide: https://docs.cloud.google.com/dataform/docs/manage-repository#move-to-package-json

Usage

Create GA4 Events Enhanced Table

Creates an enhanced version of the GA4 BigQuery export (daily & intraday).

JS Deployment (Recommended) .JS

Create a new ga4_events_enhanced table using a .js file in your repository's definitions folder.

Using Defaults

definitions/ga4/ga4_events_enhanced.js

const { ga4EventsEnhanced } = require('ga4-export-fixer');

const config = {
  // using hard-coded GA4 export path
  sourceTable: '`project.analytics_12345.events_*`'
};

ga4EventsEnhanced.createTable(publish, config);

With Custom Configuration

definitions/ga4/ga4_events_enhanced.js

const { ga4EventsEnhanced } = require('ga4-export-fixer');

const config = {
  // GA4 export path declared, using the table reference object
  sourceTable: constants.GA4_TABLES.MY_GA4_EXPORT,
  // use dataformTableConfig to make changes to the default Dataform table configuration
  dataformTableConfig: {
      schema: 'ga4'
  },
  // test configurations
  test: false,
  testConfig: {
      dateRangeStart: 'current_date()-1',
      dateRangeEnd: 'current_date()',
  },
  schemaLock: '20260101', // lock to daily export; also supports 'intraday_20260101' or 'fresh_20260101'
  customTimestampParam: 'custom_event_timestamp', // custom timestamp collected as an event param
  timezone: 'Europe/Helsinki',
  // not needed data
  excludedColumns: [
    'app_info',
    'publisher'
  ],
  // not needed events
  excludedEvents: [
    'session_start',
    'first_visit',
    'user_engagement'
  ],
  // transform to session-level
  sessionParams: [
    'user_agent'
  ],
  // promote as columns
  eventParamsToColumns: [
    {name: 'session_engaged'},
    {name: 'ga_session_number', type: 'int'},
    {name: 'page_type', type: 'string'},
  ],
  // not needed in the event_params array
  excludedEventParams: [
    'session_engaged',
    'ga_session_number',
    'page_type',
    'user_agent'
  ],
  // use export type for data_is_final instead of the default DAY_THRESHOLD
  dataIsFinal: {
    detectionMethod: 'EXPORT_TYPE',
  },
  // attribute item lists to downstream ecommerce events within the same session
  itemListAttribution: {
    lookbackType: 'SESSION',
  },
};

ga4EventsEnhanced.createTable(publish, config);

SQLX Deployment .SQLX

Alternatively, you can create the ga4_events_enhanced table using a .SQLX file.

definitions/ga4/ga4_events_enhanced.sqlx

config {
  type: "incremental",
  description: "GA4 Events Enhanced table",
  schema: "ga4",
  onSchemaChange: "EXTEND",
  bigquery: {
    partitionBy: "event_date",
    clusterBy: ['event_name', 'session_id', 'page_location', 'data_is_final'],
  },
  tags: ['ga4_export_fixer']
}

js {
  const { ga4EventsEnhanced } = require('ga4-export-fixer');

  const config = {
    // using hard-coded GA4 export path
    sourceTable: '`project.analytics_12345.events_*`',
    self: self(),
    incremental: incremental()
  };
}

${ga4EventsEnhanced.generateSql(config)}

pre_operations {
  ${ga4EventsEnhanced.setPreOperations(config)}
}

Configuration Object

All fields are optional except sourceTable. Default values are applied automatically, so you only need to specify the fields you want to override.

| Field | Type | Default/Required | Description | | ---------------------- | ----------------------- | ---------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | sourceTable | Dataform ref / object / string | required | Source GA4 export table. Inside an SQLX js { } block use ref(...). From a .js definition file use a { schema, name } ref object (resolved later via ctx.ref()) or a backtick-quoted `project.dataset.events_*` string for an external table. | | self | Dataform self() | required for .SQLX deployment | Reference to the table itself. Use self() in Dataform | | incremental | Dataform incremental() | required for .SQLX deployment | Switch between incremental and full refresh logic. Use incremental() in Dataform | | dataformTableConfig | object | In JS deployment only. See default | Override the default Dataform table configuration for JS deployment. See: ITableConfig reference | | schemaLock | string | undefined | Lock the table schema to a specific GA4 export table suffix. Accepts "YYYYMMDD" (daily), "intraday_YYYYMMDD", or "fresh_YYYYMMDD". Date must be >= "20241009" | | timezone | string | 'Etc/UTC' | IANA timezone for event datetime (e.g. 'Europe/Helsinki') | | customTimestampParam | string | undefined | Name of a custom event parameter containing a JS timestamp in milliseconds (e.g. collected via Date.now()) | | bufferDays | integer | 1 | Extra days to include for sessions that span midnight. Auto-adjusted when itemListAttribution.lookbackType is 'TIME' and the lookback exceeds bufferDays | | itemListAttribution | object | undefined | Enable item list attribution. See Item List Attribution | | test | boolean | false | Enable test mode (uses testConfig date range instead of pre-operations) | | excludedEventParams | string[] | [] | Event parameter names to exclude from the event_params array | | excludedEvents | string[] | ['session_start', 'first_visit'] | Event names to exclude from the table. These events are excluded by default because they have no use for analysis purposes. Override this to include them if needed | | excludedColumns | string[] | [] | Default GA4 export columns to exclude from the final table, for example 'app_info' or 'publisher' | | sessionParams | string[] | [] | Event parameter names to aggregate as session-level parameters | | includedExportTypes | object | See details | Which GA4 export types to include (daily, fresh, intraday) | | dataIsFinal | object | See details | How to determine whether data is final (not expected to change) | | testConfig | object | See details | Date range used when test is true | | preOperations | object | See details | Date range and incremental refresh configuration | | eventParamsToColumns | object[] | [] | Event parameters to promote to columns. See item schema | | customSteps | object[] | [] | User-defined CTEs appended to the pipeline after enhanced_events. See Custom CTEs | | enrichments | object[] | [] | Declarative external-data enrichments joined into enhanced_events. See Data Enrichments |

{
    "name": "ga4_events_enhanced_<dataset_id>",
    "type": "incremental",
    "schema": "<source_dataset>",
    "description": "<default description>",
    "bigquery": {
        "partitionBy": "event_date",
        "clusterBy": [
            "event_name",
            "session_id",
            "page_location",
            "data_is_final"
        ],
        "labels": {
            "ga4_export_fixer": "true"
        }
    },
    "onSchemaChange": "EXTEND",
    "tags": [
        "ga4_export_fixer"
    ]
}

The onSchemaChange: "EXTEND" setting updates the result table schema on incremental runs, adding columns for any new fields the query produces.

includedExportTypes — which GA4 export types to include:

| Field | Type | Default | Description | | ------------------------------ | ------- | ------- | -------------------------------- | | includedExportTypes.daily | boolean | true | Include daily (processed) export | | includedExportTypes.fresh | boolean | false | Include fresh (hourly-updated) export | | includedExportTypes.intraday | boolean | true | Include intraday export |

Export priority: daily > fresh > intraday. Each lower-priority export only provides data not already covered by a higher-priority one. All seven combinations of the three export types are supported.

When all three exports are enabled, the package:

  1. Gets all data from daily export tables
  2. Gets fresh export data for days not yet covered by a daily table
  3. Gets intraday export data for events after the latest fresh event timestamp

The boundary between fresh and intraday is timestamp-based because the fresh export is updated hourly, so within the same day some events come from the fresh export and the rest from intraday.

Without daily export: When daily is false, dataIsFinal.detectionMethod must be set to 'DAY_THRESHOLD', because EXPORT_TYPE detection relies on daily tables to mark data as final.

dataIsFinal — how to determine whether data is final (not expected to change):

| Field | Type | Default | Description | | ----------------------------- | ------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | dataIsFinal.detectionMethod | string | 'DAY_THRESHOLD' | 'DAY_THRESHOLD' (uses days since event; data older than dayThreshold is considered final) or 'EXPORT_TYPE' (uses table suffix; all data from the daily export is considered final). 'EXPORT_TYPE' is suitable for most web only properties as data is rarely received with a delay. Must be 'DAY_THRESHOLD' when daily export is not enabled | | dataIsFinal.dayThreshold | integer | 3 | Days after which data is considered final. According to GA4 documentation, data up to 72 hours old is subject to possible changes. Required when detectionMethod is 'DAY_THRESHOLD' |

testConfig — date range used when test is true:

| Field | Type | Default | Description | | --------------------------- | ----------------- | -------------------- | --------------------------- | | testConfig.dateRangeStart | string (SQL date) | 'current_date()-1' | Start date for test queries | | testConfig.dateRangeEnd | string (SQL date) | 'current_date()' | End date for test queries |

preOperations — date range and incremental refresh configuration:

| Field | Type | Default | Description | | ------------------------------------------ | ----------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | preOperations.dateRangeStartFullRefresh | string (SQL date) | 'date(2000, 1, 1)' | Start date for full refresh | | preOperations.dateRangeEnd | string (SQL date) | 'current_date()' | End date for queries | | preOperations.numberOfPreviousDaysToScan | integer | 10 | Number of days to scan backwards from the result table's last partition when determining the incremental refresh start checkpoint. Needs to cover the number of days that can still contain not final (data_is_final = false) data | | preOperations.incrementalStartOverride | string (SQL date) | undefined | Override the incremental start date to re-process a specific range | | preOperations.incrementalEndOverride | string (SQL date) | undefined | Override the incremental end date to re-process a specific range | | preOperations.numberOfDaysToProcess | integer | undefined | Limit each run to N days of data. When set, the end date becomes start + N - 1 (capped at current_date()). When undefined, dateRangeEnd is used as-is. incrementalEndOverride takes priority |

Date fields (dateRangeStart, dateRangeEnd, etc.) accept string dates in YYYYMMDD or YYYY-MM-DD format, or BigQuery SQL expressions (e.g. 'current_date()', 'date(2026, 1, 1)').

eventParamsToColumns — each item in the array is an object:

| Field | Type | Required | Description | | ------------ | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------- | | name | string | Yes | Event parameter name | | type | string | No | Data type: 'string', 'int', 'int64', 'double', 'float', or 'float64'. If omitted, returns the value converted to a string | | columnName | string | No | Column name in the output. Defaults to the parameter name |

itemListAttribution — when set to an object, enables attribution of item_list_name, item_list_id, and item_list_index from select_item/select_promotion events to downstream ecommerce events (e.g. add_to_cart, purchase). Disabled by default.

| Field | Type | Required | Description | | ---------------- | ------- | --------------------------- | --------------------------------------------------------------------- | | lookbackType | string | Yes | 'SESSION' (partition by session) or 'TIME' (time-based window) | | lookbackTimeMs | integer | When lookbackType: 'TIME' | Lookback window in milliseconds (e.g. 86400000 for 24h) |

// Session-based: attribute within the same session
itemListAttribution: { lookbackType: 'SESSION' }

// Time-based: attribute within a 24-hour window across sessions
itemListAttribution: { lookbackType: 'TIME', lookbackTimeMs: 86400000 }

Note: This feature adds a compute-heavy CTE with a window function over unnested items. Only enable it if you need item list attribution for ecommerce analysis.

customSteps — append CTEs after enhanced_events. Each entry is either a raw {name, query} or a structured {name, select, from, ...}. The last entry becomes the table's final SELECT; earlier entries become CTEs.

Stable CTE names you can reference from your custom steps:

| Name | Always present? | Contents | | ------------------------ | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | | event_data | yes | Extracted and shaped events from sourceTable, with date filtering and column promotions applied. Unfiltered for the buffer-days range. | | session_data | yes | Session-level aggregations (grouped by session_id). | | items_unnested | only when itemListAttribution is on | Per-event item rows (one row per item per ecommerce event), with attribution window function applied. | | items_rebuilt | only when itemListAttribution is on | Re-aggregated items with attributed list fields, joined back to events via _item_row_id. | | enrich_<name> | only when configured via enrichments | One CTE per enrichment entry, providing dim data for joining into enhanced_events. | | enhanced_events | yes | The package's standard output shape (joined event_data + session_data + items_rebuilt + enrich_*, columns ordered, incremental date filter applied). The natural starting point for most custom CTEs. |

Example custom step using the raw SQL format:

// Add a content_group column derived from page.path
customSteps: [
    {
        name: 'final',
        query: `select
  enhanced_events.*,
  case
    when page.path like '/blog/%' then 'blog'
    when page.path like '/products/%' then 'product'
    when page.path = '/' then 'home'
    else 'other'
  end as content_group
from enhanced_events`,
    },
],

The same example in the structured shape:

customSteps: [
    {
        name: 'final',
        select: {
            columns: {
                '[sql]passthrough': 'enhanced_events.*',
                content_group: `case
  when page.path like '/blog/%' then 'blog'
  when page.path like '/products/%' then 'product'
  when page.path = '/' then 'home'
  else 'other'
end`,
            },
        },
        from: 'enhanced_events',
    },
],

Note: Custom columns aren't auto-documented. Use dataformTableConfig.columns to add descriptions — it's deep-merged with the package's defaults, so your keys are added or override matching defaults, and untouched defaults stay.

Note: Built-in assertions assume the package's standard schema. If your custom CTEs rename, drop, or filter rows in ways that break those assumptions, disable the affected assertions explicitly via the assertions config option.

enrichments — declaratively join external dimension data into enhanced_events (cohort labels, page metadata, marketing attribution, etc.). Each entry describes one dim source plus the join — the package generates the source CTE, the LEFT JOIN, and column descriptions automatically.

For typical use cases this is the right tool; reach for customSteps only when you need a transformation that doesn't fit a flat dim join.

Per-enrichment shape:

| Field | Type | Required | Description | | --- | --- | --- | --- | | name | string | Yes | Used in the generated enrich_<name> CTE name. Unique within enrichments. | | level | 'row' / 'item' | No, defaults to 'row' | Join grain. 'row' joins external dim data onto each row of enhanced_events (any column on enhanced_events as the key). 'item' joins external dim data onto each item inside the items array (any field on the items struct or any event_data column as the key). | | source | Dataform ref / object / string | Yes | Source dim table. Inside an SQLX js { } block use ref(...). From a .js definition file use a { schema, name } ref object (resolved later via ctx.ref()) or a backtick-quoted `project.dataset.table` string for an external table. | | joinKey | string / string[] | Yes | For level: 'row': column name(s) on enhanced_events. For level: 'item': field name(s) on the items struct (e.g. 'item_id') or column name(s) on event_data (e.g. 'user_pseudo_id'). Composite keys (array) compile to USING(col1, col2, ...). | | columns | string[] | Yes | Source columns to add to the output (excluding joinKey). Names matching existing columns are coalesced with the original (coalesce(enrich.col, original)) so missed JOINs fall back to the existing value. | | dedupe | boolean | No, defaults to false | When true, wraps the source CTE in qualify row_number() over (partition by <joinKey>) = 1 for non-unique-key dim sources. Non-deterministic which row wins; for strict needs, pre-aggregate in source SQL. |

Coalesce-or-add semantics. If an enrichment column name matches an existing column on enhanced_events (a column promoted via eventParamsToColumns, a package-generated column, or a default GA4 column from the export), the enrichment value is coalesced with the original: coalesce(enrich_<name>.<col>, <original>) as <col>. Rows where the JOIN matches get the enrichment value; rows where it misses fall back to the existing value rather than going NULL. If there is no overlap, the column is added as a plain enrich_<name>.<col>.

Example — attach user cohort labels by user_pseudo_id (Dataform-declared table referenced by { schema, name }):

enrichments: [
    {
        name: 'cohorts',
        // level omitted → defaults to 'row'
        source: { schema: 'analytics', name: 'user_cohorts' },
        joinKey: 'user_pseudo_id',
        columns: ['cohort_label', 'lifecycle_stage'],
    },
],

Example — composite key (date + user) for daily-varying dim data, with dedupe safety net (external table referenced by backtick-FQN):

enrichments: [
    {
        name: 'segments',
        level: 'row',
        source: '`my-project.analytics.daily_user_segments`',
        joinKey: ['event_date', 'user_pseudo_id'],
        columns: ['segment'],
        dedupe: true,
    },
],

Example — fix a promoted event parameter via enrichment (coalesce case: enrichment value wins where the JOIN matches, original kept where it doesn't):

{
    eventParamsToColumns: [{ name: 'page_title', type: 'string' }],
    enrichments: [
        {
            name: 'titles',
            level: 'row',
            source: { schema: 'analytics', name: 'page_title_overrides' },
            joinKey: 'page_location',
            columns: ['page_title'],   // overlaps the promoted column → coalesce(enrich.page_title, event_data.page_title)
        },
    ],
}

Example — item-level enrichment: attach product master data to each item via item_id. The enrichment flows into the items array struct; margin_bucket is added as a new item-struct field, and item_category overlap-coalesces against the original. Item-level enrichment columns do NOT appear at the row grain — they live inside items[].<col>:

enrichments: [
    {
        name: 'products',
        level: 'item',
        source: { schema: 'analytics', name: 'product_master' },
        joinKey: 'item_id',                                  // joins on item.item_id
        columns: ['margin_bucket', 'item_category'],         // margin_bucket is additive; item_category overlap-coalesces
    },
],

For level: 'item', valid joinKey values are any field on the GA4 items struct (item_id, item_category, etc.) or any column on event_data (user_pseudo_id, event_date, etc.). A row-level and an item-level enrichment may share the same column name (e.g. both writing cohort) — the two columns target structurally distinct slots (enhanced_events.cohort at row grain vs items[].cohort inside the items array) and are not in collision.

Note: Each enrichment generates a CTE named enrich_<name> at the top of the pipeline. The enrich_* namespace is part of the reserved-names contract — customSteps cannot use these names. The active reserved set includes only the names of enrichments actually configured.

Note: Row-level enrichment columns get auto-generated descriptions (Added by enrichment '<name>' (joined on <joinKey> from <source>). for new columns; Coalesced by enrichment '<name>' (...; falls back to original on missed JOIN). Original: <description> for overlapping columns). User-supplied dataformTableConfig.columns overrides win — the auto-generated description is the default. Item-level enrichment columns do not receive auto-generated descriptions (BigQuery does not surface per-field descriptions on STRUCT-array fields cleanly through Dataform's column-description mechanism).

Note: joinKey and columns entries must be plain SQL identifiers — inline aliases like 'id as user_id' are rejected at validation time. If your dim source uses a different column name, alias it in an upstream Dataform view and point source at that view.


Assertions

The package includes built-in data quality assertions that can be automatically created alongside the enhanced events table. Pass Dataform's assert function as the third argument to createTable:

ga4EventsEnhanced.createTable(publish, config, { assert });

This creates the table along with the default-enabled assertions, using the same configuration:

| Assertion | Name | Enabled by default | Description | | --------- | ---- | ------------------ | ----------- | | dailyQuality | {tableName}_daily_quality | Yes | Compares session count, event count, item revenue, and ecommerce purchase revenue per day between the enhanced table and raw export. Also reconciles item_revenue at the (event_date, item_id) grain on purchase events for days both sides consider final. Detects missing days, count mismatches, and non-final data inflation |

The assertion inherits the table's schema and tags from dataformTableConfig and queries the last 5 days of data.

Selective Assertions

Disable the assertion by setting it to false:

ga4EventsEnhanced.createTable(publish, config, {
    assert,
    assertions: { dailyQuality: false },
});

Assertion Config Overrides

Override the assertion's Dataform configuration (name, schema, tags):

ga4EventsEnhanced.createTable(publish, config, {
    assert,
    assertions: {
        dailyQuality: { tags: ['data_quality', 'ga4_export_fixer'] },
    },
});

Standalone Assertions (SQLX Deployment)

For SQLX deployments or when you need full control, assertions can also be used as standalone SQL generators:

const { ga4EventsEnhanced } = require('ga4-export-fixer');

assert('daily_quality_check', {
    schema: 'analytics_123456789',
    tags: ['ga4_export_fixer'],
}).query(ctx => {
    return ga4EventsEnhanced.assertions.dailyQuality(
        ctx.ref('ga4_events_enhanced_123456789'),
        { ...config, sourceTable: ctx.ref(config.sourceTable) }
    );
});

Creating Incremental Downstream Tables from ga4_events_enhanced

Setting up incremental updates is easy using the setPreOperations() function. Just ensure that your result table includes the data_is_final flag from the ga4_events_enhanced table.

The incrementalDateFilter() function applies the same date filtering used by ga4_events_enhanced, based on the config options and the variables declared by setPreOperations().

Key fields such as session_id, user_id, and session_traffic_source_last_click are available as clean, sessionized versions that handle edge cases like sessions spanning midnight.

definitions/ga4/ga4_sessions.sqlx

config {
  type: "incremental",
  description: "GA4 sessions table",
  schema: "ga4_export_fixer",
  bigquery: {
    partitionBy: "event_date",
    clusterBy: ['session_id', 'data_is_final'],
  },
  tags: ['ga4_export_fixer']
}

js {
  const { setPreOperations, helpers } = require('ga4-export-fixer');

  const config = {
    self: self(),
    incremental: incremental(),
    /*
    Default options that can be overriden:
    test: false,
    testConfig: {
        dateRangeStart: 'current_date()-1',
        dateRangeEnd: 'current_date()',
    },
    preOperations: {
        dateRangeStartFullRefresh: 'date(2000, 1, 1)',
        dateRangeEnd: 'current_date()',
        // incremental date range overrides allow re-processing only a subset of the data:
        //incrementalStartOverride: undefined,
        //incrementalEndOverride: undefined,
    },
    */
  };
}

select
  event_date,
  session_id,
  user_pseudo_id,
  user_id,
  any_value(session_traffic_source_last_click.cross_channel_campaign) as session_traffic_source,
  any_value(landing_page) as landing_page,
  current_datetime() as row_inserted_timestamp,
  min(data_is_final) as data_is_final
from
  ${ref('ga4_events_enhanced_298233330')}
where
  ${helpers.incrementalDateFilter(config)}
group by 
  event_date,
  session_id,
  user_pseudo_id,
  user_id

pre_operations {
  ${setPreOperations(config)}
}

Helpers

The helpers contain templates for common SQL expressions. The functions are referenced by ga4EventsEnhanced but can also be imported as utility functions for working with GA4 data.

const { helpers } = require('ga4-export-fixer');

SQL Templates

| Name | Example | Description | | ----------- | ------------------- | ------------------------------------------------------------------------- | | eventDate | helpers.eventDate | Casts event_date string to a DATE using YYYYMMDD format | | sessionId | helpers.sessionId | Builds a session ID by concatenating user_pseudo_id and ga_session_id |

Functions

Unnesting parameters

| Function | Example | Description | | ------------------ | --------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | unnestEventParam | unnestEventParam('page_location', 'string') | Extracts a value from the event_params array by key. Supported types: 'string', 'int', 'int64', 'double', 'float', 'float64'. Omit type to get the value converted as a string |

Date and time

| Function | Example | Description | | ------------------------- | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | | getEventTimestampMicros | getEventTimestampMicros('custom_ts') | Returns SQL for event timestamp in microseconds. With a custom parameter, uses it (converted from ms) with fallback to event_timestamp | | getEventDateTime | getEventDateTime({ timezone: 'Europe/Helsinki' }) | Returns SQL for event datetime in the given timezone. Defaults to 'Etc/UTC' |

Date filters

| Function | Example | Description | | --------------------- | -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | ga4ExportDateFilter | ga4ExportDateFilter('daily', 'current_date()-7', 'current_date()') | Generates a _table_suffix filter for a single export type ('daily' or 'intraday') and date range |

Page details

| Function | Example | Description | | ----------------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | | extractUrlHostname | extractUrlHostname('page_location') | Extracts hostname from a URL column | | extractUrlPath | extractUrlPath('page_location') | Extracts the path component from a URL column | | extractUrlQuery | extractUrlQuery('page_location') | Extracts the query string (including ?) from a URL column | | extractUrlQueryParams | extractUrlQueryParams('page_location') | Parses URL query parameters into ARRAY<STRUCT<key STRING, value STRING>> | | extractPageDetails | extractPageDetails() | Returns a struct with hostname, path, query, and query_params. Defaults to page_location event parameter |

Aggregation

| Function | Example | Description | | ---------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------- | | aggregateValue | aggregateValue('user_id', 'last', 'event_timestamp') | Aggregates a column using 'max', 'min', 'first', 'last', or 'any'. 'first' and 'last' use the timestamp column for ordering |

Ecommerce

| Function | Example | Description | | -------------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | | fixEcommerceStruct | fixEcommerceStruct() | Cleans the ecommerce struct: sets transaction_id to null when '(not set)', and fixes missing/NaN purchase_revenue for purchase events |

Data freshness

| Function | Example | Description | | ------------- | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | isFinalData | isFinalData('DAY_THRESHOLD', 3) | Returns SQL that evaluates to true when data is final. 'DAY_THRESHOLD' uses days since event (dayThreshold is required and must be a non-negative integer); 'EXPORT_TYPE' checks table suffix |

License

MIT