npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

dataform-logging

v1.0.0

Published

Structured logging library for Dataform pipelines

Readme

DataformLogging Library

A utility library for structured logging of Dataform tasks, designed to help track, audit, and debug Dataform pipeline executions. It provides automatic logging of task metadata, performance metrics, and dependency management for robust pipeline observability.


Features

  • Structured Logging: Automatically inserts and updates log records for each Dataform task.
  • Performance Metrics: Captures start time, end time, duration, row count, and more.
  • Dependency Management: Ensures log records are written in the correct order, respecting task dependencies.
  • Flexible API: Supports logging for single tasks or arrays of tasks.
  • Extensible: Easily add custom fields or logic for advanced logging needs.

Installation

Copy the following files into your Dataform project:

  • includes/logging/dataform_logger.js
  • includes/logging/sql.js
  • includes/logging/util.js

Ensure your project’s constants file defines the logging table location:

// Example in includes/constants.js
module.exports = {
  logging_project_id: "your_project",
  logging_schema: "your_schema",
  logging_table: "your_logging_table",
  default_location: "us-central1"
};

Logging Table Schema

Create a logging table in your warehouse (e.g., BigQuery):

CREATE OR REPLACE TABLE `your_project.your_schema.your_logging_table` (
  id STRING,
  job_id STRING,
  run_id STRING,
  task_index INT64,
  task_name STRING,
  task_description STRING,
  action STRING,
  start_time TIMESTAMP,
  end_time TIMESTAMP,
  duration_seconds FLOAT64,
  row_count INT64,
  comments STRING,
  created_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
);

Usage

1. Import and Instantiate

const { DataformLogger } = require("./includes/logging/dataform_logger.js");

const logger = new DataformLogger({
  project: "your_project",
  schema: "your_schema",
  table: "your_logging_table",
  location: "us-central1",
  run_id: "your_run_id"
});

2. Log Tasks

Wrap your Dataform tasks with logging:

logger.logTask({
  name: "supplier_table_creation",
  description: "Creates supplier table from source",
  comments: "Initial supplier load",
  task: supplierTableTask, // Dataform task object
  config: { countTarget: true } // Optional: count rows in target table
});

You can also log multiple tasks at once:

logger.logTask({
  name: "batch_processing",
  description: "Processes batches",
  comments: "Batch step",
  task: [task1, task2, task3], // Array of Dataform task objects
  config: {}
});

Creating Dataform Functions for Logging

To use DataformLogger for logging, your Dataform functions must return a Dataform task object (or an array of such objects).
A Dataform task object is typically the result of calling publish, declare, or operation in your JS definitions.

Example: Function Returning a Single Task

// definitions/supplier_table.js
function createSupplierTable(config) {
  return publish(config.table_name, {
    schema: config.schema,
    type: "table",
    query: `SELECT * FROM ${config.source_table}`
  });
}

// Usage with logger:
const supplierTableTask = createSupplierTable({
  table_name: "supplier_table",
  schema: "staging",
  source_table: "raw_supplier"
});

logger.logTask({
  name: "supplier_table_creation",
  description: "Creates supplier table from source",
  comments: "Initial supplier load",
  task: supplierTableTask,
  config: { countTarget: true }
});

Example: Function Returning an Array of Tasks

// definitions/batch_tables.js
function createBatchTables(batchConfigs) {
  return batchConfigs.map(cfg =>
    publish(cfg.table_name, {
      schema: cfg.schema,
      type: "table",
      query: `SELECT * FROM ${cfg.source_table} WHERE batch_id = ${cfg.batch_id}`
    })
  );
}

// Usage with logger:
const batchTasks = createBatchTables([
  { table_name: "batch_table_1", schema: "staging", source_table: "raw_data", batch_id: 1 },
  { table_name: "batch_table_2", schema: "staging", source_table: "raw_data", batch_id: 2 }
]);

logger.logTask({
  name: "batch_processing",
  description: "Processes batches",
  comments: "Batch step",
  task: batchTasks,
  config: { countTarget: true }
});

Important Notes

  • The returned object(s) must be the result of a Dataform action (publish, declare, or operation), not just a SQL string or config object.
  • If you want to log multiple tasks, return an array of Dataform task objects.
  • The logger will automatically handle both single and array cases.

Summary:

  • Your function should return a Dataform task object (for a single task) or an array of such objects (for multiple tasks).
  • Pass the returned value to logger.logTask as the task parameter.

API Reference

DataformLogger

Constructor

new DataformLogger({ project, schema, table, location, run_id })
  • project (string): Project ID for logging table.
  • schema (string): Schema (dataset) for logging table.
  • table (string): Logging table name.
  • location (string): Warehouse location/region.
  • run_id (string): Unique identifier for the pipeline run.

logTask

logTask({ name, description, comments, task, config })
  • name (string): Task name.
  • description (string): Task description.
  • comments (string): Additional comments.
  • task (object|array): Dataform task object(s).
  • config (object): Additional config for logging (see below).

Config Options

  • countTarget (boolean): If true, logs row count of the target table.
  • customSQL (object): Custom SQL for additional fields.
  • sql (string): Custom SQL for row count or other metrics.
  • getJobID (boolean): If true, logs warehouse job ID.

How It Works

  1. Insert Log Record:
    Before each task runs, a log record is inserted with metadata (name, description, start time, etc.).

  2. Update Log Record:
    After the task completes, the log record is updated with end time, duration, row count, and other metrics.

  3. Dependency Chaining:
    Each log record is chained to the previous one, ensuring correct execution order and traceability.


Extending

  • Add new fields to the logging table and update sql.js to support them.
  • Customize logging logic in dataform_logger.js for advanced use cases.

Consuming Logged Data

The logging table provides a rich source of metadata and performance metrics for your Dataform pipeline. You can use standard SQL queries to analyze task execution, monitor performance, and audit pipeline runs.

Example: Basic Query

SELECT
  task_name,
  start_time,
  end_time,
  duration_seconds,
  row_count,
  comments
FROM `your_project.your_schema.your_logging_table`
ORDER BY start_time DESC

Enriching Log Data with BigQuery Job Metadata

If you log the job_id for each Dataform task (using the getJobID config option), you can join your logging table to BigQuery's INFORMATION_SCHEMA.JOBS to obtain additional details about each warehouse job, such as slot usage, bytes processed, and job status.

Example: Join with INFORMATION_SCHEMA.JOBS

SELECT
  log.task_name,
  log.start_time,
  log.end_time,
  log.duration_seconds,
  log.row_count,
  log.comments,
  jobs.user_email,
  jobs.query,
  jobs.total_bytes_processed,
  jobs.total_slot_ms,
  jobs.state,
  jobs.creation_time,
  jobs.end_time AS job_end_time
FROM `your_project.your_schema.your_logging_table` AS log
LEFT JOIN `your_project.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` AS jobs
  ON log.job_id = jobs.job_id
WHERE log.start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY log.start_time DESC

Notes:

  • Replace region-us with your BigQuery region.
  • The join allows you to see who ran the job, how much data was processed, how long it took, and the SQL text for debugging or auditing.

Use Cases

  • Performance Monitoring: Track which tasks consume the most resources or take the longest.
  • Auditing: See who triggered each job and what SQL was executed.
  • Debugging: Investigate failed or slow jobs by correlating log records with warehouse job metadata.
  • Cost Analysis: Analyze slot usage and bytes processed for cost optimization.

Tip:
You can further join to other INFORMATION_SCHEMA tables (e.g., QUERY_HISTORY, TABLES, etc.) for deeper insights into your pipeline and warehouse


Troubleshooting

  • Table Not Created: Ensure the logging table exists before running your pipeline.
  • Missing Metrics: Check your config options and logging table schema.
  • Dependency Issues: Make sure all Dataform tasks are properly wrapped with logTask.

License

MIT License


Authors

  • DataformLogger originally by your team.
  • Contributions welcome!

See Also


For questions or issues, open an issue in your project repository.