time-grain-detector

v1.1.0

Published

3 years ago

Infer time grain from a list of dates

Downloads

0High
0Medium
0Low

paulrosenzweig

date time timestamp timeunit grain

Time Grain Detector

What it does

This Javascript library takes a list of timestamps and tells you the grain.

For example, ["2020-01-01", "2021-01-01", "2022-01-01"] all fall at the start of the year. While ["2021-11-10T09:00", "2021-11-10T10:00", "2021-11-10T11:00"] are on hour boundaries.

This should work if there are gaps in the data or if the timestamps are in a non-GMT timezone.

Why it's useful

If you have data that was already grouped along some time unit, this will let you guess what it was.

As a concrete example, suppose you run this SQL query and pass the results elsewhere for visualization.

select
  date_trunc('month', date) as date,
  sum(revenue) as revenue
from revenue
group by 1

The results from that query might look like this.

[
    {"date": "1999-10-01", "revenue": 100},
    {"date": "1999-11-01", "revenue": 200},
    {"date": "1999-12-01", "revenue": 300},
]

When you go to visualize the data, you might not know that 'month' was used in the SQL query. If you want to draw a bar chart of this data with a continuous temporal axis, you'll need to know that the bars are one month wide.

Incorrectly thinking it's daily data would lead to really skinny bars that are a month apart from eachother.

Knowing the grain is also helpful for imputing missing data and generating reasonable ticks/labels.

How to use it

The library's default export is a function, detectTimeGrain. It takes a list of Javascript date objects or ISO 8601 strings.

detectTimeGrain(["2000-01-01", "2000-01-02"])

detectTimeGrain([new Date(2000, 0, 1), new Date(2000, 0, 2)])

The return value is an object with two keys unit and count. The call above would return this object:

{ unit: 'day', count: 1 }

The count property is usually 1, but sometimes a multiple of the base unit is a better fit:

detectTimeGrain(["2000-01-01", "2010-01-01", "2020-01-01"])
// => { unit: 'year', count: 10 }

Algorithm

We want to pick the biggest unit where all the values fall along boundaries.

The basic algorithm checks for alignment along increasingly large grains. As soon as some grain fails, we know the previous grain was the largest.

The main iteration is almost a one-liner. Most of the logic lives in these "alignment" functions defined for each grain.

There are a few complications to this discussed below.

Timezones

Timezones complicate this algorithm by making it hard to see when dates fall on some boundary.

The timestamp 2000-06-11T12:00:00Z looks like it doesn't fall on the boundary between days, but it does in Auckland, New Zealand.

This library tries to be generous. If a series of timestamps all could fall on a boundary in some timezone, it considers that boundary a valid grain. We can't do this perfectly without a database of timezones, but we can be slightly more permissive than reality and still perform well on real data.

e.g. when checking for date alignment: ["2000-01-01T23:00:00Z", "2000-01-02T23:00:00Z", "2000-01-03T23:00:00Z"] ✅ These work somewhere. ["2000-01-01T23:00:00Z", "2000-01-02T22:00:00Z", "2000-01-03T21:00:00Z"] ❌ These don't.

Implementation Note

Javascript's Date class is limited to work in UTC or the local time. For example, there's no ability to get the time of day in US/Pacific unless you happen to be running in that timezone. This library should function the same in any system timezone, so the code only uses the UTC methods.

Once Temporal is widely available, I'd like to update this so it can optionally incorporate explicit timezone information.

Daylight savings

Many timezones around the world shift their UTC offset twice a year. This compounds the complexities of timezone offsets, since multiple offsets are valid in the same dataset.

Weeks

If a series of timestamps fall on year intervals, they also are aligned on months, days, hours, etc. This pattern lets us check grains and find the largest acceptable.

Weeks break this pattern. Dates that are aligned on month or year boundaries likely fall on different days of the week. The code has a notion of "skippable" grains to account for weeks.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme