An Introduction To Data Scaffolding

Sometimes, we’re interested in surfacing data on something that doesn’t happen. For example, that no units of product A were sold yesterday or that no new customer accounts were created last Thursday. This can be challenging, because most databases capture what happens, and not necessarily what doesn’t happen.

Take customer signups for example. A database might contain a customers table like the following. In it, we have some basic customer information – name, email address, and registration (signup) date.

This data, when aggregated, might look something like this:

Notice that we have values missing for certain dates, 3/4/2023 and 3/7/2023. On those days, no new customers signed up.

That’s useful information that we want to be able to see. But without any manipulation, we only see the below, that is, the days where we did have customer signups.

Then, we can use all_dates as a base and LEFT JOIN onto our aggregated customers data.

We’re taking care to select the date value from all_dates instead of aggregated_customer_data so we don’t get NULL registration_date values. And we’re using COALESCE() to replace NULL values for dates with no data with zeroes.

Then, the data when aggregated might look something like this:

Data Scaffolding with dates is a common use case, but there are other use cases too. Returning to the very first example we mentioned, handling when no units of Product A were sold, your data scaffold in this case would be a list of all the products. Hopefully you already have access to that in a products table in the database. In that case, use the products table as your base and then LEFT JOIN onto sales.

If you have questions on data scaffolding or other data problems, don’t be afraid to contact us!

Previous
Previous

Strategies for Improving Data Quality