dbt Macros: Group By
One of the benefits of using dbt is the macros. A dbt macro is a piece of Jinja code that can be reused multiple times. They can be a time saver by using Python code to reduce tedium. Listing a bunch of numbers in a group by statement can be very tedious, especially if there are a lot of columns that need to be included. dbt Labs provides a group by macro in their dbt-utils package.
{{ group_by(10) }}
compiles to
GROUP BY 1,2,3,4,5,6,7,8,9,10
However, there are cases where you may not want to have all of the aggregates in the last columns of your table. For example, there might be a table with 18 columns, but the aggregated columns are in the 11th, 12th, 13th, and 14th columns. Since the group by macro is a code replacement, it can be used for the first 10 columns while the 15th through 18th columns are added manually.
{{ group_by(10) }},15,16,17,18
compiles to
GROUP BY 1,2,3,4,5,6,7,8,9,10,15,16,17,18
What about those cases where there is an aggregated column in the middle of everything? The default group by macro can help for the columns before the aggregated column. But after that, you are on your own! Seeing some potential for improvement, I made a small modification to the existing macro which allows for specified columns to be removed. Below is the code for the adjusted group by macro.
{%- macro group_by(num, except) -%}
{#- Check if except is a list and convert to a list if needed -#}
{%- if except is not iterable -%}
{%- set except = [except] -%}
{%- endif -%}
{#- Generate group by with expected numbers -#}
GROUP BY {% for i in range(1, num + 1) -%}
{%- if i not in except -%}
{{ i }}{{ ',' if not loop.last }}
{%- endif -%}
{%- endfor -%}
{%- endmacro -%}
This simple addition to the existing dbt macro allows for more flexibility. Being able to code away some of the tedious parts of model building lets you get back to fun and exciting parts sooner.