Skip to main content

Why I Stopped Worrying and Learned to Love Denormalized Tables

glean.io

I quickly learned that writing one giant query with a bunch of joins or even bunch of Python helper functions could get me stuck. My transformation functions weren’t flexible enough, or my joins were too complicated to answer the endless variety of questions thrown my way while keeping the numbers correct.

Instead, the easiest way to be fast, nimble, and answer all the unexpected questions was to prepare a giant table or dataframe and limit myself to it. As long as I understood the table’s contents, it was harder to make mistakes. I could group by and aggregate on the fly with confidence.