Power BI Dataflow and Power Query Housekeeping

Data Organization

I have been working a lot lately with Power BI Dataflows. Dataflows were quite easy to pick up considering data prep is handled using Power Query. This version of Power Query is similar to Power Query in Power BI Desktop with only a few minor differences. As I worked through my queries, I noticed that over the years I have developed a routine or a standard set of steps that most of my queries follow. Of course, there are always exceptions and queries that require me to do things differently, but here are some general guidelines I follow to make sure my queries are efficient and easy to understand.

I make modifications as close to the source as possible. This prevents potentially duplicating efforts later in my data prep and (sometimes) allows me take advantage of query folding. On the topic of duplicating efforts, I always consider whether an existing query can be Duplicated or Referenced in Power Query before creating a new one. Performing data prep on the same data in different queries can produce different results if not careful.

I eliminate multiple steps that perform the same operation/function. For example, I do not have multiple “Renamed Columns” or “Remove Columns” steps. This makes my Applied Steps easier to understand and helps me to avoid errors and confusion about what is happening in each step. It can be very frustrating to search in multiple steps to find where a column name was misspelled.

On average, each of my queries only has 10-12 steps. Actually, my first seven steps are typically the same or very similar. After connecting to my source, I normally apply filters (if applicable), remove invalid columns (type table, record, list, nullable binary, binary or function), remove unnecessary columns, change/correct data types and finally rename columns.

I always make sure to provide a descriptive name to every Applied Step. Power Query does a fairly good job of naming steps. However, the names can get repetitive if the same function is used multiple times. Descriptive names help when modifying existing queries.

When changing column names, I always make the changes in the Power Query Editor instead of directly in Power BI Desktop. This prevents potential extra “Renamed Column” steps from being added to my queries.

For each of my queries, I always consider whether “Enable Load” and “Include in Report Refresh” should be checked/enabled. If a query is only used as an intermediate step for another query, there is typically no need to “Enable Load”. If I know queried data is not going to change, I will uncheck “Include in Report Refresh” to avoid pulling it again for no reason. This can help to speed up refresh times.

Finally, I like to organize my queries into logical Groups/Folders and sort them alphabetically to make them easier to find and work with.

Hopefully, you find this useful or my guidelines at least get you thinking about what your typical Power Query set of Applied Steps looks like. It is always interesting to see how others work and I am always looking for better ways to get the job done.

Leave a Reply

Your email address will not be published. Required fields are marked *