Power BI Dataflow and Power Query Housekeeping

Data Organization

I have been working a lot lately with Power BI Dataflows. Dataflows were quite easy to pick up considering data prep is handled using Power Query. This version of Power Query is similar to Power Query in Power BI Desktop with only a few minor differences. As I worked through my queries, I noticed that over the years I have developed a routine or a standard set of steps that most of my queries follow. Of course, there are always exceptions and queries that require me to do things differently, but here are some general guidelines I follow to make sure my queries are efficient and easy to understand.

I make modifications as close to the source as possible. This prevents potentially duplicating efforts later in my data prep and (sometimes) allows me take advantage of query folding. On the topic of duplicating efforts, I always consider whether an existing query can be Duplicated or Referenced in Power Query before creating a new one. Performing data prep on the same data in different queries can produce different results if not careful.

I eliminate multiple steps that perform the same operation/function. For example, I do not have multiple “Renamed Columns” or “Remove Columns” steps. This makes my Applied Steps easier to understand and helps me to avoid errors and confusion about what is happening in each step. It can be very frustrating to search in multiple steps to find where a column name was misspelled.

On average, each of my queries only has 10-12 steps. Actually, my first seven steps are typically the same or very similar. After connecting to my source, I normally apply filters (if applicable), remove invalid columns (type table, record, list, nullable binary, binary or function), remove unnecessary columns, change/correct data types and finally rename columns.

I always make sure to provide a descriptive name to every Applied Step. Power Query does a fairly good job of naming steps. However, the names can get repetitive if the same function is used multiple times. Descriptive names help when modifying existing queries.

When changing column names, I always make the changes in the Power Query Editor instead of directly in Power BI Desktop. This prevents potential extra “Renamed Column” steps from being added to my queries.

For each of my queries, I always consider whether “Enable Load” and “Include in Report Refresh” should be checked/enabled. If a query is only used as an intermediate step for another query, there is typically no need to “Enable Load”. If I know queried data is not going to change, I will uncheck “Include in Report Refresh” to avoid pulling it again for no reason. This can help to speed up refresh times.

Finally, I like to organize my queries into logical Groups/Folders and sort them alphabetically to make them easier to find and work with.

Hopefully, you find this useful or my guidelines at least get you thinking about what your typical Power Query set of Applied Steps looks like. It is always interesting to see how others work and I am always looking for better ways to get the job done.

Use a Stored Procedure as your Source for the Copy Data Activity in Azure Data Factory

Azure Data Factory is Microsoft’s cloud-based data integration service that is positioned as the new and modern tool for ETL / ELT. From ADF you can connect to over 80 different types of data sources including Microsoft Azure, Amazon Redshift or S3, Google Cloud Storage and other services such as Dynamics 365, Salesforce and SAP Cloud. You can also connect to your on-premise data using a self-hosted Azure Data Factory Integration Runtime. This opens access to files within your network, on-premise databases such as SQL Server or even Microsoft Access databases/files.

Once you’ve established connections to your data using the Integration Runtimes and the ADF Linked Services, you will use Pipelines to move data and perform your transformations. Azure Data Factory offers a wide range of Activities that can be used within a Pipeline. These include activities to move & transform data such as Copy Data and Data Flow. There are also a wide range of other activity types including Azure Function, Batch Service, Databricks, Data Lake Analytics / U-SQL, HDInsight and Machine Learning using Azure. Whether you are dealing with a small number of files, a transactional database or a large data lake, Azure Data Factory can handle it.

One of the common activities within an Azure Data Factory Pipeline is Copy Data. The Copy Data activity can be used to copy data between a Source and a Sink (destination). This can be from your on-premise database to your Azure SQL Data Warehouse (Azure Synapse Analytics), from flat files to your Azure Data Lake Storage or between any number of combinations within ADF’s 80+ available data sources. Each of Azure Data Factory’s different activity types could warrant an entire book of tips and tricks. As with most technologies, their implementation can be as simple or complex as you need them. Azure Data Factory is advertised as “Code-Free ETL as a Service”!

This blog post focuses specifically on how to use an Azure SQL / SQL Server Stored Procedure as your Source in a Copy Data Activity. The reason I wanted to write about this is because I find it very confusing that you must choose a table as your Source Dataset first before you can see the “Use Query” option in the Copy Data activity.
As an example, here is a new ADF Pipeline with a new Copy Data Activity added to it:

As you will notice, the “Source dataset” dropdown only allows you to choose an existing Dataset or choose from a list of Tables in your Azure SQL / SQL Server database. What if you want to use a SQL query or a Stored Procedure as your data source? Unfortunately, first you must choose an existing Table before you can see the “Use query” radio button. (I am embarrassed to say how long it took me to realize this.) Once you select a Table, you are able to choose from Table, Query or Stored Procedure. Although it doesn’t indicate this, if you use Query or Stored Procedure, it is not required to incorporate the Table that was selected and is still showing in the “Source dataset” dropdown.

I am very curious about this setup and feel it would make more sense to move the “Use query” radio button before the “Source dataset” dropdown. Of course, you’d have to also choose a Connection, but it would still be more user friendly than it is now. Hopefully, you find this post helpful if you are just getting started with Azure Data Factory. Feel free to reach out if you have any questions or need assistance with a project you are working on!

Pause Power BI Embedded with Azure Functions

Power BI Embedded is great for embedding Power BI content into custom applications, websites and portals via an embedded iframe. With its ability to scale up, scale down and even be paused, it is also an excellent way to test out Power BI Premium functionality. If you are in the process of setting up Power BI Premium capacity but are not quite ready to commit to the Premium capacity P SKUs (P1-P3), Power BI Embedded is a great alternative for your development and testing.

Currently, Power BI Embedded offers 6 different nodes/sizes. The cores, memory and infrastructure can vary based on your needs. An A4 Embedded SKU is the equivalent of a P1 Premium SKU and is the minimum required to use Power BI Paginated Reports, but the smaller SKUs (A1-A3) can be used to test other Power BI Premium capabilities. Also, Power BI Embedded can be scaled up or down easily via the Azure portal and can even be paused when you are not using it. This is great for adjusting the performance to suit your needs or even pausing it during off hours to reduce costs.

As with most services available in Azure, there are multiple ways for interacting with your Power BI Embedded service. There is a REST API that provides the ability to create, retrieve, update and delete Power BI dedicated capacities. There are also Power BI Embedded PowerShell Cmdlets that allow you to interact with your Power BI Embedded Capacity and related Workspaces.

In my demo below, I will be using the PowerShell Cmdlets to pause Power BI Embedded using an Azure Function. However, this is only one of the many ways you can use the Power BI Embedded REST API and PowerShell Cmdlets. I chose Azure Functions due to their flexibility of supporting multiple languages as well as the ease of integrating with other Azure services. Also, using the Azure Functions consumption plan, you get 1 millions requests and up to 400,000 GB-s free every month. My function will only run once a day and will not use many resources, so it will run free in my Azure subscription.

If you do not have Power BI Embedded setup, you will need to do that as your first step.

After setting up my Power BI Embedded service, my next step was to create an Azure Function. Since I will be using PowerShell Cmdlets to interact with Power BI Embedded, I chose PowerShell Core as my Runtime Stack.

As discussed previously, I will be using the Consumption plan to take advantage of the free requests/resources. Also, be sure to turn on Application Insights so you can monitor the Azure Function.

Once the Azure Function has been created, the next step is to turn on the System Assigned Managed Identity for the Azure Function. This will allow the function to control Power BI Embedded.

With the Azure Function setup with a system assigned managed identity, the next step is to go to your Power BI Embedded service and add a role assignment to that identity. “Click Access control (IAM)”, then “Add a role assignment”. I used the Owner role in my demo, but you may want to get more granular with your security. The managed identity will have the same name as your Azure Function App Service (mine was faedwglock001).

Now we need to setup the Azure Function. For this demo, I am using the Time Trigger template. The HTTP Trigger is great if you want to send an HTTP request to interact with your function/Power BI Embedded Capacity.

I set my function up to run once daily at 5 PM. The Cron Expression uses UTC time so you will need to adjust accordingly.

Here is the PowerShell script I used. For this demo, it is very basic and checks to see if my Power BI Embedded Capacity is paused. If not, it pauses it using the Suspend-AzPowerBIEmbeddedCapacity cmdlet.

The hardest part of setting this up for me was figuring out how and where to access the PowerShell cmdlets. To find the proper path for my PowerBIEmbedded.psd1, I used the Azure Functions Advanced Tools (Kudu). In the PowerShell Debug Console, I was able to browse the folders and find what I was looking for.

This was a quick demo of how to pause Power BI Embedded using Azure Functions. If this is something you need assistance with or you are looking for someone to help on your Power BI journey, contact us.

Welcome to Ali’i Solutions!

Welcome to Ali’i Solutions! We are a full-service consulting firm specializing in Data and Analytics and Management Consulting. Whether you have a short-term project you need help with, or you are looking for a long-term partner, we would love to hear from you! We offer a wide range of services including:

Data and Analytics

Ali’i Solutions is a proud Microsoft Partner specializing in Data and Analytics. We have expertise in Power BI Development and Power BI Adoption and can help with Power BI POCs, enterprise deployments and Power BI Managed Services. We also have extensive experience working with transactional data from ERP systems such as Microsoft Dynamics 365 (previously Dynamics AX, GP, NAV and SL), JD Edwards, NetSuite and Oracle and CRM systems such as Salesforce and Dynamics CRM.

Management Consulting

Our consultants have many years of experience in Project Management, Vendor Management and Process Improvement. Whether it’s a quick project or a corporate initiative, we are ready to apply best practices and industry standards to fulfill your needs.

Salesforce Consulting

Ali’i Solutions offers Salesforce Administrator and Salesforce Developer consulting. Whether you need part-time administrator help or you have a development project in the pipeline, Ali’i Solutions consultants can help! We have experience integrating Salesforce with DocuSign or subscription billing platforms such as Zuora, or we can integrate Salesforce with a wide range of other APIs.

Cloud Adoption

As a Microsoft Partner, we have expertise and experience with many services offered by Microsoft. This includes their cloud computing platform, Microsoft Azure. Whether you are looking to migrate some of your on-premise services or applications to the cloud or you are interested in exploring new services such as Microsoft Cognitive Services or Azure Bot Services, we are here to help! Contact us for more information.

Brand Development

Whether you need assistance with brand identity, brand strategy or marketing and design, Ali’i Solutions can help! We are experienced marketing and design professionals who specialize in differentiating you from the competition.

Custom Development

Let Ali’i Solutions help with your next custom development project! Whether it is a custom application, a new website or you are interested in API development, give us a call! We can transform existing applications and forms into more modern solutions, or we can build new ones from scratch.