Azure Data Factory is Microsoft’s cloud-based data integration service that is positioned as the new and modern tool for ETL / ELT. From ADF you can connect to over 80 different types of data sources including Microsoft Azure, Amazon Redshift or S3, Google Cloud Storage and other services such as Dynamics 365, Salesforce and SAP Cloud. You can also connect to your on-premise data using a self-hosted Azure Data Factory Integration Runtime. This opens access to files within your network, on-premise databases such as SQL Server or even Microsoft Access databases/files.
Once you’ve established connections to your data using the Integration Runtimes and the ADF Linked Services, you will use Pipelines to move data and perform your transformations. Azure Data Factory offers a wide range of Activities that can be used within a Pipeline. These include activities to move & transform data such as Copy Data and Data Flow. There are also a wide range of other activity types including Azure Function, Batch Service, Databricks, Data Lake Analytics / U-SQL, HDInsight and Machine Learning using Azure. Whether you are dealing with a small number of files, a transactional database or a large data lake, Azure Data Factory can handle it.
One of the common activities within an Azure Data Factory Pipeline is Copy Data. The Copy Data activity can be used to copy data between a Source and a Sink (destination). This can be from your on-premise database to your Azure SQL Data Warehouse (Azure Synapse Analytics), from flat files to your Azure Data Lake Storage or between any number of combinations within ADF’s 80+ available data sources. Each of Azure Data Factory’s different activity types could warrant an entire book of tips and tricks. As with most technologies, their implementation can be as simple or complex as you need them. Azure Data Factory is advertised as “Code-Free ETL as a Service”!
This blog post focuses specifically on how to use an Azure SQL / SQL Server Stored Procedure as your Source in a Copy Data Activity. The reason I wanted to write about this is because I find it very confusing that you must choose a table as your Source Dataset first before you can see the “Use Query” option in the Copy Data activity.
As an example, here is a new ADF Pipeline with a new Copy Data Activity added to it:
As you will notice, the “Source dataset” dropdown only allows you to choose an existing Dataset or choose from a list of Tables in your Azure SQL / SQL Server database. What if you want to use a SQL query or a Stored Procedure as your data source? Unfortunately, first you must choose an existing Table before you can see the “Use query” radio button. (I am embarrassed to say how long it took me to realize this.) Once you select a Table, you are able to choose from Table, Query or Stored Procedure. Although it doesn’t indicate this, if you use Query or Stored Procedure, it is not required to incorporate the Table that was selected and is still showing in the “Source dataset” dropdown.
I am very curious about this setup and feel it would make more sense to move the “Use query” radio button before the “Source dataset” dropdown. Of course, you’d have to also choose a Connection, but it would still be more user friendly than it is now. Hopefully, you find this post helpful if you are just getting started with Azure Data Factory. Feel free to reach out if you have any questions or need assistance with a project you are working on!