Click on the banner to learn about and purchase my database training on Azure

Azure Data Factory (ADF) – How to create offline documentation (in CSV) of your project

Views: 1.651 views
Reading Time: 10 minutes

Hello everybody!
In this article, I would like to share with you a Powershell script to create a documentation of your project offline (in CSV) in Azure Data Factory, a very common need people have, especially to list and track source datasets in a form. easy, and also list Integration Runtimes (IR), Dataflows, Linked Services, Pipelines and Triggers.

The list of objects documented by this script follows:

  • Tables read and written by the ADF (Datasets)
  • Integration Runtime (IR) — lists, nodes and metrics
  • dataflows
  • Linked Services
  • pipelines
  • Triggers

Documentation result example:

My motivation for creating this article was the need to identify the source tables of my Data Factory project that had more than 300 datasets in total, 70 of which were tables in the source systems, and the rest were datasets from transformations.

When I needed to do this, I found the article Azure Data Factory Documenting, which had a very complete Powershell script, but the datasets part didn't return what I needed, which were more details regarding the tables, linkedservice used, table name, schema name, etc.

That's when I started studying the ADF API and created my own version of this script, implementing these improvements, plus a filter based on the ADF name, in case you want to update the documentation for just one instance instead of all.

Prerequisite: Install Azure Powershell Module (AZ)

If you haven't installed the Azure Powershell module yet (it doesn't come installed by default), you'll need to install this module to use the script in this article, and I'll walk you through the step-by-step instructions below.
Click here to install Azure Powershell Module (AZ)

To install the Azure (AZ) Powershell module, open Command Prompt (cmd.exe):

Enter the command powershell, to start the Powershell interface:

Enter the command Install-Module -Name Az

An untrusted repository warning will appear for you. Type the letter “A” (Yes to all) to proceed with the installation.

Ready! Azure Powershell Module (AZ) installed successfully.

Notice: If your PowerShell finds an error while trying to run the Install-Module command, it is because your machine is still using PowerShell 4.0 or earlier. If this is your case, you will need to install the Windows Management Framework 5.1, according the official documentation guides us, to add this module to your PowerShell or update your Powershell version for a newer version.

If you have already installed the module, you can ignore this topic and go straight to using the script.

Testing the Azure Powershell connection

If you want to test if the connection is working normally, use the guide below.

If you think it's working, you can skip this step. If you try to run the documentation script and encounter error messages, try the tests below.
Click here to view content

To test the connection to Azure Powershell, open Command Prompt (cmd.exe):

Enter the command powershell, to start the Powershell interface:

We will now need to type the command Connect-AzAccount to connect to Azure via Powershell, but first, we will need to import the module. To do this, type the command Set-ExecutionPolicy Unrestricted to allow the script to run, then type the command Import-Module Az.Accounts:

Now enter the command Connect-AzAccount, to connect to Azure through Powershell:

Enter your credentials and you will be authenticated to Azure.

As I have more than one TentandId in my Azure account, I got some notices stating this. The selected account is the first one returned, but in order not to run the risk of running the script in the wrong subscription, let's set the subscription I want manually using the command Set-AzConnect -TenantId "xxxx-xxxx-xxxx-xxxx"

To find out which Tentant ID you want to use, go to the Azure portal and open the properties of one of the Azure Data Factories (ADF) you want to use, and copy the “Managed Identity Tenant” property.

The tenant I selected also has more than one subscription. To manually define which one I want to use, I'll use the command Set-AzContext -Subscription "xxxx-xxxx-xxxx-xxxx"

To find out the Subscription ID, go back to the Azure portal and open the properties of one of the Azure Data Factories (ADF) you want to use, and copy the “Subscription” property. This ID is also present in the URL itself and also makes up the Resource ID address.

How to use Powershell script

To use the ExportAzureDataFactoryDocumentation.ps1 powershell script and start documenting your Azure Data Factory (ADF) instances, open Command Prompt (cmd.exe):

Enter the command powershell, to start the Powershell interface:

Navigate to the local directory where you downloaded the ExportAzureDataFactoryDocumentation.ps1 script using the command cd "local_directory" and type run the script below:

To find out which Tentant ID you want to use, go to the Azure portal and open the properties of one of the Azure Data Factories (ADF) you want to use, and copy the “Managed Identity Tenant” property.

To find out the Subscription ID, copy the “Subscription” property. This ID is also present in the URL itself and also makes up the Resource ID address.

You can also use the full file path without having to navigate to the directory and also filter the Data Factory name to export the documentation from one instance only instead of exporting from all instances of the subscription.

Execution Result:

Successfully generated files!

Content of generated files

I'll show you here what you can expect from the generated documentation and what information is returned.

List of Data Factories

List of Dataflows

List of Datasets

List of Pipelines

List of Integration Runtime (IR)

List of Linked Services

ExportAzureDataFactoryDocumentation.ps1 script source code

The code for this script is available in this github repository here, where you can always have access to the most up-to-date version of the code and also submit improvements and new features, but I'll also leave the code right here below:

Observation: I won't be updating the code that is here in the post, so give preference to the code that is in the GitHub Repository.

That's it folks!
Hope you enjoyed this article and see you next time!