Hello everybody!
In this article, I would like to share with you a Powershell script to create a documentation of your project offline (in CSV) in Azure Data Factory, a very common need people have, especially to list and track source datasets in a form. easy, and also list Integration Runtimes (IR), Dataflows, Linked Services, Pipelines and Triggers.
The list of objects documented by this script follows:
- Tables read and written by the ADF (Datasets)
- Integration Runtime (IR) — lists, nodes and metrics
- dataflows
- Linked Services
- pipelines
- Triggers
My motivation for creating this article was the need to identify the source tables of my Data Factory project that had more than 300 datasets in total, 70 of which were tables in the source systems, and the rest were datasets from transformations.
When I needed to do this, I found the article Azure Data Factory Documenting, which had a very complete Powershell script, but the datasets part didn't return what I needed, which were more details regarding the tables, linkedservice used, table name, schema name, etc.
That's when I started studying the ADF API and created my own version of this script, implementing these improvements, plus a filter based on the ADF name, in case you want to update the documentation for just one instance instead of all.
Prerequisite: Install Azure Powershell Module (AZ)
If you haven't installed the Azure Powershell module yet (it doesn't come installed by default), you'll need to install this module to use the script in this article, and I'll walk you through the step-by-step instructions below.
Click here to install Azure Powershell Module (AZ)
If you have already installed the module, you can ignore this topic and go straight to using the script.
Testing the Azure Powershell connection
If you want to test if the connection is working normally, use the guide below.
If you think it's working, you can skip this step. If you try to run the documentation script and encounter error messages, try the tests below.
Click here to view content
How to use Powershell script
To use the ExportAzureDataFactoryDocumentation.ps1 powershell script and start documenting your Azure Data Factory (ADF) instances, open Command Prompt (cmd.exe):
Enter the command powershell, to start the Powershell interface:
Navigate to the local directory where you downloaded the ExportAzureDataFactoryDocumentation.ps1 script using the command cd "local_directory" and type run the script below:
1 2 3 4 5 6 |
# Não se esqueça do ponto (.) antes do caminho do arquivo! # Ah, o diretório de saída (OutputFolder) deve existir ."ExportAzureDataFactoryDocumentation.ps1" -TenantId "8a74e1e0-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -SubscriptionId "c74c4d41-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -OutputFolder "C:\Dirceu\Exported\" |
To find out which Tentant ID you want to use, go to the Azure portal and open the properties of one of the Azure Data Factories (ADF) you want to use, and copy the “Managed Identity Tenant” property.
To find out the Subscription ID, copy the “Subscription” property. This ID is also present in the URL itself and also makes up the Resource ID address.
You can also use the full file path without having to navigate to the directory and also filter the Data Factory name to export the documentation from one instance only instead of exporting from all instances of the subscription.
1 2 3 4 5 6 7 8 9 |
Set-ExecutionPolicy Unrestricted Import-Module Az.Accounts ."C:\Dirceu\adf-doc\ExportAzureDataFactoryDocumentation.ps1" -TenantId "8a74e1e0-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -SubscriptionId "c74c4d41-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -OutputFolder "C:\Dirceu\Exported\" -DataFactoryName "nome do seu adf" |
Content of generated files
I'll show you here what you can expect from the generated documentation and what information is returned.
List of Integration Runtime (IR)
ExportAzureDataFactoryDocumentation.ps1 script source code
The code for this script is available in this github repository here, where you can always have access to the most up-to-date version of the code and also submit improvements and new features, but I'll also leave the code right here below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 |
<# .History 21/07/2020 - 1.0 - Initial release - David Alzamendi 03/10/2021 - 1.1 - Included DataFactoryName parameter and Datasets enhancements .Synopsis Export Azure Data Factory V2 information using Power Shell cmdlets .DESCRIPTION This script export the Azure Data factory V2 information using the following cmdlets Pre-requirements: AzModule ----> Install-Module -Name Az Be connected to Azure ----> Connect-AzAccount APIs documentation is available in https://docs.microsoft.com/en-us/powershell/module/az.datafactory/?view=azps-4.2.0#data_factories Module descriptions are in https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.commands.datafactoryv2.models?view=azurerm-ps .EXAMPLE ExportAzureDataFactoryDocumentation -TenantId "XXXXXX-XXXXXX-XXXXXX-XXXXXX" -OutputFolder "C:\Temp\" #> Param ( # Tenant Id [Parameter(Mandatory=$true)] $TenantId, # Subscription Id [Parameter(Mandatory=$true)] $SubscriptionId, # Define folder name [Parameter(Mandatory=$true)] $OutputFolder, # Define Data Factory name [Parameter(Mandatory=$false)] $DataFactoryName ) Begin { write-host "Starting tenant" $TenantId # Define file names $ADFOutputFile = $OutputFolder + "ADF_$(get-date -f yyyyMMdd).csv" $ADFDataflowOutputFile = $OutputFolder + "ADF_Dataflows_$(get-date -f yyyyMMdd).csv" $ADFDatasetsOutputFile = $OutputFolder + "ADF_Datasets_$(get-date -f yyyyMMdd).csv" $ADFLinkedServiceOutputFile = $OutputFolder + "ADF_LinkedServices_$(get-date -f yyyyMMdd).csv" $ADFPipelineOutputFile = $OutputFolder + "ADF_Pipelines_$(get-date -f yyyyMMdd).csv" $ADFTriggerOutputFile = $OutputFolder + "ADF_Triggers_$(get-date -f yyyyMMdd).csv" $ADFIntegrationRuntimeOutputFile = $OutputFolder + "ADF_IntegrationRuntimes_$(get-date -f yyyyMMdd).csv" $ADFIntegrationRuntimeMetricOutputFile = $OutputFolder + "ADF_IntegrationRuntimeMetrics_$(get-date -f yyyyMMdd).csv" $ADFIntegrationRuntimeNodeOutputFile = $OutputFolder + "ADF_IntegrationRuntimeNodes_$(get-date -f yyyyMMdd).csv" # Connect to Azure Connect-AzAccount -Tenant $TenantId # Change Subscription Set-AzContext -Subscription $SubscriptionId $adflist = Get-AzDataFactoryV2 # Define function for hash tables like tags or nodes function Resolve-Hashtable { param ( $Collection ) if($Collection.Keys -gt 0) { $KeyArray = @() $Collection.Keys | ForEach-Object { $KeyArray += "[$_ | $($Collection[$_])] " } } else { $KeyArray = '' } [string]$KeyArray } # Define function for pipelines parameters hash table function Resolve-Hashtable-Pipelines { param ( $Collection ) if($Collection.Keys -gt 0) { $KeyArray = @() $Collection.Keys | ForEach-Object { $KeyArray += "[$_] " } } else { $KeyArray = '' } [string]$KeyArray } # Define function for pipeline lists function Resolve-List-Pipelines { param ( $List ) if($List.Count -gt 0) { for ($i = 0; $i -lt $List.Count; $i++) { $KeyString += "["+ $List.Item($i).Name + " | " + $List.Item($i) + " | " + $List.Item($i).Description + "]" $KeyString = $KeyString.replace("Microsoft.Azure.Management.DataFactory.Models.","") } } else { $KeyString = '' } [string]$KeyString } } Process { # Start foreach ($adfname in $adflist) { if ( $DataFactoryName -eq $null -Or $DataFactoryName -eq $adfname.DataFactoryName ) { Write-host "Starting data factory" $adfname.DataFactoryName # Get-AzDataFactoryV2 # Gets information about Data Factory. Get-AzDataFactoryV2 -ResourceGroupName $adfname.ResourceGroupName -Name $adfname.DataFactoryName ` | Select-Object ResourceGroupName,DataFactoryName,Location,@{Name = 'Tags'; Expression = {Resolve-Hashtable($_.Tags)}} ` | Export-Csv -Append -Path $ADFOutputFile -NoTypeInformation # Get-AzDataFactoryV2DataFlow # Gets information about data flows in Data Factory. # 21/07/2020 Mapping Data Flows are not available for factories in the following regions: West Central US, Australia Southeast. # To use this feature, please create a factory in a supported region. Write-host "Exporting" $adfname.DataFactoryName "Data flows" try { Get-AzDataFactoryV2DataFlow -ErrorAction Stop -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName ` | Select-Object ResourceGroupName,DataFactoryName,@{L=’DataFlowName’;E={$_.Name}}, @{L=’DataFlowType’;E={$_.Properties -replace "Microsoft.Azure.Management.DataFactory.Models.",""}} ` | Export-Csv -Append -Path $ADFDataflowOutputFile -NoTypeInformation -ErrorAction Stop } catch { Write-host "Data Flows are not available for factories in the following regions: West Central US, Australia Southeast." } # Get-AzDataFactoryV2Dataset # Gets information about datasets in Data Factory. Write-host "Exporting" $adfname.DataFactoryName "Data sets" Get-AzDataFactoryV2Dataset -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName | Select-Object ` ResourceGroupName, DataFactoryName, @{ L=’Folder’; E={ ($_).Properties.Folder.Name } }, @{ L=’DatasetType’; E={ $_.Properties -replace "Microsoft.Azure.Management.DataFactory.Models.","" } }, @{ L=’DatasetName’; E={ $_.Name } }, @{ L=’SchemaName’; E={ $obj = ($_) $type = $obj.Properties -replace "Microsoft.Azure.Management.DataFactory.Models.",""; switch ($type) { "SqlServerTableDataset" { $obj.Properties.SqlServerTableDatasetSchema; break } "AzureSqlTableDataset" { $obj.Properties.AzureSqlTableDatasetSchema ; break } } } }, @{ L=’TableName’; E={ $obj = ($_) $type = $obj.Properties -replace "Microsoft.Azure.Management.DataFactory.Models.",""; switch ($type) { "CommonDataServiceForAppsEntityDataset" { $obj.Properties.entityName; break } "SqlServerTableDataset" { $obj.Properties.Table; break } "AzureSqlTableDataset" { $obj.Properties.Table; break } "DelimitedTextDataset" { -join($obj.Properties.location.container, "\", $obj.Properties.location.fileName); break } } } }, @{ L=’LinkedServiceName’; E={ ($_).Properties.LinkedServiceName.ReferenceName } }, @{ L=’LinkedServiceConnectionString’; E={ $dataRawLocal = (Get-AzDataFactoryV2LinkedService -ResourceGroupName ($_).ResourceGroupName -DataFactoryName ($_).DataFactoryName -Name ($_).Properties.LinkedServiceName.ReferenceName).Properties.AdditionalProperties; $secretNameValid = (Select-String -InputObject $($dataRawLocal.typeProperties) -Pattern "secretName.*"); if( $secretNameValid -eq $null ) { (Get-AzDataFactoryV2LinkedService -ResourceGroupName ($_).ResourceGroupName -DataFactoryName ($_).DataFactoryName -Name ($_).Properties.LinkedServiceName.ReferenceName).Properties.ConnectionString -replace "Integrated Security=False;Encrypt=True;Connection Timeout=30;", "" } else { -join ("AzureKeyVaultSecret: ", $secretNameValid.Matches.Value.Split(":")[1].trim().Replace("`"", "")) } } }, @{ L=’LinkedServiceServiceUri’; E={ (Get-AzDataFactoryV2LinkedService -ResourceGroupName ($_).ResourceGroupName -DataFactoryName ($_).DataFactoryName -Name ($_).Properties.LinkedServiceName.ReferenceName).Properties.ServiceUri } } ` | Export-Csv -Append -Path $ADFDatasetsOutputFile -NoTypeInformation #Get-AzDataFactoryV2Dataset -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName ` #| Select-Object ResourceGroupName,DataFactoryName,@{L=’DatasetName’;E={$_.Name}}, @{L=’DatasetType’;E={$_.Properties -replace "Microsoft.Azure.Management.DataFactory.Models.",""}}, @{L=’TableName’;E={($_).Properties.Table}}, @{L=’SchemaName’;E={($_).Properties.SqlServerTableDatasetSchema}} ` #| Export-Csv -Append -Path $ADFDatasetsOutputFile -NoTypeInformation # Get-AzDataFactoryV2IntegrationRuntime # Gets information about integration runtime resources. # Nodes column is available, but they are not being populated Write-host "Exporting" $adfname.DataFactoryName "Integration runtimes" Get-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName -Status ` | Select-Object ResourceGroupName,DataFactoryName,@{L=’IntegrationRuntimeName’;E={$_.Name}},@{L=’IntegrationRuntimeType’;E={$_.Type}},Description,State,CreateTime,AutoUpdate,ScheduledUpdateDate,Version,VersionStatus,LatestVersion,PushedVersion ` | Export-Csv -Append -Path $ADFIntegrationRuntimeOutputFile -NoTypeInformation $irlist = Get-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName foreach ($irname in $irlist) { # Integration Runtime needs to be online to capture further information if($irname.state -eq "Online" -or $irname.state -eq "Starting" -or $irname.Type -eq "SelfHosted") { # Get-AzDataFactoryV2IntegrationRuntimeMetric # Gets information about integration runtime metrics. Write-host "Exporting" $adfname.DataFactoryName "Integration runtime metrics" $irname.Name Get-AzDataFactoryV2IntegrationRuntimeMetric -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName $irname.Name ` | Select-Object ResourceGroupName,DataFactoryName,@{Name = 'Nodes'; Expression = {Resolve-Hashtable($_.Nodes) }} ` | Export-Csv -Append -Path $ADFIntegrationRuntimeMetricOutputFile -NoTypeInformation $metrics = Get-AzDataFactoryV2IntegrationRuntimeMetric -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName $irname.Name foreach ($metricname in $irname.Nodes) { try { # Get-AzDataFactoryV2IntegrationRuntimeNode # Gets an integration runtime node information. Write-host "Exporting" $adfname.DataFactoryName "Integration runtime" $irname.Name " node" $metricname.NodeName Get-AzDataFactoryV2IntegrationRuntimeNode -ErrorAction Stop -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName -IntegrationRuntimeName $irname.Name -IpAddress -Name $metricname.NodeName ` | Select-Object ResourceGroupName,DataFactoryName, IntegrationRuntimeName,@{L=’NodeName’;E={$_.Name}},Status,MachineName,VersionStatus,Version,IPAddress,ConcurrentJobsLimit ` | Export-Csv -Append -Path $ADFIntegrationRuntimeNodeOutputFile -NoTypeInformation } catch { write-host "Impossible to retrieve information from" $metricname.NodeName # Add-Content -Path -Append $ADFIntegrationRuntimeNodeOutputFile -NoTypeInformation -Value "$adfname.ResourceGroupName,$adfname.DataFactoryName,$irname.Name,$metricname.NodeName,Unreachable" $Unreachable = "{0},{1},{2},{3},{4}" -f $adfname.ResourceGroupName,$adfname.DataFactoryName,$irname.Name,$metricname.NodeName,"Unreachable" $Unreachable | add-content -path $ADFIntegrationRuntimeNodeOutputFile } } } } # Get-AzDataFactoryV2LinkedService # Gets information about linked services in Data Factory. Write-host "Exporting" $adfname.DataFactoryName "Linked services" Get-AzDataFactoryV2LinkedService -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName ` | Select-Object ResourceGroupName, DataFactoryName, @{L=’LinkedServiceName’;E={$_.Name}}, @{L=’LinkedServiceType’;E={$_.Properties -replace "Microsoft.Azure.Management.DataFactory.Models.",""}}, @{L=’LinkedServiceConnectionString’;E={($_).Properties.ConnectionString -replace "Integrated Security=False;Encrypt=True;Connection Timeout=30;", "" -replace ",", "|" -replace "`r`n", " "}}, @{L=’LinkedServiceServiceUri’;E={($_).Properties.ServiceUri}} ` | Export-Csv -Append -Path $ADFLinkedServiceOutputFile -NoTypeInformation # Get-AzDataFactoryV2Pipeline # Gets information about pipelines in Data Factory. Write-host "Exporting" $adfname.DataFactoryName "Pipelines" Get-AzDataFactoryV2Pipeline -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName ` | Select-Object ResourceGroupName,DataFactoryName,@{L=’PipelineName’;E={$_.Name}}, @{Name = 'Activities'; Expression = {Resolve-List-Pipelines($_.Activities)}} , @{Name = 'Parameters'; Expression = {Resolve-Hashtable-Pipelines($_.Parameters)} } ` | Export-Csv -Append -Path $ADFPipelineOutputFile -NoTypeInformation # Get-AzDataFactoryV2Trigger # Gets information about triggers in a data factory. Write-host "Exporting" $adfname.DataFactoryName "Triggers" Get-AzDataFactoryV2Trigger -ResourceGroupName $adfname.ResourceGroupName -DataFactoryName $adfname.DataFactoryName ` | Select-Object ResourceGroupName,DataFactoryName,@{L=’TriggerName’;E={$_.Name}}, @{L=’TriggerType’;E={$_.Properties -replace "Microsoft.Azure.Management.DataFactory.Models.",""}},@{L=’TriggerStatus’;E={$_.RuntimeState}} ` | Export-Csv -Append -Path $ADFTriggerOutputFile -NoTypeInformation } } # End } End { write-host "Finish tenant" $TenantId } |
That's it folks!
Hope you enjoyed this article and see you next time!