In this post I bring you a simple solution, but I really liked it, because it helped me to greatly reduce the processing time of an Azure Data Factory (ADF) pipeline by changing the Service Objective and resizing an Azure SQL Database using T commands -SQL before the start of processing, and back to the original tier at the end of all processing.
To change the Azure SQL Database tier, you can use the Azure portal interface, Powershell, Azure CLI, Azure DevOps and a series of alternatives, in addition to our dear Transact-SQL (T-SQL), which, in my view , is the easiest and most practical choice of all.
See how easy it is:
ALTER DATABASE [dirceuresende]
MODIFY(SERVICE_OBJECTIVE = 'S3')
The problem with all these solutions, especially in the scenario I'm talking about, is that changing the Service Objective, aka Service Tier, is not done immediately, that is, you execute the command and Azure SQL will effect the tier change in the whenever Azure sees fit. It can take 1 second, 10 seconds, 60 seconds, etc… And when the Azure SQL Database tier changes, sessions are disconnected.
In the scenario I listed above, where I change the tier before starting the data processing, if I simply run the T-SQL command and continue processing, in the middle of it Azure will change the tier, the connection will drop for a few seconds and the bank will be unavailable for a few seconds.
You can also use Retry in all components of your pipeline and thus, avoid this being a problem for you, because when the connection drops, the retry will be activated and the operations will be done again, but you would have to configure the retry in all components instead of configuring only in the upsizing component, you may have been charged for data movement and use of ADF resources during that time that you processed for nothing. In addition, side effects can occur if a component is not prepared to be stopped mid-execution and run again.
Another possible solution is to put a Wait operator in Azure Data Factory and specify whatever time you think is enough for Azure to change the tier. From my tests, the time required for Azure SQL Database to effect the tier switch is usually somewhere between 50 and 90 seconds.
While this may work in some cases (and in some cases it doesn't), this solution doesn't seem to me to be very reliable. If the time to change the tier exceeds the limit I set, I will have waited a long time and it will still fail in the middle of processing. And if the change ends earlier, I will have waited a long time unnecessarily.
I looked for some solutions to solve my problem and ended up falling for the idea of Data Platform MVP Greg Low in this post here, but I chose to create my procedure in order to have a simpler solution to try to solve this problem.
Stored Procedure source code
CREATE PROCEDURE dbo.stpAltera_Tier_DB (
@TimeoutEmSegundos INT = 60
SET NOCOUNT ON
@DataHoraLimite DATETIME2 = DATEADD(SECOND, @TimeoutEmSegundos, GETDATE()),
@ServiceLevelObjectiveAtual VARCHAR(20) = CONVERT(VARCHAR(100), DATABASEPROPERTYEX( DB_NAME(), 'ServiceObjective' ))
IF (@ServiceLevelObjectiveAtual <> @ServiceLevelObjective)
SET @Query = N'ALTER DATABASE [' + DB_NAME() + '] MODIFY (SERVICE_OBJECTIVE = ''' + @ServiceLevelObjective + ''');'
EXEC sp_executesql @Query;
WHILE ((DATABASEPROPERTYEX( DB_NAME(), 'ServiceObjective' ) <> @ServiceLevelObjective) AND GETDATE() <= @DataHoraLimite)
WAITFOR DELAY '00:00:00.500';
Using this procedure is very simple:
@ServiceLevelObjective = 'S3',
@TimeoutEmSegundos = 60
After executing the command, the procedure will wait until the change is made by Azure, re-evaluating every 500 milliseconds if the change has already been made, respecting the defined time limit. If the limit is reached, the procedure will end the execution even if the change is not yet effective. If the change effect is less than the time limit, the procedure will end the execution as soon as the tier is changed, avoiding wasting time.
As Azure SQL Database eliminates all connections and the database is unavailable for a few seconds, even with this treatment, the procedure will return an error because its own session was eliminated:
In the second execution of the Stored Procedure, as the change will be by the same tier of the previous execution, the command will be executed instantly and Azure will just ignore the command and return successful execution, as shown below:
And with that, now your bank has the new tier and you can start processing the data. At the end of processing, I downscale to return the tier to the original value, but this time, I don't need to wait for the change to finish, because I won't process anything else.
To be very honest with you, thinking about simplicity first, not even the Stored Procedure is necessary after all. As the connection is always broken, I can just put a simple IF with a very long WAITFOR DELAY inside the Script block and have the same behavior:
ALTER DATABASE [dirceuresende] MODIFY(SERVICE_OBJECTIVE = 'S6')
IF (CONVERT(VARCHAR(100), DATABASEPROPERTYEX( DB_NAME(), 'ServiceObjective' )) <> 'S6')
WAITFOR DELAY '00:10:00';
And just like using the Stored Procedure, the routine gave an error on the first run, waiting until the tier change is effected in the database. When this occurs, the connection is broken and execution returns to failure.
Azure Data Factory waits 10 seconds (time I set) after failure and tries again. This time, the execution is very fast, since the tier has already been changed to the chosen tier. This second run returns success and the pipeline cycle proceeds normally.
The behavior ended up being the same as the Procedure, but much simpler. I put a very long wait (10 minutes), which will end up being the time limit that Azure will have to make the change, which is much more than enough. Finishing the change before, the cycle continues without having to wait these 10 minutes.
It turned out that the solution was even simpler than I thought. Now you can increase the tier of your Azure SQL Database before starting the ETL processing using Azure Data Factory, so that the processing is faster, and at the end of the processing, you go back to the original tier, paying more only during the time spent processing data. A smart way to perform much better while paying much less 🙂
I hope you enjoyed this tip and until next time.