Reading Time: 4 minutesIntroduction
Speak guys!
In this article I would like to share with you a little code that I needed to use today to UPDATE a relatively large table (55M+ records) in an Azure SQL Database base and, after 1h and 30mins waiting, I got a connection error and I had to do it all over again.
Not only in this case, but it can happen that the log overflows and the operation fails as well, and, as we know, when an error occurs during UPDATE or DELETE, the automatic rollback is initiated and no rows are actually changed. Breaking this single operation into smaller, segmented operations will allow the log to be flushed and thus minimize possible log overflows.
You may also want to break these large and time-consuming operations into smaller pieces to be able to track progress or your maintenance window is not enough to process the UPDATE/DELETE on all the necessary lines and you want to continue in another window.
I've also seen cases of tables with triggers that can end up generating a very large overhead when executing an UPDATE/DELETE that changes many rows. And we also have to remember locks in the table, which by breaking into smaller pieces, can be released quickly while the next batch is starting to process.
There are several reasons that can influence the decision to participate in a large UPDATE/DELETE and in this article I will show you some easy ways to do that.
How to delete or update data in large tables
UPDATE partitioned by integer field In the example below, I'm using an autoincrement column of the integer type to assemble the ranges of the values I'm going to update. The @Aumento variable defines the number of lines of each batch that will be updated and I defined in this example that the number of lines updated at a time will be 1 million lines.
UPDATE partitioned by integer field
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | DECLARE @Min INT, @Max INT, @Contador INT = 0, @Aumento INT = 1000000, @LimiteInferior INT = 0, @LimiteSuperior INT = 0, @Msg VARCHAR(MAX) SELECT @Min = MIN(Id_Registro), @Max = MAX(Id_Registro) FROM dbo.Tabela WHILE(@LimiteSuperior < @Max) BEGIN SET @LimiteInferior = @Min + (@Contador * @Aumento) SET @LimiteSuperior = @LimiteInferior + @Aumento UPDATE A SET Dt_Registro = (CASE WHEN ISNUMERIC(A.Data_Registro_String) = 1 THEN CONVERT(DATE, DATEADD(DAY, TRY_CONVERT(INT, A.Data_Registro_String), '1900-01-01')) WHEN LEN(A.Data_Registro_String) = 10 AND TRY_CONVERT(DATE, A.Data_Registro_String, 103) IS NOT NULL THEN CONVERT(DATE, A.Data_Registro_String, 103) ELSE CONVERT(DATE, A.Data_Registro_String) END) FROM dbo.Tabela A WHERE Id_Registro >= @LimiteInferior AND Id_Registro < @LimiteSuperior SET @Contador += 1 SET @Msg = CONCAT('Processando dados no intervalo ', @LimiteInferior, '-', @LimiteSuperior, '...') RAISERROR(@Msg, 1, 1) WITH NOWAIT END |
Result:

UPDATE partitioned by date field In the example below, I'm using a date column to assemble the ranges of the values I'm going to update. The @Aumento variable defines the number of days of each batch that will be updated and I defined in this example that this number of days that will be updated by each block will be 30 days.
UPDATE partitioned by date field
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | DECLARE @Min DATE, @Max DATE, @Contador INT = 0, @Aumento INT = 30, @LimiteInferior DATE = '1900-01-01', @LimiteSuperior DATE = '1900-01-01', @Msg VARCHAR(MAX) SELECT @Min = MIN(Dt_Cadastro), @Max = MAX(Dt_Cadastro) FROM dbo.Tabela WHILE(@LimiteSuperior < @Max) BEGIN SET @LimiteInferior = DATEADD(DAY, (@Contador * @Aumento), @Min) SET @LimiteSuperior = DATEADD(DAY, @Aumento, @LimiteInferior) UPDATE A SET Dt_Registro = (CASE WHEN ISNUMERIC(A.Data_Registro_String) = 1 THEN CONVERT(DATE, DATEADD(DAY, TRY_CONVERT(INT, A.Data_Registro_String), '1900-01-01')) WHEN LEN(A.Data_Registro_String) = 10 AND TRY_CONVERT(DATE, A.Data_Registro_String, 103) IS NOT NULL THEN CONVERT(DATE, A.Data_Registro_String, 103) ELSE CONVERT(DATE, A.Data_Registro_String) END) FROM dbo.Tabela A WHERE Dt_Cadastro >= @LimiteInferior AND Dt_Cadastro < @LimiteSuperior SET @Contador += 1 SET @Msg = CONCAT('Processando dados no intervalo ', CONVERT(VARCHAR(10), @LimiteInferior, 103), '-', CONVERT(VARCHAR(10), @LimiteSuperior, 103), '...') RAISERROR(@Msg, 1, 1) WITH NOWAIT END |
Result:

DELETE TOP(N) partitioned using percentage In the example below, I am deleting 10% of my table with each iteration. As the data is erased, I don't need to control intervals, I just erase it from 10 to 10%.
DELETE TOP(N) partitioned using percentage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | DECLARE @Msg VARCHAR(MAX), @Qt_Linhas INT WHILE (1=1) BEGIN DELETE TOP(10) PERCENT FROM dbo.Tabela WHERE [Status] = 6 SET @Qt_Linhas = @@ROWCOUNT IF (@Qt_Linhas = 0) BREAK SET @Msg = CONCAT('Quantidade de Linhas Apagadas: ', @Qt_Linhas) RAISERROR(@Msg, 1, 1) WITH NOWAIT END |
Result:

Observation: This solution can present problems in very large tables, as 10% can represent a very large volume of rows. And when few records are left, the 10% can take many iterations to erase.

DELETE TOP(N) partitioned using number of rows In the example below, I'm deleting 500k rows from my table with each iteration. As the data is erased, I don't need to control intervals, I just erase from 500 to 500 thousand lines until there are no more lines that meet the criteria of the filters.
DELETE TOP(N) partitioned using number of rows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | DECLARE @Msg VARCHAR(MAX), @Qt_Linhas INT WHILE (1=1) BEGIN DELETE TOP(100000) FROM dbo.Tabela WHERE [Status] = 6 SET @Qt_Linhas = @@ROWCOUNT IF (@Qt_Linhas = 0) BREAK SET @Msg = CONCAT('Quantidade de Linhas Apagadas: ', @Qt_Linhas) RAISERROR(@Msg, 1, 1) WITH NOWAIT END |
Result:

And that's it folks!
Hope you like it and a big hug!
Good tips. But in the case of delete I always use TRUNCATE TABLE. Since you're going to delete everything anyway! Even the performance is much higher, uses less log resource, does not trigger and still cleans all the pages of the table.
Hi Colli, good idea, but my idea in the post is to purge data, where all the data would not be deleted, then in truncate it wouldn't work.
Good tips. But in the case of delete so as not to waste time doing count, I like in the while to pass 1=2, and after the delete passes
if @@rowcount = 0 break
The first counts can take a long time if the table is very large, or if you have a where clause to delete only part of the table.
Good idea! It really gets faster.