Hello everybody!
In this post, I would like to demonstrate to you how to use the MERGE command, available since SQL Server 2008, to perform INSERT and UPDATE commands between two tables in a single statement.
For the examples in this post, I will use a base that I will create using the script below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
IF (OBJECT_ID('dbo.Venda') IS NOT NULL) DROP TABLE dbo.Venda CREATE TABLE dbo.Venda ( Id_Venda INT IDENTITY(1,1) NOT NULL, Dt_Venda DATE NOT NULL, Id_Produto INT NOT NULL, Quantidade INT NOT NULL, Valor NUMERIC(9, 2) NOT NULL ) INSERT INTO dbo.Venda ( Dt_Venda, Id_Produto, Quantidade, Valor ) VALUES ( '2018-09-21', 1, 2, 25.99 ), ( '2018-09-21', 2, 3, 29.99 ), ( '2018-09-21', 1, 1, 15.99 ), ( '2018-09-22', 1, 2, 25.99 ), ( '2018-09-22', 2, 1, 10.00 ), ( '2018-09-22', 9, 1, 35.99 ), ( '2018-09-22', 5, 3, 20.00 ), ( '2018-09-22', 3, 2, 25.87 ), ( '2018-09-22', 2, 1, 11.25 ), ( '2018-09-22', 1, 1, 21.90 ), ( '2018-09-22', 4, 3, 29.99 ) IF (OBJECT_ID('dbo.Dim_Venda') IS NOT NULL) DROP TABLE dbo.Dim_Venda CREATE TABLE dbo.Dim_Venda ( Id_Venda INT NOT NULL, Dt_Venda DATE NOT NULL, Id_Produto INT NOT NULL, Quantidade INT NOT NULL, Valor NUMERIC(9, 2) NOT NULL ) INSERT INTO dbo.Dim_Venda ( Id_Venda, Dt_Venda, Id_Produto, Quantidade, Valor ) SELECT Id_Venda, Dt_Venda, Id_Produto, Quantidade, Valor FROM dbo.Venda WHERE Dt_Venda = '2018-09-21' |
And our source and destination tables look like this:
Introduction
A very common day-to-day activity for data workers, especially those working with ETL, the task of updating a table based on data from another table often ends up generating distinct 2 data update operations:
1 Method: Insert / Update
- 1 Activity: Update the records that exist in both tables
- 2 Activity: Insert Records That Only Exist in Source Table
Let's see how we would do this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
-- Atividade 1: Atualizar os registros que existem nas duas tabelas UPDATE A SET A.Dt_Venda = B.Dt_Venda, A.Id_Produto = B.Id_Produto, A.Quantidade = B.Quantidade, A.Valor = B.Valor FROM dbo.Dim_Venda A JOIN dbo.Venda B ON A.Id_Produto = B.Id_Venda -- Atividade 2: Inserir os registros que só existem na tabela de origem INSERT INTO dbo.Dim_Venda ( Id_Venda, Dt_Venda, Id_Produto, Quantidade, Valor ) SELECT A.Id_Venda, A.Dt_Venda, A.Id_Produto, A.Quantidade, A.Valor FROM dbo.Venda A LEFT JOIN dbo.Dim_Venda B ON A.Id_Venda = B.Id_Venda WHERE B.Id_Venda IS NULL SELECT * FROM dbo.Dim_Venda SELECT * FROM dbo.Venda |
And how did the table data look?
2 Method: Delete / Insert
- 1 Activity: Deletes the records that exist in both tables
- 2 Activity: Inserting Records That From Source Table
Which we can understand how it works through the script below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
-- Atividade 1: Apaga os registros que existem nas duas tabelas DELETE A FROM dbo.Dim_Venda A JOIN dbo.Venda B ON A.Id_Produto = B.Id_Venda -- Atividade 2: Inserir os registros que só existem na tabela de origem INSERT INTO dbo.Dim_Venda ( Id_Venda, Dt_Venda, Id_Produto, Quantidade, Valor ) SELECT A.Id_Venda, A.Dt_Venda, A.Id_Produto, A.Quantidade, A.Valor FROM dbo.Venda A LEFT JOIN dbo.Dim_Venda B ON A.Id_Venda = B.Id_Venda WHERE B.Id_Venda IS NULL SELECT * FROM dbo.Dim_Venda SELECT * FROM dbo.Venda |
Using the MERGE command for INSERT + UPDATE + DELETE
So far, I have not presented any news, right? DELETE / INSERT and UPDATE / INSERT commands are very common in everyday data professionals.
However, to solve this problem, from SQL Server 2008 we can use the MERGE command, which allows us to perform this same activity in just 1 statement. Its operation is simple: One or more columns of the tables involved are considered keys (identifiers) so if the key value exists in the target table, the values will be updated according to the source table. If this identifier does not exist, this record will be inserted into the target table.
Your syntax works like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
MERGE dbo.Dim_Venda AS Destino USING dbo.Venda AS Origem ON (Origem.Id_Venda = Destino.Id_Venda) -- Registro existe nas 2 tabelas WHEN MATCHED THEN UPDATE SET Destino.Dt_Venda = Origem.Dt_Venda, Destino.Id_Produto = Origem.Id_Produto, Destino.Quantidade = Origem.Quantidade, Destino.Valor = Origem.Valor -- Registro não existe no destino. Vamos inserir. WHEN NOT MATCHED THEN INSERT VALUES(Origem.Id_Venda, Origem.Dt_Venda, Origem.Id_Produto, Origem.Quantidade, Origem.Valor); |
Controlling the NOT MATCHED Clause on Source and Destination
We can also control when a record does not exist at the destination or source, and make different decisions for these 2 situations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
MERGE dbo.Dim_Venda AS Destino USING dbo.Venda AS Origem ON (Origem.Id_Venda = Destino.Id_Venda) -- Registro existe nas 2 tabelas WHEN MATCHED THEN UPDATE SET Destino.Dt_Venda = Origem.Dt_Venda, Destino.Id_Produto = Origem.Id_Produto, Destino.Quantidade = Origem.Quantidade, Destino.Valor = Origem.Valor -- Registro não existe na origem, apenas no destino. Vamos apagar o dado da origem. WHEN NOT MATCHED BY SOURCE THEN DELETE -- Registro não existe no destino. Vamos inserir. WHEN NOT MATCHED BY TARGET THEN INSERT VALUES(Origem.Id_Venda, Origem.Dt_Venda, Origem.Id_Produto, Origem.Quantidade, Origem.Valor); |
Updating specific values through specific filters
You may only need to update the quantity and value, and where these differ:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
MERGE dbo.Dim_Venda AS Destino USING dbo.Venda AS Origem ON (Origem.Id_Venda = Destino.Id_Venda) -- Registro existe nas 2 tabelas WHEN MATCHED AND (Origem.Quantidade <> Destino.Quantidade OR Origem.Valor <> Destino.Valor) THEN UPDATE SET Destino.Quantidade = Origem.Quantidade, Destino.Valor = Origem.Valor -- Registro não existe no destino. Vamos inserir. WHEN NOT MATCHED THEN INSERT VALUES(Origem.Id_Venda, Origem.Dt_Venda, Origem.Id_Produto, Origem.Quantidade, Origem.Valor); |
Displaying output with all changes made by MERGE
What if we want to collect MERGE results and display the operations performed on the screen? Just use OUTPUT!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
MERGE dbo.Dim_Venda AS Destino USING dbo.Venda AS Origem ON (Origem.Id_Venda = Destino.Id_Venda) -- Registro existe nas 2 tabelas WHEN MATCHED THEN UPDATE SET Destino.Dt_Venda = Origem.Dt_Venda, Destino.Id_Produto = Origem.Id_Produto, Destino.Quantidade = Origem.Quantidade, Destino.Valor = Origem.Valor -- Registro não existe na origem, apenas no destino. Vamos apagar o dado da origem. WHEN NOT MATCHED BY SOURCE THEN DELETE -- Registro não existe no destino. Vamos inserir. WHEN NOT MATCHED BY TARGET THEN INSERT VALUES(Origem.Id_Venda, Origem.Dt_Venda, Origem.Id_Produto, Origem.Quantidade, Origem.Valor) OUTPUT $action, Deleted.Id_Venda, Deleted.Dt_Venda, Deleted.Id_Produto, Deleted.Quantidade, Deleted.Valor, Inserted.Id_Venda, Inserted.Dt_Venda, Inserted.Id_Produto, Inserted.Quantidade, Inserted.Valor; |
Storing MERGE Output in a Table
Would you like to save log records to a table to query the data?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
CREATE TABLE dbo.Dim_Venda_Log ( [Id_Log] BIGINT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, [Acao] NVARCHAR(10), [Inserted_Id_Venda] INT, [Inserted_Dt_Venda] DATE, [Inserted_Id_Produto] INT, [Inserted_Quantidade] INT, [Inserted_Valor] DECIMAL(9, 2), [Deleted_Id_Venda] INT, [Deleted_Dt_Venda] DATE, [Deleted_Id_Produto] INT, [Deleted_Quantidade] INT, [Deleted_Valor] DECIMAL(9, 2) ) WITH(DATA_COMPRESSION=PAGE) MERGE dbo.Dim_Venda AS Destino USING dbo.Venda AS Origem ON (Origem.Id_Venda = Destino.Id_Venda) -- Registro existe nas 2 tabelas WHEN MATCHED THEN UPDATE SET Destino.Dt_Venda = Origem.Dt_Venda, Destino.Id_Produto = Origem.Id_Produto, Destino.Quantidade = Origem.Quantidade, Destino.Valor = Origem.Valor -- Registro não existe na origem, apenas no destino. Vamos apagar o dado da origem. WHEN NOT MATCHED BY SOURCE THEN DELETE -- Registro não existe no destino. Vamos inserir. WHEN NOT MATCHED BY TARGET THEN INSERT VALUES(Origem.Id_Venda, Origem.Dt_Venda, Origem.Id_Produto, Origem.Quantidade, Origem.Valor) OUTPUT $action, Inserted.*, Deleted.* INTO dbo.Dim_Venda_Log; SELECT * FROM dbo.Dim_Venda_Log |
Using queries as source data
MERGE also allows us to gather data from queries instead of fixed tables, and use JOINS if necessary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
MERGE dbo.Dim_Venda AS Destino USING (SELECT Id_Venda, Dt_Venda, Id_Produto, Quantidade, Valor FROM dbo.Venda WHERE Dt_Venda = '2018-09-22') AS Origem ON (Origem.Id_Venda = Destino.Id_Venda) -- Registro existe nas 2 tabelas WHEN MATCHED THEN UPDATE SET Destino.Dt_Venda = Origem.Dt_Venda, Destino.Id_Produto = Origem.Id_Produto, Destino.Quantidade = Origem.Quantidade, Destino.Valor = Origem.Valor -- Registro não existe na origem, apenas no destino. Vamos apagar o dado da origem. WHEN NOT MATCHED BY SOURCE THEN DELETE -- Registro não existe no destino. Vamos inserir. WHEN NOT MATCHED BY TARGET THEN INSERT VALUES(Origem.Id_Venda, Origem.Dt_Venda, Origem.Id_Produto, Origem.Quantidade, Origem.Valor) OUTPUT $action, Inserted.*, Deleted.*; |
Result - see that, as you only brought the data for 22/09 on MERGE, the data for 21/09 that were manually loaded before MERGE were deleted from the destination table, as they did not exist at the source
Using CTE with JOIN to generate source data
Merge also lets you use CTE to generate the data source resultset and use JOINS if necessary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
;WITH cteVenda AS ( SELECT A.Id_Venda, A.Dt_Venda, A.Id_Produto, A.Quantidade, A.Valor, B.Id_Erro, B.Dt_Erro, B.Ds_Erro FROM dbo.Venda A LEFT JOIN Consultoria.dbo.Log_Erro B ON A.Id_Produto = B.Id_Erro WHERE A.Dt_Venda = '2018-09-21' ) MERGE dbo.Dim_Venda AS Destino USING cteVenda AS Origem ON (Origem.Id_Venda = Destino.Id_Venda) -- Registro existe nas 2 tabelas WHEN MATCHED THEN UPDATE SET Destino.Dt_Venda = Origem.Dt_Venda, Destino.Id_Produto = Origem.Id_Produto, Destino.Quantidade = Origem.Quantidade, Destino.Valor = Origem.Valor -- Registro não existe na origem, apenas no destino. Vamos apagar o dado da origem. WHEN NOT MATCHED BY SOURCE THEN DELETE -- Registro não existe no destino. Vamos inserir. WHEN NOT MATCHED BY TARGET THEN INSERT VALUES(Origem.Id_Venda, Origem.Dt_Venda, Origem.Id_Produto, Origem.Quantidade, Origem.Valor) OUTPUT $action, Inserted.*, Deleted.*; |
Cautions when using MERGE
Now that I have demonstrated several ways to use MERGE on SQL Server, I would like to warn you that there are some bugs when using MERGE on SQL Server 2008, ranging from incorrect output when using OUTPUT to severe errors like the one below. :
Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
If you are using the 2008 (STILL) version, update your SQL with the latest SP and Cumulative Update version available and carefully read the articles I will make available below, as they have several scenarios that can create a headache when using the MERGE on SQL Server 2008 (in 2012 + versions these bugs no longer exist):
- http://sqlblog.com/blogs/paul_white/archive/2010/08/04/another-interesting-merge-bug.aspx
- http://andreyzavadskiy.com/2015/07/23/sql-server-2008r2-causes-severe-error-in-merge-output/
- https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/
Using the stored procedure stpETL_Upsert
To make MERGE easier to use, especially in tables with many columns, I will share with you the Stored Procedure stpETL_Upsert, originally created by Murilo Mielke and adapted by me, which aims to facilitate the use of MERGE in everyday situations.
SP Parameters:
- @Nm_Source: Parameter of type VARCHAR (MAX), it contains the name of the data source table (source). Table name must be specified in full form (database.schema.table)
- @Nm_Target: Parameter of type VARCHAR (MAX), it contains the name of the data target table (target), ie where the data will be inserted / updated. Table name must be specified in full form (database.schema.table)
- @Cd_Join: Parameter of type VARCHAR (MAX), the column or columns of the JOIN must be informed to link the data from the source (source) and destination (target) table. For simplified use, you can specify only the column names, if it is the same between the source and the destination, separating with a comma "," if there is more than one column to perform the JOIN. If the name of the columns between the source and the destination are different, the format must be used: origin_column_name: destination_column_name, also using the comma separator "," in case of multiple columns
- @Cd_Key: Parameter of type VARCHAR (MAX), the name of the sequential column (IDENTITY) of the target table must be entered to prevent SP from attempting an UPDATE or INSERT using that column. If you have no IDENTITY column in the target, you can ignore this parameter since its default value is already empty.
- @Fl_Update: Parameter of type BIT, with default value = 1, this column tells SP whether records in the target table should be updated if they exist at source and destination (@Fl_Update = 1) or not.
- @Fl_Debug: BIT type parameter that causes the SP to display in the output window the MERGE command generated at the end of its execution.
Example 1 - Simple use
Using the same scenario as the examples in this article, let's use stpETL_Upset to perform our MERGE:
1 2 3 4 5 6 7 8 9 10 11 |
SELECT * FROM dbo.Dim_Venda EXEC dbo.stpETL_Upsert @Nm_Source = 'dirceuresende.dbo.Venda', -- varchar(max) @Nm_Target = 'dirceuresende.dbo.Dim_Venda', -- varchar(max) @Cd_Join = 'Id_Venda', -- varchar(max) @Cd_Chave = '', -- varchar(max) @Fl_Update = 1, -- bit @Fl_Debug = 1 -- bit SELECT * FROM dbo.Dim_Venda |
Example 2 - Using multiple columns for the JOIN
If you need to use several columns to JOIN the data, just separate them with a comma “,” in the SP call:
1 2 3 4 5 6 7 |
EXEC dbo.stpETL_Upsert @Nm_Source = 'dirceuresende.dbo.Venda', -- varchar(max) @Nm_Target = 'dirceuresende.dbo.Dim_Venda', -- varchar(max) @Cd_Join = 'Id_Venda, Id_Produto, Dt_Venda', -- varchar(max) @Cd_Chave = '', -- varchar(max) @Fl_Update = 1, -- bit @Fl_Debug = 1 -- bit |
Example 3 - Using JOINS with different columns
If the JOIN columns do not have exactly the same name or you want to use more than one JOIN clause, you can use the SP as shown below, using the syntax column_name_table: column_name_table_destination, and you can also use the comma character "," to use multiple mappings.
In the example below, the source table "Sale" has a field called Sales_ID, which will perform the JOIN with the column Cod_Venda in the destination table "Dim_Venda".
1 2 3 4 5 6 7 |
EXEC dbo.stpETL_Upsert @Nm_Source = 'dirceuresende.dbo.Venda', -- varchar(max) @Nm_Target = 'dirceuresende.dbo.Dim_Venda', -- varchar(max) @Cd_Join = 'Id_Venda:Cod_Venda', -- varchar(max) @Cd_Chave = '', -- varchar(max) @Fl_Update = 1, -- bit @Fl_Debug = 1 -- bit |
Source code
Did you like stpETL_Upset? Good practice right? I will make the source code of this Stored Procedure available below so that you can start using it in your daily life. Don't forget to create the fncSplit and fncSplitText objects that are in the requirements huh, otherwise it will give error and you will complain to me in the comments .. kkkkk
View stpETL_Upsert source code
Requirements:
References
- https://docs.microsoft.com/pt-br/sql/t-sql/statements/merge-transact-sql?view=sql-server-2017
- https://www.mssqltips.com/sqlservertip/1704/using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time/
Well guys!
I hope you enjoyed this post and see you next time.
Big hug!
Very good, Dirceu! It showed some usage situations that are not very addressed and helped me a lot! ..
Excellent post. I use the merge in a few situations, but it is always very practical.
Excellent post Dirceu, we recently had a challenge of using merge in a job to do SCD 2 load without tool etl, he answered magnificently. This procedure of Murilo is show !! Abs