Speak guys!
In this post I would like to share with you the operation of an operator that I rarely see anyone using, which is CROSS JOIN, and which in certain scenarios may be useful.
What is CROSS JOIN?
CROSS JOIN returns the combination of each row between the tables involved, also known as Cartesian Product, which I will demonstrate below for ease of understanding.
1 Example
In this example I will create 2 tables and cross data from the two tables to create a Cartesian product by combining each row of the #Sorbet table with each row of the #Complement table (yes, I feel like eating ice cream).
Script for data generation from this example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | IF (OBJECT_ID('tempdb..#Sorvete') IS NOT NULL) DROP TABLE #Sorvete CREATE TABLE #Sorvete ( Nome VARCHAR(60) ) INSERT INTO #Sorvete VALUES ('Chocolate'), ('Baunilha'), ('Morango') GO IF (OBJECT_ID('tempdb..#Complemento') IS NOT NULL) DROP TABLE #Complemento CREATE TABLE #Complemento ( Nome VARCHAR(60) ) GO INSERT INTO #Complemento VALUES ('Paçoca'), ('Granulado'), ('Castanha'), ('Nutella') GO SELECT * FROM #Sorvete SELECT * FROM #Complemento |
Well, now let's create the Cartesian product between the two tables:
The only way to create a Cartesian product is by using CROSS JOIN?
As we have seen, the Cartesian product creates all possible combinations between the data involved. Since in our example it was a table of 3 records in a CROSS JOIN operation with another table of 4 records, our final result was 12 rows (3 x 4).
If each table had 1.000 records, the end result would be 1 million records (1.000 x 1.000), so the Cartesian product is generally not desirable in a JOIN operation, and can cause performance issues if it occurs unwanted. (as in a wrong join clause, for example)
2 Example
In this example, I will demonstrate a practical scenario where the Cartesian product justifies its use, using the organizers and speakers of the 1 SQL Day ES.
Script for data generation from this example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | IF (OBJECT_ID('tempdb..#Pessoas') IS NOT NULL) DROP TABLE #Pessoas CREATE TABLE #Pessoas ( Id_Pessoa INT IDENTITY(1,1) NOT NULL PRIMARY KEY, Nome VARCHAR(60) NOT NULL, Sexo VARCHAR(1) NOT NULL, Estado VARCHAR(2) NOT NULL, Cidade VARCHAR(60) NOT NULL ) GO INSERT INTO #Pessoas (Nome, Sexo, Estado, Cidade) VALUES ('Dirceu', 'M', 'ES', 'Vitória'), ('Vithor', 'M','ES', 'Vila Velha'), ('Fabrício', 'M','ES', 'Vila Velha'), ('Tiago', 'M','ES', 'Serra'), ('Caio', 'M','SP', 'São José dos Campos'), ('Cláudio', 'M','RJ', 'Rio de Janeiro'), ('Daniel', 'M','ES', 'Vila Velha'), ('Felippe', 'M','RJ', 'Rio de Janeiro'), ('Jéssica', 'F','SP', 'São José dos Campos'), ('Raiane', 'F','DF', 'Brasília'), ('Reginaldo', 'M','SP', 'Boituva'), ('Sulamita', 'F', 'MG', 'Belo Horizonte') |
Now, I want to show you how to create a report that shows me gender, state, and how many people we have in this grouping:
1 2 3 4 5 6 7 8 9 10 11 12 13 | SELECT Estado, Cidade, Sexo, COUNT(*) AS Qt_Pessoas FROM #Pessoas GROUP BY Estado, Cidade, Sexo ORDER BY 1, 2, 3 |
As you may have noticed, only the existing records in the table are returned. For example, in the state of "DF", only results with the sex "F" returned, because in our table, there are only people with the female sex in the DF state.
But what if I want to show also genres that do not have people registered in the table? How do I do? Well, for starters, let's use CTE (Common Table Expressions) to match gender with cities and states and create all possible combinations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ;WITH cteEstados AS ( SELECT DISTINCT Estado, Cidade FROM #Pessoas ), cteSexo AS ( SELECT DISTINCT Sexo FROM #Pessoas ) SELECT * FROM cteEstados CROSS JOIN cteSexo ORDER BY 1, 2, 3 |
Now, let's modify this script to get us the number of people from each of the combinations, which is what we want to show in the result of our query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ;WITH cteEstados AS ( SELECT DISTINCT Estado, Cidade FROM #Pessoas ), cteSexo AS ( SELECT DISTINCT Sexo FROM #Pessoas ), cteQuantidade AS ( SELECT Cidade, Estado, Sexo, COUNT(*) AS Quantidade FROM #Pessoas GROUP BY Cidade, Estado, Sexo ) SELECT cteEstados.Estado, cteEstados.Cidade, cteSexo.Sexo, ISNULL(cteQuantidade.Quantidade, 0) AS Quantidade FROM cteEstados CROSS JOIN cteSexo LEFT JOIN cteQuantidade ON cteEstados.Cidade = cteQuantidade.Cidade AND cteEstados.Estado = cteQuantidade.Estado AND cteQuantidade.Sexo = cteSexo.Sexo ORDER BY 1, 2, 3 |
Conclusion
Well guys, I hope I was able to demonstrate that the Cartesian product can indeed have its usefulness in developing T-SQL queries. To create the 2 example without using it, we would have to create cursor or while loop, which has a bad performance and the routine ends up being much more complex.
Although I have shown good use of this feature, be very careful when using it, especially with large data, as CROSS JOIN or Cartesian product creates all possible combinations between 2 or more ResultSets, crossing each table row. A with each row of table B.
Many times, we observe JOIN's (INNER, LEFT, RIGHT) badly constructed and that end up making an unintended Cartesian product, generating many unwanted lines and bringing a performance problem in the consultation.
That said, I close this post.
A big hug to you and until next time.
Congratulations on the tips! Helped me a lot! Big hug and all the success in the world to you!
Excellent post, you explain with clarity and clear examples! Thanks
Does the use of the character '#' in the table name mean that you are creating a temporary table?
Hello Marcos. Good night.
Yes, that's right. In these examples I used temporary tables.