Speak guys!
In this article I will demonstrate to you when to use ORDER BY and when we should not use it at all, because it has no effect on practice and just makes our query take longer and more resource consuming.
The main purpose of this article is to break the myth that data is physically ordered in the table when you INSERT… FROM SELECT and ORDER BY, causing many programmers to insist on using ORDER BY in INSERT operations, a scenario that I find a lot on consulting clients and is a lot more common than I would like.
To begin with, I will make a very simple test of this by creating a new table and inserting the data into the table from a query in sys.objects:
1 2 3 4 5 6 7 8 9 10 11 | CREATE TABLE dbo.Post_OrderBy ( Id INT IDENTITY(1,1) NOT NULL, Nome VARCHAR(900) NOT NULL ) INSERT INTO dbo.Post_OrderBy SELECT [name] FROM sys.objects ORDER BY name SELECT * FROM dbo.Post_OrderBy |
And when we analyze the results, we see that even using ORDER BY on INSERT, the records are not being sorted:
Well, the myth has been demystified. That theory is worth smashing into INSERT so that you don't have to use ORDER BY in SELECT. That is, in such a scenario, you should NEVER use ORDER BY to insert data into a table unless you have a SELECT TOP clause.
Now I will explain why this happens. As I explained in my article Understanding Index Functioning in SQL Server, who is responsible for sorting these records is the index, since indexes often use algorithms like QuickSort, which require the data to be sorted for maximum performance in Seek operations.
To demonstrate how the index sorts the data, I will create one clustered and one nonclustered index on the table and then query the data:
1 2 | CREATE CLUSTERED INDEX SK01_Post_OrderBy ON dbo.Post_OrderBy(Id) WITH(FILLFACTOR=100) CREATE NONCLUSTERED INDEX SK02_Post_OrderBy ON dbo.Post_OrderBy(Nome) WITH(FILLFACTOR=100) |
See how the data got sorted? Very cool, right 🙂
So can we trust that indexes will keep my table data always sorted and no longer need to use ORDER BY even in queries? NOT!! Indexes are subject to data fragmentation as they are inserted / deleted / updated, losing order in index pages.
In the example of the index I created above, I specified the index FILLFACTOR as 100%, that is, the index will sort all data during creation and will store it in an orderly fashion, trying to occupy all index pages (each page contains 8 KB, between data and headers), leaving no vacant space for any new data that is inserted / updated in the table. That is, since SQL Server has no free space in the index, if I insert a new record in this table, that record will not be inserted in order but at the end of the last page (or a new page will be created for it if all already full).
After creating my index, where he ordered the records, the fragmentation of my indexes should be very close to zero:
1 2 3 4 5 6 7 8 9 10 11 12 | SELECT OBJECT_NAME(B.object_id) AS TableName, B.name AS IndexName, A.index_type_desc AS IndexType, A.avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') A INNER JOIN sys.indexes B WITH(NOLOCK) ON B.object_id = A.object_id AND B.index_id = A.index_id WHERE OBJECT_NAME(B.object_id) = 'Post_OrderBy' ORDER BY A.avg_fragmentation_in_percent DESC |
And now let's force data fragmentation using the T-SQL command below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | -- Força a fragmentação do índice DECLARE @Contador INT = 1, @Total INT = (SELECT COUNT(*) FROM dbo.Post_OrderBy) WHILE(@Contador <= @Total) BEGIN UPDATE dbo.Post_OrderBy SET Nome = REPLICATE(CHAR(65 + (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 57), (ABS(CHECKSUM(PWDENCRYPT(N''))) / 2147483647.0) * 10) WHERE Id = @Contador INSERT INTO dbo.Post_OrderBy SELECT [name] FROM sys.objects WHERE SUBSTRING([name], 1, 1) NOT IN ('D', 'H', 'M', 'T', 'S') SET @Contador += 1 END |
Analyzing the level of index fragmentation:
Did my data remain orderly with this level of fragmentation?
Vixi .. Now messed up everything. To fix this fragmentation problem caused by the script I used above, we can use the operations of REORGANIZE (Recommendation: 5 to 30% fragmentation) and REBUILD (Recommendation: Above 30% fragmentation) to reorder the data in the index:
1 | ALTER INDEX ALL ON dbo.Post_OrderBy REBUILD |
Or you can REBUILD / REORGANIZE on individual indexes:
1 2 | ALTER INDEX SK01_Post_OrderBy ON dbo.Post_OrderBy REBUILD ALTER INDEX SK02_Post_OrderBy ON dbo.Post_OrderBy REBUILD |
As a result, our fragmentation diminished:
And the data is sorted again. But can we rely only on index ordering to return ordered data? The answer is no! THE ONLY The reliable way to return data in an orderly manner is through ORDER BY.
But here comes the key question: Do you really need to sort this data in the bank? In many scenarios (not to say most), the answer is NO. The data can be perfectly consulted in the bank without ordering, returned to the application and ordered there, in the user interface. The problem with this approach is that it “takes more work” for the Developer, and therefore, we hardly see this scenario happen on a daily basis, which is a pity, as it is the ideal scenario, as it does not change anything for the end user and the The bank is not burdened with several sorting operations in “heavy” queries.
Are there any scenarios where ORDER BY in the bank query is justified? Yes for sure! Especially in scenarios where there is a TOP operator to return the top / bottom TOP N records according to some sort criteria. In this case, ORDER BY is indicated as it is better to sort a table of 10 million records in the bank and return the largest 10 than to return all 10 million records and filter / sort it in the application.
Well guys, I hope you enjoyed this article and were able to clarify some questions about indexes, fragmentation, fillfactor (later I will create a more detailed post about it) and when ORDER BY should be used and when should NEVER be used.
If you are a Developer, I urge you to consider removing ORDER BY from queries and start sorting these records at the application layer instead of the database. DBA and Database Thank You 🙂
Do you like studying about Performance Tuning? So don't waste your time and start reading my Series articles Performance tuning and I hope you finish these readings by taking another look at your database.
A big hug to you and until the next post!