SQL Server - When you should use ORDER BY in the query and when you should not use it at all!

Views: 2.177
Reading Time: 5 minutes

Hey guys!
In this article I will demonstrate to you when to use ORDER BY and when we should not use it at all, because it has no effect on practice and just makes our query take longer and more resource consuming.

The main purpose of this article is to break the myth that data is physically ordered in the table when you do INSERT… FROM SELECT and ORDER BY, causing many programmers to insist on using ORDER BY in INSERT operations, a scenario I find. quite a lot in consulting clients and is far more common than I would like.

To begin with, I will make a very simple test of this by creating a new table and inserting the data into the table from a query in sys.objects:

And when we analyze the results, we see that even using ORDER BY on INSERT, the records are not being sorted:

Well, the myth has been demystified. That theory is worth smashing into INSERT so that you don't have to use ORDER BY in SELECT. That is, in such a scenario, you should NEVER use ORDER BY to insert data into a table unless you have a SELECT TOP clause.

Now I will explain why this happens. As I explained in my article Understanding Index Functioning in SQL Server, who is responsible for sorting these records is the index, since indexes often use algorithms like QuickSort, which require the data to be sorted for maximum performance in Seek operations.

To demonstrate how the index sorts the data, I will create one clustered and one nonclustered index on the table and then query the data:

Query Result:

See how the data got sorted? Very cool, right 🙂

So can we trust that indexes will keep my table data always sorted and no longer need to use ORDER BY even in queries? NOT!! Indexes are subject to data fragmentation as they are inserted / deleted / updated, losing order in index pages.

In the example of the index I created above, I specified the index FILLFACTOR as 100%, that is, the index will sort all data during creation and will store it in an orderly fashion, trying to occupy all index pages (each page contains 8 KB, between data and headers), leaving no vacant space for any new data that is inserted / updated in the table. That is, since SQL Server has no free space in the index, if I insert a new record in this table, that record will not be inserted in order but at the end of the last page (or a new page will be created for it if all already full).

After creating my index, where he ordered the records, the fragmentation of my indexes should be very close to zero:

Result:

And now let's force data fragmentation using the T-SQL command below:

Analyzing the level of index fragmentation:

Did my data remain orderly with this level of fragmentation?

Vixi .. Now messed up everything. To fix this fragmentation problem caused by the script I used above, we can use the operations of REORGANIZE (Recommendation: 5 to 30% fragmentation) and REBUILD (Recommendation: Above 30% fragmentation) to reorder the data in the index:

Or you can REBUILD / REORGANIZE on individual indexes:

As a result, our fragmentation diminished:

And the data is sorted again. But can we rely only on index ordering to return ordered data? The answer is no! THE ONLY The reliable way to return data in an orderly manner is through ORDER BY.

But here comes the key question: Do you really need to sort this data in the database? In many scenarios (not to say the majority), the answer is NO. Data can be perfectly queried in the unordered database, returned to the application and sorted there in the user interface. The problem with this approach is that it is "more work" for the Developer, so we hardly see this scenario happening on a day to day basis, which is a shame as it is the ideal scenario as it does not change anything for the end user and the user. The bank is not overwhelmed with various sorting operations on “heavy” queries.

Are there any scenarios where ORDER BY in the bank query is justified? Yes for sure! Especially in scenarios where there is a TOP operator to return the top / bottom TOP N records according to some sort criteria. In this case, ORDER BY is indicated as it is better to sort a table of 10 million records in the bank and return the largest 10 than to return all 10 million records and filter / sort it in the application.

Well guys, I hope you enjoyed this article and were able to clarify some questions about indexes, fragmentation, fillfactor (later I will create a more detailed post about it) and when ORDER BY should be used and when should NEVER be used.

If you are a Developer, I urge you to consider removing ORDER BY from queries and start sorting these records at the application layer instead of the database. DBA and Database Thank You 🙂

Do you like studying about Performance Tuning? So don't waste your time and start reading my Series articles Performance tuning and I hope you finish these readings by taking another look at your database.

A big hug to you and until the next post!