Performance problems in MS SQL relational databases are a common problem for developers who work with them. There may be many causes of such problems.
Everyone knows the cases when people from the business claim that their reports are not going smoothly, some daily process has stuck at some point and there is no new data available in the business application, or even data update operations in the application works too slowly than they are supposed to.
Then the question arises: where should this verification begin?
In the following article, I will try to present the basic approaches that can be used to look for these causes.
#1 What is happening in the database?
First of all, it is worth familiarizing yourself with the „sp_whoisactive” procedure.
This is a procedure created by Adam Machanic that makes working with databases really easy and can be added to our database by our administrators.
Once we have access to it, we can check information such as:
- what query is currently running on the database and when it was launched?
- what is its query plan?
- are there any sessions with queries that are blocking?
So the first method that I like to use is to use this procedure to check what is happening on the server in the meantime. Sometimes when a query I run fails, it may be because another person or some schedule is running code that blocks access to objects, or it happens that resources on the server are being saturated by a long-running process.
#2 Check the code and logs
Once we have excluded the above external factors, the next step is to find the piece of code that causes the performance drop.
If we have available logging of performed procedures/processes, it is worth reviewing the logs.
Sometimes the code we will be reviewing may have logging information after specific operations. So by looking at these logs, we can spot which part of the code takes the most time.
Additionally, we can also check in the logs whether something was happening in parallel at a given moment of code execution. If so, this may already give us clues that the performance of the checked code may also be influenced by other processes that we did not capture in point #1.
If we don’t have such logs, unfortunately, we have to query the code one by one and check where it gets stuck.
#3 Draw information from the query plan
When we find a bottleneck in the code, i.e. a part of the code that should be executed faster than it currently does, we can start checking it and testing the causes of the situation.
! BUT ! It is important to remember that if a problem occurs in production, it is not always possible to test the code 1:1 – I mainly mean the code that modifies the data. Then, firstly, we can change this data by mistake, and secondly, by testing the code we can influence the server’s performance. Therefore, it is worth keeping in mind what we want to test and it is also worth trying to recreate the entire situation on a copy of objects in, for example, a test environment.
We can start checking the code by verifying the query plan we are turning on from the toolbar of SQL Server Management Studio.
We can use the option „Display Estimated Execution Plan” which will show the query plan that will likely be used when the code executes. We also can use the „Include Live Query Statistics” option that will show the current query plan when we run the code.
In the latter case – if after a while after starting the next steps of the plan are not executed, it is worth looking for the place where the query got stuck – there will still be a dashed line there. This may give us information about, for example, a table from which data is retrieved in an inefficient way.
If you have already the execution plan, it is worth paying special attention to some specific information like:
- Table size and connections
When comes to the table size – performance may be impacted if your query hits one or more tables with millions of rows or more.
Table connections also can tell you about some information, like the possibility of existing duplicates in the connections that may result in unexpectedly returning large amounts of data. Your query is likely to be slow if it joins two tables in a way that significantly raises the number of rows in the return set. It’s good to know then what type of joins there are and when they should appear.
What’s more, you can find some information when you check aggregations. Adding several rows together to create a single result needs more processing than just retrieving those values individually. You should understand what aggregations you see in your execution plan and sometimes, if there are any options to simplify the code that you are trying to execute.
#4 When data volume is an issue
On the other hand, there are also some other reasons worth checking, to improve your query execution.
Firstly, check if there are missing or incorrectly defined indexes. Sometimes indexes are outdated and they need some updates.
Check also whether more data appeared than usually flowed into the system. Maybe there is a usual period where the data amount increases but there can be also some issues in the related systems that started sending too much data than expected.
Moreover, check the way you transfer data between objects. For example, the previously used temp or variable tables may no longer be sufficient, and loading data into them may be ineffective with larger amounts of data.
This is also a good place to consider, whether the amount of data we currently process is appropriate.
For example, does the report to which we read daily data, has to always include data for the last 5 years? Does anyone analyze this amount of data? Perhaps it would be enough to limit the data at the level of loading it and change the scope of this data to a smaller one, and thus the speed of loading data would increase dramatically.
It is also worth looking for some keywords in the code itself that can significantly reduce the efficiency of the code. I would mention here a few of them which I encounter the most:
- DISTINCT – when we see this word in the code, it is worth taking a closer look at it, because it is sometimes used to quickly cover up some duplicates, without understanding how the code works. What in the end can cause unexpected code behavior.
- UNION – it differs significantly from UNION ALL because it allows you to eliminate duplicates in the merged data. So the same as in the above example – check if is it correctly used in the code.
- SELECT * – it means „add all columns from each joined table to the result” and may result in pulling too much data at once. It is a good practice to avoid this way of reading data from the table in the target code.
- WHERE LIKE '%abc%’ – searching long text strings in this way may slow down code performance. Check if such a condition is needed and if yes – check maybe if you can re-write the code without using it and get the same results more quickly.
- multiple OR in the WHERE clause – SQL Server can not process OR within one operation. Instead, it evaluates each component of the OR which, in turn, may lead to poor performance. If we split the query SQL Server will be able to make use of the indexes, and the query will be optimized. So it is worth splitting queries and then merging their results, rather than performing 1 query with many disjoint conditions.
Improve your query performance!
The above tips are just the basics from which we can start our adventure by testing the performance of SQL code. The more experience you gain in this area, the easier and faster you will be able to find the causes of such situations. Depending on where you start your code review, we may use different verification techniques, but it’s still a good idea to go through these basic code review steps every time.
Let us help you with your query performance!