Troubleshoot and Capture Blocking on a SQL database

Prerequisites to capture blocking chains is to install Brent Ozar’s First Responder Kit. The tool kit is an excellent resource for SQL troubleshooting.

The below mentioned process uses these scripts when checking out SQL Servers. Sometimes we can run them as needed and you can even have them running from an Agent job and logging to tables on a regular basis (Usually run them every 15 minutes but might run them more or less regularly as needed). It’s a great way to catch performance data over time if you don’t have any other monitoring tools, or even if you do, the data these procs catch can be a great supplement to those tools.

By far the most common cause for poor performance is blocking. Unless you’ve got some monitoring in place it’s near impossible to identify the cause of any historical blocking incidents. If you have scheduled SQL agent job and running the First responder kit regularly into tables, you will have a history of running statement thanks to sp_BlitzWho. That data will contain the SPIDs of any blocking processes.

Now, if you’ve ever had to look at that data and tried to figure out what the blocking chain is then you’ll know that it can be a total headache.

For that reason, we can use BlitzBlocking Chain (courtesy of David Fowler) to clearly show the lead blocker and how the chain is hanging together. By default, the proc expects the data to be in a table named BlitzWho. 

Parameters – The proc takes two parameteres, @StartDate and @EndDate to specify the date range that you want to return. By default the proc will return any blocks recorded for the past hour.

CREATE OR ALTER PROC BlitzBlockingChain
@StartDate AS DATETIME = NULL,
@EndDate AS DATETIME = NULL

AS

BEGIN

IF @StartDate IS NULL
SELECT @StartDate = DATEADD(MINUTE, -60, GETDATE());

IF @EndDate IS NULL
SELECT @EndDate = DATEADD(MINUTE, 60, @StartDate);

WITH BlockChain
( Chain,
ID,
CheckDate,
session_id,
blocking_session_id,
login_name,
host_name,
program_name,
database_name,
elapsed_time,
query_text,
status,
wait_info)

AS
(
SELECT CAST(‘Lead Blocker: ‘ + CAST(BlitzWho.session_id AS VARCHAR) AS VARCHAR(50)) AS Chain,
BlitzWho.ID,
BlitzWho.CheckDate,
BlitzWho.session_id,
BlitzWho.blocking_session_id,
BlitzWho.login_name,
BlitzWho.host_name,
BlitzWho.program_name,
BlitzWho.database_name,
BlitzWho.elapsed_time,
BlitzWho.query_text,
BlitzWho.status,
BlitzWho.wait_info
FROM BlitzWho
WHERE blocking_session_id IS NULL
AND EXISTS (SELECT 1 FROM BlitzWho BW WHERE BlitzWho.CheckDate = BW.CheckDate AND BlitzWho.session_id = BW.blocking_session_id)
AND CheckDate BETWEEN @StartDate AND @EndDate

UNION ALL

SELECT CAST(BlockChain.Chain + ‘ -> ‘ + CAST(BlitzWho.session_id AS VARCHAR) AS VARCHAR(50)) AS Chain,
BlitzWho.ID,
BlitzWho.CheckDate,
BlitzWho.session_id,
BlitzWho.blocking_session_id,
BlitzWho.login_name,
BlitzWho.host_name,
BlitzWho.program_name,
BlitzWho.database_name,
BlitzWho.elapsed_time,
BlitzWho.query_text,
BlitzWho.status,
BlitzWho.wait_info
FROM BlitzWho
INNER JOIN BlockChain ON BlitzWho.CheckDate = BlockChain.CheckDate AND BlitzWho.blocking_session_id = BlockChain.session_id
)

SELECT * FROM BlockChain
ORDER BY CheckDate, database_name, Chain

END

So what’s it look like if we run it, for the sake of this post we’ll just use the default parameters and return any blocks for the past hour.

EXEC BlitzBlockingChain

and the output:

The output shown easily spots the lead blocker and deal with it as you feel fit.

What are few ways to fix locking and blocking in SQL Server?

  1. Have enough indexes to make queries go fast but not so many that they make our queries go slow. That wont fix the 5000 rows problem. SQL server will escalate to table level lock if it needs to access/lock more than 5000 rows.
  2. Keep your transactions as short as possible. That helps but you have modify the query to make that happen.
  3. Last is changing your default isolation level -> RCSI. This is a server level setting and you need account for extra tempdb usage to keep track of all those transactions.

As long as you de-dupe and eliminate the indexes, then you add indexes that are desperately needed to help the queries go faster. Just the above talked point almost eliminates blocking problem.

Typically, you see users have blocking issues, they either had no indexes whatsoever or they had dozens of indexes.

%d bloggers like this: