SQL Server Statistics – Explained in short summary

There are 3 types of statistics in a SQL Server database

  1. Statistics created due to index creation. These statistics have the index name
  2. Statistics created by Optimizer (Column stats). Starts with _WA_*
  3. User defined stats which are created with CREATE STATISTICS command by the DBA

General Facts about statistics

  • Index stats are always updated with fullscan when we rebuild the index
  • Column stats are not updated because of daily/weekly index rebuilds, they are only updated either by DBA Auto update stats option or if we manually update statistics using UPDATE STATISTICS command
  • If we do not specify the sampling rate while running the UPDATE STATISTICS command it will take the default sampling rate.
  • If you specify UPDATE STATISTICS command with just the table name, it will update all the statistics including the index statistics on the table. So you may lose the full scan advantage which you get for index stats if you update stats with sampling rate after rebuilding indexes

General Advice: Recommend to leave the Auto Stats option for the database to ON, and if there is a particular case where you are seeing bad query plans on certain tables, just disable Auto Stats for that table and enable the manual statistics job for that table.

How does the Auto Update stats Algorithm work you may ask?

So the Auto Update stats will fire for every 500 + 20% change in table rows. Of course, we have an improved algorithm in version SQL 2012 and above which is SQRT(1000 * Table rows). This is much better than the older algorithm.

When it fires, it will use the default sampling rate and here is the algorithm how it calculates the sampling rate.

  1. If the table < 8MB, then it updates the statistics with a full scan.
  2. If the table > 8MB, it follows an algorithm. It reduces the sampling rate as the number of rows in the table are increased to make sure we are not scanning too much data. This is not a fixed value but is under the control of optimizer. It is not a linear algorithm either.

Example: If we have 1 million rows, it would use a sampling rate of 30% but when the number of rows increase to 8 million, it would reduce the sampling rate to 10%. These sampling rates are not under the DBA’s control but optimizer decides it.

If you ever want to find out which table has been modified the most since the last statistics update. You can do so using the DMV sys.dm_db_stats_properties, simply look at the column modification_counter. This column contains the total number of modifications for the leading statistics column since the last time statistics were updated.

SELECT obj.name, obj.object_id, stat.name, stat.stats_id,
last_updated, modification_counter
FROM sys.objects AS obj
INNER JOIN sys.stats AS stat ON stat.object_id = obj.object_id
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
ORDER BY modification_counter DESC

The column last_updated suggests when was the statistics last updated and modification counter column indicates how many times the column was updated since the last statistics updated.

Please note that if any of your processes drop and recreate tables in the nightly ETLs, you would see the modification_counter as zero since the statistics are dropped when you drop a table and create it.

sys.dm_db_stats_properties is a newer replacement to the uber popular DBCC SHOW_STATISTICS command.

If you have auto-update statistics enabled for the database configuration, in most case, you do not need to worry about these details.

How to Move TempDB to Another Drive

I figured i will store this on my blog since each time i scramble to google for the script to move the files. So here we are:

SELECT 'ALTER DATABASE tempdb MODIFY FILE (NAME = [' + f.name + '],'
	+ ' FILENAME = ''T:\TempDB\' + f.name
	+ CASE WHEN f.type = 1 THEN '.ldf' ELSE '.mdf' END
	+ ''');'
FROM sys.master_files f
WHERE f.database_id = DB_ID(N'tempdb');

Copy the output of the above script and paste it in SSMS. I kind of like using the script to dynamically generate the script I need to run. Sometimes, depending of available space, I may have to move log file to another drive.

Sample Output (run this on a new query window)

Please be sure to verify the file path used above. Otherwise, you would fail to start up the SQL server instance after you bounce the services.

If you need up in that situation. Here is what you do:

Re-start the instance in safe mode (-f startup parameter) and move tempdb. Then re-start without the parameter.

Here is an example of what that would look like:

Get Backup history

The following code will get you backup history across all databases in a SQL instance:

SELECT 
 bs.database_name,
(bs.backup_start_date),
 bs.backup_finish_date,
 CAST(CAST(bs.backup_size/1000000 AS INT) AS VARCHAR(14)) + ' ' + 'MB' AS [Size],
 CAST(DATEDIFF(second, bs.backup_start_date,
 bs.backup_finish_date) AS VARCHAR(4)) + ' ' + 'Seconds' [TimeTaken],
 CASE bs.[type]
 WHEN 'D' THEN 'Full Backup'
 WHEN 'I' THEN 'Differential Backup'
 WHEN 'L' THEN 'TLog Backup'
 WHEN 'F' THEN 'File or filegroup'
 WHEN 'G' THEN 'Differential file'
 WHEN 'P' THEN 'Partial'
 WHEN 'Q' THEN 'Differential Partial'
 END AS BackupType,
 bs.server_name,
 bs.recovery_model
 From msdb.dbo.backupset bs
 INNER JOIN msdb.dbo.backupmediafamily bmf 
 ON bs.media_set_id = bmf.media_set_id
 group by bs.backup_start_date,bs.database_name,bs.backup_finish_date,bs.backup_size,bs.[type], bs.server_name, bs.recovery_model
 ORDER BY bs.database_name asc,bs.backup_start_date desc;
 GO

If you need to look at the backup history at a specific database, use the following:

SELECT 
 bs.database_name,
(bs.backup_start_date),
 bs.backup_finish_date,
 CAST(CAST(bs.backup_size/1000000 AS INT) AS VARCHAR(14)) + ' ' + 'MB' AS [Size],
 CAST(DATEDIFF(second, bs.backup_start_date,
 bs.backup_finish_date) AS VARCHAR(4)) + ' ' + 'Seconds' [TimeTaken],
 CASE bs.[type]
 WHEN 'D' THEN 'Full Backup'
 WHEN 'I' THEN 'Differential Backup'
 WHEN 'L' THEN 'TLog Backup'
 WHEN 'F' THEN 'File or filegroup'
 WHEN 'G' THEN 'Differential file'
 WHEN 'P' THEN 'Partial'
 WHEN 'Q' THEN 'Differential Partial'
 END AS BackupType,
 bs.server_name,
 bs.recovery_model
 From msdb.dbo.backupset bs
 INNER JOIN msdb.dbo.backupmediafamily bmf 
 ON bs.media_set_id = bmf.media_set_id
 group by bs.backup_start_date,bs.database_name,bs.backup_finish_date,bs.backup_size,bs.[type], bs.server_name, bs.recovery_model
 HAVING bs.database_name = 'SQLDB'
 ORDER BY bs.database_name asc,bs.backup_start_date desc;
 GO

Why you cannot sometimes kill a DBCC CHECKDB

what does it do in the background?

Lets say you run DBCC CHECKDB regularly at night during off-peak load times and has no problems with it but last week you ran it during the day because of a corruption indication, then decided to stop it, but couldn’t. You killed the SPID and nothing happened. Except now the SPID is showing as KILLED/ROLLBACK state.

What happens under the hood?

It needs to see a transactionally-consistent, unchanging view of the database. Prior to SQL Server 2005 it did this by essentially running its own version of crash recovery inside itself, by analyzing the database’s transaction log. The newer logic was to instead use a private database snapshot, because a database snapshot gives a transactionally-consistent, unchanging view of a database.

So, the first thing DBCC CHECKDB does in SQL server 2005 and later versions is to create a database snapshot.

A database snapshot runs crash recovery of the target database into the database snapshot, and there in lies the problem: crash recovery cannot be interrupted, and there’s no check in the crash recovery code to tell whether it’s real crash recovery, or crash recovery into a database snapshot or into the DBCC CHECKDB private snapshot.

If there’s a lot of transaction log to be recovered as part of the initial crash recovery into the database snapshot, that could take a long time. And if someone tries to kill the DBCC CHECKDB while the database snapshot is still running crash recovery, nothing will happen until the crash recovery finishes. It’s not a DBCC shortcoming, it’s a shortcoming of a database snapshot, and you have need to let it finish.

How do you find which page an index row is on – Corruption scenario

There’s an undocumented column that you can select %%PHYSLOC%% that gives the page:record:slot of the selected record. It gives the information back in hex, so there’s a companion function named sys.fn_PhysLocCracker that formats the output nicely.

For example, lets use Stackoverflow database.

Let’s say there’s a corruption issue where it says there’s a missing row for DisplayName = ‘anonymous’ and Reputation = 7777 in the DisplayName_Reputation nonclustered index (which is index ID 4) of the dbo.Users table. If you want to prove that the row is not missing, here is what you can do:

SELECT DisplayName,
       Reputation,
       physloc.*
FROM dbo.Users
WITH (INDEX = 4)
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) AS [physloc]
WHERE DisplayName = 'anonymous'
AND Reputation = 7777;

You are selecting the index keys for the nonclustered index and using an index hint to force that index to be used.

And if you can get any output, the row exists. In this case, you get:

DisplayName	Reputation	file_id	page_id	slot_id
Anonymous	7777	3	11284812	133

Pretty awesome.

You can also use this to find rows that need to use DBCC WRITEPAGE to around corruptions to allow data recovery from trashed databases – exhaustive, but possible.