SQL Server Statistics – Explained in short summary

There are 3 types of statistics in a SQL Server database

  1. Statistics created due to index creation. These statistics have the index name
  2. Statistics created by Optimizer (Column stats). Starts with _WA_*
  3. User defined stats which are created with CREATE STATISTICS command by the DBA

General Facts about statistics

  • Index stats are always updated with fullscan when we rebuild the index
  • Column stats are not updated because of daily/weekly index rebuilds, they are only updated either by DBA Auto update stats option or if we manually update statistics using UPDATE STATISTICS command
  • If we do not specify the sampling rate while running the UPDATE STATISTICS command it will take the default sampling rate.
  • If you specify UPDATE STATISTICS command with just the table name, it will update all the statistics including the index statistics on the table. So you may lose the full scan advantage which you get for index stats if you update stats with sampling rate after rebuilding indexes

General Advice: Recommend to leave the Auto Stats option for the database to ON, and if there is a particular case where you are seeing bad query plans on certain tables, just disable Auto Stats for that table and enable the manual statistics job for that table.

How does the Auto Update stats Algorithm work you may ask?

So the Auto Update stats will fire for every 500 + 20% change in table rows. Of course, we have an improved algorithm in version SQL 2012 and above which is SQRT(1000 * Table rows). This is much better than the older algorithm.

When it fires, it will use the default sampling rate and here is the algorithm how it calculates the sampling rate.

  1. If the table < 8MB, then it updates the statistics with a full scan.
  2. If the table > 8MB, it follows an algorithm. It reduces the sampling rate as the number of rows in the table are increased to make sure we are not scanning too much data. This is not a fixed value but is under the control of optimizer. It is not a linear algorithm either.

Example: If we have 1 million rows, it would use a sampling rate of 30% but when the number of rows increase to 8 million, it would reduce the sampling rate to 10%. These sampling rates are not under the DBA’s control but optimizer decides it.

If you ever want to find out which table has been modified the most since the last statistics update. You can do so using the DMV sys.dm_db_stats_properties, simply look at the column modification_counter. This column contains the total number of modifications for the leading statistics column since the last time statistics were updated.

SELECT obj.name, obj.object_id, stat.name, stat.stats_id,
last_updated, modification_counter
FROM sys.objects AS obj
INNER JOIN sys.stats AS stat ON stat.object_id = obj.object_id
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
ORDER BY modification_counter DESC

The column last_updated suggests when was the statistics last updated and modification counter column indicates how many times the column was updated since the last statistics updated.

Please note that if any of your processes drop and recreate tables in the nightly ETLs, you would see the modification_counter as zero since the statistics are dropped when you drop a table and create it.

sys.dm_db_stats_properties is a newer replacement to the uber popular DBCC SHOW_STATISTICS command.

If you have auto-update statistics enabled for the database configuration, in most case, you do not need to worry about these details.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: