Frequently Asked Questions

Q: Why is the same query sometimes fast and sometimes slow, depending on where I run it?

It’s most likely parameter sniffing.

Q: What should you do when your index maintenance jobs take forever?

When you rebuild a 50GB table’s indexes, you’re basically reloading that table from scratch. SQL server has to make a brand-new copy of the table on new data pages, and it logs all this stuff in the transaction log – which means your backups take longer and your Availability Group gets way far behind.

If you’ve been rebuilding indexes nightly, consider easing that off to weekends instead. If you’re worried that will affect performance, you’re probably mixing up the difference between rebuilding indexes and updating statistics. Consider doing daily stats update jobs rather than rebuilding your indexes.

In general, statistics updates are way better than daily index rebuilds.

Q: Which cloud provider should you use to host SQL server?

  • If you run SQL Server in a VM, you can get m ore VM selection & performance at AWS
  • If you want to rent SQL server as a service, Microsoft’s Azure SQL DB Managed instances are quite a bit ahead of Amazon RDS

Find outdated statistics

How often do you update your statistics?

Here is my general advise: https://vinayrahulare.wordpress.com/2020/12/08/database-maintenance-best-practices/

Everyone’s criteria is different for how often you want to update the statistics.

If you like to find outdated statistics. Here is an useful script which shows you the last time your statistics were updated and the number of modifications made on a column. Based on these two parameters you can decide if you would like to update the statistics for a particular table.

SELECT DISTINCT
OBJECT_NAME(s.[object_id]) AS TableName,
c.name AS ColumnName,
s.name AS StatName,
STATS_DATE(s.[object_id], s.stats_id) AS LastUpdated,
DATEDIFF(d,STATS_DATE(s.[object_id], s.stats_id),getdate()) DaysOld,
dsp.modification_counter,
s.auto_created,
s.user_created,
s.no_recompute,
s.[object_id],
s.stats_id,
sc.stats_column_id,
sc.column_id
FROM sys.stats s
JOIN sys.stats_columns sc
ON sc.[object_id] = s.[object_id] AND sc.stats_id = s.stats_id
JOIN sys.columns c ON c.[object_id] = sc.[object_id] AND c.column_id = sc.column_id
JOIN sys.partitions par ON par.[object_id] = s.[object_id]
JOIN sys.objects obj ON par.[object_id] = obj.[object_id]
CROSS APPLY sys.dm_db_stats_properties(sc.[object_id], s.stats_id) AS dsp
WHERE OBJECTPROPERTY(s.OBJECT_ID,'IsUserTable') = 1
AND (s.auto_created = 1 OR s.user_created = 1)
ORDER BY DaysOld;

Temp table caching

When you create a temp table, you might think that it is exclusive to your session. However, it’s not. Other sessions can reuse your table structure and statistics.

Temp tables in Stored procs: Here is what happens when you repeatedly call a stored procedure which uses temp tables. The table statistics from one table of similar structure gets re-used for the next session. How do you solve this problem?

First, you could execute the stored procedure with the WITH RECOMPILE option to produce better plans.

Second, you could add OPTION(RECOMPILE) at the end of the select query. Second option is a better one than the first since the WITH RECOMPILE causes the entire stored procedure to recompile. Whereas OPTION(RECOMPILE) only recompiles the select statement.

Third, to handle the parameter sniffing problem is to add UPDATE STATISTICS command to the stored procedure instead of OPTION(RECOMPILE). The statistics we want to update are created during compilation or recompilation of the SELECT statement, suggesting the UPDATE STATISTICS should come after the SELECT. Still we dont get accurate statistics.

Finally, we add both UPDATE STATISTICS as well as OPTION(RECOMPILE). This finally takes care of the parameter sniffing issues we were seeing earlier. It’s a success!

This goes to show that Cardinality estimation is a tough problem – try using the statistics histograms yourself to predict how the values in the temp table will join to the histogram on the same column for the much larger table. You may find the task of matching values to values using RANGE_ROWS, EQ_ROWS and so on is really not an easy one at all.

Temporary table Caching Internals:

Key point to note is that CREATE TABLE and DROP TABLE do not create and drop temp tables in a stored procedure, if the temporary object can be cached. Instead, the temporary object is renamed to an internal form when DROP TABLE is executed, and renamed back to the same user-visible name when the corresponding CREATE TABLE is encountered on the next execution.

In addition, any statistics that were auto-created on the temporary table are also cached. This means that statistics from a previous execution remain when the procedure is called again.

CREATE OR ALTER PROCEDURE dbo.TempTableDemo
AS
BEGIN
    CREATE TABLE #Temp (col1 integer NULL);

    SELECT ObjectID = 
        OBJECT_ID(N'tempdb..#Temp');

    DROP TABLE #Temp;
END;
GO
EXECUTE dbo.TempTableDemo;
EXECUTE dbo.TempTableDemo;
EXECUTE dbo.TempTableDemo;
GO

Example output:

This is quite different from how a permanent table behaves. With a permanent table, CREATE and DROP definitely create and drop a table on each execution of the procedure. The table is new each time, new statistics and plans are created on each execution.

It allows temporary tables an opportunity to cache plans and temporary objects for reuse, which is not possible with permanent table.

Recompilation Thresholds

Cached query plans include a Recompilation Threshold (RT) for statistics considered interesting by the optimizer while compiling the plan.

The query plan also records the cardinality of tables and indexed view referenced by the plan, at the time the plan is compiled.

Generally, if the column modification counter (colmodctr) for interesting statistics change by RT or more, the query recompiles.

Similarly, if the table cardinality has changed by RT or more compared with the value stored at the time the cached plan was created, the query plan also recompiles.

The value of RT depends on the type of table (permanent or temporary) and the number of rows (n) in the table or indexed view at the time of compilation.

The following summary of RT values has been taken from a Microsoft technical article.

  • Permanent table
    • If n <= 500, RT = 500
    • If n > 500, RT = 500 + 0.20 *n
  • Temporary table
    • If n < 6, RT = 6
    • If n >= 6 and n <= 500, RT = 500
    • If n > 500, RT = 500 + 0.20 * n
  • Table variable
    • RT does not exist, Therefore, recompilations do no happen because of change in cardinalities of table variables.

Generally, although queries referencing permanent tables recompile when interesting statistics modification counters change by RT or more, this is not the case for cached temporary tables in stored procedures, which only recompile when table cardinality changes by RT or more, unless OPTION(RECOMPILE) is used, when statistics counters come back into play.

This could be a bug in SQL server, but on the other hand, it is reasonable that a cached temporary table should not cause a recompilation simply because the procedure is called a number of times with the same parameter, with the number of modifications accumulating on each call to eventually pass RT.

Summary

Temporary tables in stored procs have couple of unexpected behaviors:

  • Temporary objects may be cached across executions in spite of using CREATE and DROP statements
  • Statistics associated with a cached temporary object are also cached
  • Statistics can be incorrect compared with the contents of a temporary table
  • A TRUNCATE table at the end of the procs increases the expected impact on column modification counters
  • It is possible to update cached statistics before the statement that caused them to be created
  • OPTION (RECOMPILE) changes the way column modifications are carried over between procedure executions
  • Manual UPDATE STATISTICS is not enough to force a recompilation

These above behaviors are not intentional and they have been this way for a long time and not very likely to be changed in the foreseeable future.

Are you likely to be affected by these issues?

A temporary table containing wide range of numbers of rows returns per execution is not likely to be affected. If the recompilation threshold is passed, chances are you will get good statistics and any cached plan with recompile without intervention.

For a cached plan, when the temp table contains 20,000 rows the default threshold is 500 + 0.2 * 20,000 = 4500 rows. This means the next execution with more than 24,500 rows or less than 15,500 rows will pass the threshold test.

Avoiding temporary table caching removes the risk of stale statistics, but a cached plan may still be used unless the RECOMPILE query hint is used. Avoiding temporary table caching will also tend to increase contention on tempdb, and can be difficult to enforce in practice.

Procedures that routinely store relatively small result sets and use cached temporary tables are probably most at risk. This is a common scenario where temporary tables are often used for smaller sets.

General preference should be still to opt for simpler relations queries and, where necessary, for temporary tables over table variables as a starting point.

SQL Server Statistics – Explained in short summary

There are 3 types of statistics in a SQL Server database

  1. Statistics created due to index creation. These statistics have the index name
  2. Statistics created by Optimizer (Column stats). Starts with _WA_*
  3. User defined stats which are created with CREATE STATISTICS command by the DBA

General Facts about statistics

  • Index stats are always updated with fullscan when we rebuild the index
  • Column stats are not updated because of daily/weekly index rebuilds, they are only updated either by DBA Auto update stats option or if we manually update statistics using UPDATE STATISTICS command
  • If we do not specify the sampling rate while running the UPDATE STATISTICS command it will take the default sampling rate.
  • If you specify UPDATE STATISTICS command with just the table name, it will update all the statistics including the index statistics on the table. So you may lose the full scan advantage which you get for index stats if you update stats with sampling rate after rebuilding indexes

General Advice: Recommend to leave the Auto Stats option for the database to ON, and if there is a particular case where you are seeing bad query plans on certain tables, just disable Auto Stats for that table and enable the manual statistics job for that table.

How does the Auto Update stats Algorithm work you may ask?

So the Auto Update stats will fire for every 500 + 20% change in table rows. Of course, we have an improved algorithm in version SQL 2012 and above which is SQRT(1000 * Table rows). This is much better than the older algorithm.

When it fires, it will use the default sampling rate and here is the algorithm how it calculates the sampling rate.

  1. If the table < 8MB, then it updates the statistics with a full scan.
  2. If the table > 8MB, it follows an algorithm. It reduces the sampling rate as the number of rows in the table are increased to make sure we are not scanning too much data. This is not a fixed value but is under the control of optimizer. It is not a linear algorithm either.

Example: If we have 1 million rows, it would use a sampling rate of 30% but when the number of rows increase to 8 million, it would reduce the sampling rate to 10%. These sampling rates are not under the DBA’s control but optimizer decides it.

If you ever want to find out which table has been modified the most since the last statistics update. You can do so using the DMV sys.dm_db_stats_properties, simply look at the column modification_counter. This column contains the total number of modifications for the leading statistics column since the last time statistics were updated.

SELECT obj.name, obj.object_id, stat.name, stat.stats_id,
last_updated, modification_counter
FROM sys.objects AS obj
INNER JOIN sys.stats AS stat ON stat.object_id = obj.object_id
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
ORDER BY modification_counter DESC

The column last_updated suggests when was the statistics last updated and modification counter column indicates how many times the column was updated since the last statistics updated.

Please note that if any of your processes drop and recreate tables in the nightly ETLs, you would see the modification_counter as zero since the statistics are dropped when you drop a table and create it.

sys.dm_db_stats_properties is a newer replacement to the uber popular DBCC SHOW_STATISTICS command.

If you have auto-update statistics enabled for the database configuration, in most case, you do not need to worry about these details.

%d bloggers like this: