How often should you do corruption checks if your databases are large?

Usually we skip corruption until weekend since it takes really long time especially when your databases sizes is larger than 400GB.

For example, a server with 8 cores and 64GB RAM with 200GB database would take about 15 minutes to perform a complete DBCC CHECK with logical check.

Instead, I would suggest the following:

DBCC CHECKDB (Physical_Only)Weekdays
DBCC CHECKDB (Extended checks)Weekends

This way, physical_only checks takes about 5 minutes instead of 15 minutes for a full check. You are still covered in some way rather than no checks whatsoever.

Why you cannot sometimes kill a DBCC CHECKDB

what does it do in the background?

Lets say you run DBCC CHECKDB regularly at night during off-peak load times and has no problems with it but last week you ran it during the day because of a corruption indication, then decided to stop it, but couldn’t. You killed the SPID and nothing happened. Except now the SPID is showing as KILLED/ROLLBACK state.

What happens under the hood?

It needs to see a transactionally-consistent, unchanging view of the database. Prior to SQL Server 2005 it did this by essentially running its own version of crash recovery inside itself, by analyzing the database’s transaction log. The newer logic was to instead use a private database snapshot, because a database snapshot gives a transactionally-consistent, unchanging view of a database.

So, the first thing DBCC CHECKDB does in SQL server 2005 and later versions is to create a database snapshot.

A database snapshot runs crash recovery of the target database into the database snapshot, and there in lies the problem: crash recovery cannot be interrupted, and there’s no check in the crash recovery code to tell whether it’s real crash recovery, or crash recovery into a database snapshot or into the DBCC CHECKDB private snapshot.

If there’s a lot of transaction log to be recovered as part of the initial crash recovery into the database snapshot, that could take a long time. And if someone tries to kill the DBCC CHECKDB while the database snapshot is still running crash recovery, nothing will happen until the crash recovery finishes. It’s not a DBCC shortcoming, it’s a shortcoming of a database snapshot, and you have need to let it finish.

%d bloggers like this: