Core and Master databases keep falling under recovery_pending state

We are on an upgraded Sitecore 9.0.2 instance and restoring Core and Master databases from an already running environment to a new one we are building.

After the DB restore as soon as we start the app pool and trying to connect Sitecore 9.0.2 instance with these DBs, the status of databases is changing to recovery_pending. We see the following exceptions in logs as well as on browser.

We confirm there is no space issues on the SQL server, Auto Grow is turned on, tried offline/online databases and the ldf/mdf are not read only.

ERROR Exception processing remote events from database: master 
Exception: System.Data.SqlClient.SqlException 
Message: Database 'Sitecore_Master_stg' cannot be opened due to inaccessible files or insufficient memory or disk space.

Issue Summary:

After a restart of SQL Server 2016, Core and Master databases are online and accessible on SQL Server. When the application starts, these two databases go into recovery pending state and cannot be accessed. Once SQL Server is restarted again they come back online. Each time the databases go into recovery pending, SQL Server issues a stack dump. After review of the dump, I have found an EXCEPTION_ACCESS_VIOLATION in function CSecCtxtCacheStore. We see the following error logs:

2019-04-15 11:35:15.00 spid67      SqlDumpExceptionHandler: Process 67 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.
2019-04-15 11:35:15.00 spid67      * *******************************************************************************
2019-04-15 11:35:15.00 spid67      *
2019-04-15 11:35:15.00 spid67      * BEGIN STACK DUMP:
2019-04-15 11:35:15.00 spid67      *   04/15/19 11:35:15 spid 67
2019-04-15 11:35:15.00 spid67      *
2019-04-15 11:35:15.00 spid67      *
2019-04-15 11:35:15.00 spid67      *   Exception Address = 00007FFD7876F4AE Module(sqllang+000000000000F4AE)
2019-04-15 11:35:15.00 spid67      *   Exception Code    = c0000005 EXCEPTION_ACCESS_VIOLATION
2019-04-15 11:35:15.00 spid67      *   Access Violation occurred reading address 0000000000001278

The dump shows the following exception.

11

An Access Violation (AV) occurs when a program performs an action on a memory address however that action is not allowed. Memory page will be set with proper memory protection option during allocating or protecting calls. If an application access a memory address which does not align with the page protection for that memory, Access Violation exception will be thrown.

The most common AV pattern will be a read or write to address 0 (zero). A common programming issue involves several scenarios where the programmer expects a value to be a valid address, but for whatever reason it did not get set properly (e.g. a function failed and returned 0 or null instead of the expected value).

Resolution:

We raised a case with Microsoft Support and they identified it as a known problem with SQL Server 2016. Support suggested us to apply a cumulative update which resolved the issue:

https://support.microsoft.com/en-us/help/4488536/cumulative-update-6-for-sql-server-2016-sp2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: