Recently, I had to re-learn a lesson. About 8 months ago, we migrated most of our environment to SQL Server 2012. Initially, we ran into some real issues with our SQL Server 2012 deployment on a Windows 2008 R2 clustered server. With Microsoft’s help, we were able to address the issues and finish the migration. More recently, we had to stand up some new servers and encountered some similar problems that we encountered with our initial migration. Unfortunately, we did not recognize them to be the same problems initially and lost a day of productivity. Thus, I decided to blog about these issues, so others will have an easier time with their migrations. The issues we faced were as follows:
- Large amounts of memory – SQL Server 2012 has some known issues with large amounts of memory (we had 1TB). There are some hotfixes for specific memory problems that are available. Our issue was that we had a “non-yielding scheduler” which caused the SQL Server service to crash. After opening a ticket with Microsoft, the engineer had me enable the lock pages in memory (LPIM) option (http://technet.microsoft.com/en-us/library/ms190730.aspx). This option causes SQL Server to use different windows APIs to allocate the memory. This worked in our case. The engineer also mentioned this was not a problem with Windows 2012, and I would recommend deploying that operating system with SQL Server 2012.
- SQL Server Resource Group “IsAlive” Check (http://technet.microsoft.com/en-us/library/ff878664.aspx) – SQL Server 2012 is different than SQL Server 2008 in the way it handles clusters. Prior to SQL Server 2012, the cluster host node checked @@SERVERNAME, as a way to see if the server “looks alive”. This was a very speedy check. In SQL Server 2012, they introduced a new procedure, sp_server_diagnostics (http://technet.microsoft.com/en-us/library/ff878233.aspx), which is used for the “looks alive” check. This is not as quick. So, when the server is doing something IO intensive, this check was not responding in a timely fashion, and the resource group was restarting on the same host. The fix/workaround was to use one of the new settings in SQL Server 2012 to manage the failover behavior. We adjusted the HealthCheckTimeout setting to allow more time for the “looks alive” check.
So, hopefully this may be helpful to others doing SQL Server 2012 migrations. Just remember to document each step and the issues involved, so you don’t have to re-learn lessons.