Azure SQL Managed Instance: 1. excessive initial compile times leading to app timeouts and 2. partition maintenance: long-running, bloats data file
7
votes
3
answers
1532
views
We are having a couple of issues preventing us from going live with an on-prem SQL 2016 Std Ed to Azure SQL Managed Instance Business Critical tier re-platform, and I was wondering if anyone had similar issues and had advice for us. Our issues are:
1. The **#1 show-stopping issue is excessive query plan **initial compilation**** for queries in at least a dozen stored procedures and other queries (40 seconds to 174 seconds...I've even seen examples up to 725 seconds!). This is leading to frequent and random **application timeouts** since most are set to the default 30 seconds. To clarify, this is not excessive re-compilations, but a very high CompileTime in the query plan.
The queries and info in https://erikdarling.com/are-long-compile-times-bringing-you-down/ have been very helpful to identify these queries (I've read https://littlekendra.com/2024/03/05/long-compilers-who-time-out-not-in-query-store/ as well). We do see some compilation timeouts as well: https://www.brentozar.com/blitzcache/compilation-timeout/ .
We are attempting to tune these queries and reduce their complexity as we've been told that should reduce CompileTime. We have also tried upgrading the AZ MI tier to Business Critical, adding vCores, and tuning instance and database settings for months.
We have been working a Sev B case with Microsoft for several weeks now and received some tips, but still no resolution. Level 3 Support recommended making sure the compilation timeout queries do not have forced plans, because plan forcing allows the SQL Optimizer to spend 3x the duration than normal to compile, which increases the likelihood of timeout. The only thing that was forcing plans is Automatic tuning which applies to MI for only the FORCE_LAST_GOOD_PLAN option per https://learn.microsoft.com/en-us/sql/relational-databases/automatic-tuning/automatic-tuning?view=sql-server-ver16 and https://learn.microsoft.com/en-us/azure/azure-sql/database/automatic-tuning-overview?view=azuresql . We turned off FORCE_LAST_GOOD_PLAN per their recommendation and that helped some of the high CompileTimes but not all of them. We are not forcing plans with the plan guide (USE PLAN N'') or manual forcing in the Query Store. They recommended tuning queries to reduce their complexity and running a DUMP if turning off FORCE_LAST_GOOD_PLAN doesn't work. I'm hoping to schedule a working session with their performance team to troubleshoot these issues.
**Update:** During our call on 5/28 with Level 3 Support, we noticed that even some trivial queries like SELECT COUNT(*) FROM Table were using OptimizationLevel = FULL instead of TRIVIAL. Level 3 Support said there weren't any settings we could change to affect that, but would mention it to the Product Team.
2. Our **partition maintenance job** which does ALTER PARTITION SPLIT **runs a lot longer (days to weeks)** in MI BusCrit compared to the same code in SQL 2016 Std Ed. It **also chews up a lot of extra used space in the data (and log) file during processing** which doesn't happen in SQL 2016, like a database that is 52 GB grew to 588 GB! while the job was running. The bloating (and performance?) may be partially related to Accelerated Database Recovery (ADR) since that is in SQL 2019+ and AZ MI: https://dba.stackexchange.com/questions/265446/table-size-in-azure-sql-managed-instance-vs-on-premise-sql-server (see also https://learn.microsoft.com/en-us/sql/relational-databases/accelerated-database-recovery-concepts?view=sql-server-ver16) .
We are reviewing the algorithm we inherited as it appears to not follow the best practice in https://learn.microsoft.com/en-us/sql/t-sql/statements/alter-partition-function-transact-sql?view=sql-server-ver16 "Always keep empty partitions at both ends of the partition range. Keep the partitions at both ends to guarantee that the partition split and the partition merge don't incur any data movement. The partition split occurs at the beginning and the partition merge occurs at the end. Avoid splitting or merging populated partitions. Splitting or merging populated partitions can be inefficient. They can be inefficient because the split or merge may cause as much as four times more log generation, and may also cause severe locking." But we're puzzled why the same code runs so much longer and excessively bloats the data file in MI when it didn't in SQL 2016 Std Ed.
Another recommendation we've considered is to move to Azure VMs with SQL, but this will add months to our re-platform launch to set up, test, and operationalize this environment as we don't have any Azure VMs now. We'll also have to operationalize patching, backups, etc.
I appreciate your help!
Mike
Asked by Mike Petrak
(436 rep)
May 23, 2024, 01:28 PM
Last activity: Jul 22, 2024, 10:18 PM
Last activity: Jul 22, 2024, 10:18 PM