Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes
0 answers
92 views
SLEEP_TASK waits on single-threaded UPDATE that does not generate an in-flight plan, how are these waits possible?
Yesterday, I witnesses a running query in `sp_WhoIsActive` that I could not explain. It had the following properties: * It ran in a nested transaction. The first transaction bulk inserts into a temp table that is later given a unique clustered index. The second runs an `UPDATE` against a system-vers...
Yesterday, I witnesses a running query in sp_WhoIsActive that I could not explain. It had the following properties: * It ran in a nested transaction. The first transaction bulk inserts into a temp table that is later given a unique clustered index. The second runs an UPDATE against a system-versioned table. * The UPDATE was running for hours despite usually taking less than seconds. 5 minutes at most. * There were multiple calls to this UPDATE in different sessions, all being blocked by one of them. * sp_WhoIsActive revealed that: * The query has no in-flight execution plan (@get_plans = 2) * **The block leader is almost always blocked on "SLEEP_TASK", but sometimes it isn't blocked at all. I recall that it seems to stay blocked on this for about 4ms.** * The query was still doing work, only reads as far as I could tell, but incredible slowly (from @delta_interval = 10) * My guess is that the query waits on SLEEP_TASK for some set amount of time, does one read, and then goes back to sleep. * All of the reads were logical, not physical. * The block leading query had so far done millions of reads, but the blocked queries had only done tens of thousands * The query was single-threaded (@get_task_info = 2) * open_tran_count was 3 * I cannot share the XML showing the locks, but I find it interesting that it shows that the block leader shows: * S lock on "DATABASE.ENCRYPTION_SCAN" * Sch-S lock on "METADATA.PLAN_GUIDE_HASH" * Lots of locks on both the table and its history table * tempdb locks all presumably taken by the transaction populating the temp table. * For the blocked queries, the lock XML showed that they were all waiting on an IX lock on the clustered index of the history table. This index is partitioned and not unique. The history table has no other indexes. * The main table has only a primary key. It is not partitioned. * I have confirmed that the history table is free of corruption. The text of the query this, but with the names changed.
UPDATE
    MyTable
SET
    MyTable.FloatCol = TempTable.FloatCol,
    MyTable.Date = TempTable.Date
FROM
   #TempTableWithUniqueClusteredIndex AS TempTable
JOIN
    TemporalTableWithJustAPrimaryKey AS MyTable
ON
    MyTable.PrimaryKeyInt = TempTable.PrimaryKeyInt;
Query Store holds an estimated plan for this query. It is of this shape. Query plan shape With this database dropping and creating script, I can get an actual execution plan of identical shape.
SET STATISTICS XML OFF;
IF @@TRANCOUNT > 0
BEGIN
    ROLLBACK;
END;

-- =========================================
-- 1. Drop/Create sample DB and switch context
-- =========================================
USE master;
GO

ALTER DATABASE SampleDB SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO

DROP DATABASE IF EXISTS SampleDB;
GO

CREATE DATABASE SampleDB;
GO

USE SampleDB;
GO

-- =========================================
-- 2. Partition function & scheme for history
--    (partitioning by ValidFrom date/time)
-- =========================================
CREATE PARTITION FUNCTION PF_ValidFrom (DATETIME2) 
AS RANGE LEFT FOR VALUES
    ('2015-01-01','2020-01-01','2025-01-01');
GO

CREATE PARTITION SCHEME PS_ValidFrom 
AS PARTITION PF_ValidFrom
    ALL TO ([PRIMARY]);
GO

-- =========================================
-- 3. Pre-create history table on scheme
-- =========================================
CREATE TABLE dbo.MainTableHistory
(
    PrimaryKeyInt INT        NOT NULL,
    FloatCol      FLOAT      NOT NULL,
    [Date]        DATE       NOT NULL,
    ValidFrom     DATETIME2  NOT NULL,
    ValidTo       DATETIME2  NOT NULL
)
ON PS_ValidFrom (ValidFrom);
GO

-- Partitioned, non-unique clustered index on history
CREATE CLUSTERED INDEX CX_MainTableHistory
    ON dbo.MainTableHistory(PrimaryKeyInt, ValidFrom)
    ON PS_ValidFrom (ValidFrom);
GO

-- =========================================
-- 4. Create main temporal table (unpartitioned)
--    with a DATE column and system-period cols
-- =========================================
CREATE TABLE dbo.TemporalTableWithJustAPrimaryKey
(
    PrimaryKeyInt INT         NOT NULL
      CONSTRAINT PK_TemporalTable PRIMARY KEY,
    FloatCol       FLOAT       NOT NULL,
    [Date]         DATE        NOT NULL,
    ValidFrom      DATETIME2   GENERATED ALWAYS AS ROW START NOT NULL,
    ValidTo        DATETIME2   GENERATED ALWAYS AS ROW END   NOT NULL,
    PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
)
ON [PRIMARY];
GO

-- =========================================
-- 5. Seed MainTable with 1,000 rows
-- =========================================
INSERT INTO dbo.TemporalTableWithJustAPrimaryKey (PrimaryKeyInt, FloatCol, [Date])
SELECT TOP (1000)
       ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
       RAND(CHECKSUM(NEWID())) * 100,                -- random float
       DATEADD(day, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 365, '2020-01-01')
FROM sys.all_columns a
CROSS JOIN sys.all_columns b;
GO

-- =========================================
-- 6. Turn on system versioning pointing at our history
-- =========================================
ALTER TABLE dbo.TemporalTableWithJustAPrimaryKey
    SET (SYSTEM_VERSIONING = ON 
         (HISTORY_TABLE = dbo.MainTableHistory
         ,DATA_CONSISTENCY_CHECK = ON));
GO

-- =========================================
-- 7. Simulate nested transactions:
-- =========================================
BEGIN TRAN
    CREATE TABLE #TempTableWithUniqueClusteredIndex
    (
        PrimaryKeyInt INT   NOT NULL,
        FloatCol       FLOAT NOT NULL,
        [Date]         DATE  NOT NULL
    );

    INSERT INTO #TempTableWithUniqueClusteredIndex (PrimaryKeyInt, FloatCol, [Date])
    SELECT TOP (50)
           ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
           RAND(CHECKSUM(NEWID())) * 100,
           DATEADD(day, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 365, '2021-01-01')
    FROM sys.all_columns a
    CROSS JOIN sys.all_columns b;

    CREATE UNIQUE CLUSTERED INDEX CX_Temp_UC
      ON #TempTableWithUniqueClusteredIndex(PrimaryKeyInt);

BEGIN TRAN
    SET STATISTICS XML ON;

    UPDATE
        MyTable
    SET
        MyTable.FloatCol = TempTable.FloatCol,
        MyTable.Date = TempTable.Date
    FROM
       #TempTableWithUniqueClusteredIndex AS TempTable
    JOIN
        TemporalTableWithJustAPrimaryKey AS MyTable
    ON
        MyTable.PrimaryKeyInt = TempTable.PrimaryKeyInt;

ROLLBACK;
My question is this: **does anything that I have said above explain these SLEEP_TASK waits?** To [my knowledge](https://www.sqlskills.com/help/waits/sleep_task/) , such waits should be very rare on single-threaded queries.
J. Mini (1225 rep)
Jun 29, 2025, 01:02 AM • Last activity: Jul 2, 2025, 07:27 PM
2 votes
1 answers
241 views
Recurring High PREEMPTIVE_OS_QUERYREGISTRY on Sql Serever 2022
We experienced an issue on our Sql Server running on a VM with Windows Server 2022 after migrating from Sql Server 2019 and without any change to workload and to our schema. >Microsoft SQL Server 2022 (RTM-GDR) (KB5046861) - 16.0.1135.2 (X64) Oct 18 2024 15:31:58 Copyright (C) 2022 Microsoft Corpora...
We experienced an issue on our Sql Server running on a VM with Windows Server 2022 after migrating from Sql Server 2019 and without any change to workload and to our schema. >Microsoft SQL Server 2022 (RTM-GDR) (KB5046861) - 16.0.1135.2 (X64) Oct 18 2024 15:31:58 Copyright (C) 2022 Microsoft Corporation Standard Edition (64-bit) The issue occurs about every 2 months, for no apparent reason the PREEMPTIVE_OS_QUERYREGISTRY waits suddenly appear and the server gradually begins to be less and less responsive. This is the image taken by datadog monitoring system showing the number of these waits detected enter image description here We need to restart the service in order to return immediately at the standard performance. In the past few months we turned on trace flag 12502, how suggested by Microsoft in this link https://learn.microsoft.com/en-us/troubleshoot/sql/releases/sqlserver-2022/cumulativeupdate5#2351584 , but we still encountered the issue. Our hosting providers has not encountered any anomalies in the past occurrences of this issue that is similar to one described here: https://dba.stackexchange.com/questions/343585/high-preemptive-os-queryregistry-waits-on-new-sql-server2022 I've taken an Extended Event capture with callstack_rva filtered on the external wait PREEMPTIVE_OS_QUERYREGISTRY here . CREATE EVENT SESSION [External_Waits] ON SERVER ADD EVENT sqlos.wait_info_external( ACTION(package0.callstack_rva) WHERE ([wait_type]='PREEMPTIVE_OS_QUERYREGISTRY')) ADD TARGET package0.event_file(SET filename=N'External_Waits') Do you have any suggestions please? Thank you!
Raffaele Della Germania (23 rep)
Apr 16, 2025, 08:14 AM • Last activity: Apr 22, 2025, 03:24 PM
3 votes
1 answers
3254 views
WAIT STATS - high Wait Time on ASYNC_IO_COMPLETION
I currently look at my wait stats. I followed best practices explained by [Paul Randal][1] and go through the documents he references. Looking at the statistic slices I take once / day for a while now from all production servers, I see significant high values for Avg_Wait_S, meaning the relation of...
I currently look at my wait stats. I followed best practices explained by Paul Randal and go through the documents he references. Looking at the statistic slices I take once / day for a while now from all production servers, I see significant high values for Avg_Wait_S, meaning the relation of total Wait in Seconds divided by WaitCounts. The value for wait count is very low compared to all the other wait types (the lowest), but on each of my servers the AvgWait_S for this type is between 120 - 2493 seconds! This looks like a massive wait time. I read this can anyhow be caused by backups. But I wonder if this is kind of a 'normal' value? How to get more details about it? Here is a sample resultset of my wait stats: WAITTYPE Wait_S Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S BACKUPIO 46151.80 45093.07 1058.74 22819151 26.55 0.0020 0.0020 0.0000 CXPACKET 45057.81 27926.59 17131.22 20387211 25.92 0.0022 0.0014 0.0008 BACKUPBUFFER 16658.83 15867.71 791.12 8993341 9.58 0.0019 0.0018 0.0001 PAGEIOLATCH_SH 15326.95 15284.65 42.30 2131848 8.82 0.0072 0.0072 0.0000 ASYNC_IO_COMPLETION 14203.17 14203.17 0.00 9 8.17 1578.1303 1578.1303 0.0000 WRITELOG 8570.20 8377.81 192.39 2136964 4.93 0.0040 0.0039 0.0001 PAGEIOLATCH_EX 7691.32 7673.31 18.01 1777069 4.43 0.0043 0.0043 0.0000 SOS_SCHEDULER_YIELD 4548.90 43.44 4505.47 13294384 2.62 0.0003 0.0000 0.0003 LCK_M_S 3018.76 3018.06 0.71 1704 1.74 1.7716 1.7712 0.0004 ASYNC_NETWORK_IO 2678.83 2147.78 531.05 4372825 1.54 0.0006 0.0005 0.0001 LCK_M_SCH_S 2612.09 2612.08 0.01 37 1.50 70.5971 70.5968 0.0003
Magier (4827 rep)
Nov 17, 2015, 12:46 PM • Last activity: Mar 12, 2025, 12:04 PM
3 votes
1 answers
103 views
What does it mean if all of my wait_resources in the System Health Extended Event are the same pointer?
It's notoriously impossible to decode [what the wait_resource pointer in the System Health Extended Event means](https://dba.stackexchange.com/questions/320594/identifying-wait-resource-for-wait-info-extended-events). But if all of my long waits in that Extended Event for a certain time period are r...
It's notoriously impossible to decode [what the wait_resource pointer in the System Health Extended Event means](https://dba.stackexchange.com/questions/320594/identifying-wait-resource-for-wait-info-extended-events) . But if all of my long waits in that Extended Event for a certain time period are reporting the same pointer, then does that indicate anything at all? I'm experiencing high PAGELATCH_SH waits for this time period and expect that the reason why all of pointers are the same will be related to that. I've not included the XML here because I am absolutely certain that it won't help answer the question. I'm just trying to fathom if there is ever a case where I should pay attention to the wait_resource value in this Extended Event when it is obviously a pointer.
J. Mini (1225 rep)
Mar 5, 2025, 06:28 PM • Last activity: Mar 11, 2025, 03:04 AM
0 votes
1 answers
363 views
Diagnose the exact cause of WAIT_ON_SYNC_STATISTICS_REFRESH
We are currently facing query timeout issues with our online API . The related request is an ad hoc select query referencing two tables. In Redgate, we see that each time the request is blocked, it's due to "WAIT_ON_SYNC_STATISTICS_REFRESH". I tried to match occurrence of blocking with ongoing stati...
We are currently facing query timeout issues with our online API . The related request is an ad hoc select query referencing two tables. In Redgate, we see that each time the request is blocked, it's due to "WAIT_ON_SYNC_STATISTICS_REFRESH". I tried to match occurrence of blocking with ongoing statistic refresh using requests on sys.stats or sys.dm_tran_locks but I can't find any clear relations. I can't find any updated statistic related to the request during the occurrence of blocking, neither I can manage to find CPU or I/O consumption issues. I know we could try to use the asynchronous statistic update strategy on the database, but I would like to understand the reasons of the problem first. Any idea would be welcome.
Nicolas Gicquel (1 rep)
Jun 24, 2022, 01:54 PM • Last activity: Feb 5, 2025, 03:06 PM
0 votes
0 answers
148 views
Blocking with Strange Wait Resource
My shop had two significant blocking storms yesterday with most of the blocking victims piling up behind Drop table commands in a user database for short life processing tables (coding practice that is under review). This wait resource has surprisingly few online references. I haven’t seen this one...
My shop had two significant blocking storms yesterday with most of the blocking victims piling up behind Drop table commands in a user database for short life processing tables (coding practice that is under review). This wait resource has surprisingly few online references. I haven’t seen this one before. **WaitText**
-none
LCK_M_SCH_M
**WaitResource**
-none              
METADATA: database_id = 18
ALL_AUDIT_SPECIFICATIONS_AND_ACTIONS(database_id = 18), lockPartitionId = 0
Any insights or resources on this wait resource would be appreciated. --- Audit scripts:
/****** Object:  Audit [DatabasePermissionsAudit]    Script Date: 12/18/2024 3:30:13 PM ******/

CREATE SERVER AUDIT [name1]

TO FILE

(              FILEPATH = N'C:\mypath\DatabaseAuditLogs\'

               ,MAXSIZE = 500 MB

               ,MAX_FILES = 5

               ,RESERVE_DISK_SPACE = OFF

) WITH (QUEUE_DELAY = 1000, ON_FAILURE = CONTINUE, AUDIT_GUID = 'b6c33b5a-93a0-419f-ad33-d0d12ce73f62')
USE [master]

GO

 

CREATE SERVER AUDIT SPECIFICATION [Name1]

FOR SERVER AUDIT [Name1]

ADD (DATABASE_ROLE_MEMBER_CHANGE_GROUP),

ADD (DATABASE_PERMISSION_CHANGE_GROUP),

ADD (DATABASE_PRINCIPAL_CHANGE_GROUP)

WITH (STATE = ON)

GO

 

 

USE [master]

GO

 



ALTER SERVER AUDIT [Name1] WITH (STATE = ON)

GO

 

 

/****** Object:  Audit [ServerPermissionsAudit]    Script Date: 12/18/2024 3:30:55 PM ******/

CREATE SERVER AUDIT [Name2]

TO FILE

(              FILEPATH = N'C:\mypath\ServerAuditLogs\'

               ,MAXSIZE = 500 MB

               ,MAX_FILES = 5

               ,RESERVE_DISK_SPACE = OFF

) WITH (QUEUE_DELAY = 1000, ON_FAILURE = CONTINUE, AUDIT_GUID = '966a14e5-6efe-4085-87af-3f48d61b5dad')

ALTER SERVER AUDIT [name2] WITH (STATE = ON)
CREATE SERVER AUDIT SPECIFICATION [Name2]

FOR SERVER AUDIT [Name2]

ADD (SERVER_ROLE_MEMBER_CHANGE_GROUP),

ADD (SERVER_PRINCIPAL_CHANGE_GROUP)

WITH (STATE = ON)

GO

GO
user901135 (11 rep)
Dec 18, 2024, 08:07 PM • Last activity: Dec 20, 2024, 02:24 AM
0 votes
2 answers
163 views
What wait types would terrible RCSI performance cause?
Read Committed Snapshot Isolation (RCSI) is well-understood. The key way that it can cause performance bottlenecks is if the version chains get too long. Are there any wait type that indicate this specific issue? Or is it only shown by other wait types (and if so, what?).
Read Committed Snapshot Isolation (RCSI) is well-understood. The key way that it can cause performance bottlenecks is if the version chains get too long. Are there any wait type that indicate this specific issue? Or is it only shown by other wait types (and if so, what?).
J. Mini (1225 rep)
Nov 15, 2024, 02:44 PM • Last activity: Nov 30, 2024, 12:42 PM
0 votes
1 answers
94 views
MySQL Amazon Aurora - why is CPU low while max vCPU is high?
In MySQL Amazon Aurora, is it normal for `CPU` to be be low while other metrics are above the `max vCPU` line? [![DB metrics][1]][1] [1]: https://i.sstatic.net/dUnGz.png
In MySQL Amazon Aurora, is it normal for CPU to be be low while other metrics are above the max vCPU line? DB metrics
mstrom (143 rep)
Mar 6, 2024, 07:00 PM • Last activity: Mar 7, 2024, 09:51 AM
-2 votes
1 answers
662 views
Huge PREEMPTIVE_OS_GETPROCADDRESS Waits From sp_OAMethod. How to debug?
Some queries are hitting my CPU extremely hard. `sp_WhoIsActive` reports that `sp_OAMethod` is the cause (the `sql_text` column points to it) and that it has of huge waits of the `PREEMPTIVE_OS_GETPROCADDRESS` wait type. Given that `sp_OAMethod` is a built-in stored procedure, how can I debug this?...
Some queries are hitting my CPU extremely hard. sp_WhoIsActive reports that sp_OAMethod is the cause (the sql_text column points to it) and that it has of huge waits of the PREEMPTIVE_OS_GETPROCADDRESS wait type. Given that sp_OAMethod is a built-in stored procedure, how can I debug this? I am on a 2019 version of SQL server, 15.0.something.
J. Mini (1225 rep)
Nov 22, 2023, 12:12 PM • Last activity: Dec 1, 2023, 10:56 PM
6 votes
4 answers
19446 views
Why decrease the wait_timeout configuration parameter in MySQL?
**Question** I'm working on an application that uses a connection pool in Apache Tomcat to communicate with MySQL. I was wondering why would you want to use a smaller wait_timeout than the default of 28,800 seconds? I see plenty of downsides to decreasing the wait timeout but what are the upsides? *...
**Question** I'm working on an application that uses a connection pool in Apache Tomcat to communicate with MySQL. I was wondering why would you want to use a smaller wait_timeout than the default of 28,800 seconds? I see plenty of downsides to decreasing the wait timeout but what are the upsides? **Background** The parameter wait_timeout is defined as > The number of seconds the server waits for activity on a > noninteractive connection before closing it. **Problem** If I make the wait_timeout too small then MySQL kills connections that are still valid in the connection pool my web application is using. In my situation I am the only person currently using the application at any time so naturally the db connection goes idle when I'm not testing it. I could solve this problem by simply extending the wait_timeout on MySQL to something very large so that even if I'm away from my desk for 2 weeks the pool never returns a dead connection. # QUESTIONS - Why would anyone ever want a shorter wait_time? - Is the overhead of storing connections that bad? - Although I understand connection pools alleviate the overhead of CREATING connections, is it such an issue to keep idle connections in memory?
Usman Mutawakil (163 rep)
May 2, 2013, 12:25 PM • Last activity: Mar 2, 2023, 10:28 PM
4 votes
2 answers
2854 views
Does a query that is suspended due to an ongoing ASYNC_NETWORK_IO cause blocking?
I am looking at the *dm_exec_sessions* and *requests* DMVs on a production server. There are some queries with status 'suspended', a duration of 12 minutes, and a wait type of `ASYNC_NETWORK_IO`. I understand this is because the client application isn't quick enough to fetch the data (or the result...
I am looking at the *dm_exec_sessions* and *requests* DMVs on a production server. There are some queries with status 'suspended', a duration of 12 minutes, and a wait type of ASYNC_NETWORK_IO. I understand this is because the client application isn't quick enough to fetch the data (or the result is so large that the client program is taking time to consume it). Does such a query (that is suspended due to an ongoing ASYNC_NETWORK_IO) cause blocking on this table? The query is a SELECT and I cannot see this query blocking any other query. Hence my question to ask is whether it can potentially block anything. For example: Does ASYNC_NETWORK_IO mean that execution is ongoing or execution has finished and the client app is pulling the data? In the latter case, I don't see the reason why this query could potentially block other queries given that it has already produced its result. The locks held by the query are as follows: - Page lock request mode S - Object lock request mode IS
variable (3590 rep)
Feb 2, 2023, 06:57 AM • Last activity: Feb 2, 2023, 12:28 PM
1 votes
0 answers
183 views
Which SQL Server performance condition DB alert can be used to find "RESOURCE_SEMAPHORE" waits?
I'm trying 2 SQL Server Alerts: SQL Server:Memory Manager / Memory Grants Pending / Waits in progress / rises above / Value: 5 SQL Server:wait Statistics / Memory grant queue waits / Waits in progress / rises above / Value: 5 This is SQL Server 2008 R2. (I know were building DEV env now for 2019) 8...
I'm trying 2 SQL Server Alerts: SQL Server:Memory Manager / Memory Grants Pending / Waits in progress / rises above / Value: 5 SQL Server:wait Statistics / Memory grant queue waits / Waits in progress / rises above / Value: 5 This is SQL Server 2008 R2. (I know were building DEV env now for 2019) 8 cores, 50 GB memory server, 25GB Max SQL memory size - soon to be 35GB, 3500-4000 sessions, 350GB DB/250GB TLOG - Manufacturing lines monitoring Application. Many parallel queries due to 1 billion row table. (I know... purge and archive is part of upgrade)
sherveyj (111 rep)
Oct 11, 2022, 08:58 PM • Last activity: Oct 12, 2022, 05:25 AM
2 votes
2 answers
3457 views
How to dig deeper into ASYNC_NETWORK_IO waits
I've been handed a graph that comes from SolarWinds showing top wait events broken down by day. The last 2 days there's a part of the daily color coded bar representing a total of 3 hours for ASYNC_NETWORK_IO. Still trying to see if I can get access to Solar Winds panels to try and see if it offers...
I've been handed a graph that comes from SolarWinds showing top wait events broken down by day. The last 2 days there's a part of the daily color coded bar representing a total of 3 hours for ASYNC_NETWORK_IO. Still trying to see if I can get access to Solar Winds panels to try and see if it offers a drill down into the graph or not. I've been searching through google, websites and documentation for the last 2 hours and finding little of consequence to tell me how to drill down into this 3 hours. Is it one large block, is it a few minutes here and there through out the day, does it match any particular window of high I/O activity (data warehouse refresh or something). All I have is "here, there's a total of 3 hours of this wait yesterday and the day before, now tell me why" - and I must admit I really don't have any idea on how to drill down into this any further. I've read all sorts of articles about "Well, it's usually a poorly designed application" or something along those lines. Funny this is of the last 7 days, the first 5 are all clear. No signs of this wait. Suddenly there's a huge chunk of it. I need to know how I can dig more. As far as I'm aware there are no user complaints about system performance. Is there a DMV or something to help out here ? Anyone able to give me some pointers? Thanks
SJWales (109 rep)
Aug 15, 2020, 12:48 AM • Last activity: Oct 11, 2022, 11:24 PM
35 votes
2 answers
16977 views
Find Queries that Are Causing Waits
Given a certain kind of wait, how do you find which queries are causing those waits safely on a production SQL 2008 R2 Server? In this particular case I am wondering about `async_network_io`.
Given a certain kind of wait, how do you find which queries are causing those waits safely on a production SQL 2008 R2 Server? In this particular case I am wondering about async_network_io.
Kyle Brandt (2335 rep)
Mar 4, 2011, 09:10 PM • Last activity: Jun 8, 2022, 08:01 AM
0 votes
2 answers
643 views
Does ASYNC_NETWORK_IO wait types result in thread locking until entire result is consumed by client application?
ASYNC_NETWORK_IO wait types are mostly caused because the client application cannot process the data that arrives from SQL Server fast enough. As the query executes, and as the results start getting available for consumption, the client application starts consuming the results. Assuming the result i...
ASYNC_NETWORK_IO wait types are mostly caused because the client application cannot process the data that arrives from SQL Server fast enough. As the query executes, and as the results start getting available for consumption, the client application starts consuming the results. Assuming the result is held by say 4 threads, then, until the result is entirely consumed by the client application, do the threads that hold the data remain locked?
variable (3590 rep)
May 30, 2022, 01:36 PM • Last activity: Jun 2, 2022, 06:21 AM
4 votes
1 answers
1458 views
Extended event - slow queries and waits
I have an extended event which filters on my slow queries. I have created the following script for it. CREATE EVENT SESSION [SlowQueriesAndStatementsLargerThan3Seconds] ON SERVER ADD EVENT sqlserver.rpc_completed(SET collect_statement=(1) ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sq...
I have an extended event which filters on my slow queries. I have created the following script for it. CREATE EVENT SESSION [SlowQueriesAndStatementsLargerThan3Seconds] ON SERVER ADD EVENT sqlserver.rpc_completed(SET collect_statement=(1) ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_id,sqlserver.database_name,sqlserver.nt_username,sqlserver.session_id,sqlserver.session_nt_username,sqlserver.sql_text,sqlserver.username) WHERE ([duration]>=(3000000))), ADD EVENT sqlserver.sp_statement_completed( ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.database_name,sqlserver.nt_username,sqlserver.server_principal_name,sqlserver.session_id) WHERE ([duration]>=(3000000))), ADD EVENT sqlserver.sql_batch_completed( ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_id,sqlserver.database_name,sqlserver.nt_username,sqlserver.session_id,sqlserver.session_nt_username,sqlserver.sql_text,sqlserver.username) WHERE ([duration]>=(3000000))) ADD TARGET package0.event_file(SET filename=N'SlowQueriesAndStatementsLargerThan3Seconds'), ADD TARGET package0.ring_buffer WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=ON) GO I would like to include where my query is waiting for when it is being slow. I would like to include the wait_info for it, however when I add this, my trace fills up with unrelated waits for other queries. To trace the waits I've added the following event: CREATE EVENT SESSION [SlowQueriesAndStatementsLargerThan3Seconds] ON SERVER ADD EVENT sqlos.wait_info( ACTION(sqlos.task_elapsed_quantum,sqlos.task_time,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_id,sqlserver.database_name,sqlserver.session_id,sqlserver.sql_text) WHERE ([opcode]='End')), ADD EVENT sqlserver.rpc_completed(SET collect_statement=(1) ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_id,sqlserver.database_name,sqlserver.nt_username,sqlserver.session_id,sqlserver.session_nt_username,sqlserver.sql_text,sqlserver.username) WHERE ([duration]>=(3000000))), ADD EVENT sqlserver.sp_statement_completed( ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.database_name,sqlserver.nt_username,sqlserver.server_principal_name,sqlserver.session_id) WHERE ([duration]>=(3000000))), ADD EVENT sqlserver.sql_batch_completed( ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_id,sqlserver.database_name,sqlserver.nt_username,sqlserver.session_id,sqlserver.session_nt_username,sqlserver.sql_text,sqlserver.username) WHERE ([duration]>=(3000000))) ADD TARGET package0.event_file(SET filename=N'SlowQueriesAndStatementsLargerThan3Seconds'), ADD TARGET package0.ring_buffer WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=ON) GO Is it possible to have the waits included for only the queries which ran for in this case 3 seconds or longer?
Peter (2540 rep)
May 31, 2022, 06:28 AM • Last activity: May 31, 2022, 05:46 PM
0 votes
1 answers
303 views
XE_SERVICES_RWLOCK in top waits
We start facing some uncommon waits after the last release. `XE_SERVICES_RWLOCK` waits starts to appear in our top waits. Investigation shows that this waits happen during insert to our main OLTP tables. The only thing we change in this part is that we change the insert into identity column by inser...
We start facing some uncommon waits after the last release. XE_SERVICES_RWLOCK waits starts to appear in our top waits. Investigation shows that this waits happen during insert to our main OLTP tables. The only thing we change in this part is that we change the insert into identity column by inserting with sequence. Actually we have not removed identity option from the column. Rather we are just doing SET IDENTITY_INSERT ON INSERT STATEMENT HERE SET IDENTITY INSERT OFF The reason that we have not removed the identity property was that it requires to rebuild the table, but our table is huge and it will require hours of downtime. The question is what are XE_SERVICES_RWLOCK waits, can they appear in case of massive SET IDENTITY_INSERT ON|OFF statements or can the root case be using sequence instead of identity? We run sp_WhoIsActive every 30 sec and save result into tables to be able to investigate problems later. You can see results from that table attached. enter image description here We are using SQL Server 2016 SP2.
Artashes Khachatryan (1533 rep)
Dec 8, 2021, 11:51 AM • Last activity: Dec 9, 2021, 05:59 PM
4 votes
3 answers
15294 views
Postgres 10 Query waits on IO: BufFileWrite causing failure to get new database connections
I have a query (attached with query plan) that is run every 5-7 seconds from every device that is running our dashboard (typically, 500+ devices at peak time). This query, at the outset looks like it spends time in wait state `IO:BufFileWrite`. From AWS Aurora Performance Insights dashboard, one can...
I have a query (attached with query plan) that is run every 5-7 seconds from every device that is running our dashboard (typically, 500+ devices at peak time). This query, at the outset looks like it spends time in wait state IO:BufFileWrite. From AWS Aurora Performance Insights dashboard, one can see that the query in question spends more time in IO: BUfFileWrite wait state (sky blue color in the graph) Performance_Insights **Postgres Configuration / Details:** 1. AWS Aurora PostgreSQL 10.6 1. R5.4X Large instance (128 GB RAM) 1. work_mem = 80MB 1. I use a hikari connection pool to manage the DB connections. A week earlier, I started seeing many errors on my app server: > Failed to get database connection errors and/or > java.sql.SQLTimeoutException: Timeout after 30000ms of waiting for a connection A little debugging and with help of PGAdmin, saw that most of the connections were waiting on IO: BufFileWrite and hence realised that the default **4MB** work_mem was not enough for the query. PGAdmin stats With the help of PgBadger, saw that the average temp file size was 70MB and hence updated the work_mem to 80 MB. Temp_File_Pg_Badger This definitely helped, I started seeing lesser DB connections issue, but it did not go away completely. I used Postgres Visualize Analyzer and Explain Depesz to understand the query plan and saw that an index only scan has a **Total Cost of 4984972.45 and Plan Rows 111195272.** This table (students_points) actually has 100M+ rows and is ~15GB in size and is not partitioned. I tried adding a partial index ( create index students_latest_point_idx ON students_points (student_id, evaluation_date desc) ), in the hope that the cost of the above scan would improve, but in vain. I have run VACUUM ( FULL, ANALYZE, VERBOSE); and REINDEX on the tables involved, but no visible performance improvement. I need help with the following 1. What does the never executed part of the query plan mean? - I have checked the literature on the web, but no satisfactory explanation apart from Postgres engine thinks that it's not relevant / returned 0 rows. 1. Should I look at/be worried at **Total Cost of 4984972.45 and Plan Rows 111195272.** from the query plan, even though it says **never executed**? 1. What would lead to excessive time spent in the wait state BufFileWrite? From what I understand, when a sort/filter is being applied, temp files are used and this shows up as the BufFileWrite wait state. Monitoring_Postgres 1. Where would you advise me to start with, to reduce the time spent by the query in the IO wait state of BufFileWrite? - I have tried, Vacuum, Reindex, adding new partial index - but didn't help. 1. One thing on my mind is, instead of using the students_points table (which has 1 row, for every student, for every test, he takes every week, over 4 years) so it builds up fast, create a new table that will hold only the latest points for every student (hence only as many rows as there are students) and use that in the query. Any help is appreciated. Thanks in advance. If you need any more information, I would be happy to provide. ### The query and the plan
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, TIMING, SUMMARY) 
SELECT 
    SUM(CASE WHEN just_pass = 1 THEN 1 ELSE 0 END) point,
    SUM(CASE WHEN week_of_birthday = TRUE THEN 1 ELSE 0 END) birthday, 
    SUM(CASE WHEN fresh_grad = 1 THEN 1 ELSE 0 end) fresher, 
    SUM(CASE WHEN intervention_type::INT = 2 OR intervention_type::INT = 3 THEN 1 ELSE 0 END) attention 
FROM 
    (
        SELECT 
            checkins.student_id, intercepts.intervention_type ,max(evaluation_date), just_pass, 
            compute_week_of_birthday(student_birthdate, 4 , 'US/Central') as week_of_birthday, 
            CASE 
                WHEN student_enrolment_date NOTNULL AND student_enrolment_date >= '2017-01-29' AND student_enrolment_date  '2019-06-11 01:00:40' AND timestamp=18 
        AND 
            lower(registration_type_description) !~* '.*temporary.*' 
        GROUP BY 
            checkins.student_id, students.student_enrolment_date, student_birthdate, just_pass, intercepts.intervention_type
    ) AS result 
WHERE 
    max IN 
        (
            SELECT 
                evaluation_date 
            FROM 
                students_points 
            ORDER BY 
                evaluation_date DESC LIMIT 1
        ) 
        OR 
    max ISNULL;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      QUERY PLAN                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=74433.83..74433.84 rows=1 width=32) (actual time=0.081..0.081 rows=1 loops=1)
   Output: sum(CASE WHEN (result.just_pass = 1) THEN 1 ELSE 0 END), sum(CASE WHEN result.week_of_birthday THEN 1 ELSE 0 END), sum(CASE WHEN (result.fresh_grad = 1) THEN 1 ELSE 0 END), sum(CASE WHEN (((result.intervention_type)::integer = 2) OR ((result.intervention_type)::integer = 3)) THEN 1 ELSE 0 END)
   Buffers: shared hit=20
   ->  Subquery Scan on result  (cost=74412.92..74432.98 rows=34 width=15) (actual time=0.079..0.079 rows=0 loops=1)
         Output: result.student_id, result.intervention_type, result.max, result.just_pass, result.week_of_birthday, result.fresh_grad, students.student_enrolment_date, students.student_birthdate
         Filter: ((hashed SubPlan 1) OR (result.max IS NULL))
         Buffers: shared hit=20
         ->  GroupAggregate  (cost=74412.31..74431.52 rows=68 width=35) (actual time=0.079..0.079 rows=0 loops=1)
               Output: checkin_table.student_id, intervention_table.intervention_type, max(points.evaluation_date), points.just_pass, compute_week_of_birthday(students.student_birthdate, 4, 'US/Central'::text), CASE WHEN ((students.student_enrolment_date IS NOT NULL) AND (students.student_enrolment_date >= '2017-01-29'::date) AND (students.student_enrolment_date   Sort  (cost=74412.31..74412.48 rows=68 width=30) (actual time=0.078..0.078 rows=0 loops=1)
                     Output: checkin_table.student_id, intervention_table.intervention_type, points.just_pass, students.student_enrolment_date, students.student_birthdate, points.evaluation_date
                     Sort Key: checkin_table.student_id, students.student_enrolment_date, students.student_birthdate, points.just_pass, intervention_table.intervention_type
                     Sort Method: quicksort  Memory: 25kB
                     Buffers: shared hit=20
                     ->  Nested Loop Left Join  (cost=70384.64..74410.24 rows=68 width=30) (actual time=0.035..0.035 rows=0 loops=1)
                           Output: checkin_table.student_id, intervention_table.intervention_type, points.just_pass, students.student_enrolment_date, students.student_birthdate, points.evaluation_date
                           Buffers: shared hit=6
                           ->  Nested Loop Left Join  (cost=70384.08..74151.91 rows=1 width=22) (actual time=0.035..0.035 rows=0 loops=1)
                                 Output: checkin_table.student_id, intervention_table.intervention_type, students.student_birthdate, students.student_enrolment_date
                                 Buffers: shared hit=6
                                 ->  Nested Loop  (cost=8.90..25.46 rows=1 width=16) (actual time=0.034..0.034 rows=0 loops=1)
                                       Output: checkin_table.student_id, students.student_birthdate, students.student_enrolment_date
                                       Buffers: shared hit=6
                                       ->  Group  (cost=8.46..8.47 rows=2 width=8) (actual time=0.034..0.034 rows=0 loops=1)
                                             Output: checkin_table.student_id
                                             Group Key: checkin_table.student_id
                                             Buffers: shared hit=6
                                             ->  Sort  (cost=8.46..8.47 rows=2 width=8) (actual time=0.033..0.033 rows=0 loops=1)
                                                   Output: checkin_table.student_id
                                                   Sort Key: checkin_table.student_id
                                                   Sort Method: quicksort  Memory: 25kB
                                                   Buffers: shared hit=6
                                                   ->  Append  (cost=0.00..8.45 rows=2 width=8) (actual time=0.027..0.027 rows=0 loops=1)
                                                         Buffers: shared hit=6
                                                         ->  Seq Scan on public.checkin_table  (cost=0.00..0.00 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=1)
                                                               Output: checkin_table.student_id
                                                               Filter: ((checkin_table.checkin_time > '2019-06-11 01:00:40+00'::timestamp with time zone) AND (checkin_table.checkin_time   Index Scan using checkins_y2019_m6_house_id_timestamp_idx on public.checkins_y2019_m6  (cost=0.43..8.45 rows=1 width=8) (actual time=0.024..0.024 rows=0 loops=1)
                                                               Output: checkins_y2019_m6.student_id
                                                               Index Cond: ((checkins_y2019_m6.house_id = 9001) AND (checkins_y2019_m6.checkin_time > '2019-06-11 01:00:40+00'::timestamp with time zone) AND (checkins_y2019_m6.checkin_time   Index Scan using students_student_id_idx on public.students  (cost=0.43..8.47 rows=1 width=16) (never executed)
                                             Output: students.student_type, students.house_id, students.registration_id, students.registration_status, students.registration_type_status, students.total_non_core_subjects, students.registration_source, students.total_core_subjects, students.registration_type_description, students.non_access_flag, students.address_1, students.address_2, students.city, students.state, students.zipcode, students.registration_created_date, students.registration_activation_date, students.registration_cancellation_request_date, students.registration_termination_date, students.cancellation_reason, students.monthly_dues, students.student_id, students.student_type, students.student_status, students.student_first_name, students.student_last_name, students.email_address, students.student_enrolment_date, students.student_birthdate, students.student_gender, students.insert_time, students.update_time
                                             Index Cond: (students.student_id = checkin_table.student_id)
                                             Filter: ((lower((students.registration_type_description)::text) !~* '.*temporary.*'::text) AND (date_part('year'::text, age((CURRENT_DATE)::timestamp with time zone, (students.student_birthdate)::timestamp with time zone)) >= '18'::double precision))
                                 ->  Nested Loop  (cost=70375.18..74126.43 rows=2 width=14) (never executed)
                                       Output: intervention_table.intervention_type, intervention_table.student_id
                                       Join Filter: (intervention_table.intervention_date = (max(intervention_table_1.intervention_date)))
                                       ->  HashAggregate  (cost=70374.75..70376.75 rows=200 width=8) (never executed)
                                             Output: (max(intervention_table_1.intervention_date))
                                             Group Key: max(intervention_table_1.intervention_date)
                                             ->  HashAggregate  (cost=57759.88..63366.49 rows=560661 width=16) (never executed)
                                                   Output: max(intervention_table_1.intervention_date), intervention_table_1.student_id
                                                   Group Key: intervention_table_1.student_id
                                                   ->  Seq Scan on public.intervention_table intervention_table_1  (cost=0.00..46349.25 rows=2282125 width=16) (never executed)
                                                         Output: intervention_table_1.record_id, intervention_table_1.student_id, intervention_table_1.intervention_type, intervention_table_1.intervention_date, intervention_table_1.house_id, intervention_table_1.teacher_id, intervention_table_1.expiration_date, intervention_table_1.point_at_intervention
                                       ->  Index Scan using intervention_table_student_id_idx on public.intervention_table  (cost=0.43..18.70 rows=4 width=22) (never executed)
                                             Output: intervention_table.record_id, intervention_table.student_id, intervention_table.intervention_type, intervention_table.intervention_date, intervention_table.house_id, intervention_table.teacher_id, intervention_table.expiration_date, intervention_table.point_at_intervention
                                             Index Cond: (checkin_table.student_id = intervention_table.student_id)
                           ->  Index Scan using students_latest_points_idx on public.students_points points  (cost=0.57..257.65 rows=68 width=16) (never executed)
                                 Output: points.record_id, points.student_id, points.registration_id, points.house_id, points.evaluation_date, points.just_pass, points.five_star, points.star1, points.star2, points.star3, points.star4, points.updatedate
                                 Index Cond: (checkin_table.student_id = points.student_id)
         SubPlan 1
           ->  Limit  (cost=0.57..0.61 rows=1 width=4) (never executed)
                 Output: students_points.evaluation_date
                 ->  Index Only Scan Backward using students_points_evaluation_date_idx on public.students_points  (cost=0.57..4984972.45 rows=111195272 width=4) (never executed)
                       Output: students_points.evaluation_date
                       Heap Fetches: 0
 Planning time: 23.993 ms
 Execution time: 17.648 ms
(72 rows)
PS: The names of the table and the attributes are replaced for privacy concerns. At the outset, it seems like I can partition the students_points table by year, but that is not an option that the team is open to for reasons I can't specify and it does not make sense to partition it based on the year, since most of our joins are on student_id and partitioning on student_id would lead to 1M+ partitions. Edited to address Jjanes comment . 1. **checkin_table seems to be empty** - checkin_table is a partitioned table. The query actually hits the partition - checkins_y2019_m6, which actually has data. 1. **What led you to think this query was the culprit?** - When using PGBadger at the peak time, see that 30 out of the 40 DB connections are in the wait state. Looking at the queries of these connections, it's the same query described above - but with different combinations of house_id and checkin_time. Also, from the RDS insights, (image 1 above), if you look at the bottom portion of the screenshot, it has bar graphic, under the **Load By Waits (AAS)** and you can see that 2/3rds of the bar graph is light blue color (IOWait) and 1/3 is Green (CPU) and the corresponding query. Look at the attached pgbadger view (redacted the query details). This query is the most time consuming query. Redacted_PgBadger_Long_Running_Query 1. **pg_stat_statements** - Yes I have had a look at it And this is the top query on total_time desc, which concurs with the PG Badger one. pg_stat_statements 1. **auto_explain** looks doable. Just one question - would it hamper the performance in any way? 1. **Regarding IO Churn and slowest queries** - I agree, but I am hitting dead ends and run out of ideas. I may be misinterpreting things, like you pointed out. I am not looking at all queries writing to temporary files, and that might be hogging the buffers, resulting in an IOWait here. **Edited to add solution:** The backend team re-wrote the sql query and the performance improved. This brought down the query execution time to milliseconds.
gvatreya (201 rep)
Jun 19, 2019, 03:36 PM • Last activity: Mar 12, 2021, 06:54 AM
2 votes
1 answers
2526 views
How to determine right value for innodb_sync_array_size
How do I determine the right value for innodb_sync_array_size in MySQL? We have version 5.7 deployed on AWS RDS with 32 vCPUs, and it sees about 26 concurrent threads on average. When I measure waits using: ``` show engine innodb mutex```, the total number of waits increases by about 143k / hour. Th...
How do I determine the right value for innodb_sync_array_size in MySQL? We have version 5.7 deployed on AWS RDS with 32 vCPUs, and it sees about 26 concurrent threads on average. When I measure waits using:
show engine innodb mutex
, the total number of waits increases by about 143k / hour. The MySQL documentation does say that a higher number will help with high concurrency and wait situation, but I cannot find any documentation on what to set this number and how to determine whether this will indeed help me.
Shahid Thaika (223 rep)
Aug 29, 2020, 02:03 AM • Last activity: Feb 25, 2021, 12:48 AM
3 votes
1 answers
2481 views
High ASYNC_NETWORK_IO when user locks computer with Excel open
I have a user who gets in at 8AM and leaves at 6PM every day, and every day I see the same strange pattern in SolarWinds DPA. It seems that Excel is refreshing the Query Table when the user locks their screen. My questions are: 1. Is there any way to stop Excel from doing that? I can't be the only D...
I have a user who gets in at 8AM and leaves at 6PM every day, and every day I see the same strange pattern in SolarWinds DPA. It seems that Excel is refreshing the Query Table when the user locks their screen. My questions are: 1. Is there any way to stop Excel from doing that? I can't be the only DBA who has fought such problems. 2. Is there any way I can reproduce the problem on my machine? 3. Help finding Google keywords to search for a known issue... I've tried unsuccessfully High Async Network IO from Excel *UPDATE* It occurred to me there could be other factors at play here, so here goes: 1. The file is on a Network-Attached Storage Drive shared among a work group. 2. I looked into various Network Interface Card powersaving feature issues that could be causing Excel to think the file handle is lost (???) 3. "Refresh data when opening the file" is checked under Connection Properties 4. This may be occurring on a Virtual Machine. I have found one other user with a similar problem: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/bcc35121-1575-4a7f-b82f-1c20f6fed58d/process-blocked-by-asyncnetworkio-status-blocks-other-queries?forum=sqldatabaseengine
John Zabroski (485 rep)
Jul 11, 2017, 05:06 PM • Last activity: Nov 24, 2020, 02:16 AM
Showing page 1 of 20 total questions