Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

3 votes

1 answers

103 views

What does it mean if all of my wait_resources in the System Health Extended Event are the same pointer?

sql-server extended-events sql-server-2022 waits latch

It's notoriously impossible to decode [what the wait_resource pointer in the System Health Extended Event means](https://dba.stackexchange.com/questions/320594/identifying-wait-resource-for-wait-info-extended-events). But if all of my long waits in that Extended Event for a certain time period are r...

                                  It's notoriously impossible to decode [what the wait_resource pointer in the System Health Extended Event means](https://dba.stackexchange.com/questions/320594/identifying-wait-resource-for-wait-info-extended-events) . But if all of my long waits in that Extended Event for a certain time period are reporting the same pointer, then does that indicate anything at all?

I'm experiencing high PAGELATCH_SH waits for this time period and expect that the reason why all of pointers are the same will be related to that. I've not included the XML here because I am absolutely certain that it won't help answer the question. I'm just trying to fathom if there is ever a case where I should pay attention to the wait_resource value in this Extended Event when it is obviously a pointer.

J. Mini (1225 rep)

Mar 5, 2025, 06:28 PM • Last activity: Mar 11, 2025, 03:04 AM

1 votes

1 answers

118 views

What is an example to understand that having multiple data files helps with latch pages in parallel?

sql-server filegroups files latch

> You will always benefit from multiple data files, because then SQL > Server is able to latch multiple system pages in parallel in the > Buffer Pool. Link: https://www.sqlpassion.at/archive/2016/08/29/files-and-file-groups-in-sql-server/ I currently have one data file. I'm exploring the use case of...

                                  > You will always benefit from multiple data files, because then SQL
> Server is able to latch multiple system pages in parallel in the
> Buffer Pool.

Link: https://www.sqlpassion.at/archive/2016/08/29/files-and-file-groups-in-sql-server/ 

I currently have one data file. I'm exploring the use case  of adding a new data file, or increasing the size of an existing data file.

Given that my IO subsystem (Data drive) can withstand high IOPS, I understand that there is no performance improvement whether I have one or more data files. And because there is no limit to how many threads can use the data file for read/write in parallel, SQL server must allow parallel latch of pages.

So then, why does the author say that - having multiple data files gives the  Server ability latch multiple system pages in parallel in the Buffer Pool.

Why would this not be possible with single data file?

variable (3590 rep)

Jan 17, 2024, 07:07 PM • Last activity: Feb 1, 2024, 10:32 AM

2 votes

1 answers

13075 views

Non-yielding Scheduler Error with dump caused by heavy queries on SQL 2016

sql-server sql-server-2016 dump timeout latch

After bringing our databases to new servers with new hardware and new os (2012 server) and new SQL Server version (2016 sp1) we face the following issue: During nightly reporting database update runs, the SQL Server service suddenly gets into trouble. Those report DB updates are performed using `mer...

                                  After bringing our databases to new servers with new hardware and new os (2012 server) and new SQL Server version (2016 sp1) we face the following issue:

During nightly reporting database update runs, the SQL Server service suddenly gets into trouble. Those report DB updates are performed using merge statements that collect, update and insert delta within multiple steps. The first step of the job started at 01:00 AM and caused around 500 Mio IO READS, but it succeeded. The second step started at 01:31 am but failed. It's all not highly efficient, I know and I also know there are still lots of indexes that have to be created to make those queries and the whole process more efficient. But this is not the point. Nobody cares if it would take some hours.

The point is that so far on the old SQL Server systems, those same procedures took long, but they never made the SQL Server stumble. 

Now, we observe the following:
The jobs start and run for a while, around 30 minutes and complete some job steps. During 01:35 and 01:55 AM the CPU load gets close to or to 100%. Between 01:55 AM and 02:11 AM the load varies btw 70% an 95%. 

In the SQL logs we see those entries btw 01.35 and 01:53.: 

   

>  **Unable to get thread context for spid 0
>  BEGIN STACK DUMP:

>  05/18/17 01:35:02 spid 5416

>   Non-yielding Scheduler

>   Stack Signature for the dump is 0x0000000000000266

>   External dump process return code 0x20000001.

>   External dump process returned no errors.

>     
>  Process 62:0:7 (0x3ab4) Worker 0x00000066A0FD8160 appears to be non-yielding on Scheduler 0. Thread creation time: 13139533134164.
> Approx Thread CPU Used: kernel 0 ms, user 50156 ms. Process
> Utilization 97%. System Idle 1%. Interval: 73484 ms.
>     
>  Process 62:0:6 (0x3b64) Worker 0x00000066A0FBC160 appears to be non-yielding on Scheduler 1. Thread creation time: 13139503419122.
> Approx Thread CPU Used: kernel 0 ms, user 57343 ms. Process
> Utilization 95%. System Idle 3%. Interval: 78640 ms.
> 
> Process 0:0:0 (0x2464) Worker 0x00000066A0E42160 appears to be
> non-yielding on Scheduler 2. Thread creation time: 13139456436014.
> Approx Thread CPU Used: kernel 15 ms, user 62515 ms. Process
> Utilization 94%. System Idle 4%. Interval: 71437 ms.
> 
> Process 62:0:5 (0x1688) Worker 0x00000066A0E46160 appears to be
> non-yielding on Scheduler 3. Thread creation time: 13139225220768.
> Approx Thread CPU Used: kernel 0 ms, user 53156 ms. Process
> Utilization 95%. System Idle 3%. Interval: 73437 ms.
> 
> SQL Trace ID 2 was started by login "mydomain\mySentryOneServiceUser".
> 
>  I/O is frozen on database SDB. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
>     
> A time-out occurred while waiting for buffer latch -- type 3, bp 000000751B746240, page 4:218376, stat 0x10f, database id: 2,
> allocation unit Id: 6488064, task 0x00000066A0FEE8C8 : 5, waittime 300
> seconds, flags 0x1000000019, owning task 0x00000066A0DF5848. Not
> continuing to wait.

At 01:53 the netapp SnapManager tasks start to fail taking snapshots:

> Error: 3041, Severity: 16, State: 1. BACKUP failed to complete the
> command BACKUP DATABASE DB1. Check the backup application log for
> detailed messages.

    
Between  the same timeslots, Job Agent log shows this:   
01:36 AM - 02:08 AM 

>  SQLServer Error: 65535, SQL Server Network Interfaces: Error
> Locating Server/Instance Specified [xFFFFFFFF]. [SQLSTATE 08001] 

>  ODBC Error: 0, Login timeout expired [SQLSTATE HYT00] 

>  SQLServer Error: 65535, A network-related or instance-specific error has
> occurred while establishing a connection to SQL Server. Server is not
> found or not accessible. Check if instance name is correct and if SQL
> Server is configured to allow remote connections. For more information
> see SQL Server Books Online. [SQLSTATE 08001] 

>   Logon to server 'xxx\xxx' failed (ConnUpdateStartExecutionDate)

Most of them repeat multiple times per minute.

This is the CPU spec: INTEL XEON E5-3690 v3 30M 4x 2,6GHz 
4 virtual CPUs
Software: Microsoft SQL Server Enterprise (64-bit) 13.0.4001.0
Instance Settings: All settings are default, MAXDOP = 4, CTFP = 50, system memory=64 GB, SQL MAX_MEMORY = 58982

CONTENT OF DUMP.txt:

    Current time is 01:35:02 05/18/17.                                                                               
    
    BugCheck Dump                                                                                             
                                                                                                                     
    This file is generated by Microsoft SQL Server                                                                   
    version 13.0.4001.0                                                                                              
    upon detection of fatal unexpected error. Please return this file,                                               
    the query or program that produced the bugcheck, the database and                                                
    the error log, and any other pertinent information with a Service Request.                                       
                                                                                                                     
                                                                                                                     
    Computer type is Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz.                                                      
    Bios Version is INTEL  - 6040000                                                                                 
    PhoenixBIOS 4.0 Release 6.0                                                                                      
    4 X64 level 8664, 2 Mhz processor (s).                                                                           
    Windows NT 6.2 Build 9200 CSD .                                                                                  
                                         
    Memory                               
    MemoryLoad = 94%                     
    Total Physical = 65535 MB            
    Available Physical = 3317 MB         
    Total Page File = 73535 MB           
    Available Page File = 10601 MB       
    Total Virtual = 134217727 MB         
    Available Virtual = 134041378 MB     
    ***Unable to get thread context for spid 0                                                                       
    * *******************************************************************************                                
    *                                                                                                                
    * BEGIN STACK DUMP:                                                                                              
    *   05/18/17 01:35:02 spid 5416                                                                                  
    *                                                                                                                
    * Non-yielding Scheduler                                                                                         
    *                                                                                                                
    * *******************************************************************************                                


What can I do to prevent the system from running into trouble and getting all those error log entries? Is there configuration to check?








                                

Magier (4827 rep)

May 18, 2017, 09:55 AM • Last activity: Mar 3, 2023, 01:02 PM

5 votes

1 answers

454 views

NOLOCK or Read Uncommitted locking / latching behaviours

sql-server locking isolation-level latch

I've read many, many different takes on how NOLOCK or the read uncommitted isolation levels function in terms of locks / latches taken. When using a SELECT with NOLOCK or in the Read Uncommitted Isolation level, is the only lock taken out a Schema Stability Lock, or do shared locks get taken on a ro...

                                  I've read many, many different takes on how NOLOCK or the read uncommitted isolation levels function in terms of locks / latches taken.

When using a SELECT with NOLOCK or in the Read Uncommitted Isolation level, is the only lock taken out a Schema Stability Lock, or do shared locks get taken on a rolling basis as the query works through the rows? (Obviously these locks would need to be dropped straight away) 

What about latches? How are the pages in memory dealt with as I assume referencing an in-memory object that's in the middle of being modified isn't allowed?

George.Palacios (5591 rep)

Jul 5, 2019, 12:30 PM • Last activity: Feb 12, 2022, 05:38 PM

6 votes

1 answers

4493 views

DBCC CHECKDB on a user database: Time-out occurred while waiting for buffer latch type 2 for page (X:XXX), database ID 2

sql-server tempdb dbcc-checkdb timeout latch

Our Ola Hallengren IntegrityCheck job failed due to a buffer latch timeout while running DBCC CHECKDB on a user database. However, the buffer latch timeout reported was in TempDB (database ID 2). Output from the job: ``` Date and time: 2022-01-22 09:04:15 [SQLSTATE 01000] Database context: [master]...

Date and time: 2022-01-22 09:04:15 [SQLSTATE 01000]
Database context: [master] [SQLSTATE 01000]
Command: SET LOCK_TIMEOUT 600000; DBCC CHECKDB ([SentryOne]) WITH NO_INFOMSGS, ALL_ERRORMSGS, MAXDOP = 4 [SQLSTATE 01000]
Msg 845, Sev 17, State 1, Line 1 : Time-out occurred while waiting for buffer latch type 2 for page (6:222), database ID 2. [SQLSTATE 42000]
Outcome: Failed [SQLSTATE 01000]
Duration: 12:40:32 [SQLSTATE 01000]
Date and time: 2022-01-22 21:44:47 [SQLSTATE 01000]

Messages in the SQL Error Log:

Date 1/22/2022 9:35:22 PM
Log SQL Server (Archive #1 - 1/23/2022 12:00:00 AM)

Source spid777

Message
A time-out occurred while waiting for buffer latch -- type 2, bp 0000016D7A1DE340, page 6:222, stat 0x40d, database id: 2, allocation unit Id: 536870912/281475513581568, task 0x0000016CD624E4E8 : 2, waittime 300 seconds, flags 0x1a, owning task 0x0000016CD624E4E8. Not continuing to wait.

BobMgr::GetBuf: Sort Big Output Buffer write not complete after 60 seconds.

DBCC CHECKDB (SentryOne) WITH all_errormsgs, no_infomsgs, maxdop = 4 executed by NT SERVICE\SQLSERVERAGENT terminated abnormally due to error state 6. Elapsed time: 12 hours 39 minutes 57 seconds.

[INFO] Database ID: . Cleaning up StorageArray. LastClosedCheckpointEndTs: '1225176490'

[WARNING] ALTER or DROP TABLE could not clean up root row within 10 seconds.

The page in question is an IAM page in TempDB. DBCC PAGE output for it:

PAGE: (6:222)

BUFFER:

BUF @0x0000016D6B622C40

bpage = 0x00000161A6C52000 bPmmpage = 0x0000000000000000 bsort_r_nextbP = 0x0000016D6B622AD0
bsort_r_prevbP = 0x0000016D6B622B80 bhash = 0x0000000000000000 bpageno = (6:222)
bpart = 4 ckptGen = 0x0000000000000000 bDirtyRefCount = 0
bstat = 0x9 breferences = 3 berrcode = 0
bUse1 = 19321 bstat2 = 0x0 blog = 0x215a215a
bsampleCount = 1 bIoCount = 0 resPoolId = 0
bcputicks = 330 bReadMicroSec = 848 bDirtyContext = 0x0000000000000000
bDbPageBroker = 0x0000000000000000 bdbid = 2 bpru = 0x0000016D3A170040

PAGE HEADER:

Page @0x00000161A6C52000

m_pageId = (6:222) m_headerVersion = 1 m_type = 10
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 1837007664 m_indexId (AllocUnitId.idInd) = 28675
Metadata: AllocUnitId = 8071415347312328704
Metadata: PartitionId = 2162691495132069888 Metadata: IndexId = 0
Metadata: ObjectId = -1083263203 m_prevPage = (0:0) m_nextPage = (0:0)
pminlen = 90 m_slotCnt = 2 m_freeCnt = 6
m_freeData = 8182 m_reservedCnt = 0 m_lsn = (324:1435184:42)
m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0
m_tornBits = -551517712 DB Frag ID = 1

Allocation Status

GAM (6:2) = ALLOCATED SGAM (6:3) = ALLOCATED
PFS (6:1) = 0x70 IAM_PG MIXED_EXT ALLOCATED 0_PCT_FULL DIFF (6:6) = NOT CHANGED
ML (6:7) = NOT MIN_LOGGED

IAM: Header @0x00000036C4278064 Slot 0, Offset 96

sequenceNumber = 0 status = 0x0 objectId = 0
indexId = 0 page_count = 0 start_pg = (8:511232)

IAM: Single Page Allocations @0x00000036C427808E

Slot 0 = (0:0) Slot 1 = (0:0) Slot 2 = (0:0)
Slot 3 = (0:0) Slot 4 = (0:0) Slot 5 = (0:0)
Slot 6 = (0:0) Slot 7 = (0:0)

IAM: Extent Alloc Status Slot 1 @0x00000036C42780C2

(8:511232) - (8:592656) = NOT ALLOCATED
(8:592664) - = ALLOCATED
(8:592672) - (8:770424) = NOT ALLOCATED

I found this DBA StackExchange question which seems to be most closely relevant: https://dba.stackexchange.com/questions/117864/getting-error-time-out-occurred-while-waiting-for-buffer-latch-type-2-for-page However, in that scenario, the timed-out page reported is in a user database and not TempDB. Also, we're on the latest SQL Server version (2019-CU14 Enterprise) so this should not be happening anyway. I did notice that there was very high latency on TempDB data files around 6 PM and 9:40 PM. I'm unsure whether it's related, though, because the latency spike at 6 PM is higher and yet the job failed only around 9:44 PM.

These latency spikes do not correlate with any spikes in the Azure Monitor metrics VM Cached IOPS Consumed Percentage and VM Cached Bandwidth Consumed Percentage. Additional information: - The server is installed on an Azure VM: Standard E8s v3 (8 vcpus, 64 GiB memory) - The user database in question (SentryOne) has memory-optimized tables. - TempDB Memory-Optimized metadata is DISABLED. - Instant File Initialization is ENABLED. - Lock Pages in Memory is DISABLED. Any ideas why this happened and how to prevent it in the future? Thanks!

Eitan Blumin (483 rep)

Jan 23, 2022, 11:44 AM • Last activity: Jan 23, 2022, 03:51 PM

2 votes

1 answers

297 views

LATCH_EX and CX_PACKETS for a stored proc seen when timing out

sql-server performance sql-server-2014 latch

I am trying to troubleshoot an issue with below things noticed:- There is this stored proc which gets timed out on some random days while other days or after retry may complete faster. Time out setting 1 hour. When completes, it does under 15 mins. What we have noticed below for this DB server- - Th...

                                  I am trying to troubleshoot an issue with below things noticed:-

There is this stored proc which gets timed out on some random days while other days or after retry may complete faster. Time out setting 1 hour. When completes, it does under 15 mins.

What we have noticed below for this DB server-

 - The database in question is a LS standby readonly database which
   keeps getting restored every 4 hours from Primary.
   
 - Therefore the SP resides on secondary LS read only database and user 
   query once its accessible.
          
 - After restore is completed certain mornings we see this SP gets timed
   out with below as TOP 2 waits
 
       LATCH_EX and CXPACKETS    (wait time  
       showing accumulating upto 8 hours for that 1 hour run,    possibly   
       because MAXDOP is 8)

However when we retry the same proc after next cycle of restore of database, same SP just completes under 15 minutes.

When it Completes-

 - We do see CXPACKETS but that wait time is very less
    
 - We do not see any LATCH_EX waits

What possibly is going on here and how can i troubleshoot this further? We have already shared the code level tuning which can be done but DEVs are more interested why same SP gets completed in under 15 mins sometime and fails may be one random day in a week

Server level settings:

> MAXDOP=8 CTOP=5

Edit- We found within SP the particular statement which tends to be the slowest amongst all and spending 90% of that 1 hour time is insert into # (temptable) after selecting data from multiple tables via job going parallel.

 - When the issue is there above statement also waits on LATCH_EX in
   addition to CXPACKETS
   
 - When running faster CXPACKETS are there but no LATCH_EX waits

Newbie-DBA (804 rep)

Aug 10, 2021, 04:25 PM • Last activity: Aug 16, 2021, 03:15 PM

2 votes

1 answers

1810 views

SQL Server - High buffer time

sql-server waits latch

I'm noticing a high number of waitingRequestsCount on SQL server. The latch_class is buffer. See attached screenshots. Going through SQL server dashboard, I see high number (32221334) waits for Buffer Pool. Can someone please explain what could contribute to high number of waits?[![enter image descr...

                                  I'm noticing a high number of waitingRequestsCount on SQL server. The latch_class is buffer. See attached screenshots. Going through SQL server dashboard, I see high number (32221334) waits for Buffer Pool. Can someone please explain what could contribute to high number of waits?
                                

Adit Bhatia (21 rep)

Oct 15, 2019, 03:13 PM • Last activity: Feb 20, 2020, 11:00 PM

2 votes

1 answers

1800 views

Differences of latch and lock

sql-server locking latch

Recently when i was trying to find information about latch_ex wait type, i come across one blog as below which stated about latch and lock. [![enter image description here][1]][1] [1]: https://i.sstatic.net/2DdrR.png After reading this blog , i m just curious in one thing. When app submit a request,...

                                  Recently when i was trying to find information about latch_ex wait type, i come across one blog as below which stated about latch and lock.

After reading this blog , i m just curious in one thing. When app submit a request, SQL server will look for the information in buffer cache first and if the page not in buffer cache ,then only it will read from disk and put it in buffer cache before send info to app. My question based on the screenshot as above where it stated latch and lock need to avoid two thread updating same page. Basically all request come to SQL server will go to buffer cache first, if the page in buffer cache is busy updating, another thread will have to wait. It wont go back to disk because the page already in memory. Then what is the purpose of lock since every request will be done through memory and there is latch to protect the page

jack (79 rep)

Nov 9, 2019, 04:29 PM • Last activity: Nov 9, 2019, 07:00 PM

0 votes

1 answers

50 views

Wait Stats on Microsoft Azure test

sql-server wait-types latch

On a particular test we're doing with Microsoft Azure, we are seeing a wait_category called: PAGE LATCH (non-I/O) that we're unfamiliar with. In our test, CaptureSeconds = 1595 and for PAGE LATCH (non-I/O) it reports: wait_time_ms = 126666216 wait_time_ms_per_sec = 79364 max_waiting_tasks = 13086110...

                                  On a particular test we're doing with Microsoft Azure, we are seeing a wait_category called: PAGE LATCH (non-I/O) that we're unfamiliar with.

In our test, CaptureSeconds = 1595 and for PAGE LATCH (non-I/O) it reports:

wait_time_ms = 126666216

wait_time_ms_per_sec = 79364

max_waiting_tasks = 13086110

average_wait_time_ms = 9

max_Wait_time_ms = 2526

Can someone tell me what PAGE LATCH (non-I/O) wait category is and whether this is indicating a problem of some sort?

Thanks,

RW

Robert West (21 rep)

Oct 3, 2018, 03:06 PM • Last activity: Oct 3, 2018, 03:24 PM

0 votes

1 answers

2641 views

ACCESS_METHODS_DATASET_PARENT Problem

sql-server availability-groups high-availability wait-types latch

We had an AlwaysOn Async replica we had some problems so we deleted the availability group After that the database and CPU seem to work abnormally It's very slow and we have waittype = LATCH_EX , ACCESS_METHODS_DATASET_PARENT We didn't have such a problem till today Can anybody help ?????

                                  We had an AlwaysOn Async replica
we had some problems so we deleted the availability group
After that the database and CPU seem to work abnormally
It's very slow and we have waittype = LATCH_EX , ACCESS_METHODS_DATASET_PARENT
We didn't have such a problem till today
Can anybody help ?????


                                

Milad Firouzi (139 rep)

Jul 19, 2016, 11:42 AM • Last activity: Jul 15, 2018, 10:06 AM

5 votes

1 answers

678 views

Database ID of a non-existent database - time-out waiting for buffer latch on page appearing in error log

sql-server sql-server-2008-r2 snapshot dbcc-checkdb latch

**Background:** I've recently started at a new shop with lots of databases in various states. I arrived this morning (having just been added to the SQL DBA mailing group) to find an email about the CHECKDB job failing last night on one of the servers (one I've been warned has been problematic). I wo...

                                  **Background:**
I've recently started at a new shop with lots of databases in various states. I arrived this morning (having just been added to the SQL DBA mailing group) to find an email about the CHECKDB job failing last night on one of the servers (one I've been warned has been problematic).

I won't dwell on the details of this here, it appears to be down to a number of time-outs waiting for buffer latches - by itself it's hard to tell why this is, as there should've been (and appears to have been) very little system usage last night when this was run.

There are ongoing problems with I/O requests longer than 15 seconds in the logs, and the vmware guys say that there have been communication problems with storage (on-going network issues), so this goes someway to explain the problem I think. 

**Important Note:** I reran CHECKDB this morning and it completely quickly, finding 0 errors and as far as I'm aware nothing has happened since the failure last night, so (correct me if I'm wrong), I'm pretty confident there is no database problems, and last night was an anomaly, possibly triggered by network or storage problems (I'll know more about this when the network guys get back to me... if they do). 

**The actual point of this:**
In the logs, right at the point where last night's CHECKDB job failed, there is a time-out waiting for a buffer latch - the strange thing to me is that the database id of the page it's supposedly trying to latch is 11, and there is no database with that id on the instance - ids only go up to 10.

This is followed up by the CHECKDB line with 1 errors found, 0 repaired, pointing to an Internal Database snapshot having a split point (see below for error log)

**Specific Questions:**

1. Why would there be a reference to a database ID that doesn't exist (do Internal database snapshots get assigned a database ID at some point and that's what it was referring to)?
2. Is there anyway of finding out what the database snapshot in question is (is it something to do with the way CHECKDB works, or is it something else)?
3. Is there a way of finding what happened to the snapshot in question (as it obviously wasn't a problem when I ran CHECKDB again this morning).



**Error log:**

01/04/2017 23:50	spid101
Unknown	DBCC CHECKDB (zenworks_UAL) WITH no_infomsgs executed by ARTSLOCAL\svc_zcmsql found 1 errors and repaired 0 errors. Elapsed time: 0 hours 28 minutes 51 seconds.  Internal database snapshot has split point LSN = 00adf51f:00019bf4:0001 and first LSN = 00adf51f:00019bf3:0001.

01/04/2017 23:50	spid101	
Unknown	Unable to read and latch page (1:2067184) with latch type SH. Latch failed.

01/04/2017 23:50	spid101	
Unknown	Error: 8966 Severity: 16 State: 2.

01/04/2017 23:50	spid101	
Unknown	A time-out occurred while waiting for buffer latch -- type 2 bp 00000004FFFF4A80 page 1:2067184 stat 0xc2040d **database id: 11** allocation unit Id: 0/281474976710656 task 0x000000015BA6A088 : 0 waittime 300 flags 0x100000001a owning task 0x000000015BA6A088. Not continuing to wait.





**Note:**
*As far as a know, no one else was working the database or running anything significant between the failure last night and me running CHECKDB this morning. The system was in use, but not significantly. Nothing is showing in the logs.*

*To confirm, there is no DB of ID 11, and I don't think there ever was (and as certain as I can be not between last night and this morning*

Query:

    SELECT db_name(11)
Results:

    NULL

Query:

    SELECT * FROM sys.databases
Results:

    ╔════════════════════╦═════════════╗
    ║        name        ║ database_id ║
    ╠════════════════════╬═════════════╣
    ║ master             ║           1 ║
    ║ tempdb             ║           2 ║
    ║ model              ║           3 ║
    ║ msdb               ║           4 ║
    ║ ReportServer       ║           5 ║
    ║ ReportServerTempDB ║           6 ║
    ║ z_xxxxxx_L         ║           7 ║
    ║ z_xxxxxx_p         ║           8 ║
    ║ S_xxxxxx_t         ║           9 ║
    ║ z_xxxxxx_t         ║          10 ║
    ╚════════════════════╩═════════════╝



                                

Ian_H (1674 rep)

Jan 5, 2017, 03:28 PM • Last activity: Mar 29, 2018, 08:18 PM

5 votes

3 answers

17253 views

How to reduce huge CXPACKET & LATCH_EX (ACCESS_METHODS_DATASET_PARENT) wait times?

sql-server-2008-r2 parallelism timeout maxdop latch

**Problem** We have been experiencing high levels of user disruption due to SQL timeouts accross our systems since the beginning of the year. The SQL-Server instance in question has very high CPU usage (higher than 90% on all 16 cores all the time) during business hours. We have also noticed very hi...

                                  **Problem**

We have been experiencing high levels of user disruption due to SQL timeouts accross our systems since the beginning of the year.

The SQL-Server instance in question has very high CPU usage (higher than 90% on all 16 cores all the time) during business hours.

We have also noticed very high wait times: The combination of CXPACKET & LATCH_EX accounts for about 97% of all waits. This is split about 50/50 between CXPACKET & LATCH_EX.

The non-buffered latch wait accounting for the vast majority (>95%) of LATCH_EX is ACCESS_METHODS_DATASET_PARENT.

This suggests the problem is to do with parallelism.

An example of the scale of wait times is:

    CXPACKET : 332,301,799 ms
    LATCH_EX : 267,955,752 ms
    PAGEIOLATCH_SH : 2,955,160 ms

This was for the period between 08:00-16:24 on Jan 11th.

**Options under consideration**

1) Change MAXDOP from 0 to something between 4 and 8

2) Modify the cost threshold of parallelism from 50 to a higher number

Suggestions most welcome on how to ease the very high CPU load we are seeing, and reduce timeouts, in particular whether the proposed course of action is wise, and which numbers to change MAXDOP and cost threshold of parallellism to.

**Background information**

 - SQL-Server 2008 R2 running on AMD Opteron 6180 SE, of which 16 cores are given to this instance of SQL-Server.

 - Type of workload: something of the order of 800 connections at the same time during business hours; majority OLTP type workload with some OLAP mixed in.

 - Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64) ... Enterprise Edition (64-bit) on Windows NT 6.1  (Build 7601: Service Pack 1). Memory is appx 128 Gigs between 24 cores. 16 of the cores are available to this instance

matskm (53 rep)

Jan 13, 2016, 10:59 AM • Last activity: Jun 1, 2017, 10:14 AM

6 votes

1 answers

96 views

SELECT xyz INTO #Temp from SSMS is 5x longer in one environment than another

sql-server performance latch

First time posting, I hope I'm doing it right. I'm a "dba by accident" who is learning a lot but admittedly still has a lot more to learn. Here is my brain dump: Prod server has 5 million rows. I copied those rows to a Test server. Using SNAPSHOT or READ UNCOMMITTED produce different elapsed times,...

                                  First time posting, I hope I'm doing it right.

I'm a "dba by accident" who is learning a lot but admittedly still has a lot more to learn. Here is my brain dump:

Prod server has 5 million rows. I copied those rows to a Test server.
Using SNAPSHOT or READ UNCOMMITTED produce different elapsed times, but the ratio between Prod and Test is always ~5:1 (14 minutes:3 minutes with SNAPSHOT, 4 minutes:sub-1 with READ UNCOMMITTED)
Using STATISTICS IO ON reveals that most of the reads on Prod are logical (~90%) while none of the reads on Test are logical. I think that should mean Prod would be faster if anything.
Most of the time appears to be going to waits of type "PAGEIOLATCH_SH."
I don't think different statistics can matter, since no index should be used on an unfiltered, unjoined, unsorted query.
SELECTing the top n rows on both servers reveals a very linear scale... the ratio is always ~5:1.
The comparison is not apples-to-apples, but I think the only relevant differences are activity and storage.
I have performed the comparison when activity was next to nothing, and saw the same results.
I discovered that autogrowth was set to 1MB (ugh), which I have since changed but I wonder if this has led to fragmented storage at the file system level.
The database is on a separate volume from the Windows installation, but it does share a volume with tempdb and with a FILESTREAM store, which I assume could also lead to fragmented storage.
The "optimize drives" windows utility says the volume is "OK (98% space efficiency)" which seems to suggest that maybe file system fragmentation isn't a problem.


Questions:
Are my thoughts and assumptions above correct?
Is there anything I'm blatantly overlooking in my thought process?
What are good candidates for the next steps in this troubleshooting process?
What other information can/should I have provided?


Thank you for your attention and help!

EDIT: Added rest of time information in 2nd statement.
                                

Kyle Chandler (63 rep)

Dec 29, 2015, 10:45 PM • Last activity: Dec 30, 2015, 05:45 AM

2 votes

1 answers

221 views

Non Buffer Latch - Forefront Identity Manager

sql-server-2012 wait-types latch

We are using Forefront Identity Manger and we are doing an initial load to which we are experiencing slow performance. I always start with wait types for such things and can see `%30 CXPACKET, %25 LATCH_EX and %15 ASYNC_NETWORK_IO.` I understand that `LATCH_EX` is a non buffer wait type so I looked...

                                  We are using Forefront Identity Manger and we are doing an initial load to which we are experiencing slow performance.

I always start with wait types for such things and can see %30 CXPACKET, %25 LATCH_EX and %15 ASYNC_NETWORK_IO.

I understand that LATCH_EX is a non buffer wait type so I looked in sys.dm_os_latch_stats and can see that %87 of the latch waits is down to ACCESS_METHODS_DATASET_PARENT.

Is this a problem?  Even on Paul Randals blog I am unclear what my issue is if at all.

Thanks

Tom (1569 rep)

Aug 14, 2015, 03:04 PM • Last activity: Aug 14, 2015, 04:12 PM

Showing page 1 of 14 total questions