Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

2 votes
1 answers
29 views
SQL Server Estimates don't use AVG_RANGE_ROWS for Uniqueidentifer Parameter
I'm trying to debug a very weird query row estimation. The query is very simple. I have a table `OrderItems` that contains for each Order (column `OrderId`) the items of the order. ```sql SELECT count(*) FROM orders.OrderItem WHERE OrderId = '5a7e53c4-fc70-f011-8dca-000d3a3aa5e1' ``` According to th...
I'm trying to debug a very weird query row estimation. The query is very simple. I have a table OrderItems that contains for each Order (column OrderId) the items of the order.
SELECT count(*)
FROM orders.OrderItem 
WHERE OrderId = '5a7e53c4-fc70-f011-8dca-000d3a3aa5e1'
According to the statistics from IX_OrderItem_FK_OrderId (that's just a normal unfiltered foreign key index CREATE INDEX IX_OrderItem_FK_OrderId on orders.OrderId(OrderId), the density is 1.2620972E-06 with 7423048 rows, so about ~9.3 items per order (if we ignore the items with OrderId = NULL, if we include them there are even less). The statistics are created with FULLSCAN, and are only slightly out of date (around ~0.2% new rows since the last recompute). | Name | Updated | Rows | Rows Sampled | Steps | Density | Average key length | String Index | Filter Expression | Unfiltered Rows | Persisted Sample Percent | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | IX_OrderItem_FK_OrderId | Aug 3 2025 4:36PM | 7423048 | 7423048 | 198 | 0.1649756 |26.443027 | "NO " | NULL | 7423048 | 100 | | All density | Average Length | Columns | | --- | --- | --- | | 1.2620972E-06 | 10.443027 | OrderId | | 1.3471555E-07 | 26.443027 | OrderId, Id | The query plan however expects, that the query returns 205.496 items. And in reality there are actually 0 results - because the orderId doesn't exist. Detailed Query Plan: https://www.brentozar.com/pastetheplan/?id=hVKYNLmXSU It probably uses the histogram for coming up with the estimate. It should fall into following bucket with RANGE_HI_KEY = 'a39932d8-aa2c-f011-8b3d-000d3a440098'. But that estimate should then be 6.87 according to the AVG_RANGE_ROWS. It somehow looks like it uses the EQ_ROWS from the previous bucket (but 205 might also just be by accident). | RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS | | --- | --- | --- | --- | --- | --- | | 9d2e2bea-aa6e-f011-8dca-000d3a3aa5e1 | 12889 | 205 | 2412 | 5.343698 | | a39932d8-aa2c-f011-8b3d-000d3a440098 | 21923 | 107 | 3191 | 6.8702602 | OPTION(RECOMPILE) does not help. Can somebody explain how SQL Server (in particularly Azure SQL) is coming up with that number? - Does it really think that the parameter is close enough to the bucket start, and just takes the EQ_ROWS value even though the AVG_RANGE_ROWS is a lot smaller? - Does it not understand the parameter because it's defined as VARCHAR? If I replace it with DECLARE @OrderId UNIQUEIDENTIFIER = '5a7e...'; WHERE OrderId = @OrderId the estimate is down to 6. But if that's the reason, from where is the estimate 205?
Jakube (121 rep)
Aug 5, 2025, 04:53 PM • Last activity: Aug 6, 2025, 04:39 PM
1 votes
1 answers
48 views
Postgres query planner join selectivity greater than 1?
I am using PostgreSQL 14.17. I am trying to debug a query planner failure in a bigger query, but I think I've narrowed down the problem to a self-join on a join table: ```sql SELECT t2.item_id FROM item_sessions t1 JOIN item_sessions t2 ON t1.session_key = t2.session_key WHERE t1.item_id = 'xxxxxxxx...
I am using PostgreSQL 14.17. I am trying to debug a query planner failure in a bigger query, but I think I've narrowed down the problem to a self-join on a join table:
SELECT t2.item_id
  FROM item_sessions t1
  JOIN item_sessions t2
       ON t1.session_key = t2.session_key
 WHERE t1.item_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
After running ANALYZE on the table, EXPLAIN gives this plan (which matches the subplan in the larger query):
Nested Loop  (cost=1.12..119.60 rows=7398 width=16)
   ->  Index Only Scan using item_sessions_item_id_session_key_uniq on item_sessions t1  (cost=0.56..8.58 rows=1 width=33)
         Index Cond: (item_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'::uuid)
   ->  Index Only Scan using item_sessions_session_idx on item_sessions t2  (cost=0.56..110.11 rows=91 width=49)
         Index Cond: (session_key = (t1.session_key)::text)
**Why is the loop estimating 7398 rows when the two child nodes estimate 1 and 91 respectively?** I would have expected the loop total to be less than 1 * 91 FWIW, the child estimates seem correct. item_id has n_distinct at -0.77649677, so the expected row count is 1.3, and session_key has n_distinct at 149555 out of an estimated 1.36e+07 tuples, which gives 90.9 expected tuples per session_key. The indexes referenced in the plan are: - item_sessions_session_idx btree (session_key, item_id) - item_sessions_item_id_session_key_uniq UNIQUE CONSTRAINT, btree (item_id, session_key) ETA: I created a minimal reproduction [here](https://github.com/felipeochoa/pg-plan-selectivity-gt1) . The failure is visible [in the job logs](https://github.com/felipeochoa/pg-plan-selectivity-gt1/actions/runs/16359411460/job/46224463766) on 17.5, 16.9, and 15.13
Felipe (317 rep)
Jul 17, 2025, 04:41 AM • Last activity: Jul 18, 2025, 09:14 AM
2 votes
1 answers
91 views
Why does FORCE_LEGACY_CARDINALITY_ESTIMATION not match ('ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES', 'ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS')?
Assume the StackOverflow2010 database under SQL Server 2022 and compatibility level 160. Consider the following two queries: ```sql SELECT COUNT_BIG(*) AS records FROM dbo.Users AS u JOIN dbo.Posts AS p ON (p.OwnerUserId = u.Id AND p.LastEditorUserId = u.Id) WHERE u.DownVotes > 3 AND u.UpVotes > 1 O...
Assume the StackOverflow2010 database under SQL Server 2022 and compatibility level 160. Consider the following two queries:
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p 
    ON (p.OwnerUserId = u.Id
        AND p.LastEditorUserId = u.Id)
WHERE
    u.DownVotes > 3 AND u.UpVotes > 1
OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p 
    ON (p.OwnerUserId = u.Id
        AND p.LastEditorUserId = u.Id)
WHERE
    u.DownVotes > 3 AND u.UpVotes > 1
OPTION(USE HINT('ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES', 'ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS'));
On my machine, I get the same estimated number of rows from the scan of the Users table (23,277.1) and the Posts table (372,920). However, the joins get different estimates. The legacy version estimates 178,865 and the double-hinted version estimates 372,920. legacy join doubled hint join Why is this? I know that the legacy cardinality estimator used simple containment, so I presumed that OPTION(USE HINT('ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES', 'ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS')); and OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION')); would produce identical plans. It is the first time that I've run either of these queries, so I presume that there is no intelligent optimization occurring in the background.
J. Mini (1225 rep)
Jun 8, 2025, 07:41 PM • Last activity: Jun 10, 2025, 10:36 AM
14 votes
1 answers
2948 views
How does SQL Server's optimizer estimate the number of rows in a joined table?
I am running this query in the [AdventureWorks2012][1] database: SELECT s.SalesOrderID, d.CarrierTrackingNumber, d.ProductID, d.OrderQty FROM Sales.SalesOrderHeader s JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID WHERE s.CustomerID = 11077 If I look at the estimated execution plan...
I am running this query in the AdventureWorks2012 database: SELECT s.SalesOrderID, d.CarrierTrackingNumber, d.ProductID, d.OrderQty FROM Sales.SalesOrderHeader s JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID WHERE s.CustomerID = 11077 If I look at the estimated execution plan, I see the following: enter image description here The initial index seek (top right) is using the IX_SalesOrderHeader_CustomerID index and searching on the literal 11077. It has an estimate of 2.6192 rows. enter image description here If I use DBCC SHOW_STATISTICS ('Sales.SalesOrderHeader', 'IX_SalesOrderHeader_CustomerID') WITH HISTOGRAM, it shows that the value 11077 is between the two sampled keys 11019 and 11091. enter image description here The average number of distinct rows between 11019 and 11091 is 2.619718, or rounded to 2.61972 which is the value of estimated rows shown for the index seek. The part I don't understand is the estimated number of rows for the clustered index seek against the SalesOrderDetail table. enter image description here If I run DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail', 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID'): enter image description here So the density of the SalesOrderID (which I am joining on) is 3.178134E-05. That means that 1/3.178134E-05 (31465) equals the number of unique SalesOrderID values in the SalesOrderDetail table. If there are 31465 unique SalesOrderID's in the SalesOrderDetail, then with an even distribution, the average number of rows per SalesOrderID is 121317 (total number of rows) divided by 31465. The average is 3.85561 So if the estimated number of rows to be loop through is 2.61972, and the average to be returned in 3.85561, the I would think the estimated number of rows would be 2.61972 * 3.85561 = 10.10062. But the estimated number of rows is 11.4867. I think my understanding of the second estimate is incorrect and the differing numbers seems to indicate that. What am I missing?
8kb (2639 rep)
Apr 2, 2015, 04:46 PM • Last activity: Jun 10, 2025, 08:26 AM
2 votes
1 answers
160 views
SQL Server Underestimating cardinality on a Filter operator in a Left Anti-Join
I am tuning a query which is slow, I have narrowed the root of the problem to be at the very beginning of the execution plan, where SQL Server makes a bad estimate on a WHERE IS NULL filter that supports a left anti-join - SQL Server estimates 1 row and favours some index scans through a nested loop...
I am tuning a query which is slow, I have narrowed the root of the problem to be at the very beginning of the execution plan, where SQL Server makes a bad estimate on a WHERE IS NULL filter that supports a left anti-join - SQL Server estimates 1 row and favours some index scans through a nested loop, thinking it will only execute them once, when in fact it happens several thousand times: enter image description here I've managed to create an MCVE to replicate the problem. Set up the test environment
/* INSERT 35000 dinstinct random numbers into a table */
CREATE TABLE #TableA
(
	ID BIGINT NULL
)

INSERT INTO #TableA
SELECT	DISTINCT
		TOP 35000
		a.Random
FROM	(
			SELECT	TOP 50000
					ABS(CHECKSUM(NewId())) % 20000000 AS Random
			FROM	sys.messages
		) a
GO

/* add a further 15000 that already exist in the table. Use a loop to increase the possibility of duplicates */
INSERT INTO #TableA
SELECT	TOP	1000 
		ID
FROM	#TableA a
ORDER BY NEWID()
GO 15


/* Insert 10000 numbers into another table, that are in the first table  */
CREATE TABLE #TableB
(
	ID BIGINT NOT NULL
)

INSERT INTO #TableB
SELECT	TOP 10000
		*
FROM	#TableA

/* insert 80000 distinct random numbers that are not in the first table */
INSERT INTO #TableB
SELECT	DISTINCT
		TOP 80000
		a.Random
FROM	(
			SELECT	TOP 100000
					ABS(CHECKSUM(NewId())) % 2000000 AS Random
			FROM	sys.messages
		) a
		LEFT JOIN #TableA b
			ON a.Random = b.ID
WHERE	b.ID IS NULL
Then, the query which suffers the problem is
SELECT	a.ID
FROM	#TableA a
		LEFT JOIN #TableB b
			ON a.ID = b.ID
WHERE	b.ID IS NULL
Which is a fairly simple "show me all the IDs in TableA that are not in TableB" The execution plan from my test environment is here We can see a very similar thing happening to as we see in above plan, in terms of the filter operator - SQL Server joins the two tables together and then filters down to those records that are in the left table but not the right table and it massively underestimates the number of rows that match that predicate If I force legacy estimation, I get a much better estimate on the operator I believe one of the key differences between the old estimator and new estimators is how they differ on their assumption of the correlation between two predicates - the old one assumes there is little correlation between two predicates whereas the new estimator is more optimistic and assumes a higher correlation? My questions are - What causes this underestimation on the newer cardinality estimator? - Is there a way to fix it other than forcing the older compatibility model?
SE1986 (2182 rep)
May 30, 2024, 01:45 PM • Last activity: Jun 9, 2025, 11:51 PM
4 votes
1 answers
448 views
Index scan when more than 35 correlated subqueries are used with default cardinality estimation
Recently, we updated the compatibility level of our SQL Server from 2012 to 2016, but after updating the compatibility level we ran into performance issues when a lot of sub queries are used. Especially when more than 35 subqueries are used. Here is a query with which I can reproduce it: ``` SELECT...
Recently, we updated the compatibility level of our SQL Server from 2012 to 2016, but after updating the compatibility level we ran into performance issues when a lot of sub queries are used. Especially when more than 35 subqueries are used. Here is a query with which I can reproduce it:
SELECT 
  [PK_R_ASSEMBLYCOSTING],
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  [PK_R_ASSEMBLYCOSTING]
FROM [R_ASSEMBLYCOSTING]
WHERE [FK_ASSEMBLY] = 309961
When there are less than 35 subqueries, the query plan shows it is using index seeks for the subqueries: query plan index seeks But for every additional subquery over 35 an index scan is used: query plan index scans Does anybody have any explanation why this happens? If Legacy Cardinality Estimation is enabled, the query is fast and doesn't have this issue, but we want to disable that. I already tried to rebuild indexes and update the statistics but that doesn't make any difference.
urk_forever (143 rep)
Mar 7, 2025, 04:27 PM • Last activity: Mar 10, 2025, 04:53 PM
11 votes
2 answers
1379 views
Why am I getting an implicit conversion of Int / Smallint to Varchar, and is it really impacting Cardinality Estimates?
I'm trying to trouble shoot a slow performing query using Show Plan Analysis (SSMS) on the actual execution plan. The Analysis tool points out that estimates for number of rows are off from returned results in a few places in the plan and further gives me some implicit conversion warnings. I don't u...
I'm trying to trouble shoot a slow performing query using Show Plan Analysis (SSMS) on the actual execution plan. The Analysis tool points out that estimates for number of rows are off from returned results in a few places in the plan and further gives me some implicit conversion warnings. I don't understand these implicit conversions of int over to Varchar- The fields referenced are not part of any parameter/filter on the query and in all tables involved the column data types are the same: I get the below CardinalityEstimate Warnings: > Type conversion in expression > (CONVERT_IMPLICIT(varchar(12),[ccd].[profileid],0)) may affect > "CardinalityEstimate" in query plan choice --This field is an integer everywhere in my DB > > Type conversion in expression > (CONVERT_IMPLICIT(varchar(6),[ccd].[nodeid],0)) may affect > "CardinalityEstimate" in query plan choice --This field is an smallint everywhere in my DB > > Type conversion in expression > (CONVERT_IMPLICIT(varchar(6),[ccd].[sessionseqnum],0)) may affect > "CardinalityEstimate" in query plan choice --This field is an smallint everywhere in my DB > > Type conversion in expression > (CONVERT_IMPLICIT(varchar(41),[ccd].[sessionid],0)) may affect > "CardinalityEstimate" in query plan choice --This field is an decimal everywhere in my DB [EDIT] Here is the query and actual execution plan for reference https://www.brentozar.com/pastetheplan/?id=SysYt0NzN And table definitions.. /****** Object: Table [dbo].[agentconnectiondetail] Script Date: 1/10/2019 9:10:04 AM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[agentconnectiondetail]( [sessionid] [decimal](18, 0) NOT NULL, [sessionseqnum] [smallint] NOT NULL, [nodeid] [smallint] NOT NULL, [profileid] [int] NOT NULL, [resourceid] [int] NOT NULL, [startdatetime] [datetime2](7) NOT NULL, [enddatetime] [datetime2](7) NOT NULL, [qindex] [smallint] NOT NULL, [gmtoffset] [smallint] NOT NULL, [ringtime] [smallint] NULL, [talktime] [smallint] NULL, [holdtime] [smallint] NULL, [worktime] [smallint] NULL, [callwrapupdata] [varchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [callresult] [smallint] NULL, [dialinglistid] [int] NULL, [convertedStartDatetimelocal] [datetime2](7) NULL, [convertedEndDatetimelocal] [datetime2](7) NULL, CONSTRAINT [PK_agentconnectiondetail] PRIMARY KEY CLUSTERED ( [sessionid] ASC, [sessionseqnum] ASC, [nodeid] ASC, [profileid] ASC, [resourceid] ASC, [startdatetime] ASC, [qindex] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO /****** Object: Table [dbo].[contactcalldetail] Script Date: 1/10/2019 9:10:04 AM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[contactcalldetail]( [sessionid] [decimal](18, 0) NOT NULL, [sessionseqnum] [smallint] NOT NULL, [nodeid] [smallint] NOT NULL, [profileid] [int] NOT NULL, [contacttype] [smallint] NOT NULL, [contactTypeDescription] [varchar](20) COLLATE Latin1_General_CI_AS NULL, [contactdisposition] [smallint] NOT NULL, [contactdispositionDescription] [varchar](20) COLLATE Latin1_General_CI_AS NULL, [dispositionreason] [varchar](100) COLLATE Latin1_General_CI_AS NULL, [originatortype] [smallint] NOT NULL, [originatorTypeDescription] [varchar](20) COLLATE Latin1_General_CI_AS NULL, [originatorid] [int] NULL, [originatordn] [varchar](30) COLLATE Latin1_General_CI_AS NULL, [destinationtype] [smallint] NULL, [destinationTypeDescription] [varchar](20) COLLATE Latin1_General_CI_AS NULL, [destinationid] [int] NULL, [destinationdn] [varchar](30) COLLATE Latin1_General_CI_AS NULL, [startdatetimeUTC] [datetime2](7) NOT NULL, [enddatetimeUTC] [datetime2](7) NOT NULL, [gmtoffset] [smallint] NOT NULL, [callednumber] [varchar](30) COLLATE Latin1_General_CI_AS NULL, [origcallednumber] [varchar](30) COLLATE Latin1_General_CI_AS NULL, [applicationtaskid] [decimal](18, 0) NULL, [applicationid] [int] NULL, [applicationname] [varchar](30) COLLATE Latin1_General_CI_AS NULL, [connecttime] [smallint] NULL, [customvariable1] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [customvariable2] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [customvariable3] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [customvariable4] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [customvariable5] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [customvariable6] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [customvariable7] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [customvariable8] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [customvariable9] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [customvariable10] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [accountnumber] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [callerentereddigits] [varchar](40) COLLATE Latin1_General_CI_AS NULL, [badcalltag] [char](1) COLLATE Latin1_General_CI_AS NULL, [transfer] [bit] NULL, [NextSeqNum] [smallint] NULL, [redirect] [bit] NULL, [conference] [bit] NULL, [flowout] [bit] NULL, [metservicelevel] [bit] NULL, [campaignid] [int] NULL, [origprotocolcallref] [varchar](32) COLLATE Latin1_General_CI_AS NULL, [destprotocolcallref] [varchar](32) COLLATE Latin1_General_CI_AS NULL, [convertedStartDatetimelocal] [datetime2](7) NULL, [convertedEndDatetimelocal] [datetime2](7) NULL, [AltKey] AS (concat([sessionid],[sessionseqnum],[nodeid],[profileid]) collate database_default) PERSISTED NOT NULL, [PrvSeqNum] [smallint] NULL, CONSTRAINT [PK_contactcalldetail] PRIMARY KEY CLUSTERED ( [sessionid] ASC, [sessionseqnum] ASC, [nodeid] ASC, [profileid] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO /****** Object: Table [dbo].[contactqueuedetail] Script Date: 1/10/2019 9:10:04 AM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[contactqueuedetail]( [sessionid] [decimal](18, 0) NOT NULL, [sessionseqnum] [smallint] NOT NULL, [profileid] [int] NOT NULL, [nodeid] [smallint] NOT NULL, [targetid] [int] NOT NULL, [targettype] [smallint] NOT NULL, [targetTypeDescription] [varchar](10) COLLATE Latin1_General_CI_AS NULL, [qindex] [smallint] NOT NULL, [queueorder] [smallint] NOT NULL, [disposition] [smallint] NULL, [dispositionDescription] [varchar](50) COLLATE Latin1_General_CI_AS NULL, [metservicelevel] [bit] NULL, [queuetime] [smallint] NULL, CONSTRAINT [PK_contactqueuedetail] PRIMARY KEY CLUSTERED ( [sessionid] ASC, [sessionseqnum] ASC, [profileid] ASC, [nodeid] ASC, [targetid] ASC, [targettype] ASC, [qindex] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO /****** Object: Index [] Script Date: 1/10/2019 9:10:04 AM ******/ CREATE NONCLUSTERED INDEX [] ON [dbo].[contactcalldetail] ( [convertedStartDatetimelocal] ASC ) INCLUDE ( [sessionid], [sessionseqnum], [nodeid], [profileid]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO /****** Object: Index [idx_CCD_ContactType_DestType_StDtLocal] Script Date: 1/10/2019 9:10:04 AM ******/ CREATE NONCLUSTERED INDEX [idx_CCD_ContactType_DestType_StDtLocal] ON [dbo].[contactcalldetail] ( [destinationtype] ASC, [contacttype] ASC, [convertedStartDatetimelocal] ASC ) INCLUDE ( [sessionid], [sessionseqnum], [nodeid], [profileid], [convertedEndDatetimelocal]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO SET ANSI_PADDING ON GO /****** Object: Index [idx_CQD_Profile_Traget_TargetType] Script Date: 1/10/2019 9:10:04 AM ******/ CREATE NONCLUSTERED INDEX [idx_CQD_Profile_Traget_TargetType] ON [dbo].[contactqueuedetail] ( [profileid] ASC, [targetid] ASC, [targettype] ASC ) INCLUDE ( [targetTypeDescription], [queueorder], [disposition], [dispositionDescription], [queuetime]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO
Voysinmyhead (257 rep)
Jan 8, 2019, 05:27 PM • Last activity: Mar 2, 2025, 07:03 AM
1 votes
2 answers
126 views
Execution Plan Estimates vs Actuals with Inequality Filters
I have the following SQL query: ```sql declare @p1 INT = 20240703; declare @p2 INT = 20240703; declare @p3 NVARCHAR(50) = N'USA'; SELECT R.taxareaid, R.filtertypes FROM region R JOIN country C ON R.countryid = C.countryid where C.name = @p3 and R.effdate = @p2 ORDER BY R.taxareaid OPTION (RECOMPILE)...
I have the following SQL query:
declare @p1 INT = 20240703;
declare @p2 INT = 20240703;
declare @p3 NVARCHAR(50) = N'USA';

SELECT R.taxareaid, R.filtertypes
FROM region R
JOIN country C ON R.countryid = C.countryid 
where C.name = @p3
and  R.effdate = @p2
ORDER BY R.taxareaid 
OPTION (RECOMPILE);
https://www.brentozar.com/pastetheplan/?id=B1FTNkXHkx
CREATE TABLE [dbo].[Region](
	[regionId] [numeric](18, 0) NOT NULL,
	[taxAreaId] [numeric](18, 0) NOT NULL,
	[effDate] [numeric](8, 0) NOT NULL,
	[expDate] [numeric](8, 0) NOT NULL,
	[countryId] [numeric](18, 0) NOT NULL,
	[mainDivisionId] [numeric](18, 0) NOT NULL,
	[subDivisionId] [numeric](18, 0) NOT NULL,
	[cityId] [numeric](18, 0) NOT NULL,
	[postalCodeId] [numeric](18, 0) NOT NULL,
	[cityCompressedId] [numeric](18, 0) NOT NULL,
	[subDivCompressedId] [numeric](18, 0) NOT NULL,
	[filterTypes] [numeric](32, 0) NOT NULL,
	[updateId] [numeric](18, 0) NOT NULL,
 CONSTRAINT pk_region PRIMARY KEY CLUSTERED 
(
	[regionId] ASC
)
### Problem: - **Execution Plan Estimates**: When I look at the execution plan, I notice that the estimates are much smaller than the actual rows processed. Although I'm using OPTION (RECOMPILE) to prevent parameter sniffing, I'm still not getting accurate estimates. I’ve also updated statistics on the region table using a full scan, but the estimates are still incorrect. - **TempDB Spill**: The query is leading to a spill to TempDB during sorting. ### What I’ve Tried: 1. **Updated Statistics**: I performed a full scan to update statistics on the region table. 2. **Indexes**: I created an index on region(taxareaid) and a composite index on (countryid, effdate, expdate, taxareaid), but I am still seeing sorting in the execution plan. ### My Questions: 1. **How can I get more accurate execution plan estimates** to avoid the TempDB spill during sorting? 2. **How can I avoid the sorting operation entirely?** Are there other strategies I can try, given that indexing doesn’t seem to resolve the issue?
sebeid (1415 rep)
Dec 19, 2024, 11:32 PM • Last activity: Dec 21, 2024, 06:05 AM
26 votes
2 answers
1455 views
Why does a subquery reduce the row estimate to 1?
Consider the following contrived but simple query: SELECT ID , CASE WHEN ID 0 THEN (SELECT TOP 1 ID FROM X_OTHER_TABLE) ELSE (SELECT TOP 1 ID FROM X_OTHER_TABLE_2) END AS ID2 FROM X_HEAP; I would expect the final row estimate for this query to be equal to the number of rows in the `X_HEAP` table. Wh...
Consider the following contrived but simple query: SELECT ID , CASE WHEN ID 0 THEN (SELECT TOP 1 ID FROM X_OTHER_TABLE) ELSE (SELECT TOP 1 ID FROM X_OTHER_TABLE_2) END AS ID2 FROM X_HEAP; I would expect the final row estimate for this query to be equal to the number of rows in the X_HEAP table. Whatever I'm doing in the subquery shouldn't matter for the row estimate because it cannot filter out any rows. However, on SQL Server 2016 I see the row estimate reduced to 1 because of the subquery: bad query Why does this happen? What can I do about it? It's very easy to reproduce this issue with the right syntax. Here is one set of table definitions that will do it: CREATE TABLE dbo.X_HEAP (ID INT NOT NULL) CREATE TABLE dbo.X_OTHER_TABLE (ID INT NOT NULL); CREATE TABLE dbo.X_OTHER_TABLE_2 (ID INT NOT NULL); INSERT INTO dbo.X_HEAP WITH (TABLOCK) SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM master..spt_values; CREATE STATISTICS X_HEAP__ID ON X_HEAP (ID) WITH FULLSCAN; db fiddle link .
Joe Obbish (32976 rep)
Apr 21, 2017, 03:33 PM • Last activity: Nov 24, 2024, 01:24 PM
1 votes
1 answers
95 views
Why I don't see the OptimizerStatsUsage in the execution plan
SQL Server 2017 introduces a very helpful enhancement to the showplan to see which statistics were used to generate a plan: https://learn.microsoft.com/en-nz/archive/blogs/sql_server_team/sql-server-2017-showplan-enhancements However, I can't find it in my execution plan. I have the following query...
SQL Server 2017 introduces a very helpful enhancement to the showplan to see which statistics were used to generate a plan: https://learn.microsoft.com/en-nz/archive/blogs/sql_server_team/sql-server-2017-showplan-enhancements However, I can't find it in my execution plan. I have the following query on the StackOverflow database:
Use StackOverflow2010;

DROP TABLE IF EXISTS #tempPosts;
CREATE TABLE #tempPosts(
	Id int
)

INSERT INTO #tempPosts
SELECT ID FROM dbo.Posts
WHERE OwnerUserId = 26837

SELECT Title, u.DisplayName, pt.Type FROM dbo.Posts p
INNER JOIN #tempPosts temp
ON p.Id = temp.Id
INNER JOIN dbo.Users u
ON p.OwnerUserId = u.Id
INNER JOIN dbo.PostTypes pt
ON p.PostTypeId = pt.Id
OPTION(RECOMPILE)
I turned on the Include Actual Execution Plan to capture the plan and could not find OptimizerStatsUsage field in the plan: execution plan that doesn't have OptimizerStatsUsage properties What could be the reason for OptimizerStatsUsage not showing up in my execution plan? Is there any additional configuration or step needed to see this property? Thank you for any insights!
Tuyen Nguyen (343 rep)
Nov 11, 2024, 10:04 PM • Last activity: Nov 12, 2024, 07:08 AM
2 votes
3 answers
673 views
Why is my Nested Loops join showing inaccurate row estimates in SQL Server?
I have the following execution plan: [![execution plan with inaccurate row estimates][1]][1] [1]: https://i.sstatic.net/KrC6HNGy.png As you can see, the row estimates for the `Clustered Index Scan` and `Index Seek` operators are accurate. However, the Nested Loops join has a significant discrepancy:...
I have the following execution plan: execution plan with inaccurate row estimates As you can see, the row estimates for the Clustered Index Scan and Index Seek operators are accurate. However, the Nested Loops join has a significant discrepancy: the actual row count is 6,420, while the estimated row count is only 72. My questions are: 1. How is the row count estimated for a Nested Loops join in SQL Server? 2. What factors could lead to such an inaccurate row estimate in this case? 3. Is there anything I can do to improve or correct the estimate? Thank you for any insights!
Tuyen Nguyen (343 rep)
Nov 6, 2024, 08:35 PM • Last activity: Nov 8, 2024, 09:28 AM
3 votes
1 answers
347 views
What Method / Formula does a Nested Loops Operator use for row estimation?
The following, simple query in AdventureWorks: SELECT * FROM Person.Person p JOIN HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID Gives the following execution plans: [New estimator plan][1] If I look at the above plan, I can see the index scan and index seek both (correctly) es...
The following, simple query in AdventureWorks: SELECT * FROM Person.Person p JOIN HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID Gives the following execution plans: New estimator plan If I look at the above plan, I can see the index scan and index seek both (correctly) estimate 290 rows, however, the estimated loops operator that joins the two, estimates 279 rows. Old estimator The old estimator also correctly guesses 290 rows out of both the seek and the scan but the nested loops estimates 289 rows which in the case of this query is a better estimate. Is it true then that in the case of the new CE the optimizer estimates that when it is joins 290 rows from the index scan and 290 from the index seek, there will be 11 rows that do not match? What method / formula does it use to make this estimate? Am I correct in saying whatever said method is, that it has changed from the earlier CE version as that has made a different estimate? I realise the "bad" estimate of the new CE is not significant enough to detriment performance, I am just trying to understand the estimators processing
SE1986 (2182 rep)
Sep 9, 2020, 11:30 AM • Last activity: Nov 7, 2024, 09:10 AM
4 votes
4 answers
2096 views
Actual and Estimated rows differ greatly
[The full Actual plan is here.][1] Prior to executing the plan (because I'm debugging a poorly functioning plan) I have this block of variable assignments: DECLARE @Days INT = 180 DECLARE @DateRangeFrom DateTime = DATEADD(d, -@Days, getDate()) DECLARE @DateRangeTo DateTime = getDate() DECLARE @Facil...
The full Actual plan is here. Prior to executing the plan (because I'm debugging a poorly functioning plan) I have this block of variable assignments: DECLARE @Days INT = 180 DECLARE @DateRangeFrom DateTime = DATEADD(d, -@Days, getDate()) DECLARE @DateRangeTo DateTime = getDate() DECLARE @FacilityID INT = 1010 DECLARE @Answer0 INT = 1879 DECLARE @Answer1 INT = 1949 DECLARE @Answer1SetID INT = 1607 DECLARE @Answer2 INT = 1907 DECLARE @Answer2SetID INT = 1593 My first problem is with the lookup I'm performing on the IRItemAnswer_Info table (Node ID 19). It's spilling to Tempdb which already starts the query off on the wrong foot. It's referencing the IRItemAnswerInfo_DGItemID_AnswerSourceID index, which is the correct index, as I'm matching on DGItemID and AnswerSourceID, and getting back IncidentID. The index is created as CREATE NONCLUSTERED INDEX IRItemAnswerInfo_DGItemID_AnswerSourceID ON dbo.IRItemAnswer_Info (DGItemID, AnswerSourceID) INCLUDE([IncidentID], [AnswerBoolean]) However, the Estimated Rows for the query is 53,459 and the Actual Rows is 969,812. I just finished forcing new statistics via UPDATE STATISTICS IRItemAnswer_Info IRItemAnswerInfo_DGItemID_AnswerSourceID WITH FULLSCAN and it made no difference. DBCC SHOW_STATISTICS ('IRItemAnswer_Info', 'DGItemID') for DGItemID=1949 has EQ_ROWS as 1,063,536 and DBCC SHOW_STATISTICS ('IRItemAnswer_Info', 'AnswerSourceID') for AnswerSourceID=1607 has EQ_ROWS as 970,079 The database is running Compatibility level 140 (SQL Server 2017). We would run 2019, but there are issues we need to correct in the stored procedures before we can do that. What should be the next thing I look at? --- I chose the worst performing output, which is the most common values. IRItemAnswer_Info is a table containing user-defined answers to associate to an event, where DGItemID=1949 is one of the most common questions (almost every event has one), and where AnswerSourceID=1607 is the most common answer. Given that there is a strong correlation between them, how should I reorder the query? As it is a point of a little bit of confusion, there are two INNER JOINs to the same table, IRItemAnswer_Info. One is the answer I'm looking for (as identified by the question iria.DGItemID=1879 and its output iria.AnswerSourceID links to irai.AltLabel), and the second one is a limiting factor. I only want records where the question iiai1.DGItemID=1949 has as its answer iiai1.AnswerSourceID=1607. I have explicitly removed the plan from the cache (using DBCC FREEPROCCACHE) and re-run it, with no change in the result - the Hash Match is still spilling.
Daniel Bragg (183 rep)
Dec 20, 2021, 11:38 PM • Last activity: May 31, 2024, 02:45 PM
6 votes
1 answers
3657 views
Bitmap Creation in Execution Plan Causes bad Estimate on Clustered Index Scan
Given the following simple query on the StackOverflow2010 database: SELECT u.DisplayName, u.Reputation FROM Users u JOIN Posts p ON u.id = p.OwnerUserId WHERE u.DisplayName = 'alex' AND p.CreationDate >= '2010-01-01' AND p.CreationDate ='2010-01-01 00:00:00.000' AND [StackOverflow2010].[dbo].[Posts]...
Given the following simple query on the StackOverflow2010 database: SELECT u.DisplayName, u.Reputation FROM Users u JOIN Posts p ON u.id = p.OwnerUserId WHERE u.DisplayName = 'alex' AND p.CreationDate >= '2010-01-01' AND p.CreationDate ='2010-01-01 00:00:00.000' AND [StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]= Scalar Operator('2010-01-01 00:00:00.000'), End: [StackOverflow2010].[dbo].[Posts].CreationDate <= Scalar Operator('2010-03-01 00:00:00.000') So i can see Plan 2 is just going to use the histogram to find the number of rows between the two dates but Plan 1 has a slightly more complicated predicate involving a bitmap probe. That (I think) explains why the estimate on the seek is more accurate but I am now wondering what is the bitmap probe? I can see in the plan that there is a bitmap created of the user Ids that match the Alex predicate and that is what is being probed. I wondered "without the index, why wouldn't Plan 1 be the same as Plan 2, the only difference being a CI scan instead of an index seek on CreationDate?" I did some further testing and found that if I run the query without the index but force the plan to go serial, using OPTION (MAXDOP 1) I get Plan 3 which has a better estimate on CreationDate despite now doing a CI Scan on Posts. If I look at the predicate, I can see that the probe is now gone and the bitmap is no longer in the plan so this leads me to believe the bitmap is something to do with the plan going parallel. So my question is - why is a bitmap created when the plan goes parallel and why does it cause such a bad estimate on Posts.CreationDate?
SE1986 (2182 rep)
Jan 27, 2022, 11:11 PM • Last activity: Mar 26, 2024, 01:14 PM
1 votes
2 answers
625 views
How can I optimize a recursive CTE inside a IVTF?
I have a recursive CTE inside a inline table-valued function. The ITVF is returning a list of IDs containing a long sequence of ancestors for a person, it usually loops back about 12 to 18 times before getting to the end. It's quite fast but there's an error in the estimations that stacks when used...
I have a recursive CTE inside a inline table-valued function. The ITVF is returning a list of IDs containing a long sequence of ancestors for a person, it usually loops back about 12 to 18 times before getting to the end. It's quite fast but there's an error in the estimations that stacks when used in on many people, so it becomes extremely slow. The CTE looks like this WITH ancestors AS ( SELECT IndID, AncestorID FROM dbo.persons UNION ALL SELECT IndID, AncestorID FROM ancestors a INNER JOIN dbo.persons p ON p.IndID = a.AncestorID ) SELECT IndID, AncestorID FROM ancestors I have a dozen million rows so it's quite a large table. When I ask for one IndID, the execution plan says that it estimated 7 rows but got 1300 actual rows. For a single request it's acceptable (runs in less than a second) but if I join it in an other request so it gets called, let's say 100 times, the speed drops to a crawl since the estimation is getting worse and worse. Just to be clear, the estimation error is present even out of the IVTF. I only specified it to be clear that I can't just use a temporary table. It needs to stay in a IVTF so I can join it in larger, more complex requests and it stays parallelable. What can I do to estimate the rows better? Update : Paste The Plan Update 2 : Less simplified I'm kind of stuck between two problems. Either I use a MSTVF and all my queries can't parallelize, or I use an ITVF and hope that the SQL gods are generous and don't horribly underestimates the row counts so everything now swaps on the hard drive instead of RAM. I hope that it's just that I'm dumb and it's a stupid easy fix somewhere. **Update 3** To answer to the best of my knowledge to the questions asked. uno) Updated to the latest cumulative update. Didn't change anything as expected, but it's good to be up to date as you said :) dos) We are on Standard edition, but I do have a Column Store and I can't remember why I did it. It's on IndID, FirstNameID, LastNameID. I'll try dropping it, we are only 2 users on the database today, we can manage downtimes if it crashes something else. > After removing the ColumnStore, it did save about 30 seconds! Still slow, but it's better. I'll have to check my notes to find why I did that ColumnStore. dos: part 2) The "underpowered box" feeling you got is exactly what got me up to now. I tought our machine was underpowered, but after talking with the IT here, they said we weren't using more than 25% of it's ressources available so the bottleneck was definitely at the SQL level. So I asked for an update from SQL 2017 to 2022 last month and then, now that I saw that most of my heavy queries were always running serialized, started optimizing until I hit this one. I tried the OPTION(USE HINT('DISALLOW_BATCH_MODE'), MAXDOP 8); and I don't see any changes in speed. tres) That request is indeed supposed to return about 14 million rows, so no worries on that side. But isn't the fact only 8 rows were estimated in the ressource reservation a reason why it's much slower than it should be? more context) I was using a MSTVF before all my work this month, when I switched to an IVTF it is faster but the curve of time spent vs rows asked is exponential instead of linear if you get what I mean. I'm open to rethink how all this is done. I work for a research group and a part of my job is extracting datasets for researchers. I'm pretty much the only heavy user on the database, my colleagues are more in the "inserting and cleaning the data" part of the job. So I can pretty much do what I want with the indexes, functions, etc. as long as the table structure itself is not changed too much. **Update 4 - What?** I don't get it, I was trying to make a nice graphic to show the "time spent vs rows asked" exponential curve, so I changed my query to get nice square numbers. select count(*) FROM (SELECT TOP 10000 * FROM individus.Individus WHERE AnneeNaissance > 1901 AND AnneeDeces < 1911) i CROSS APPLY individus.GetAscendanceSimple(i.IndID) a And that ran in 10 seconds... Even tried TOP 10,000,000 and still fast, so I just have to put a arbitrary large number so all my cases are covered and it runs as fast as I would have hoped (The TOP is important). Before putting that as a solution, I must be wrong no? That's a really dumb fix if it's all we need to do to fix the planning. Without subquery vs With subquery
James McGrath (121 rep)
Jan 24, 2024, 09:10 PM • Last activity: Jan 29, 2024, 07:45 PM
1 votes
1 answers
160 views
Wrong cardinality estimation after gathering statistics ORACLE
We have a large table range partitioned by month. Incremental statistics turned on. After scheduled statistics gathering cardinality estimation become weird, like select count(*) from my_table where date >= trunc(sysdate) - 30 and date 'ownname', tabname=> 'tabname' , estimate_percent=> DBMS_STATS.A...
We have a large table range partitioned by month. Incremental statistics turned on. After scheduled statistics gathering cardinality estimation become weird, like select count(*) from my_table where date >= trunc(sysdate) - 30 and date 'ownname', tabname=> 'tabname' , estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE, cascade=> DBMS_STATS.AUTO_CASCADE, degree=> 4, no_invalidate=> DBMS_STATS.AUTO_INVALIDATE, granularity=> 'AUTO', method_opt=> 'FOR ALL COLUMNS SIZE AUTO' ); -- Manual DBMS_STATS.GATHER_TABLE_STATS ( ownname => '"ownname"', tabname => '"tabname"', partname => '"partname"', method_opt => 'FOR COLUMNS DATE SIZE 254', estimate_percent => 1 ); Other partitioned tables are ok. The differences between this table and others are (as we know): 1. There were wrong inserts in this table. Most dates are between 2014 and 2023, but there are some rows with 1970 and 2024 (we can't change it). Also there is empty partition with 2045. We tried recreating this but didn't get same behaviour. 2. We messed with histograms, removed some automatically created and manually created some useful function-based. But in USER_TAB_COL_STATISTICS and USER_TAB_HISTOGRAMS histograms for DATE column were present. What can cause such behaviour? How can we fix it?
Andy DB Analyst (110 rep)
Oct 24, 2023, 08:38 AM • Last activity: Oct 25, 2023, 01:50 AM
8 votes
1 answers
1569 views
Inserting with implicit type conversion causes warning for cardinality estimates
I noticed this while doing some performance testing recently. When I insert a value into a column that will require an implicit conversion (e.g. `bigint` into `nvarchar`), I get a warning: > Type conversion in expression `(CONVERT_IMPLICIT(nvarchar(50),[tempdb].[dbo].[#MyFunIntTable].[EvenCoolerColu...
I noticed this while doing some performance testing recently. When I insert a value into a column that will require an implicit conversion (e.g. bigint into nvarchar), I get a warning: > Type conversion in expression (CONVERT_IMPLICIT(nvarchar(50),[tempdb].[dbo].[#MyFunIntTable].[EvenCoolerColumn],0)) may affect "Cardinality Estimate" in query plan choice. Being a concerned citizen, I checked all of the obvious suspects and eventually dug into the XML to confirm that it was actually warning about the insert into the table. The problem is, I can't figure out why this would ever affect cardinality estimates. If I were doing this in a join or somewhere with a little more logic it would make sense, but there shouldn't be a cardinality estimate mismatch for the actual insert operation, right? I noticed that this happened when it was more than just a trivial query - as soon as more than one value is inserted, or we're pulling a value from a table, we hit this. This question has attracted some potential duplicates, including: - https://dba.stackexchange.com/q/226610/69545 - https://dba.stackexchange.com/q/36097/69545 I think it is different from these questions because I'm literally not doing anything with this column. I'm not using it in a filter, or a sort, or a grouping, or a join, or in a function - any of these things makes the scenario more complicated. All I'm doing is inserting a bigint into a nvarchar, which should never impact a meaningful cardinality estimate that I can think of. What I'm specifically looking for out of an answer is: 1. An explanation of why I get this warning despite nothing of interest going on - is it just that SQL Server will be conservative and report even when it won't affect plan choice? 2. What cardinality estimate is actually at risk here, and what operation would change based off of inaccuracies in that cardinality estimate? 3. Is there a scenario where this could affect plan choice? Obviously if I start joining or filtering on the converted column it could, but as-is? 4. Is there anything that can be done to prevent it from warning, besides changing data types (assume this is a requirement of how the data models interact) I recreated it with the below simple example (paste the plan ) DROP TABLE IF EXISTS #MyFunStringTable; DROP TABLE IF EXISTS #MyFunIntTable; CREATE TABLE #MyFunStringTable ( SuperCoolColumn nvarchar(50) COLLATE DATABASE_DEFAULT NULL ); CREATE TABLE #MyFunIntTable ( EvenCoolerColumn bigint NULL ); INSERT INTO #MyFunIntTable ( EvenCoolerColumn ) VALUES ( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ); INSERT INTO #MyFunStringTable ( SuperCoolColumn ) SELECT EvenCoolerColumn FROM #MyFunIntTable; INSERT INTO #MyFunStringTable ( SuperCoolColumn ) VALUES ( 1 ); INSERT INTO #MyFunStringTable ( SuperCoolColumn ) VALUES ( 1 ), ( 2 ); INSERT INTO #MyFunStringTable ( SuperCoolColumn ) SELECT 1; INSERT INTO #MyFunStringTable ( SuperCoolColumn ) SELECT 1 UNION ALL SELECT 2; INSERT INTO #MyFunStringTable ( SuperCoolColumn ) SELECT 1 FROM #MyFunIntTable;
Dan Oberlam (183 rep)
Aug 30, 2019, 07:58 PM • Last activity: Oct 20, 2023, 06:18 AM
0 votes
1 answers
315 views
Cardinality miscalculations leads to ridiculous execution plans
I'm not talking about stale statistics, or just simply "bad"/"non-optimal" plans. We have a lot of complicated queries running in our database. Normally all works as expected. But from time to time we have cases when optimizer miscalculates cardinalities and chooses ridiculuous execution plans. The...
I'm not talking about stale statistics, or just simply "bad"/"non-optimal" plans. We have a lot of complicated queries running in our database. Normally all works as expected. But from time to time we have cases when optimizer miscalculates cardinalities and chooses ridiculuous execution plans. The worst cases are when optimizer evaluates subquery to have 1 row. Then we have plans with: 1) Wrong join order with MERGE JOIN CARTESIAN and thousands of rows tables/subqueries. Optimizer for some reasons chooses plans like "SELECT * FROM TAB1... (CROSS) JOIN TAB3... JOIN TAB2 ON TAB1.COL11 = TAB3.COL31 AND TAB2.COL21 = TAB3.COL32" instead of joining "TAB1 JOIN TAB2 JOIN TAB3" as expected. With LEADING/ORDERED hints it starts working properly 2) For no reason using MERGE JOIN CARTESIAN or NESTED LOOP without indexes instead of HASH JOIN. USE_HASH or CARDINALITY hints solve this problem 3) Using VIEW PUSHED PREDICATE with FULL TABLE SCAN. This leads to scanning small tables thousands of times. It takes additional minutes/hours for queries to execute. CARDINALITY/MATERIALIZE/NO_PUSH_PRED hints solve this problem The question is: Is there a way to globally force the optimizer not to use CARTESIAN/VIEW PUSHED PREDICATE if there is no guarantee of 1 row result? Or at least decrease a probability of using it? Like when we had problems of indexes overusing while doing analysis, "ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ = 200" partially solved it.
Andy DB Analyst (110 rep)
Jun 8, 2023, 08:42 AM • Last activity: Jun 8, 2023, 09:19 AM
0 votes
1 answers
175 views
SQL Server 2014 Cardinality Estimator estimate final number of rows after OUTER JOIN is less than number of rows from initial table
I have a [SQL query] for SQL Server 2019. It works fine with option (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')) and very bad without this hint. I found out, that SQL Server 2014 Cardinality Estimator estimate final number of rows after OUTER JOIN **is less than number of rows from initial tab...
I have a [SQL query] for SQL Server 2019. It works fine with option (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')) and very bad without this hint. I found out, that SQL Server 2014 Cardinality Estimator estimate final number of rows after OUTER JOIN **is less than number of rows from initial table** without any WHERE predicates and **with option (recompile)**. New CE Plan Legacy Cardinality Estimator estimate final number of rows after OUTER JOIN is equal or more number of rows from initial table (correct). Old CE Plan Is this a bug of SQL Server 2014 Cardinality Estimator or I do something wrong?
Андрей Ерёмин (1 rep)
May 22, 2023, 08:05 AM • Last activity: May 31, 2023, 04:46 AM
3 votes
1 answers
364 views
How does SQL Server estimate cardinality on nested loops index seek
I am trying to understand how SQL Server estimates cardinality on the below Stack Overflow database query Firstly, I create the index CREATE INDEX IX_PostId ON dbo.Comments ( PostId ) INCLUDE ( [Text] ) And here is the query: SELECT u.DisplayName, c.PostId, c.Text FROM Users u JOIN Comments c ON u.R...
I am trying to understand how SQL Server estimates cardinality on the below Stack Overflow database query Firstly, I create the index CREATE INDEX IX_PostId ON dbo.Comments ( PostId ) INCLUDE ( [Text] ) And here is the query: SELECT u.DisplayName, c.PostId, c.Text FROM Users u JOIN Comments c ON u.Reputation = c.PostId WHERE u.AccountId = 22547 The execution plan is here First of all, SQL Server scans the Clustered index on the users table to return the users that match the AccountId predicate. I can see that it uses this statistic: _WA_Sys_0000000E_09DE7BCC enter image description here I can see that this user doesn't have a range high key, so SQL Server uses the avg_range rows and estimates 1 enter image description here The seek predicate on the comments index seek is enter image description here so Scalar Operator([StackOverflow2010].[dbo].[Users].[Reputation] as [u].[Reputation] represents the reputation value of the User(s) in the users table with the accountId of 22547 I can see three stats loaded in total : _WA_Sys_0000000E_09DE7BCC - Users.AccountId (Used to estimate the Clustered index seek predicate) IX_PostId - Comments.PostId (Used to estimate the Index seek predicate) _WA_Sys_0000000A_09DE7BCC - Users.Reputation (?) how does SQL server come up with the estimate on the index seek? It cannot know the reputation of accountId 22547 at compile time as the Account ID stat does not show that, so it cannot perform a lookup on the histogram for IX_PostId. I can see that the reputation stat is also loaded so does it use both somehow? This query was run against CE 150
SE1986 (2182 rep)
May 4, 2023, 03:40 PM • Last activity: May 16, 2023, 10:13 AM
Showing page 1 of 20 total questions