Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

2 votes

1 answers

29 views

SQL Server Estimates don't use AVG_RANGE_ROWS for Uniqueidentifer Parameter

sql-server azure-sql-database cardinality-estimates

I'm trying to debug a very weird query row estimation. The query is very simple. I have a table `OrderItems` that contains for each Order (column `OrderId`) the items of the order. ```sql SELECT count(*) FROM orders.OrderItem WHERE OrderId = '5a7e53c4-fc70-f011-8dca-000d3a3aa5e1' ``` According to th...

I'm trying to debug a very weird query row estimation. The query is very simple. I have a table OrderItems that contains for each Order (column OrderId) the items of the order.

SELECT count(*)
FROM orders.OrderItem 
WHERE OrderId = '5a7e53c4-fc70-f011-8dca-000d3a3aa5e1'

According to the statistics from IX_OrderItem_FK_OrderId (that's just a normal unfiltered foreign key index CREATE INDEX IX_OrderItem_FK_OrderId on orders.OrderId(OrderId), the density is 1.2620972E-06 with 7423048 rows, so about ~9.3 items per order (if we ignore the items with OrderId = NULL, if we include them there are even less). The statistics are created with FULLSCAN, and are only slightly out of date (around ~0.2% new rows since the last recompute). | Name | Updated | Rows | Rows Sampled | Steps | Density | Average key length | String Index | Filter Expression | Unfiltered Rows | Persisted Sample Percent | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | IX_OrderItem_FK_OrderId | Aug 3 2025 4:36PM | 7423048 | 7423048 | 198 | 0.1649756 |26.443027 | "NO " | NULL | 7423048 | 100 | | All density | Average Length | Columns | | --- | --- | --- | | 1.2620972E-06 | 10.443027 | OrderId | | 1.3471555E-07 | 26.443027 | OrderId, Id | The query plan however expects, that the query returns 205.496 items. And in reality there are actually 0 results - because the orderId doesn't exist. Detailed Query Plan: https://www.brentozar.com/pastetheplan/?id=hVKYNLmXSU It probably uses the histogram for coming up with the estimate. It should fall into following bucket with RANGE_HI_KEY = 'a39932d8-aa2c-f011-8b3d-000d3a440098'. But that estimate should then be 6.87 according to the AVG_RANGE_ROWS. It somehow looks like it uses the EQ_ROWS from the previous bucket (but 205 might also just be by accident). | RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS | | --- | --- | --- | --- | --- | --- | | 9d2e2bea-aa6e-f011-8dca-000d3a3aa5e1 | 12889 | 205 | 2412 | 5.343698 | | a39932d8-aa2c-f011-8b3d-000d3a440098 | 21923 | 107 | 3191 | 6.8702602 | OPTION(RECOMPILE) does not help. Can somebody explain how SQL Server (in particularly Azure SQL) is coming up with that number? - Does it really think that the parameter is close enough to the bucket start, and just takes the EQ_ROWS value even though the AVG_RANGE_ROWS is a lot smaller? - Does it not understand the parameter because it's defined as VARCHAR? If I replace it with DECLARE @OrderId UNIQUEIDENTIFIER = '5a7e...'; WHERE OrderId = @OrderId the estimate is down to 6. But if that's the reason, from where is the estimate 205?

Jakube (121 rep)

Aug 5, 2025, 04:53 PM • Last activity: Aug 6, 2025, 04:39 PM

1 votes

1 answers

48 views

Postgres query planner join selectivity greater than 1?

postgresql performance postgresql-performance execution-plan cardinality-estimates

I am using PostgreSQL 14.17. I am trying to debug a query planner failure in a bigger query, but I think I've narrowed down the problem to a self-join on a join table: ```sql SELECT t2.item_id FROM item_sessions t1 JOIN item_sessions t2 ON t1.session_key = t2.session_key WHERE t1.item_id = 'xxxxxxxx...

I am using PostgreSQL 14.17. I am trying to debug a query planner failure in a bigger query, but I think I've narrowed down the problem to a self-join on a join table:

SELECT t2.item_id
  FROM item_sessions t1
  JOIN item_sessions t2
       ON t1.session_key = t2.session_key
 WHERE t1.item_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';

After running ANALYZE on the table, EXPLAIN gives this plan (which matches the subplan in the larger query):

Nested Loop  (cost=1.12..119.60 rows=7398 width=16)
   ->  Index Only Scan using item_sessions_item_id_session_key_uniq on item_sessions t1  (cost=0.56..8.58 rows=1 width=33)
         Index Cond: (item_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'::uuid)
   ->  Index Only Scan using item_sessions_session_idx on item_sessions t2  (cost=0.56..110.11 rows=91 width=49)
         Index Cond: (session_key = (t1.session_key)::text)

**Why is the loop estimating 7398 rows when the two child nodes estimate 1 and 91 respectively?** I would have expected the loop total to be less than 1 * 91 FWIW, the child estimates seem correct. item_id has n_distinct at -0.77649677, so the expected row count is 1.3, and session_key has n_distinct at 149555 out of an estimated 1.36e+07 tuples, which gives 90.9 expected tuples per session_key. The indexes referenced in the plan are: - item_sessions_session_idx btree (session_key, item_id) - item_sessions_item_id_session_key_uniq UNIQUE CONSTRAINT, btree (item_id, session_key) ETA: I created a minimal reproduction [here](https://github.com/felipeochoa/pg-plan-selectivity-gt1) . The failure is visible [in the job logs](https://github.com/felipeochoa/pg-plan-selectivity-gt1/actions/runs/16359411460/job/46224463766) on 17.5, 16.9, and 15.13

Felipe (317 rep)

Jul 17, 2025, 04:41 AM • Last activity: Jul 18, 2025, 09:14 AM

2 votes

1 answers

91 views

Why does FORCE_LEGACY_CARDINALITY_ESTIMATION not match ('ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES', 'ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS')?

sql-server join sql-server-2022 cardinality-estimates hints

Assume the StackOverflow2010 database under SQL Server 2022 and compatibility level 160. Consider the following two queries: ```sql SELECT COUNT_BIG(*) AS records FROM dbo.Users AS u JOIN dbo.Posts AS p ON (p.OwnerUserId = u.Id AND p.LastEditorUserId = u.Id) WHERE u.DownVotes > 3 AND u.UpVotes > 1 O...

Assume the StackOverflow2010 database under SQL Server 2022 and compatibility level 160. Consider the following two queries:

SELECT
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p 
    ON (p.OwnerUserId = u.Id
        AND p.LastEditorUserId = u.Id)
WHERE
    u.DownVotes > 3 AND u.UpVotes > 1
OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

SELECT
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p 
    ON (p.OwnerUserId = u.Id
        AND p.LastEditorUserId = u.Id)
WHERE
    u.DownVotes > 3 AND u.UpVotes > 1
OPTION(USE HINT('ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES', 'ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS'));

On my machine, I get the same estimated number of rows from the scan of the Users table (23,277.1) and the Posts table (372,920). However, the joins get different estimates. The legacy version estimates 178,865 and the double-hinted version estimates 372,920.

Why is this? I know that the legacy cardinality estimator used simple containment, so I presumed that OPTION(USE HINT('ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES', 'ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS')); and OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION')); would produce identical plans. It is the first time that I've run either of these queries, so I presume that there is no intelligent optimization occurring in the background.

J. Mini (1225 rep)

Jun 8, 2025, 07:41 PM • Last activity: Jun 10, 2025, 10:36 AM

14 votes

1 answers

2948 views

How does SQL Server's optimizer estimate the number of rows in a joined table?

sql-server sql-server-2012 optimization execution-plan cardinality-estimates query-performance

I am running this query in the [AdventureWorks2012][1] database: SELECT s.SalesOrderID, d.CarrierTrackingNumber, d.ProductID, d.OrderQty FROM Sales.SalesOrderHeader s JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID WHERE s.CustomerID = 11077 If I look at the estimated execution plan...

                                  I am running this query in the AdventureWorks2012  database: 

    SELECT 
    	s.SalesOrderID,
    	d.CarrierTrackingNumber,
    	d.ProductID,
    	d.OrderQty
    FROM Sales.SalesOrderHeader s 
    JOIN Sales.SalesOrderDetail d 
    	ON s.SalesOrderID = d.SalesOrderID
    WHERE s.CustomerID = 11077

If I look at the estimated execution plan, I see the following: 

The initial index seek (top right) is using the IX_SalesOrderHeader_CustomerID index and searching on the literal 11077. It has an estimate of 2.6192 rows. 

If I use DBCC SHOW_STATISTICS ('Sales.SalesOrderHeader', 'IX_SalesOrderHeader_CustomerID') WITH HISTOGRAM, it shows that the value 11077 is between the two sampled keys 11019 and 11091. 

The average number of distinct rows between 11019 and 11091 is 2.619718, or rounded to 2.61972 which is the value of estimated rows shown for the index seek.

The part I don't understand is the estimated number of rows for the clustered index seek against the SalesOrderDetail table. 

If I run DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail', 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID'):

So the density of the SalesOrderID (which I am joining on) is 3.178134E-05. That means that 1/3.178134E-05 (31465) equals the number of unique SalesOrderID values in the SalesOrderDetail table.

If there are 31465 unique SalesOrderID's in the SalesOrderDetail, then with an even distribution, the average number of rows per SalesOrderID is 121317 (total number of rows) divided by 31465. The average is 3.85561

So if the estimated number of rows to be loop through is 2.61972, and the average to be returned in 3.85561, the I would think the estimated number of rows would be 2.61972 * 3.85561 = 10.10062.

But the estimated number of rows is 11.4867.

I think my understanding of the second estimate is incorrect and the differing numbers seems to indicate that. What am I missing?

8kb (2639 rep)

Apr 2, 2015, 04:46 PM • Last activity: Jun 10, 2025, 08:26 AM

2 votes

1 answers

160 views

SQL Server Underestimating cardinality on a Filter operator in a Left Anti-Join

sql-server performance query-performance sql-server-2019 cardinality-estimates

I am tuning a query which is slow, I have narrowed the root of the problem to be at the very beginning of the execution plan, where SQL Server makes a bad estimate on a WHERE IS NULL filter that supports a left anti-join - SQL Server estimates 1 row and favours some index scans through a nested loop...

I've managed to create an MCVE to replicate the problem. Set up the test environment

/* INSERT 35000 dinstinct random numbers into a table */
CREATE TABLE #TableA
(
	ID BIGINT NULL
)

INSERT INTO #TableA
SELECT	DISTINCT
		TOP 35000
		a.Random
FROM	(
			SELECT	TOP 50000
					ABS(CHECKSUM(NewId())) % 20000000 AS Random
			FROM	sys.messages
		) a
GO

/* add a further 15000 that already exist in the table. Use a loop to increase the possibility of duplicates */
INSERT INTO #TableA
SELECT	TOP	1000 
		ID
FROM	#TableA a
ORDER BY NEWID()
GO 15


/* Insert 10000 numbers into another table, that are in the first table  */
CREATE TABLE #TableB
(
	ID BIGINT NOT NULL
)

INSERT INTO #TableB
SELECT	TOP 10000
		*
FROM	#TableA

/* insert 80000 distinct random numbers that are not in the first table */
INSERT INTO #TableB
SELECT	DISTINCT
		TOP 80000
		a.Random
FROM	(
			SELECT	TOP 100000
					ABS(CHECKSUM(NewId())) % 2000000 AS Random
			FROM	sys.messages
		) a
		LEFT JOIN #TableA b
			ON a.Random = b.ID
WHERE	b.ID IS NULL

Then, the query which suffers the problem is

SELECT	a.ID
FROM	#TableA a
		LEFT JOIN #TableB b
			ON a.ID = b.ID
WHERE	b.ID IS NULL

Which is a fairly simple "show me all the IDs in TableA that are not in TableB" The execution plan from my test environment is here We can see a very similar thing happening to as we see in above plan, in terms of the filter operator - SQL Server joins the two tables together and then filters down to those records that are in the left table but not the right table and it massively underestimates the number of rows that match that predicate If I force legacy estimation, I get a much better estimate on the operator I believe one of the key differences between the old estimator and new estimators is how they differ on their assumption of the correlation between two predicates - the old one assumes there is little correlation between two predicates whereas the new estimator is more optimistic and assumes a higher correlation? My questions are - What causes this underestimation on the newer cardinality estimator? - Is there a way to fix it other than forcing the older compatibility model?

SE1986 (2182 rep)

May 30, 2024, 01:45 PM • Last activity: Jun 9, 2025, 11:51 PM

4 votes

1 answers

448 views

Index scan when more than 35 correlated subqueries are used with default cardinality estimation

sql-server optimization sql-server-2022 cardinality-estimates

Recently, we updated the compatibility level of our SQL Server from 2012 to 2016, but after updating the compatibility level we ran into performance issues when a lot of sub queries are used. Especially when more than 35 subqueries are used. Here is a query with which I can reproduce it: ``` SELECT...

SELECT 
  [PK_R_ASSEMBLYCOSTING],
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  (SELECT SUM([PRICEEXMARKUP]) FROM [R_ASSEMBLYCOSTINGITEM] WHERE [FK_ASSEMBLYCOSTING] = [PK_R_ASSEMBLYCOSTING]),
  [PK_R_ASSEMBLYCOSTING]
FROM [R_ASSEMBLYCOSTING]
WHERE [FK_ASSEMBLY] = 309961

When there are less than 35 subqueries, the query plan shows it is using index seeks for the subqueries:

But for every additional subquery over 35 an index scan is used:

Does anybody have any explanation why this happens? If Legacy Cardinality Estimation is enabled, the query is fast and doesn't have this issue, but we want to disable that. I already tried to rebuild indexes and update the statistics but that doesn't make any difference.

urk_forever (143 rep)

Mar 7, 2025, 04:27 PM • Last activity: Mar 10, 2025, 04:53 PM

11 votes

2 answers

1379 views

Why am I getting an implicit conversion of Int / Smallint to Varchar, and is it really impacting Cardinality Estimates?

sql-server execution-plan sql-server-2017 type-conversion cardinality-estimates

I'm trying to trouble shoot a slow performing query using Show Plan Analysis (SSMS) on the actual execution plan. The Analysis tool points out that estimates for number of rows are off from returned results in a few places in the plan and further gives me some implicit conversion warnings. I don't u...

                                  I'm trying to trouble shoot a slow performing query using Show Plan Analysis (SSMS) on the actual execution plan. The Analysis tool points out that estimates for number of rows are off from returned results in a few places in the plan and further gives me some implicit conversion warnings.

I don't understand these implicit conversions of int over to Varchar- The fields referenced are not part of any parameter/filter on the query and in all tables involved the column data types are the same:

I get the below CardinalityEstimate Warnings:
> Type conversion in expression
> (CONVERT_IMPLICIT(varchar(12),[ccd].[profileid],0)) may affect
> "CardinalityEstimate" in query plan choice
--This field is an integer everywhere in my DB
> 
> Type conversion in expression
> (CONVERT_IMPLICIT(varchar(6),[ccd].[nodeid],0)) may affect
> "CardinalityEstimate" in query plan choice
--This field is an smallint everywhere in my DB
> 
> Type conversion in expression
> (CONVERT_IMPLICIT(varchar(6),[ccd].[sessionseqnum],0)) may affect
> "CardinalityEstimate" in query plan choice
--This field is an smallint everywhere in my DB
> 
> Type conversion in expression
> (CONVERT_IMPLICIT(varchar(41),[ccd].[sessionid],0)) may affect
> "CardinalityEstimate" in query plan choice
--This field is an decimal everywhere in my DB

[EDIT] Here is the query and actual execution plan for reference
   https://www.brentozar.com/pastetheplan/?id=SysYt0NzN 

And table definitions..

      
    
    /****** Object:  Table [dbo].[agentconnectiondetail]    Script Date: 1/10/2019 9:10:04 AM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[agentconnectiondetail](
    	[sessionid] [decimal](18, 0) NOT NULL,
    	[sessionseqnum] [smallint] NOT NULL,
    	[nodeid] [smallint] NOT NULL,
    	[profileid] [int] NOT NULL,
    	[resourceid] [int] NOT NULL,
    	[startdatetime] [datetime2](7) NOT NULL,
    	[enddatetime] [datetime2](7) NOT NULL,
    	[qindex] [smallint] NOT NULL,
    	[gmtoffset] [smallint] NOT NULL,
    	[ringtime] [smallint] NULL,
    	[talktime] [smallint] NULL,
    	[holdtime] [smallint] NULL,
    	[worktime] [smallint] NULL,
    	[callwrapupdata] [varchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    	[callresult] [smallint] NULL,
    	[dialinglistid] [int] NULL,
    	[convertedStartDatetimelocal] [datetime2](7) NULL,
    	[convertedEndDatetimelocal] [datetime2](7) NULL,
     CONSTRAINT [PK_agentconnectiondetail] PRIMARY KEY CLUSTERED 
    (
    	[sessionid] ASC,
    	[sessionseqnum] ASC,
    	[nodeid] ASC,
    	[profileid] ASC,
    	[resourceid] ASC,
    	[startdatetime] ASC,
    	[qindex] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
    /****** Object:  Table [dbo].[contactcalldetail]    Script Date: 1/10/2019 9:10:04 AM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[contactcalldetail](
    	[sessionid] [decimal](18, 0) NOT NULL,
    	[sessionseqnum] [smallint] NOT NULL,
    	[nodeid] [smallint] NOT NULL,
    	[profileid] [int] NOT NULL,
    	[contacttype] [smallint] NOT NULL,
    	[contactTypeDescription] [varchar](20) COLLATE Latin1_General_CI_AS NULL,
    	[contactdisposition] [smallint] NOT NULL,
    	[contactdispositionDescription] [varchar](20) COLLATE Latin1_General_CI_AS NULL,
    	[dispositionreason] [varchar](100) COLLATE Latin1_General_CI_AS NULL,
    	[originatortype] [smallint] NOT NULL,
    	[originatorTypeDescription] [varchar](20) COLLATE Latin1_General_CI_AS NULL,
    	[originatorid] [int] NULL,
    	[originatordn] [varchar](30) COLLATE Latin1_General_CI_AS NULL,
    	[destinationtype] [smallint] NULL,
    	[destinationTypeDescription] [varchar](20) COLLATE Latin1_General_CI_AS NULL,
    	[destinationid] [int] NULL,
    	[destinationdn] [varchar](30) COLLATE Latin1_General_CI_AS NULL,
    	[startdatetimeUTC] [datetime2](7) NOT NULL,
    	[enddatetimeUTC] [datetime2](7) NOT NULL,
    	[gmtoffset] [smallint] NOT NULL,
    	[callednumber] [varchar](30) COLLATE Latin1_General_CI_AS NULL,
    	[origcallednumber] [varchar](30) COLLATE Latin1_General_CI_AS NULL,
    	[applicationtaskid] [decimal](18, 0) NULL,
    	[applicationid] [int] NULL,
    	[applicationname] [varchar](30) COLLATE Latin1_General_CI_AS NULL,
    	[connecttime] [smallint] NULL,
    	[customvariable1] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[customvariable2] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[customvariable3] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[customvariable4] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[customvariable5] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[customvariable6] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[customvariable7] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[customvariable8] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[customvariable9] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[customvariable10] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[accountnumber] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[callerentereddigits] [varchar](40) COLLATE Latin1_General_CI_AS NULL,
    	[badcalltag] [char](1) COLLATE Latin1_General_CI_AS NULL,
    	[transfer] [bit] NULL,
    	[NextSeqNum] [smallint] NULL,
    	[redirect] [bit] NULL,
    	[conference] [bit] NULL,
    	[flowout] [bit] NULL,
    	[metservicelevel] [bit] NULL,
    	[campaignid] [int] NULL,
    	[origprotocolcallref] [varchar](32) COLLATE Latin1_General_CI_AS NULL,
    	[destprotocolcallref] [varchar](32) COLLATE Latin1_General_CI_AS NULL,
    	[convertedStartDatetimelocal] [datetime2](7) NULL,
    	[convertedEndDatetimelocal] [datetime2](7) NULL,
    	[AltKey]  AS (concat([sessionid],[sessionseqnum],[nodeid],[profileid]) collate database_default) PERSISTED NOT NULL,
    	[PrvSeqNum] [smallint] NULL,
     CONSTRAINT [PK_contactcalldetail] PRIMARY KEY CLUSTERED 
    (
    	[sessionid] ASC,
    	[sessionseqnum] ASC,
    	[nodeid] ASC,
    	[profileid] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
    /****** Object:  Table [dbo].[contactqueuedetail]    Script Date: 1/10/2019 9:10:04 AM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[contactqueuedetail](
    	[sessionid] [decimal](18, 0) NOT NULL,
    	[sessionseqnum] [smallint] NOT NULL,
    	[profileid] [int] NOT NULL,
    	[nodeid] [smallint] NOT NULL,
    	[targetid] [int] NOT NULL,
    	[targettype] [smallint] NOT NULL,
    	[targetTypeDescription] [varchar](10) COLLATE Latin1_General_CI_AS NULL,
    	[qindex] [smallint] NOT NULL,
    	[queueorder] [smallint] NOT NULL,
    	[disposition] [smallint] NULL,
    	[dispositionDescription] [varchar](50) COLLATE Latin1_General_CI_AS NULL,
    	[metservicelevel] [bit] NULL,
    	[queuetime] [smallint] NULL,
     CONSTRAINT [PK_contactqueuedetail] PRIMARY KEY CLUSTERED 
    (
    	[sessionid] ASC,
    	[sessionseqnum] ASC,
    	[profileid] ASC,
    	[nodeid] ASC,
    	[targetid] ASC,
    	[targettype] ASC,
    	[qindex] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
    /****** Object:  Index []    Script Date: 1/10/2019 9:10:04 AM ******/
    CREATE NONCLUSTERED INDEX [] ON [dbo].[contactcalldetail]
    (
    	[convertedStartDatetimelocal] ASC
    )
    INCLUDE ( 	[sessionid],
    	[sessionseqnum],
    	[nodeid],
    	[profileid]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    /****** Object:  Index [idx_CCD_ContactType_DestType_StDtLocal]    Script Date: 1/10/2019 9:10:04 AM ******/
    CREATE NONCLUSTERED INDEX [idx_CCD_ContactType_DestType_StDtLocal] ON [dbo].[contactcalldetail]
    (
    	[destinationtype] ASC,
    	[contacttype] ASC,
    	[convertedStartDatetimelocal] ASC
    )
    INCLUDE ( 	[sessionid],
    	[sessionseqnum],
    	[nodeid],
    	[profileid],
    	[convertedEndDatetimelocal]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    SET ANSI_PADDING ON
    GO
    /****** Object:  Index [idx_CQD_Profile_Traget_TargetType]    Script Date: 1/10/2019 9:10:04 AM ******/
    CREATE NONCLUSTERED INDEX [idx_CQD_Profile_Traget_TargetType] ON [dbo].[contactqueuedetail]
    (
    	[profileid] ASC,
    	[targetid] ASC,
    	[targettype] ASC
    )
    INCLUDE ( 	[targetTypeDescription],
    	[queueorder],
    	[disposition],
    	[dispositionDescription],
    	[queuetime]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
                                

Voysinmyhead (257 rep)

Jan 8, 2019, 05:27 PM • Last activity: Mar 2, 2025, 07:03 AM

1 votes

2 answers

126 views

Execution Plan Estimates vs Actuals with Inequality Filters

sql-server query-performance sorting cardinality-estimates

I have the following SQL query: ```sql declare @p1 INT = 20240703; declare @p2 INT = 20240703; declare @p3 NVARCHAR(50) = N'USA'; SELECT R.taxareaid, R.filtertypes FROM region R JOIN country C ON R.countryid = C.countryid where C.name = @p3 and R.effdate = @p2 ORDER BY R.taxareaid OPTION (RECOMPILE)...

I have the following SQL query:

declare @p1 INT = 20240703;
declare @p2 INT = 20240703;
declare @p3 NVARCHAR(50) = N'USA';

SELECT R.taxareaid, R.filtertypes
FROM region R
JOIN country C ON R.countryid = C.countryid 
where C.name = @p3
and  R.effdate = @p2
ORDER BY R.taxareaid 
OPTION (RECOMPILE);

https://www.brentozar.com/pastetheplan/?id=B1FTNkXHkx

CREATE TABLE [dbo].[Region](
	[regionId] [numeric](18, 0) NOT NULL,
	[taxAreaId] [numeric](18, 0) NOT NULL,
	[effDate] [numeric](8, 0) NOT NULL,
	[expDate] [numeric](8, 0) NOT NULL,
	[countryId] [numeric](18, 0) NOT NULL,
	[mainDivisionId] [numeric](18, 0) NOT NULL,
	[subDivisionId] [numeric](18, 0) NOT NULL,
	[cityId] [numeric](18, 0) NOT NULL,
	[postalCodeId] [numeric](18, 0) NOT NULL,
	[cityCompressedId] [numeric](18, 0) NOT NULL,
	[subDivCompressedId] [numeric](18, 0) NOT NULL,
	[filterTypes] [numeric](32, 0) NOT NULL,
	[updateId] [numeric](18, 0) NOT NULL,
 CONSTRAINT pk_region PRIMARY KEY CLUSTERED 
(
	[regionId] ASC
)

### Problem: - **Execution Plan Estimates**: When I look at the execution plan, I notice that the estimates are much smaller than the actual rows processed. Although I'm using OPTION (RECOMPILE) to prevent parameter sniffing, I'm still not getting accurate estimates. I’ve also updated statistics on the region table using a full scan, but the estimates are still incorrect. - **TempDB Spill**: The query is leading to a spill to TempDB during sorting. ### What I’ve Tried: 1. **Updated Statistics**: I performed a full scan to update statistics on the region table. 2. **Indexes**: I created an index on region(taxareaid) and a composite index on (countryid, effdate, expdate, taxareaid), but I am still seeing sorting in the execution plan. ### My Questions: 1. **How can I get more accurate execution plan estimates** to avoid the TempDB spill during sorting? 2. **How can I avoid the sorting operation entirely?** Are there other strategies I can try, given that indexing doesn’t seem to resolve the issue?

sebeid (1415 rep)

Dec 19, 2024, 11:32 PM • Last activity: Dec 21, 2024, 06:05 AM

26 votes

2 answers

1455 views

Why does a subquery reduce the row estimate to 1?

sql-server sql-server-2016 cardinality-estimates

Consider the following contrived but simple query: SELECT ID , CASE WHEN ID 0 THEN (SELECT TOP 1 ID FROM X_OTHER_TABLE) ELSE (SELECT TOP 1 ID FROM X_OTHER_TABLE_2) END AS ID2 FROM X_HEAP; I would expect the final row estimate for this query to be equal to the number of rows in the `X_HEAP` table. Wh...

                                  Consider the following contrived but simple query:

    SELECT 
      ID
    , CASE
    	WHEN ID  0 
    	THEN (SELECT TOP 1 ID FROM X_OTHER_TABLE) 
    	ELSE (SELECT TOP 1 ID FROM X_OTHER_TABLE_2) 
      END AS ID2
    FROM X_HEAP;

I would expect the final row estimate for this query to be equal to the number of rows in the X_HEAP table. Whatever I'm doing in the subquery shouldn't matter for the row estimate because it cannot filter out any rows. However, on SQL Server 2016 I see the row estimate reduced to 1 because of the subquery:

Why does this happen? What can I do about it?

It's very easy to reproduce this issue with the right syntax. Here is one set of table definitions that will do it:

    CREATE TABLE dbo.X_HEAP (ID INT NOT NULL)
    CREATE TABLE dbo.X_OTHER_TABLE (ID INT NOT NULL);
    CREATE TABLE dbo.X_OTHER_TABLE_2 (ID INT NOT NULL);
    
    INSERT INTO dbo.X_HEAP WITH (TABLOCK)
    SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM master..spt_values;
    
    CREATE STATISTICS X_HEAP__ID ON X_HEAP (ID) WITH FULLSCAN;

db fiddle link .

Joe Obbish (32976 rep)

Apr 21, 2017, 03:33 PM • Last activity: Nov 24, 2024, 01:24 PM

1 votes

1 answers

95 views

Why I don't see the OptimizerStatsUsage in the execution plan

sql-server execution-plan sql-server-2019 cardinality-estimates index-statistics

SQL Server 2017 introduces a very helpful enhancement to the showplan to see which statistics were used to generate a plan: https://learn.microsoft.com/en-nz/archive/blogs/sql_server_team/sql-server-2017-showplan-enhancements However, I can't find it in my execution plan. I have the following query...

Use StackOverflow2010;

DROP TABLE IF EXISTS #tempPosts;
CREATE TABLE #tempPosts(
	Id int
)

INSERT INTO #tempPosts
SELECT ID FROM dbo.Posts
WHERE OwnerUserId = 26837

SELECT Title, u.DisplayName, pt.Type FROM dbo.Posts p
INNER JOIN #tempPosts temp
ON p.Id = temp.Id
INNER JOIN dbo.Users u
ON p.OwnerUserId = u.Id
INNER JOIN dbo.PostTypes pt
ON p.PostTypeId = pt.Id
OPTION(RECOMPILE)

I turned on the Include Actual Execution Plan to capture the plan and could not find OptimizerStatsUsage field in the plan:

execution plan that doesn't have OptimizerStatsUsage properties

What could be the reason for OptimizerStatsUsage not showing up in my execution plan? Is there any additional configuration or step needed to see this property? Thank you for any insights!

Tuyen Nguyen (343 rep)

Nov 11, 2024, 10:04 PM • Last activity: Nov 12, 2024, 07:08 AM

2 votes

3 answers

673 views

Why is my Nested Loops join showing inaccurate row estimates in SQL Server?

sql-server execution-plan sql-server-2019 cardinality-estimates loop

I have the following execution plan: [![execution plan with inaccurate row estimates][1]][1] [1]: https://i.sstatic.net/KrC6HNGy.png As you can see, the row estimates for the `Clustered Index Scan` and `Index Seek` operators are accurate. However, the Nested Loops join has a significant discrepancy:...

                                  I have the following execution plan:

As you can see, the row estimates for the Clustered Index Scan and Index Seek operators are accurate. However, the Nested Loops join has a significant discrepancy: the actual row count is 6,420, while the estimated row count is only 72.

My questions are:

1. How is the row count estimated for a Nested Loops join in SQL Server?
2. What factors could lead to such an inaccurate row estimate in this case?
3. Is there anything I can do to improve or correct the estimate?

Thank you for any insights!

Tuyen Nguyen (343 rep)

Nov 6, 2024, 08:35 PM • Last activity: Nov 8, 2024, 09:28 AM

3 votes

1 answers

347 views

What Method / Formula does a Nested Loops Operator use for row estimation?

sql-server sql-server-2016 cardinality-estimates

The following, simple query in AdventureWorks: SELECT * FROM Person.Person p JOIN HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID Gives the following execution plans: [New estimator plan][1] If I look at the above plan, I can see the index scan and index seek both (correctly) es...

                                  The following, simple query in AdventureWorks:

    SELECT	*
    FROM	Person.Person p
    		JOIN HumanResources.Employee e
    			ON p.BusinessEntityID = e.BusinessEntityID

Gives the following execution plans:

New estimator plan 

If I look at the above plan, I can see the index scan and index seek both (correctly) estimate 290 rows, however, the estimated loops operator that joins the two, estimates 279 rows. 

Old estimator 

The old estimator also correctly guesses 290 rows out of both the seek and the scan but the nested loops estimates 289 rows which in the case of this query is a better estimate.

Is it true then that in the case of the new CE the optimizer estimates that when it is joins 290 rows from the index scan and 290 from the index seek, there will be 11 rows that do not match?

What method / formula does it use to make this estimate?

Am I correct in saying whatever said method is, that it has changed from the earlier CE version as that has made a different estimate?

I realise the "bad" estimate of the new CE is not significant enough to detriment performance, I am just trying to understand the estimators processing

SE1986 (2182 rep)

Sep 9, 2020, 11:30 AM • Last activity: Nov 7, 2024, 09:10 AM

4 votes

4 answers

2096 views

Actual and Estimated rows differ greatly

sql-server query-performance execution-plan sql-server-2019 cardinality-estimates

[The full Actual plan is here.][1] Prior to executing the plan (because I'm debugging a poorly functioning plan) I have this block of variable assignments: DECLARE @Days INT = 180 DECLARE @DateRangeFrom DateTime = DATEADD(d, -@Days, getDate()) DECLARE @DateRangeTo DateTime = getDate() DECLARE @Facil...

                                  The full Actual plan is here. 

Prior to executing the plan (because I'm debugging a poorly functioning plan) I have this block of variable assignments:

    DECLARE @Days INT = 180
    DECLARE @DateRangeFrom DateTime = DATEADD(d, -@Days, getDate())
    DECLARE @DateRangeTo DateTime = getDate()
    DECLARE @FacilityID INT = 1010
    DECLARE @Answer0 INT = 1879
    DECLARE @Answer1 INT = 1949
    DECLARE @Answer1SetID INT = 1607
    DECLARE @Answer2 INT = 1907
    DECLARE @Answer2SetID INT = 1593

My first problem is with the lookup I'm performing on the IRItemAnswer_Info table (Node ID 19).  It's spilling to Tempdb which already starts the query off on the wrong foot. 
 It's referencing the IRItemAnswerInfo_DGItemID_AnswerSourceID index, which is the correct index, as I'm matching on DGItemID and AnswerSourceID, and getting back IncidentID.  The index is created as

    CREATE NONCLUSTERED INDEX IRItemAnswerInfo_DGItemID_AnswerSourceID
    ON dbo.IRItemAnswer_Info (DGItemID, AnswerSourceID) 
    INCLUDE([IncidentID], [AnswerBoolean])

However, the Estimated Rows for the query is 53,459 and the Actual Rows is 969,812.

I just finished forcing new statistics via UPDATE STATISTICS IRItemAnswer_Info IRItemAnswerInfo_DGItemID_AnswerSourceID WITH FULLSCAN and it made no difference.

DBCC SHOW_STATISTICS ('IRItemAnswer_Info', 'DGItemID') for DGItemID=1949 has EQ_ROWS as 1,063,536 and

DBCC SHOW_STATISTICS ('IRItemAnswer_Info', 'AnswerSourceID') for AnswerSourceID=1607 has EQ_ROWS as 970,079

The database is running Compatibility level 140 (SQL Server 2017).  We would run 2019, but there are issues we need to correct in the stored procedures before we can do that.

What should be the next thing I look at?

---


I chose the worst performing output, which is the most common values.  IRItemAnswer_Info is a table containing user-defined answers to associate to an event, where DGItemID=1949 is one of the most common questions (almost every event has one), and where AnswerSourceID=1607 is the most common answer.  Given that there is a strong correlation between them, how should I reorder the query?

As it is a point of a little bit of confusion, there are two INNER JOINs to the same table, IRItemAnswer_Info.  One is the answer I'm looking for (as identified by the question iria.DGItemID=1879 and its output iria.AnswerSourceID links to irai.AltLabel), and the second one is a limiting factor.  I only want records where the question iiai1.DGItemID=1949 has as its answer iiai1.AnswerSourceID=1607.

I have explicitly removed the plan from the cache (using DBCC FREEPROCCACHE) and re-run it, with no change in the result - the Hash Match is still spilling.
                                

Daniel Bragg (183 rep)

Dec 20, 2021, 11:38 PM • Last activity: May 31, 2024, 02:45 PM

6 votes

1 answers

3657 views

Bitmap Creation in Execution Plan Causes bad Estimate on Clustered Index Scan

sql-server execution-plan sql-server-2019 database-internals cardinality-estimates

Given the following simple query on the StackOverflow2010 database: SELECT u.DisplayName, u.Reputation FROM Users u JOIN Posts p ON u.id = p.OwnerUserId WHERE u.DisplayName = 'alex' AND p.CreationDate >= '2010-01-01' AND p.CreationDate ='2010-01-01 00:00:00.000' AND [StackOverflow2010].[dbo].[Posts]...

                                  Given the following simple query on the StackOverflow2010 database:

    SELECT	u.DisplayName,
    		u.Reputation
    FROM	Users u
    		JOIN Posts p
    			ON u.id = p.OwnerUserId
    WHERE	u.DisplayName = 'alex' AND
    		p.CreationDate >= '2010-01-01' AND
    		p.CreationDate ='2010-01-01 00:00:00.000' AND [StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]= Scalar Operator('2010-01-01 00:00:00.000'), End: [StackOverflow2010].[dbo].[Posts].CreationDate <= Scalar Operator('2010-03-01 00:00:00.000')

So i can see Plan 2 is just going to use the histogram to find the number of rows between the two dates but Plan 1 has a slightly more complicated predicate involving a bitmap probe.

That (I think) explains why the estimate on the seek is more accurate but I am now wondering what is the bitmap probe? I can see in the plan that there is a bitmap created of the user Ids that match the Alex predicate and that is what is being probed.

I wondered "without the index, why wouldn't Plan 1 be the same as Plan 2, the only difference being a CI scan instead of an index seek on CreationDate?"

I did some further testing and found that if I run the query without the index but force the plan to go serial, using OPTION (MAXDOP 1) I get Plan 3  which has a better estimate on CreationDate despite now doing a CI Scan on Posts. If I look at the predicate, I can see that the probe is now gone and the bitmap is no longer in the plan so this leads me to believe the bitmap is something to do with the plan going parallel.

So my question is - why is a bitmap created when the plan goes parallel and why does it cause such a bad estimate on Posts.CreationDate?

SE1986 (2182 rep)

Jan 27, 2022, 11:11 PM • Last activity: Mar 26, 2024, 01:14 PM

1 votes

2 answers

625 views

How can I optimize a recursive CTE inside a IVTF?

sql-server cte recursive cardinality-estimates sql-server-2022

I have a recursive CTE inside a inline table-valued function. The ITVF is returning a list of IDs containing a long sequence of ancestors for a person, it usually loops back about 12 to 18 times before getting to the end. It's quite fast but there's an error in the estimations that stacks when used...

                                  I have a recursive CTE inside a inline table-valued function. The ITVF is returning a list of IDs containing a long sequence of ancestors for a person, it usually loops back about 12 to 18 times before getting to the end. It's quite fast but there's an error in the estimations that stacks when used in on many people, so it becomes extremely slow.

The CTE looks like this 

    WITH ancestors AS (
      SELECT
        IndID,
        AncestorID
      FROM
        dbo.persons

      UNION ALL

      SELECT 
        IndID,
        AncestorID
      FROM
        ancestors a
        INNER JOIN dbo.persons p ON p.IndID = a.AncestorID
    )
    SELECT IndID, AncestorID FROM ancestors

I have a dozen million rows so it's quite a large table. When I ask for one IndID, the execution plan says that it estimated 7 rows but got 1300 actual rows. For a single request it's acceptable (runs in less than a second) but if I join it in an other request so it gets called, let's say 100 times, the speed drops to a crawl since the estimation is getting worse and worse.

Just to be clear, the estimation error is present even out of the IVTF. I only specified it to be clear that I can't just use a temporary table. It needs to stay in a IVTF so I can join it in larger, more complex requests and it stays parallelable. What can I do to estimate the rows better?

Update : Paste The Plan 

Update 2 : Less simplified 


I'm kind of stuck between two problems. Either I use a MSTVF and all my queries can't parallelize, or I use an ITVF and hope that the SQL gods are generous and don't horribly underestimates the row counts so everything now swaps on the hard drive instead of RAM. I hope that it's just that I'm dumb and it's a stupid easy fix somewhere.

**Update 3**
To answer to the best of my knowledge to the questions asked.

uno) Updated to the latest cumulative update. Didn't change anything as expected, but it's good to be up to date as you said :)

dos) We are on Standard edition, but I do have a Column Store and I can't remember why I did it. It's on IndID, FirstNameID, LastNameID. I'll try dropping it, we are only 2 users on the database today, we can manage downtimes if it crashes something else. 

> After removing the ColumnStore, it did save about 30 seconds! Still slow, but it's better. I'll have to check my notes to find why I did that ColumnStore.

dos: part 2) The "underpowered box" feeling you got is exactly what got me up to now. I tought our machine was underpowered, but after talking with the IT here, they said we weren't using more than 25% of it's ressources available so the bottleneck was definitely at the SQL level. So I asked for an update from SQL 2017 to 2022 last month and then, now that I saw that most of my heavy queries were always running serialized, started optimizing until I hit this one. I tried the OPTION(USE HINT('DISALLOW_BATCH_MODE'), MAXDOP 8); and I don't see any changes in speed.

tres) That request is indeed supposed to return about 14 million rows, so no worries on that side. But isn't the fact only 8 rows were estimated in the ressource reservation a reason why it's much slower than it should be? 

more context) I was using a MSTVF before all my work this month, when I switched to an IVTF it is faster but the curve of time spent vs rows asked is exponential instead of linear if you get what I mean. I'm open to rethink how all this is done.

I work for a research group and a part of my job is extracting datasets for researchers. I'm pretty much the only heavy user on the database, my colleagues are more in the "inserting and cleaning the data" part of the job. So I can pretty much do what I want with the indexes, functions, etc. as long as the table structure itself is not changed too much.

**Update 4 - What?**
I don't get it, I was trying to make a nice graphic to show the "time spent vs rows asked" exponential curve, so I changed my query to get nice square numbers.

    select 
    	count(*)
    FROM
    	(SELECT TOP 10000 * FROM individus.Individus WHERE AnneeNaissance > 1901 AND AnneeDeces < 1911) i CROSS APPLY
    	individus.GetAscendanceSimple(i.IndID) a

And that ran in 10 seconds... Even tried TOP 10,000,000 and still fast, so I just have to put a arbitrary large number so all my cases are covered and it runs as fast as I would have hoped (The TOP is important). Before putting that as a solution, I must be wrong no? That's a really dumb fix if it's all we need to do to fix the planning.

Without subquery  vs 
With subquery 



                                

James McGrath (121 rep)

Jan 24, 2024, 09:10 PM • Last activity: Jan 29, 2024, 07:45 PM

1 votes

1 answers

160 views

Wrong cardinality estimation after gathering statistics ORACLE

oracle statistics oracle-19c cardinality-estimates

We have a large table range partitioned by month. Incremental statistics turned on. After scheduled statistics gathering cardinality estimation become weird, like select count(*) from my_table where date >= trunc(sysdate) - 30 and date 'ownname', tabname=> 'tabname' , estimate_percent=> DBMS_STATS.A...

                                  We have a large table range partitioned by month. Incremental statistics turned on. After scheduled statistics gathering cardinality estimation become weird, like 

    select count(*) from my_table where date >= trunc(sysdate) - 30 and date  'ownname', 
        tabname=> 'tabname' , 
        estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE,  
        cascade=> DBMS_STATS.AUTO_CASCADE, 
        degree=> 4,  
        no_invalidate=> DBMS_STATS.AUTO_INVALIDATE, 
        granularity=> 'AUTO', 
        method_opt=> 'FOR ALL COLUMNS SIZE AUTO'
    );
    
    -- Manual
    DBMS_STATS.GATHER_TABLE_STATS 
    (
        ownname => '"ownname"',
        tabname => '"tabname"',
        partname => '"partname"',
        method_opt => 'FOR COLUMNS DATE SIZE 254',
        estimate_percent => 1
    );

Other partitioned tables are ok.

The differences between this table and others are (as we know):

 1. There were wrong inserts in this table. Most dates are between 2014 and 2023, but there are some rows with 1970 and 2024 (we can't change it). Also there is empty partition with 2045.
We tried recreating this but didn't get same behaviour.
 2. We messed with histograms, removed some automatically created and manually created some useful function-based. But in USER_TAB_COL_STATISTICS and USER_TAB_HISTOGRAMS histograms for DATE column were present.

What can cause such behaviour? How can we fix it?
                                

Andy DB Analyst (110 rep)

Oct 24, 2023, 08:38 AM • Last activity: Oct 25, 2023, 01:50 AM

8 votes

1 answers

1569 views

Inserting with implicit type conversion causes warning for cardinality estimates

sql-server t-sql execution-plan cardinality-estimates

I noticed this while doing some performance testing recently. When I insert a value into a column that will require an implicit conversion (e.g. `bigint` into `nvarchar`), I get a warning: > Type conversion in expression `(CONVERT_IMPLICIT(nvarchar(50),[tempdb].[dbo].[#MyFunIntTable].[EvenCoolerColu...

                                  I noticed this while doing some performance testing recently. When I insert a value into a column that will require an implicit conversion (e.g. bigint into nvarchar), I get a warning: 

> Type conversion in expression (CONVERT_IMPLICIT(nvarchar(50),[tempdb].[dbo].[#MyFunIntTable].[EvenCoolerColumn],0)) may affect "Cardinality Estimate" in query plan choice.

Being a concerned citizen, I checked all of the obvious suspects and eventually dug into the XML to confirm that it was actually warning about the insert into the table. The problem is, I can't figure out why this would ever affect cardinality estimates. If I were doing this in a join or somewhere with a little more logic it would make sense, but there shouldn't be a cardinality estimate mismatch for the actual insert operation, right?

I noticed that this happened when it was more than just a trivial query - as soon as more than one value is inserted, or we're pulling a value from a table, we hit this.

This question has attracted some potential duplicates, including:

  - https://dba.stackexchange.com/q/226610/69545 
  - https://dba.stackexchange.com/q/36097/69545 

I think it is different from these questions because I'm literally not doing anything with this column. I'm not using it in a filter, or a sort, or a grouping, or a join, or in a function - any of these things makes the scenario more complicated. All I'm doing is inserting a bigint into a nvarchar, which should never impact a meaningful cardinality estimate that I can think of.

What I'm specifically looking for out of an answer is:

  1. An explanation of why I get this warning despite nothing of interest going on - is it just that SQL Server will be conservative and report even when it won't affect plan choice?
  2. What cardinality estimate is actually at risk here, and what operation would change based off of inaccuracies in that cardinality estimate?
  3. Is there a scenario where this could affect plan choice? Obviously if I start joining or filtering on the converted column it could, but as-is?
  4. Is there anything that can be done to prevent it from warning, besides changing data types (assume this is a requirement of how the data models interact)

I recreated it with the below simple example (paste the plan )

    DROP TABLE IF EXISTS #MyFunStringTable;
    DROP TABLE IF EXISTS #MyFunIntTable;
    
    CREATE TABLE #MyFunStringTable
    (
      SuperCoolColumn nvarchar(50) COLLATE DATABASE_DEFAULT NULL
    );
    
    CREATE TABLE #MyFunIntTable
    (
      EvenCoolerColumn bigint NULL
    );
    
    INSERT INTO #MyFunIntTable
    ( EvenCoolerColumn )
    VALUES
    ( 1 ),
    ( 2 ),
    ( 3 ),
    ( 4 ),
    ( 5 );
    
    INSERT INTO #MyFunStringTable
    ( SuperCoolColumn )
      SELECT EvenCoolerColumn
        FROM #MyFunIntTable;
    
    INSERT INTO #MyFunStringTable
    ( SuperCoolColumn )
    VALUES
    ( 1 );
    
    INSERT INTO #MyFunStringTable
    ( SuperCoolColumn )
    VALUES
    ( 1 ),
    ( 2 );
    
    INSERT INTO #MyFunStringTable
    ( SuperCoolColumn )
      SELECT 1;
    
    INSERT INTO #MyFunStringTable
    ( SuperCoolColumn )
    SELECT 1
    UNION ALL
    SELECT 2;
    
    INSERT INTO #MyFunStringTable
    ( SuperCoolColumn )
      SELECT 1
        FROM #MyFunIntTable;
                                

Dan Oberlam (183 rep)

Aug 30, 2019, 07:58 PM • Last activity: Oct 20, 2023, 06:18 AM

0 votes

1 answers

315 views

Cardinality miscalculations leads to ridiculous execution plans

oracle optimization oracle-19c cardinality-estimates

I'm not talking about stale statistics, or just simply "bad"/"non-optimal" plans. We have a lot of complicated queries running in our database. Normally all works as expected. But from time to time we have cases when optimizer miscalculates cardinalities and chooses ridiculuous execution plans. The...

                                  I'm not talking about stale statistics, or just simply "bad"/"non-optimal" plans.

We have a lot of complicated queries running in our database. Normally all works as expected. But from time to time we have cases when optimizer miscalculates cardinalities and chooses ridiculuous execution plans. The worst cases are when optimizer evaluates subquery to have 1 row. Then we have plans with:

1) Wrong join order with MERGE JOIN CARTESIAN and thousands of rows tables/subqueries. 
Optimizer for some reasons chooses plans like "SELECT * FROM TAB1... (CROSS) JOIN TAB3... JOIN TAB2 ON TAB1.COL11 = TAB3.COL31 AND TAB2.COL21 = TAB3.COL32" instead of joining "TAB1 JOIN TAB2 JOIN TAB3" as expected.
With LEADING/ORDERED hints it starts working properly

2) For no reason using MERGE JOIN CARTESIAN or NESTED LOOP without indexes instead of HASH JOIN. USE_HASH or CARDINALITY hints solve this problem

3) Using VIEW PUSHED PREDICATE with FULL TABLE SCAN. This leads to scanning small tables thousands of times. It takes additional minutes/hours for queries to execute. CARDINALITY/MATERIALIZE/NO_PUSH_PRED hints solve this problem

The question is: 

Is there a way to globally force the optimizer not to use CARTESIAN/VIEW PUSHED PREDICATE if there is no guarantee of 1 row result? Or at least decrease a probability of using it?
Like when we had problems of indexes overusing while doing analysis, "ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ = 200" partially solved it.

Andy DB Analyst (110 rep)

Jun 8, 2023, 08:42 AM • Last activity: Jun 8, 2023, 09:19 AM

0 votes

1 answers

175 views

SQL Server 2014 Cardinality Estimator estimate final number of rows after OUTER JOIN is less than number of rows from initial table

sql-server join cardinality-estimates recompile

I have a [SQL query] for SQL Server 2019. It works fine with option (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')) and very bad without this hint. I found out, that SQL Server 2014 Cardinality Estimator estimate final number of rows after OUTER JOIN **is less than number of rows from initial tab...

                                  I have a [SQL query] for SQL Server 2019.

It works fine with option

(USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'))
and very bad without this hint.

I found out, that SQL Server 2014 Cardinality Estimator estimate final number of rows after OUTER JOIN **is less than number of rows from initial table** without any WHERE predicates and **with option (recompile)**.

New CE Plan 

Legacy Cardinality Estimator estimate final number of rows after OUTER JOIN is equal or more number of rows from initial table (correct).

Old CE Plan 

Is this a bug of SQL Server 2014 Cardinality Estimator or I do something wrong?

Андрей Ерёмин (1 rep)

May 22, 2023, 08:05 AM • Last activity: May 31, 2023, 04:46 AM

3 votes

1 answers

364 views

How does SQL Server estimate cardinality on nested loops index seek

sql-server execution-plan sql-server-2019 cardinality-estimates

I am trying to understand how SQL Server estimates cardinality on the below Stack Overflow database query Firstly, I create the index CREATE INDEX IX_PostId ON dbo.Comments ( PostId ) INCLUDE ( [Text] ) And here is the query: SELECT u.DisplayName, c.PostId, c.Text FROM Users u JOIN Comments c ON u.R...

                                  I am trying to understand how SQL Server estimates cardinality on the below Stack Overflow database query 

Firstly, I create the index

    CREATE INDEX IX_PostId ON dbo.Comments
    (
    	PostId
    )
    INCLUDE
    (
    	[Text]
    )

And here is the query:

    SELECT	u.DisplayName,
    		c.PostId,
    		c.Text
    FROM	Users u
    		JOIN Comments c
    			ON u.Reputation = c.PostId
    WHERE	u.AccountId = 22547

The execution plan is here 

First of all, SQL Server scans the Clustered index on the users table to return the users that match the AccountId predicate. I can see that it uses this statistic: _WA_Sys_0000000E_09DE7BCC

I can see that this user doesn't have a range high key, so SQL Server uses the avg_range rows and estimates 1 

The seek predicate on the comments index seek is

so Scalar Operator([StackOverflow2010].[dbo].[Users].[Reputation] as [u].[Reputation] represents the reputation value of the User(s) in the users table with the accountId of 22547

I can see three stats loaded in total :

_WA_Sys_0000000E_09DE7BCC - Users.AccountId
(Used to estimate the Clustered index seek predicate)

IX_PostId - Comments.PostId
(Used to estimate the Index seek predicate)

_WA_Sys_0000000A_09DE7BCC - Users.Reputation
(?)

how does SQL server come up with the estimate on the index seek? It cannot know the reputation of accountId 22547 at compile time as the Account ID stat does not show that, so it cannot perform a lookup on the histogram for IX_PostId. I can see that the reputation stat is also loaded so does it use both somehow?

This query was run against CE 150

SE1986 (2182 rep)

May 4, 2023, 03:40 PM • Last activity: May 16, 2023, 10:13 AM

Showing page 1 of 20 total questions