Why does SQL Server sometimes estimate that joining onto an empty table will increase the row count?
6
votes
1
answer
589
views
I recently came across an issue where
There was one specific table involved in a lot of these joins and inserting a single row to that table was enough to stop this explosion for the joins that table was involved in (cardinality estimator output indicates that it is now using the stats histogram from that table)
The original behaviour seems weird to me. SQL Server knows that the table it is joining onto is empty and the plan caching white paper indicates that inserting any row to an empty table will cause the recompilation threshold to be reached so is there any good reason for it?
Repro showing the estimated row count growth (though without long compile times)
CREATE TABLE T1(C1 INT);
INSERT INTO T1 VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9);
CREATE TABLE T2(C1 INT, C2 VARCHAR(MAX));
SELECT *
FROM T1 LEFT OUTER JOIN T2 ON T1.C1 = T2.C1
LEFT OUTER JOIN T2 T3 ON T3.C1 = T2.C1
LEFT OUTER JOIN T2 T4 ON T4.C1 = T2.C1
LEFT OUTER JOIN T2 T5 ON T5.C1 = T2.C1
LEFT OUTER JOIN T2 T6 ON T6.C1 = T2.C1
LEFT OUTER JOIN T2 T7 ON T7.C1 = T2.C1
LEFT OUTER JOIN T2 T8 ON T8.C1 = T2.C1
LEFT OUTER JOIN T2 T9 ON T9.C1 = T2.C1
LEFT OUTER JOIN T2 T10 ON T10.C1 = T2.C1
LEFT OUTER JOIN T2 T11 ON T11.C1 = T2.C1
LEFT OUTER JOIN T2 T12 ON T12.C1 = T2.C1
LEFT OUTER JOIN T2 T13 ON T13.C1 = T2.C1
LEFT OUTER JOIN T2 T14 ON T14.C1 = T2.C1
LEFT OUTER JOIN T2 T15 ON T15.C1 = T2.C1
LEFT OUTER JOIN T2 T16 ON T16.C1 = T2.C1
LEFT OUTER JOIN T2 T17 ON T17.C1 = T2.C1
LEFT OUTER JOIN T2 T18 ON T18.C1 = T2.C1
LEFT OUTER JOIN T2 T19 ON T19.C1 = T2.C1
LEFT OUTER JOIN T2 T20 ON T20.C1 = T2.C1
LEFT OUTER JOIN T2 T21 ON T21.C1 = T2.C1
LEFT OUTER JOIN T2 T22 ON T22.C1 = T2.C1
LEFT OUTER JOIN T2 T23 ON T23.C1 = T2.C1
LEFT OUTER JOIN T2 T24 ON T24.C1 = T2.C1
LEFT OUTER JOIN T2 T25 ON T25.C1 = T2.C1
LEFT OUTER JOIN T2 T26 ON T26.C1 = T2.C1
LEFT OUTER JOIN T2 T27 ON T27.C1 = T2.C1
LEFT OUTER JOIN T2 T28 ON T28.C1 = T2.C1
LEFT OUTER JOIN T2 T29 ON T29.C1 = T2.C1
LEFT OUTER JOIN T2 T30 ON T30.C1 = T2.C1
LEFT OUTER JOIN T2 T31 ON T31.C1 = T2.C1
LEFT OUTER JOIN T2 T32 ON T32.C1 = T2.C1
LEFT OUTER JOIN T2 T33 ON T33.C1 = T2.C1
LEFT OUTER JOIN T2 T34 ON T34.C1 = T2.C1
LEFT OUTER JOIN T2 T35 ON T35.C1 = T2.C1
LEFT OUTER JOIN T2 T36 ON T36.C1 = T2.C1
LEFT OUTER JOIN T2 T37 ON T37.C1 = T2.C1
LEFT OUTER JOIN T2 T38 ON T38.C1 = T2.C1
LEFT OUTER JOIN T2 T39 ON T39.C1 = T2.C1
tSQLt
tests were taking a long time to run.
The procedure under test was doing a 38 table (!) join (with 37 faked tables and a table valued parameter).
Only two of the faked tables and the TVP had any rows inserted
Compilation times were extremely slow.
Trace flag 8675 showed
End of simplification, time: 0.002 net: 0.002 total: 0 net: 0.002
end exploration, tasks: 549 no total cost time: 0.013 net: 0.013 total: 0 net: 0.015
end search(0), cost: 13372.9 tasks: 3517 time: 0.012 net: 0.012 total: 0 net: 0.028
end exploration, tasks: 3983 Cost = 13372.9 time: 0 net: 0 total: 0 net: 0.028
end search(1), cost: 6706.79 tasks: 10187 time: 0.024 net: 0.024 total: 0 net: 0.052
end exploration, tasks: 10188 Cost = 6706.79 time: 0 net: 0 total: 0 net: 0.052
end search(1), cost: 6706.79 tasks: 61768 time: 0.165 net: 0.165 total: 0 net: 0.218
*** Optimizer time out abort at task 614400 ***
end search(2), cost: 6706.79 tasks: 614400 time: 12.539 net: 12.539 total: 12 net: 12.758
*** Optimizer time out abort at task 614400 ***
End of post optimization rewrite, time: 0.001 net: 0.001 total: 12 net: 12.759
End of query plan compilation, time: 0.003 net: 0.003 total: 12 net: 12.762
SQL Server parse and compile time:
CPU time = 12735 ms, elapsed time = 12770 ms.
It looks like the estimated rows grow exponentially for each join between empty tables until at the end the estimated row count was 135,601,000 and the query had a huge estimated cost justifying longer compile time.



Asked by Martin Smith
(88051 rep)
Jan 7, 2023, 02:05 PM
Last activity: Jan 7, 2023, 03:20 PM
Last activity: Jan 7, 2023, 03:20 PM