SQL Server Primary Key Column Statistics Histogram Suggests Duplicate Values
7
votes
1
answer
239
views
I have a statistic on a Primary Key column in a table. When I update the statistic with the default options:
UPDATE STATISTICS dbo.MyTable PK__MyTable__CB394B3946083350
I get a histogram as follows (abridged)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
3400002201 0 1 0 1
3400009992 18103.04 1 7790 2.323882
3400040033 26083.68 1 26080 1.000144
3400050456 13029.09 1 10422 1.250153
3400087676 26083.68 1 26080 1.000144
3400103858 19556.38 1 16181 1.208602
3400126866 13029.09 1 13029 1
3400162832 39138.27 1 35965 1.088232
3400213115 45665.56 1 45641 1.000547
3400238444 26083.68 1 25328 1.029836
3400242626 13029.09 1 4181 3.116262
3400262174 19556.38 1 19547 1.00048
3400283983 26083.68 1 21808 1.19606
3400304837 19556.38 1 19556 1
3400316046 13029.09 1 11208 1.162481
3400346666 13029.09 1 13029 1
3400368443 19556.38 1 19556 1
3400385634 26083.68 1 17190 1.517375
3400390548 13029.09 1 4913 2.651962
3400398297 13029.09 1 7748 1.681607
3400417467 13029.09 1 13029 1
3400428728 13029.09 1 11260 1.157113
3400462206 32610.97 1 32600 1.000332
3400477978 13029.09 1 13029 1
3400492969 19556.38 1 14990 1.304629
3400507579 13029.09 1 13029 1
3400529627 32610.97 1 22047 1.479157
3400535909 13029.09 1 6281 2.074366
3400556632 26083.68 1 20722 1.258743
3400576037 19556.38 1 19404 1.007853
3400588565 19556.38 1 12527 1.561139
3400630507 39138.27 1 39120 1.000457
3400655236 19556.38 1 19556 1
3400670940 19556.38 1 15703 1.245392
3400691760 19556.38 1 19556 1
3400701959 19556.38 1 10198 1.917668
3400718913 19556.38 1 16953 1.153565
3400745176 19556.38 1 19556 1
If we look at the Hi Key 3400009992, the histogram tells us:
There is one row equal to this value
There are 18,103 rows where the value > 3400002201 and < 3400009992, however, of these 18,103, only 7,790 are distinct.
How can this be? A primary key must be unique
If I update the statistic with FULLSCAN, I get the histogram below (complete) which seems to accurately represent the data
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
----------------------------------------------------------------------------------------------------------------------------------------------------------------
3400000000 0 1 0 1
3401474769 1474758 1 1474758 1
600004687218 16383 1 16383 1
600005089447 65535 1 65535 1
600005665352 98303 1 98303 1
600006294532 81919 1 81919 1
600008729190 294911 1 294911 1
600012125564 425983 1 425983 1
600014952842 376831 1 376831 1
600017609236 344063 1 344063 1
600017776836 24575 1 24575 1
600022385710 598015 1 598015 1
600022698873 38234 1 38234 1
600022698878 0 1 0 1
Why doesn't SQL server's sampled histogram represent the uniqueness of the primary key?
Asked by SE1986
(2182 rep)
Mar 21, 2023, 05:20 PM
Last activity: Mar 23, 2023, 04:34 PM
Last activity: Mar 23, 2023, 04:34 PM