Sample Header Ad - 728x90

Sparse columns, cpu time & filtered indexes

11 votes
1 answer
652 views
Sparsing -------- When doing some tests on sparse columns, as you do, there was a performance setback that I would like to know the direct cause of. **DDL** I created two identical tables, one with 4 sparse columns and one with no sparse columns. --Non Sparse columns table & NC index CREATE TABLE dbo.nonsparse( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL, charval char(20) NULL, varcharval varchar(20) NULL, intval int NULL, bigintval bigint NULL ); CREATE INDEX IX_Nonsparse_intval_varcharval ON dbo.nonsparse(intval,varcharval) INCLUDE(bigintval,charval); -- sparse columns table & NC index CREATE TABLE dbo.sparse( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL, charval char(20) SPARSE NULL , varcharval varchar(20) SPARSE NULL, intval int SPARSE NULL, bigintval bigint SPARSE NULL ); CREATE INDEX IX_sparse_intval_varcharval ON dbo.sparse(intval,varcharval) INCLUDE(bigintval,charval); ---------- **DML** I then inserted about **2540 NON-NULL** values into both. INSERT INTO dbo.nonsparse WITH(TABLOCK) (charval, varcharval,intval,bigintval) SELECT 'Val1','Val2',20,19 FROM MASTER..spt_values; INSERT INTO dbo.sparse WITH(TABLOCK) (charval, varcharval,intval,bigintval) SELECT 'Val1','Val2',20,19 FROM MASTER..spt_values; Afterwards, I inserted **1M NULL** values into both tables INSERT INTO dbo.nonsparse WITH(TABLOCK) (charval, varcharval,intval,bigintval) SELECT TOP(1000000) NULL,NULL,NULL,NULL FROM MASTER..spt_values spt1 CROSS APPLY MASTER..spt_values spt2; INSERT INTO dbo.sparse WITH(TABLOCK) (charval, varcharval,intval,bigintval) SELECT TOP(1000000) NULL,NULL,NULL,NULL FROM MASTER..spt_values spt1 CROSS APPLY MASTER..spt_values spt2; ---------- **Queries** *Nonsparse table execution* When running this query twice on the newly created nonsparse table: SET STATISTICS IO, TIME ON; SELECT * FROM dbo.nonsparse WHERE 1= (SELECT 1) -- force non trivial plan OPTION(RECOMPILE,MAXDOP 1); The logical reads show **5257** pages (1002540 rows affected) Table 'nonsparse'. Scan count 1, logical reads 5257, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. And the cpu time is at **343 ms** SQL Server Execution Times: CPU time = 343 ms, elapsed time = 3850 ms. ---------- *sparse table execution* Running the same query twice on the sparse table: SELECT * FROM dbo.sparse WHERE 1= (SELECT 1) -- force non trivial plan OPTION(RECOMPILE,MAXDOP 1); The reads are lower, **1763** (1002540 rows affected) Table 'sparse'. Scan count 1, logical reads 1763, physical reads 3, read-ahead reads 1759, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. But the cpu time is higher, **547 ms**. SQL Server Execution Times: CPU time = 547 ms, elapsed time = 2406 ms. Sparse table execution plan non sparse table execution plan ---------- **Questions** **Original question** Since the **NULL** values are not stored directly in the sparse columns, could the increase in cpu time be due to returning the **NULL** values as a resultset? Or is it simply the behaviour as noted in the documentation ? > Sparse columns reduce the space requirements for null values at the > cost of more overhead to retrieve nonnull values Or is the overhead only related to reads & storage used? Even when running ssms with the discard results after execution option, the cpu time of the sparse select was higher (407 ms) in comparison to the non sparse (219 ms). **EDIT** It might have been the overhead of the non null values, even if there are only 2540 present, but I am still not convinced. This seems to be about the same performance, but the sparse factor was lost. CREATE INDEX IX_Filtered ON dbo.sparse(charval,varcharval,intval,bigintval) WHERE charval IS NULL AND varcharval IS NULL AND intval IS NULL AND bigintval IS NULL; CREATE INDEX IX_Filtered ON dbo.nonsparse(charval,varcharval,intval,bigintval) WHERE charval IS NULL AND varcharval IS NULL AND intval IS NULL AND bigintval IS NULL; SET STATISTICS IO, TIME ON; SELECT charval,varcharval,intval,bigintval FROM dbo.sparse WITH(INDEX(IX_Filtered)) WHERE charval IS NULL AND varcharval IS NULL AND intval IS NULL AND bigintval IS NULL OPTION(RECOMPILE,MAXDOP 1); SELECT charval,varcharval,intval,bigintval FROM dbo.nonsparse WITH(INDEX(IX_Filtered)) WHERE charval IS NULL AND varcharval IS NULL AND intval IS NULL AND bigintval IS NULL OPTION(RECOMPILE,MAXDOP 1); Seems to have about the same execution time: SQL Server Execution Times: CPU time = 297 ms, elapsed time = 292 ms. SQL Server Execution Times: CPU time = 281 ms, elapsed time = 319 ms. **But** why are the logical reads the same amount now? Shouldn't the filtered index for the sparse column not store anything except the included ID field and some other non-data pages? Table 'sparse'. Scan count 1, logical reads 5785, Table 'nonsparse'. Scan count 1, logical reads 5785 And the size of both indices: RowCounts Used_MB Unused_MB Total_MB 1000000 45.20 0.06 45.26 Why are these the same size? Was the sparse-ness lost? Both query plans when using the filtered index ---------- **Extra Info** select @@version > Microsoft SQL Server 2017 (RTM-CU16) (KB4508218) - > 14.0.3223.3 (X64) Jul 12 2019 17:43:08 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Windows Server > 2012 R2 Datacenter 6.3 (Build 9600: ) (Hypervisor) While running the queries and only selecting the *ID* field, the cpu time is comparable, with lower logical reads for the sparse table. Size of the tables SchemaName TableName RowCounts Used_MB Unused_MB Total_MB dbo nonsparse 1002540 89.54 0.10 89.64 dbo sparse 1002540 27.95 0.20 28.14 When forcing either the clustered or nonclustered index, the cpu time difference remains.
Asked by Randi Vertongen (16603 rep)
Sep 19, 2019, 02:08 PM
Last activity: Sep 19, 2019, 06:56 PM