Sparse columns, cpu time & filtered indexes

11 votes
1 answer
652 views
sql-server sql-server-2017 sparse-column
                          Sparsing
--------

When doing some tests on sparse columns, as you do, there was a performance setback that I would like to know the direct cause of.

**DDL**

I created two identical tables, one with 4 sparse columns and one with no sparse columns.

    --Non Sparse columns table & NC index
    CREATE TABLE dbo.nonsparse( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
    					  charval char(20) NULL,
    					  varcharval varchar(20) NULL,
    					  intval int NULL,
    					  bigintval bigint NULL
    					  );
    CREATE INDEX IX_Nonsparse_intval_varcharval
    ON dbo.nonsparse(intval,varcharval)
    INCLUDE(bigintval,charval);
    
    -- sparse columns table & NC index
    
    CREATE TABLE dbo.sparse( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
    					  charval char(20) SPARSE NULL ,
    					  varcharval varchar(20) SPARSE NULL,
    					  intval int SPARSE NULL,
    					  bigintval bigint SPARSE NULL
    					  );
    
    CREATE INDEX IX_sparse_intval_varcharval
    ON dbo.sparse(intval,varcharval)
    INCLUDE(bigintval,charval);

----------

**DML**

I then inserted about **2540 NON-NULL** values into both.

    INSERT INTO dbo.nonsparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
    SELECT 'Val1','Val2',20,19
    FROM MASTER..spt_values;
    
    INSERT INTO dbo.sparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
    SELECT 'Val1','Val2',20,19
    FROM MASTER..spt_values;
    
Afterwards, I inserted **1M NULL** values into both tables

    INSERT INTO dbo.nonsparse WITH(TABLOCK)  (charval, varcharval,intval,bigintval)
    SELECT TOP(1000000) NULL,NULL,NULL,NULL 
    FROM MASTER..spt_values spt1
    CROSS APPLY MASTER..spt_values spt2;
    
    INSERT INTO dbo.sparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
    SELECT TOP(1000000) NULL,NULL,NULL,NULL 
    FROM MASTER..spt_values spt1
    CROSS APPLY MASTER..spt_values spt2;

----------

**Queries**

*Nonsparse table execution*

When running this query twice on the newly created nonsparse table:

    SET STATISTICS IO, TIME ON;
    SELECT  * FROM dbo.nonsparse
    WHERE   1= (SELECT 1) -- force non trivial plan
    OPTION(RECOMPILE,MAXDOP 1);
    
The logical reads show **5257** pages

    (1002540 rows affected)
    Table 'nonsparse'. Scan count 1, logical reads 5257, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And the cpu time is at **343 ms**

     SQL Server Execution Times:
       CPU time = 343 ms,  elapsed time = 3850 ms.

----------

*sparse table execution*

Running the same query twice on the sparse table:

    SELECT  * FROM dbo.sparse
    WHERE   1= (SELECT 1) -- force non trivial plan
    OPTION(RECOMPILE,MAXDOP 1);

The reads are lower, **1763**

    (1002540 rows affected)
    Table 'sparse'. Scan count 1, logical reads 1763, physical reads 3, read-ahead reads 1759, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

But the cpu time is higher, **547 ms**.

     SQL Server Execution Times:
       CPU time = 547 ms,  elapsed time = 2406 ms.
    
Sparse table execution plan 

non sparse table execution plan 

----------

**Questions**

**Original question**

Since the **NULL** values are not stored directly in the sparse columns, could the increase in cpu time be due to returning the **NULL** values as a resultset? Or is it simply the  behaviour as noted in the documentation ?

> Sparse columns reduce the space requirements for null values at the
> cost of more overhead to retrieve nonnull values

Or is the overhead only related to reads & storage used?

Even when running ssms with the discard results after execution option, the cpu time of the sparse select was higher (407 ms) in comparison to the non sparse (219 ms).

**EDIT**

It might have been the overhead of the non null values, even if there are only 2540 present, but I am still not convinced.

This seems to be about the same performance, but the sparse factor was lost.

    CREATE INDEX IX_Filtered
    ON dbo.sparse(charval,varcharval,intval,bigintval)
    WHERE charval IS NULL  
    	  AND varcharval IS NULL
    	  AND intval  IS NULL
    	  AND bigintval  IS NULL;
    
    CREATE INDEX IX_Filtered
    ON dbo.nonsparse(charval,varcharval,intval,bigintval)
    WHERE charval IS NULL  
    	  AND varcharval IS NULL
    	  AND intval  IS NULL
    	  AND bigintval  IS NULL;
    
        SET STATISTICS IO, TIME ON;
    
    SELECT  charval,varcharval,intval,bigintval FROM dbo.sparse WITH(INDEX(IX_Filtered))
    WHERE charval IS NULL AND  varcharval IS NULL
    					 AND intval  IS NULL
    					 AND bigintval  IS NULL
    					 OPTION(RECOMPILE,MAXDOP 1);
    
    SELECT  charval,varcharval,intval,bigintval 
    FROM dbo.nonsparse WITH(INDEX(IX_Filtered))
    WHERE charval IS NULL AND 
    					  varcharval IS NULL
    					 AND intval  IS NULL
    					 AND bigintval  IS NULL
    					 OPTION(RECOMPILE,MAXDOP 1);

Seems to have about the same execution time:

     SQL Server Execution Times:
       CPU time = 297 ms,  elapsed time = 292 ms.

     SQL Server Execution Times:
       CPU time = 281 ms,  elapsed time = 319 ms.

**But** why are the logical reads the same amount now? Shouldn't the filtered index for the sparse column not store anything except the included ID field and some other non-data pages?

    Table 'sparse'. Scan count 1, logical reads 5785,
    Table 'nonsparse'. Scan count 1, logical reads 5785

And the size of both indices:

    RowCounts	Used_MB	Unused_MB	Total_MB
    1000000	    45.20	0.06	    45.26

Why are these the same size? Was the sparse-ness lost?

Both query plans when using the filtered index 

----------

**Extra Info**

    select @@version

> Microsoft SQL Server 2017 (RTM-CU16) (KB4508218) -
> 14.0.3223.3 (X64)   Jul 12 2019 17:43:08   Copyright (C) 2017 Microsoft Corporation  Developer Edition (64-bit) on Windows Server
> 2012 R2 Datacenter 6.3  (Build 9600: ) (Hypervisor)

While running the queries and only selecting the *ID* field, the cpu time is comparable, with lower logical reads for the sparse table.

Size of the tables

    SchemaName	TableName	RowCounts	Used_MB	Unused_MB	Total_MB
    dbo	        nonsparse	1002540	    89.54	0.10	    89.64
    dbo	        sparse	    1002540	    27.95	0.20	    28.14

When forcing either the clustered or nonclustered index, the cpu time difference remains.
Asked by Randi Vertongen (16603 rep)
Sep 19, 2019, 02:08 PM
Last activity: Sep 19, 2019, 06:56 PM
Sparse columns, cpu time & filtered indexes

Related Questions