Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

2 answers

94 views

Given that they require schema modification locks, why are sparse columns considered an alternative to EAV?

sql-server sql-server-2016 locking eav sparse-column

_Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance_ is a very reliable source of good database design ideas. As an alternative to Entity-Attribute-Value it suggests using dynamic SQL to allow your users to add new sparse columns to a table....

                                  _Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance_ is a very reliable source of good database design ideas. As an alternative to Entity-Attribute-Value it suggests using dynamic SQL to allow your users to add new sparse columns to a table.

To me, this idea stinks. Adding new columns requires a schema modification lock. This is a very serious lock and I would rather my users not have the ability to obtain it.

Is there any property of sparse columns that makes allowing users to take such a serious lock not as terrible as it would be for other types of column?

J. Mini (1237 rep)

Oct 19, 2024, 11:34 PM • Last activity: Oct 21, 2024, 12:32 PM

1 votes

1 answers

109 views

Database Schema design - Sparse matrix; when does it make sense?

database-design sparse-column

I was exploring the different reasoning for schema designs. I came across the following scenario. Scenario 1 has a data schema that has a bunch of columns, but the data is sparse mostly nulls (for whatever reason). Scenario 2 is when the data is more pivoted (few columns) but a bunch more rows. I co...

                                  I was exploring the different reasoning for schema designs. I came across the following scenario. Scenario 1 has a data schema that has a bunch of columns, but the data is sparse mostly nulls (for whatever reason). Scenario 2 is when the data is more pivoted (few columns) but a bunch more rows. I couldn't figure out a general use-case rule for using these scenarios. Does it make sense for constant changing schemas with new categories for scenario 2? What scenario would make scenario 1 valid? There is so many nulls, but this exists quite a bit.

Scenario 1: 

    Product	Quantity Number_of_Holes number of legs
    Table	0	     null	         4
    Chair	1	     null	         4
    Glass	1	     1	             null


Scenario 2:

    Product	Category	    Value
    Table	Quantity	    0
    Chair	Quantity	    1
    Glass	Quantity	    1
    Table	Number of Holes	null
    Chair	Number of Holes	null
    Glass	Number of Holes	1
    Table	number of legs	4
    Chair	number of legs	4
    Glass	number of legs	null

Purely a theoretical question. This is something I've encountered in my work but no guidance on reasoning. 
                                

Raymond Hoang

Jun 12, 2023, 09:46 AM • Last activity: Sep 7, 2023, 07:06 PM

3 votes

1 answers

747 views

Row greater than max size (8060) but I can't see why

sql-server disk-space storage sparse-column

I have a table that has 483 columns. A `date not null`, a `char(6) not null`, and then 481 x `float sparse null`. By my maths, even if every sparse float is non-null, the row should still only be: ```lang-none date: 3 bytes char(6): 6 bytes sparse float: 12 bytes x 481 sparse update overhead: 2 byte...

I have a table that has 483 columns. A date not null, a char(6) not null, and then 481 x float sparse null. By my maths, even if every sparse float is non-null, the row should still only be:

-none
date: 3 bytes
char(6): 6 bytes
sparse float: 12 bytes x 481
sparse update overhead: 2 bytes x 483
= 6747 bytes and way below the limit of 8060 
    (even allowing lots for other random overheads)

In fact, the real-life row causing the issue still has 133 of the float columns as nulls and SQL Server claims this row has a row size of 8108 (and is too big). What is the ghost that is taking up my row space?

tim654321 (133 rep)

Aug 30, 2023, 12:40 PM • Last activity: Sep 4, 2023, 10:12 AM

2 votes

1 answers

280 views

SQL Server JOIN vs sparse table

sql-server performance best-practices sparse-column cross-join

I am in the need to query a relation between an entity called article (it's a product sold in a shop) and it's characterstics (a set of attributes). The result will always presented in a paged manner (so only few articles record are read each time). The goal of the query is to fetch a page of articl...

                                  I am in the need to query a relation between an entity called article (it's a product sold in a shop) and it's characterstics (a set of attributes). The result will always presented in a paged manner (so only few articles record are read each time).

The goal of the query is to fetch a page of articles that have a set of characteristics (a few dozen per query).

My first idea is to use a join among those two entities, I already have a table representing articles, a table representing characteristics, and a cross-join table that map the relations of the previous two.

Problem is that articles may be many(up to 1 million), while characteristics may be quite numerous (a few hundred). That does not seem a big deal, but I am afraid because distribution is not uniform, for example, may happen that some characteristics are common to almost all articles (while other may be barely used). Another aspect is that both article and characteristics tables have few json fields that may be quite big (a few KB of data each).

I am still in the design phase, but I am asking to myself if a join is the best approach in this scenario or if it's cheaper to build an auxiliary "index table" (with only a searchable field, so without json field) to use for the query. 

I mean programmatically build/rebuild an auxiliary table, do one query over that table to fetch the article IDs of items in the "current page" and then do another query over article tables (limited to that page for that pool of IDs).

To achieve that I thought about sparse table, I mean a sparse table could have a few hundred columns, so maybe I can create/recreate (periodically) such a search table in which I have one row for each article, and one column for each characteristic (each characteristic column is just a bit, and I have application constraint so I am guaranteed that the characteristics are less than one thousand).

Given a specific convention (eg: characteristic with ID 345 will become column 'C345' in the index table) I can create dynamically a query over such a sparse table. In that way, I have a guarantee that this "index table" will have only 1 million records in the worst-case scenario, while the join may have hundred of millions if I am particularly unlucky.

That said, I have never used sparse tables like that before, and I am not sure they perform well, so I would ask if this is a good idea accordingly to your experience.

Here is the schema of tables I already have (articles, characteristics, and the cross-join table) :

    /****** Object:  Table [dbo].[tbl_ana_Articles]    Script Date: 1/30/2023 6:37:55 PM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[tbl_ana_Articles](
    	[ID] [int] IDENTITY(1,1) NOT NULL,
    	[CODE1] [nvarchar](10) NOT NULL,	--optional searchable codes
    	[CODE2] [nvarchar](10) NOT NULL,	--optional searchable codes
    	[CODE3] [nvarchar](10) NOT NULL,	--optional searchable codes
    	[CODE4] [nvarchar](10) NOT NULL,	--optional searchable codes
    	[_JSON1] [nvarchar](max) NOT NULL,  --json fields, quite big (few KB each) not searchable
    	[_JSON2] [nvarchar](max) NOT NULL,  --json fields, quite big (few KB each) not searchable
    	[_JSON3] [nvarchar](max) NOT NULL,	--json fields, quite big (few KB each) not searchable
    
    
     CONSTRAINT [PK_tbl_ana_articles] PRIMARY KEY CLUSTERED 
    (
    	[ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO
    /****** Object:  Table [dbo].[tbl_ana_Characteristics]    Script Date: 1/30/2023 6:37:55 PM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[tbl_ana_Characteristics](
    	[ID] [int] IDENTITY(1,1) NOT NULL,
    	[ID_FILTER] [int] NOT NULL,			--characteristics are grouped by "kind" eg : the kind  "size" may contain the set of characteristic {"big","medium","small"} this ID keep track of the one to many association (not relevant in this scenario)
    	[CODE1]  [nvarchar](10) NOT NULL,	--optional searchable codes
    	[CODE2]  [nvarchar](10) NOT NULL,	--optional searchable codes
    	[_JSON1] [nvarchar](max) NOT NULL,  --json fields, quite big (few KB each) not searchable
    	[_JSON2] [nvarchar](max) NOT NULL,  --json fields, quite big (few KB each) not searchable
     CONSTRAINT [PK_tbl_ana_characteristics] PRIMARY KEY CLUSTERED 
    (
    	[ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO
    /****** Object:  Table [dbo].[tbl_src_ArticlesCharacteristics]    Script Date: 1/30/2023 6:37:55 PM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[tbl_src_ArticlesCharacteristics](
    	[ID_CHARACTERISTIC] [int] NOT NULL,
    	[ID_ARTICLE] [int] NOT NULL,
    	[ID_FILTER] [int] NOT NULL,
     CONSTRAINT [PK_tbl_src_ArticlesCharacteristics] PRIMARY KEY CLUSTERED 
    (
    	[ID_CHARACTERISTIC] ASC,
    	[ID_ARTICLE] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

Here how i build the sparse table for the lookup (i take up to 10K column just for testing pourpose)

    /****** Object:  Table [dbo].[ArticleCarat]    Script Date: 1/30/2023 6:53:11 PM ******/
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE TABLE [dbo].[ArticleCarat](
    	[ArtID] [int] NOT NULL,
    	[CARAT_1] [bit] SPARSE  NULL,
    	[CARAT_2] [bit] SPARSE  NULL,
    	[CARAT_3] [bit] SPARSE  NULL,
    	[CARAT_4] [bit] SPARSE  NULL,
    ...
    ..
    .
    	[CARAT_10132] [bit] SPARSE  NULL,
    	[CARAT_10133] [bit] SPARSE  NULL,
    	[CARAT_10134] [bit] SPARSE  NULL,
    	[SpecialPurposeColumns] [xml] COLUMN_SET FOR ALL_SPARSE_COLUMNS  NULL,
    PRIMARY KEY CLUSTERED 
    (
    	[ArtID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO

**---Adding more context :** 

As requested I have provided more context here. The article table will hold data about items that can be purchased in a shop. So entries may be shoes, clothes, or another thing. Each thing has many codes (the meaning is configurable in the installation phase) used during search and has attached few not searchable metadata (the json fields, those data include descriptions in many languages and pricing information). 
The last thing about codes is that they should be unique, but I really don't know (they are provided by the customer and sometimes data is messy, showing two articles with the same code is not an issue).

Json fields are not important here, I mentioned because seems to me that they will affect search performance even when not in the select statement.
For that reason, I rarely do a search over the article table but use a special table for searches over codes.

That said there are Characteristics, that are searchable attributes of the article. For example, shoes may have the following characteristics "blue - 42EU - running". Characteristics are more like tags here in StackOverflow.

Here is a dataset for example (keep in mind data and DB schema reflect my real case, but are fictional, due to company policy I can't disclose real data) :

https://dbfiddle.uk/aBnv_GK2 

Here the code to generate the example, i have integrated information also about a filter table, filters are just a set of characteristics related to the same topic. 


    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[tbl_ana_Articles](
    	[ID] [int] IDENTITY(1,1) NOT NULL,
    	[CODE1] [nvarchar](10) NOT NULL,
    	[CODE2] [nvarchar](10) NOT NULL,
    	[CODE3] [nvarchar](10) NOT NULL,
    	[CODE4] [nvarchar](10) NOT NULL,
    	[_JSON1] [nvarchar](max) NOT NULL,
    	[_JSON2] [nvarchar](max) NOT NULL,
    	[_JSON3] [nvarchar](max) NOT NULL,
     CONSTRAINT [PK_tbl_ana_articles] PRIMARY KEY CLUSTERED 
    (
    	[ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO
    
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[tbl_ana_Characteristics](
    	[ID] [int] IDENTITY(1,1) NOT NULL,
    	[ID_FILTER] [int] NOT NULL,
    	[CODE1] [nvarchar](10) NOT NULL,
    	[CODE2] [nvarchar](10) NOT NULL,
    	[_JSON1] [nvarchar](max) NOT NULL,
    	[_JSON2] [nvarchar](max) NOT NULL,
     CONSTRAINT [PK_tbl_ana_characteristics] PRIMARY KEY CLUSTERED 
    (
    	[ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO
    
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[tbl_ana_Filter](
    	[ID] [int] IDENTITY(1,1) NOT NULL,
    	[CODE] [nvarchar](20) NOT NULL,
     CONSTRAINT [PK_tbl_ana_filter] PRIMARY KEY CLUSTERED 
    (
    	[ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
    
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[tbl_src_ArticlesCharacteristics](
    	[ID_CHARACTERISTIC] [int] NOT NULL,
    	[ID_ARTICLE] [int] NOT NULL,
    	[ID_FILTER] [int] NOT NULL,
     CONSTRAINT [PK_tbl_src_ArticlesCharacteristics] PRIMARY KEY CLUSTERED 
    (
    	[ID_CHARACTERISTIC] ASC,
    	[ID_ARTICLE] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
    SET IDENTITY_INSERT [dbo].[tbl_ana_Articles] ON 
    GO
    INSERT [dbo].[tbl_ana_Articles] ([ID], [CODE1], [CODE2], [CODE3], [CODE4], [_JSON1], [_JSON2], [_JSON3]) VALUES (2, N'C0001', N'EAN13_1', N'BATCH_1', N'HU_1', N'{ "description":"shoes for running"}', N'{"price" : "40€"}', N'{"other" : "other"}')
    GO
    INSERT [dbo].[tbl_ana_Articles] ([ID], [CODE1], [CODE2], [CODE3], [CODE4], [_JSON1], [_JSON2], [_JSON3]) VALUES (3, N'C0002', N'EAN13_2', N'BATCH_2', N'HU_2', N'{ "description":"t-shirt"}', N'{"price" : "10€"}', N'{"other" : "other"}')
    GO
    INSERT [dbo].[tbl_ana_Articles] ([ID], [CODE1], [CODE2], [CODE3], [CODE4], [_JSON1], [_JSON2], [_JSON3]) VALUES (4, N'C0003', N'EAN13_3', N'BATCH_3', N'HU_3', N'{ "description":"sun glasses"}', N'{"price" : "15€"}', N'{"other" : "other"}')
    GO
    SET IDENTITY_INSERT [dbo].[tbl_ana_Articles] OFF
    GO
    SET IDENTITY_INSERT [dbo].[tbl_ana_Characteristics] ON 
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (1, 1, N'Size 38 EU', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (2, 1, N'Size 39 EU', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (3, 1, N'Size 40 EU', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (4, 1, N'Size 41 EU', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (5, 1, N'Size 42 EU', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (6, 1, N'Size 43 EU', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (7, 1, N'Size 44 EU', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (9, 2, N'Classic', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (10, 2, N'Running', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (11, 2, N'Trekking', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (12, 3, N'Red', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (13, 3, N'Blue', N'-nd-', N'-nd-', N'-nd-')
    GO
    INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (14, 3, N'Green', N'-nd-', N'-nd-', N'-nd-')
    GO
    SET IDENTITY_INSERT [dbo].[tbl_ana_Characteristics] OFF
    GO
    SET IDENTITY_INSERT [dbo].[tbl_ana_Filter] ON 
    GO
    INSERT [dbo].[tbl_ana_Filter] ([ID], [CODE]) VALUES (1, N'Size')
    GO
    INSERT [dbo].[tbl_ana_Filter] ([ID], [CODE]) VALUES (2, N'Pourpose')
    GO
    INSERT [dbo].[tbl_ana_Filter] ([ID], [CODE]) VALUES (3, N'Color')
    GO
    SET IDENTITY_INSERT [dbo].[tbl_ana_Filter] OFF
    GO
    INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (3, 2, 1)
    GO
    INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (9, 3, 2)
    GO
    INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (10, 2, 2)
    GO
    INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (12, 4, 3)
    GO
    INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (13, 2, 3)
    GO
    INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (13, 3, 3)
    GO


                                

Skary (368 rep)

Jan 30, 2023, 05:52 PM • Last activity: Feb 1, 2023, 08:58 AM

7 votes

3 answers

594 views

Is there some drawback to define SPARSE columns?

sql-server database-design disk-space null sparse-column

I have a table which has a lot of NULL values in columns. But some columns don't contain NULLs at all (although nullable). Is there some drawback to declare all of these columns as SPARSE?

                                  I have a table which has a lot of NULL values in columns. But some columns don't contain NULLs at all (although nullable). Is there some drawback to declare all of these columns as SPARSE?
                                

jrara (5393 rep)

Oct 3, 2011, 06:47 AM • Last activity: Sep 5, 2022, 10:36 AM

11 votes

1 answers

650 views

Sparse columns, cpu time & filtered indexes

sql-server sql-server-2017 sparse-column

Sparsing -------- When doing some tests on sparse columns, as you do, there was a performance setback that I would like to know the direct cause of. **DDL** I created two identical tables, one with 4 sparse columns and one with no sparse columns. --Non Sparse columns table & NC index CREATE TABLE db...

                                  Sparsing
--------

When doing some tests on sparse columns, as you do, there was a performance setback that I would like to know the direct cause of.

**DDL**

I created two identical tables, one with 4 sparse columns and one with no sparse columns.

    --Non Sparse columns table & NC index
    CREATE TABLE dbo.nonsparse( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
    					  charval char(20) NULL,
    					  varcharval varchar(20) NULL,
    					  intval int NULL,
    					  bigintval bigint NULL
    					  );
    CREATE INDEX IX_Nonsparse_intval_varcharval
    ON dbo.nonsparse(intval,varcharval)
    INCLUDE(bigintval,charval);
    
    -- sparse columns table & NC index
    
    CREATE TABLE dbo.sparse( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
    					  charval char(20) SPARSE NULL ,
    					  varcharval varchar(20) SPARSE NULL,
    					  intval int SPARSE NULL,
    					  bigintval bigint SPARSE NULL
    					  );
    
    CREATE INDEX IX_sparse_intval_varcharval
    ON dbo.sparse(intval,varcharval)
    INCLUDE(bigintval,charval);

----------

**DML**

I then inserted about **2540 NON-NULL** values into both.

    INSERT INTO dbo.nonsparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
    SELECT 'Val1','Val2',20,19
    FROM MASTER..spt_values;
    
    INSERT INTO dbo.sparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
    SELECT 'Val1','Val2',20,19
    FROM MASTER..spt_values;
    
Afterwards, I inserted **1M NULL** values into both tables

    INSERT INTO dbo.nonsparse WITH(TABLOCK)  (charval, varcharval,intval,bigintval)
    SELECT TOP(1000000) NULL,NULL,NULL,NULL 
    FROM MASTER..spt_values spt1
    CROSS APPLY MASTER..spt_values spt2;
    
    INSERT INTO dbo.sparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
    SELECT TOP(1000000) NULL,NULL,NULL,NULL 
    FROM MASTER..spt_values spt1
    CROSS APPLY MASTER..spt_values spt2;

----------

**Queries**

*Nonsparse table execution*

When running this query twice on the newly created nonsparse table:

    SET STATISTICS IO, TIME ON;
    SELECT  * FROM dbo.nonsparse
    WHERE   1= (SELECT 1) -- force non trivial plan
    OPTION(RECOMPILE,MAXDOP 1);
    
The logical reads show **5257** pages

    (1002540 rows affected)
    Table 'nonsparse'. Scan count 1, logical reads 5257, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And the cpu time is at **343 ms**

     SQL Server Execution Times:
       CPU time = 343 ms,  elapsed time = 3850 ms.

----------

*sparse table execution*

Running the same query twice on the sparse table:

    SELECT  * FROM dbo.sparse
    WHERE   1= (SELECT 1) -- force non trivial plan
    OPTION(RECOMPILE,MAXDOP 1);

The reads are lower, **1763**

    (1002540 rows affected)
    Table 'sparse'. Scan count 1, logical reads 1763, physical reads 3, read-ahead reads 1759, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

But the cpu time is higher, **547 ms**.

     SQL Server Execution Times:
       CPU time = 547 ms,  elapsed time = 2406 ms.
    
Sparse table execution plan 

non sparse table execution plan 

----------

**Questions**

**Original question**

Since the **NULL** values are not stored directly in the sparse columns, could the increase in cpu time be due to returning the **NULL** values as a resultset? Or is it simply the  behaviour as noted in the documentation ?

> Sparse columns reduce the space requirements for null values at the
> cost of more overhead to retrieve nonnull values

Or is the overhead only related to reads & storage used?

Even when running ssms with the discard results after execution option, the cpu time of the sparse select was higher (407 ms) in comparison to the non sparse (219 ms).

**EDIT**

It might have been the overhead of the non null values, even if there are only 2540 present, but I am still not convinced.

This seems to be about the same performance, but the sparse factor was lost.

    CREATE INDEX IX_Filtered
    ON dbo.sparse(charval,varcharval,intval,bigintval)
    WHERE charval IS NULL  
    	  AND varcharval IS NULL
    	  AND intval  IS NULL
    	  AND bigintval  IS NULL;
    
    CREATE INDEX IX_Filtered
    ON dbo.nonsparse(charval,varcharval,intval,bigintval)
    WHERE charval IS NULL  
    	  AND varcharval IS NULL
    	  AND intval  IS NULL
    	  AND bigintval  IS NULL;
    
        SET STATISTICS IO, TIME ON;
    
    SELECT  charval,varcharval,intval,bigintval FROM dbo.sparse WITH(INDEX(IX_Filtered))
    WHERE charval IS NULL AND  varcharval IS NULL
    					 AND intval  IS NULL
    					 AND bigintval  IS NULL
    					 OPTION(RECOMPILE,MAXDOP 1);
    
    SELECT  charval,varcharval,intval,bigintval 
    FROM dbo.nonsparse WITH(INDEX(IX_Filtered))
    WHERE charval IS NULL AND 
    					  varcharval IS NULL
    					 AND intval  IS NULL
    					 AND bigintval  IS NULL
    					 OPTION(RECOMPILE,MAXDOP 1);

Seems to have about the same execution time:

     SQL Server Execution Times:
       CPU time = 297 ms,  elapsed time = 292 ms.

     SQL Server Execution Times:
       CPU time = 281 ms,  elapsed time = 319 ms.

**But** why are the logical reads the same amount now? Shouldn't the filtered index for the sparse column not store anything except the included ID field and some other non-data pages?

    Table 'sparse'. Scan count 1, logical reads 5785,
    Table 'nonsparse'. Scan count 1, logical reads 5785

And the size of both indices:

    RowCounts	Used_MB	Unused_MB	Total_MB
    1000000	    45.20	0.06	    45.26

Why are these the same size? Was the sparse-ness lost?

Both query plans when using the filtered index 

----------

**Extra Info**

    select @@version

> Microsoft SQL Server 2017 (RTM-CU16) (KB4508218) -
> 14.0.3223.3 (X64)   Jul 12 2019 17:43:08   Copyright (C) 2017 Microsoft Corporation  Developer Edition (64-bit) on Windows Server
> 2012 R2 Datacenter 6.3  (Build 9600: ) (Hypervisor)

While running the queries and only selecting the *ID* field, the cpu time is comparable, with lower logical reads for the sparse table.

Size of the tables

    SchemaName	TableName	RowCounts	Used_MB	Unused_MB	Total_MB
    dbo	        nonsparse	1002540	    89.54	0.10	    89.64
    dbo	        sparse	    1002540	    27.95	0.20	    28.14

When forcing either the clustered or nonclustered index, the cpu time difference remains.

Randi Vertongen (16593 rep)

Sep 19, 2019, 02:08 PM • Last activity: Sep 19, 2019, 06:56 PM

0 votes

1 answers

639 views

how to find tables with either sparse or a column set column?

sql-server index disk-space compression sparse-column

from the error message below: > A compressed index is not supported on table that contains sparse > columns or a column set column as I was [rebuilding an index][1] and changing the `data_compression` to page This was after making good use of [sparse column or compression data][2] wonderful scripts....

                                  from the error message below:

> A compressed index is not supported on table that contains sparse
> columns or a column set column

as I was rebuilding an index  and changing the data_compression to page

This was after making good use of sparse column or compression data  wonderful scripts.

**Question:**

what are column set columns and how to find tables with either sparse or a column set column?

Marcello Miorelli (17274 rep)

Aug 8, 2019, 05:59 PM • Last activity: Aug 8, 2019, 06:12 PM

2 votes

0 answers

242 views

Quickest way to apply SPARSE on huge existing tables

sql-server sparse-column

I have a bunch (> 40) of columns in a huge table (> 27 m records) which I am attempting to set as SPARSE. Each column takes up to 40 minutes to be altered. Is there a way to do this more quickly? Is there some metadata I can modify to quickly set these columns as SPARSE and rebuild the table?

                                  I have a bunch (> 40) of columns in a huge table (> 27 m records) which I am attempting to set as SPARSE. Each column takes up to 40 minutes to be altered. Is there a way to do this more quickly? Is there some metadata I can modify to quickly set these columns as SPARSE and rebuild the table?
                                

sciron (21 rep)

Jul 30, 2019, 02:14 PM

2 votes

0 answers

182 views

Storing a large, sparse matrix in Apache Hbase?

nosql sparse-column hbase

I'm currently testing out Apache Hbase for our Matrix analytics service. I'm using a managed cluster running on AWS EMR. The matrices are sparse, and we have 50,000 columns, and up to 10 million rows. The values are integer values The main operation we'd like to support is random access of any 500 rows and 30 columns. We would like this operation to return in under a second, and ideally under half a second. Apache Hbase seemed like the ideal option, as it's advertised to support 'real-time' access of large sparse matrices with 'millions of columns and rows'. The instance I'm running is sizable, with 4 nodes, each with 16 cores and 256GB of memory. I've tried both 'wide and short' and 'tall and thin' table formats. The 'wide and short' format has 50,000 columns, and is simply the matrix represented in hbase. The 'tall and thin' format has the row key as a composite key, with the format '{row_name};{column_name}' - e.g.

;100507661

. The 'tall and thin' format has a single column containing the value. For both formats, I've reduced the block size to 8192 bytes and turned on the Bloom Filter, as well as adding SNAPPY compression. The access pattern for rows and columns is completely random. Any 500 rows and 30 cells can be requested at any time. There is no easy way to group the rows and columms for faster access. I'm still seeing a latency of up to 5 seconds, which is much too slow. Is this too fast a response time to expect from such a large dataset? Or am I making some basic error? Should I try encoding and compressing the row key?

gacharya (21 rep)

Jul 2, 2019, 12:11 AM

2 votes

1 answers

1218 views

sparse column or DATA_COMPRESSION?

sql-server performance sql-server-2016 compression sparse-column query-performance

in one of my databases I have the following table: CREATE TABLE [app].[applicantSkill]( [ApplicantSkillID] [int] IDENTITY(1,1) NOT NULL, [applicantID] [int] NOT NULL, [skillID] [tinyint] NOT NULL, [skillDetails] [varchar](500) NULL, [skillLevelID] [tinyint] SPARSE NULL, [dateAdded] [datetime2](7) NO...

                                  
in one of my databases I have the following table:

    CREATE TABLE [app].[applicantSkill](
    	[ApplicantSkillID] [int] IDENTITY(1,1) NOT NULL,
    	[applicantID] [int] NOT NULL,
    	[skillID] [tinyint] NOT NULL,
    	[skillDetails] [varchar](500) NULL,
    	[skillLevelID] [tinyint] SPARSE  NULL,
    	[dateAdded] [datetime2](7) NOT NULL,
    	[lastModified] [datetime2](7) NOT NULL,
     CONSTRAINT [PK_tbl_applicant_skill] PRIMARY KEY CLUSTERED 
    (
    	[ApplicantSkillID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [UserFG],
     CONSTRAINT [uc_appSkillID] UNIQUE NONCLUSTERED 
    (
    	[applicantID] ASC,
    	[skillID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [UserFG]
    ) ON [UserFG]
    GO


which has 1 sparse column  - skillLevelID


when I have to re-create my index:

    CREATE NONCLUSTERED INDEX I_applicantID  
    ON [app].[applicantSkill] (  [applicantID] ASC  , [dateAdded] ASC  )  
    INCLUDE ( [ApplicantSkillID] , [skillDetails] , [skillID] , [skillLevelID])  WITH (  PAD_INDEX = OFF, FILLFACTOR = 100  , SORT_IN_TEMPDB = OFF , IGNORE_DUP_KEY = OFF, 
    STATISTICS_NORECOMPUTE = OFF, ONLINE = OFF, DROP_EXISTING = OFF, DATA_COMPRESSION=PAGE, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON ) ON [NONCLUSTERED_INDEXES] 


I get the following error message:

    Msg 10622, Level 16, State 1, Line 18
    The index 'I_applicantID' could not be created or rebuilt. A compressed index is not supported on table that contains sparse columns or a column set column.


How can I find out which one the sparse column or the data_compression should I keep?


                                

Marcello Miorelli (17274 rep)

Oct 18, 2018, 02:53 PM • Last activity: Oct 18, 2018, 03:05 PM

7 votes

3 answers

1654 views

How to query against exact values in XML column set

sql-server sql-server-2016 xml xquery sparse-column

I have a table that contains 80+ sparse columns along with a column set column, this is a brief example: DROP TABLE IF EXISTS #ColumnSet GO CREATE TABLE #ColumnSet ( Id INT NOT NULL , Value1 VARCHAR(100) SPARSE NULL , Value2 VARCHAR(100) SPARSE NULL , Value3 VARCHAR(100) SPARSE NULL , Value4 VARCHAR...

                                  I have a table that contains 80+ sparse columns along with a column set column, this is a brief example:

    DROP TABLE IF EXISTS #ColumnSet
    GO
    
    CREATE TABLE #ColumnSet
    (
        Id        INT          NOT NULL
      , Value1    VARCHAR(100) SPARSE NULL 
      , Value2    VARCHAR(100) SPARSE NULL 
      , Value3    VARCHAR(100) SPARSE NULL 
      , Value4    VARCHAR(100) SPARSE NULL 
      , AllValues XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
    )
    GO
    
    INSERT INTO #ColumnSet
    (Id, Value1, Value2, Value3, Value4)
    
    VALUES
    (1, 'POSITIVE', NULL, NULL, NULL),
    (2, 'NEGATIVE', NULL, 'NEGATIVE', NULL),
    (3, NULL, NULL, 'NEGATIVE', 'POSITIVE'),
    (4, 'NEGATIVE', NULL, 'THIS IS NOT A POSITIVE RESULT', NULL)
    GO

I want to query the column set to identify rows where any of the columns has a value of POSITIVE.

Using the value method on the column set will concatenate all the values together into one string and I could use LIKE but I don't want results where the value is within another string.
    
    SELECT
    	*
    
    FROM
    	#ColumnSet
    
    WHERE
    	AllValues.value('/', 'nvarchar(4000)') LIKE '%POSITIVE%'

Are there alternative methods of querying a column set to achieve the above? Using APPLY along with the nodes method provides the same concatenated string output though my syntax could be incorrect.

The required output :

    id
    1
    3
                                

mhep (750 rep)

Aug 21, 2018, 02:56 PM • Last activity: Aug 23, 2018, 11:22 AM

1 votes

1 answers

124 views

Can I use SPARSE somehow on a non-nullable bit column with mostly false values?

sql-server performance sparse-column query-performance

I have a table that stores the results of queries that are run at least once a day. There's a `bit` column that represents whether the row is from the *most recent* run of a particular query with a particular set of arguments. (Yes, it's a functional dependency, but a necessary denormalization for p...

                                  I have a table that stores the results of queries that are run at least once a day. There's a bit column that represents whether the row is from the *most recent* run of a particular query with a particular set of arguments. (Yes, it's a functional dependency, but a necessary denormalization for performance, since most queries on this table are only interested in the most recent result.)

Since this bit column is almost always a false value, I'm looking for the best way to tune queries returning only the true values. Partitioning isn't an option (Standard Edition). It seems like making the column SPARSE would be an interesting solution, but I believe that would require me to change the column to nullable and use NULL rather than 0 for false values. Seems a little kludgy.

Is there an option similar to SPARSE that would optimize space/performance for a non-null bit column with mostly (well over 99%) false values?

Pinal Dave's article  indicates that both zero and null values are optimized, but this doesn't seem right to me, since these are different values -- unless MSSQL is using the same mechanism for non-null columns to indicate the default value. This would be great if it were true, but the BOL  doesn't mention this.

richardtallent (352 rep)

Jan 25, 2015, 03:44 AM • Last activity: Aug 23, 2018, 09:20 AM

9 votes

1 answers

1199 views

Adding SPARSE made table much bigger

sql-server sql-server-2008 null storage sparse-column

I have a generic log table, about 5m rows. There's a "strongly typed" field that stores event type, and a bunch of "losely typed" columns that contain data relevant to the event. That is, meaning of those "losely typed" columns depends on the type of the event. These columns are defined as: USER_CHA...

                                  I have a generic log table, about 5m rows.  
There's a "strongly typed" field that stores event type, and a bunch of "losely typed" columns that contain data relevant to the event. That is, meaning of those "losely typed" columns depends on the type of the event.

These columns are defined as:



    USER_CHAR1 nvarchar(150) null,
    USER_CHAR2 nvarchar(150) null,
    USER_CHAR3 nvarchar(150) null,
    USER_CHAR4 nvarchar(150) null,
    USER_CHAR5 nvarchar(150) null,
    
    USER_INTEGER1 int null,
    USER_INTEGER2 int null,
    USER_INTEGER3 int null,
    USER_INTEGER4 int null,
    USER_INTEGER5 int null,
    
    USER_FLAG1 bit null,
    USER_FLAG2 bit null,
    USER_FLAG3 bit null,
    USER_FLAG4 bit null,
    USER_FLAG5 bit null,
    
    USER_FLOAT1 float null,
    USER_FLOAT2 float null,
    USER_FLOAT3 float null,
    USER_FLOAT4 float null,
    USER_FLOAT5 float null

Columns 1 and 2 in each type are heavily used, but starting from number 3, very few event types would provide this much of info. I therefore desided to mark columns 3-5 in each type as SPARSE.

I did some analysis first, and saw that, indeed, at least 80% of data in each of those columns is null, and in some 100% of data is null. According to the 40% savings threshold table , SPARSE would be a huge win on them.

So I went and applied SPARSE to columns 3-5 in each group.
Now my table takes about 1.8Gb in data space as reported by sp_spaceused, whereas before sparsing it was 1Gb.

I tried dbcc cleantable, but it had no effect.  
Then dbcc shrinkdatabase, no effect either.

Puzzled, I removed SPARSE and repeated the dbccs. The size of the table remained at 1.8Gb.

What gives?
                                

GSerg (1353 rep)

Feb 4, 2012, 03:49 PM • Last activity: Aug 23, 2018, 09:19 AM

3 votes

2 answers

7011 views

Should I used varchar(max) or varchar(4000) SPARSE?

sql-server varchar sparse-column

I have a "comment"-type column that is rarely used -- around 6% non-null in a population of 3 million records. The average length (when used) is 6 characters, and the max so far is around 3KB. A maximum of 4000 characters is reasonable for this field. I have two options: comments varchar(max) NULL -...

                                  I have a "comment"-type column that is rarely used -- around 6% non-null in a population of 3 million records. The average length (when used) is 6 characters, and the max so far is around 3KB. A maximum of 4000 characters is reasonable for this field. I have two options:

    comments varchar(max) NULL -- this is the current column definition
    comments varchar(4000) SPARSE NULL

My current understanding is that in *both* cases, a NULL value would require no storage -- just the column's NULL bit set and a length of 0 in the row metadata.

But for the **non-null** cases, does one have a clear advantage over the other?

The extra 4-byte pointer for sparse columns with values suggests they are *always* stored off-row like text or very large varchar(max) fields. Is that the case?

If so, I'd lean toward using varchar(max), since it *only* stores values off-row if the total row length exceeds 8KB, and the majority of my values are short and unlikely to put a row over the limit.

I haven't seen this particular situation addressed in the BOL, so I'm hoping someone here knows enough about the innards of MSSQL to give some insight.

(If it matters, I'm currently using 2008R2, but hoping to upgrade soon to 2014.)

richardtallent (352 rep)

Jan 28, 2015, 11:22 PM • Last activity: Aug 23, 2018, 09:18 AM

1 votes

1 answers

225 views

SPARSE Column in SQL Server

sql-server sparse-column

I have just learned about [`SPARSE`][1] columns in SQL Server. I have never used them; I've just read about it on the Internet. Can I `ALTER` an existing Nullable column in large transactional tables to take advantage of the SPARSE property? Would it be beneficial for space and performance? [1]: htt...

                                  I have just learned about SPARSE  columns in SQL Server.  I have never used them; I've just read about it on the Internet.   

Can I ALTER an existing Nullable column in large transactional tables to take advantage of the SPARSE property? Would it be beneficial for space and performance?

Ravi Chawla (11 rep)

Jul 23, 2014, 01:22 PM • Last activity: Aug 23, 2018, 09:17 AM

Showing page 1 of 15 total questions