Sample Header Ad - 728x90

SQL Server JOIN vs sparse table

2 votes
1 answer
280 views
I am in the need to query a relation between an entity called article (it's a product sold in a shop) and it's characterstics (a set of attributes). The result will always presented in a paged manner (so only few articles record are read each time). The goal of the query is to fetch a page of articles that have a set of characteristics (a few dozen per query). My first idea is to use a join among those two entities, I already have a table representing articles, a table representing characteristics, and a cross-join table that map the relations of the previous two. Problem is that articles may be many(up to 1 million), while characteristics may be quite numerous (a few hundred). That does not seem a big deal, but I am afraid because distribution is not uniform, for example, may happen that some characteristics are common to almost all articles (while other may be barely used). Another aspect is that both article and characteristics tables have few json fields that may be quite big (a few KB of data each). I am still in the design phase, but I am asking to myself if a join is the best approach in this scenario or if it's cheaper to build an auxiliary "index table" (with only a searchable field, so without json field) to use for the query. I mean programmatically build/rebuild an auxiliary table, do one query over that table to fetch the article IDs of items in the "current page" and then do another query over article tables (limited to that page for that pool of IDs). To achieve that I thought about sparse table, I mean a sparse table could have a few hundred columns, so maybe I can create/recreate (periodically) such a search table in which I have one row for each article, and one column for each characteristic (each characteristic column is just a bit, and I have application constraint so I am guaranteed that the characteristics are less than one thousand). Given a specific convention (eg: characteristic with ID 345 will become column 'C345' in the index table) I can create dynamically a query over such a sparse table. In that way, I have a guarantee that this "index table" will have only 1 million records in the worst-case scenario, while the join may have hundred of millions if I am particularly unlucky. That said, I have never used sparse tables like that before, and I am not sure they perform well, so I would ask if this is a good idea accordingly to your experience. Here is the schema of tables I already have (articles, characteristics, and the cross-join table) : /****** Object: Table [dbo].[tbl_ana_Articles] Script Date: 1/30/2023 6:37:55 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Articles]( [ID] [int] IDENTITY(1,1) NOT NULL, [CODE1] [nvarchar](10) NOT NULL, --optional searchable codes [CODE2] [nvarchar](10) NOT NULL, --optional searchable codes [CODE3] [nvarchar](10) NOT NULL, --optional searchable codes [CODE4] [nvarchar](10) NOT NULL, --optional searchable codes [_JSON1] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable [_JSON2] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable [_JSON3] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable CONSTRAINT [PK_tbl_ana_articles] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO /****** Object: Table [dbo].[tbl_ana_Characteristics] Script Date: 1/30/2023 6:37:55 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Characteristics]( [ID] [int] IDENTITY(1,1) NOT NULL, [ID_FILTER] [int] NOT NULL, --characteristics are grouped by "kind" eg : the kind "size" may contain the set of characteristic {"big","medium","small"} this ID keep track of the one to many association (not relevant in this scenario) [CODE1] [nvarchar](10) NOT NULL, --optional searchable codes [CODE2] [nvarchar](10) NOT NULL, --optional searchable codes [_JSON1] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable [_JSON2] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable CONSTRAINT [PK_tbl_ana_characteristics] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO /****** Object: Table [dbo].[tbl_src_ArticlesCharacteristics] Script Date: 1/30/2023 6:37:55 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_src_ArticlesCharacteristics]( [ID_CHARACTERISTIC] [int] NOT NULL, [ID_ARTICLE] [int] NOT NULL, [ID_FILTER] [int] NOT NULL, CONSTRAINT [PK_tbl_src_ArticlesCharacteristics] PRIMARY KEY CLUSTERED ( [ID_CHARACTERISTIC] ASC, [ID_ARTICLE] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO Here how i build the sparse table for the lookup (i take up to 10K column just for testing pourpose) /****** Object: Table [dbo].[ArticleCarat] Script Date: 1/30/2023 6:53:11 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[ArticleCarat]( [ArtID] [int] NOT NULL, [CARAT_1] [bit] SPARSE NULL, [CARAT_2] [bit] SPARSE NULL, [CARAT_3] [bit] SPARSE NULL, [CARAT_4] [bit] SPARSE NULL, ... .. . [CARAT_10132] [bit] SPARSE NULL, [CARAT_10133] [bit] SPARSE NULL, [CARAT_10134] [bit] SPARSE NULL, [SpecialPurposeColumns] [xml] COLUMN_SET FOR ALL_SPARSE_COLUMNS NULL, PRIMARY KEY CLUSTERED ( [ArtID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO **---Adding more context :** As requested I have provided more context here. The article table will hold data about items that can be purchased in a shop. So entries may be shoes, clothes, or another thing. Each thing has many codes (the meaning is configurable in the installation phase) used during search and has attached few not searchable metadata (the json fields, those data include descriptions in many languages and pricing information). The last thing about codes is that they should be unique, but I really don't know (they are provided by the customer and sometimes data is messy, showing two articles with the same code is not an issue). Json fields are not important here, I mentioned because seems to me that they will affect search performance even when not in the select statement. For that reason, I rarely do a search over the article table but use a special table for searches over codes. That said there are Characteristics, that are searchable attributes of the article. For example, shoes may have the following characteristics "blue - 42EU - running". Characteristics are more like tags here in StackOverflow. Here is a dataset for example (keep in mind data and DB schema reflect my real case, but are fictional, due to company policy I can't disclose real data) : https://dbfiddle.uk/aBnv_GK2 Here the code to generate the example, i have integrated information also about a filter table, filters are just a set of characteristics related to the same topic. SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Articles]( [ID] [int] IDENTITY(1,1) NOT NULL, [CODE1] [nvarchar](10) NOT NULL, [CODE2] [nvarchar](10) NOT NULL, [CODE3] [nvarchar](10) NOT NULL, [CODE4] [nvarchar](10) NOT NULL, [_JSON1] [nvarchar](max) NOT NULL, [_JSON2] [nvarchar](max) NOT NULL, [_JSON3] [nvarchar](max) NOT NULL, CONSTRAINT [PK_tbl_ana_articles] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Characteristics]( [ID] [int] IDENTITY(1,1) NOT NULL, [ID_FILTER] [int] NOT NULL, [CODE1] [nvarchar](10) NOT NULL, [CODE2] [nvarchar](10) NOT NULL, [_JSON1] [nvarchar](max) NOT NULL, [_JSON2] [nvarchar](max) NOT NULL, CONSTRAINT [PK_tbl_ana_characteristics] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Filter]( [ID] [int] IDENTITY(1,1) NOT NULL, [CODE] [nvarchar](20) NOT NULL, CONSTRAINT [PK_tbl_ana_filter] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_src_ArticlesCharacteristics]( [ID_CHARACTERISTIC] [int] NOT NULL, [ID_ARTICLE] [int] NOT NULL, [ID_FILTER] [int] NOT NULL, CONSTRAINT [PK_tbl_src_ArticlesCharacteristics] PRIMARY KEY CLUSTERED ( [ID_CHARACTERISTIC] ASC, [ID_ARTICLE] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO SET IDENTITY_INSERT [dbo].[tbl_ana_Articles] ON GO INSERT [dbo].[tbl_ana_Articles] ([ID], [CODE1], [CODE2], [CODE3], [CODE4], [_JSON1], [_JSON2], [_JSON3]) VALUES (2, N'C0001', N'EAN13_1', N'BATCH_1', N'HU_1', N'{ "description":"shoes for running"}', N'{"price" : "40€"}', N'{"other" : "other"}') GO INSERT [dbo].[tbl_ana_Articles] ([ID], [CODE1], [CODE2], [CODE3], [CODE4], [_JSON1], [_JSON2], [_JSON3]) VALUES (3, N'C0002', N'EAN13_2', N'BATCH_2', N'HU_2', N'{ "description":"t-shirt"}', N'{"price" : "10€"}', N'{"other" : "other"}') GO INSERT [dbo].[tbl_ana_Articles] ([ID], [CODE1], [CODE2], [CODE3], [CODE4], [_JSON1], [_JSON2], [_JSON3]) VALUES (4, N'C0003', N'EAN13_3', N'BATCH_3', N'HU_3', N'{ "description":"sun glasses"}', N'{"price" : "15€"}', N'{"other" : "other"}') GO SET IDENTITY_INSERT [dbo].[tbl_ana_Articles] OFF GO SET IDENTITY_INSERT [dbo].[tbl_ana_Characteristics] ON GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (1, 1, N'Size 38 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (2, 1, N'Size 39 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (3, 1, N'Size 40 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (4, 1, N'Size 41 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (5, 1, N'Size 42 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (6, 1, N'Size 43 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (7, 1, N'Size 44 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (9, 2, N'Classic', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (10, 2, N'Running', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (11, 2, N'Trekking', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (12, 3, N'Red', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (13, 3, N'Blue', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (14, 3, N'Green', N'-nd-', N'-nd-', N'-nd-') GO SET IDENTITY_INSERT [dbo].[tbl_ana_Characteristics] OFF GO SET IDENTITY_INSERT [dbo].[tbl_ana_Filter] ON GO INSERT [dbo].[tbl_ana_Filter] ([ID], [CODE]) VALUES (1, N'Size') GO INSERT [dbo].[tbl_ana_Filter] ([ID], [CODE]) VALUES (2, N'Pourpose') GO INSERT [dbo].[tbl_ana_Filter] ([ID], [CODE]) VALUES (3, N'Color') GO SET IDENTITY_INSERT [dbo].[tbl_ana_Filter] OFF GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (3, 2, 1) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (9, 3, 2) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (10, 2, 2) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (12, 4, 3) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (13, 2, 3) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (13, 3, 3) GO
Asked by Skary (368 rep)
Jan 30, 2023, 05:52 PM
Last activity: Feb 1, 2023, 08:58 AM