Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

54 views

MySQL: How to optimize a JOIN on a primary key with Indexes

I am looking to optimize a JOIN between tables that uses a small subset of columns in the second table. CREATE TABLE `employees` ( `employee_id` bigint NOT NULL, `manager_id` bigint NOT NULL, `org_id` bigint NOT NULL, `union_id` bigint NOT NULL ... PRIMARY KEY (employee_id), INDEX (union_id) ); CREA...

                                  I am looking to optimize a JOIN between tables that uses a small subset of columns in the second table.

  
    CREATE TABLE employees (
      employee_id bigint NOT NULL,
      manager_id bigint NOT NULL,
      org_id bigint NOT NULL,
      union_id bigint NOT NULL
      ...
      PRIMARY KEY (employee_id),
      INDEX (union_id)
    );

    CREATE TABLE managers (
      manager_id bigint NOT NULL,
      org_id bigint NOT NULL,
      some_condition boolean NOT NULL,
      PRIMARY KEY (manager_id)    
    );

Now I want to optimize two types of queries. Both join the tables together on manager_id and org_id, and optionally apply a filter to the some_condition column.

    SELECT employees.* 
    FROM employees 
    JOIN managers
    ON (employees.manager_id = managers.manager_id AND employees.org_id = managers.org_id)
    WHERE (employees.union_id = ? AND managers.some_condition);

    SELECT employees.* 
        FROM employees 
        JOIN managers
        ON (employees.manager_id = managers.manager_id AND employees.org_id = managers.org_id)
        WHERE (employees.union_id = ?);

Assuming these are very large tables and employees > manager. I am trying to create an index on managers that will speed up the query. Right now the query is slow because for each row it has to read org_id and some_condition. I want to avoid going to disk if possible.

So far I have two indexes that might work:

    INDEX join_index (org_id,some_condition)
    INDEX id_join_index (manager_id, org_id, some_condition)

My main issue is that MySQL does not use either index in the EXPLAIN statement unless I force it to with use index (...).

Which index (if either) will speed up my query, and do I need manager_id in the index to speed up the join if I do not filter on some_condition?
                                

kingrich123 (1 rep)

Nov 20, 2024, 09:36 PM • Last activity: Nov 21, 2024, 10:00 AM

0 votes

2 answers

63 views

How to have a covering index used for the updates?

mysql index update explain covering-index

I have a db table with" - `id bigint NOT NULL AUTO_INCREMENT` (primary key) - `customer_ref varchar(64) NOT NULL` - `customer_counter int NOT NULL` - `description varchar(255) NOT NULL` I have also the following indexes: - `PRIMARY KEY(id)` - `UNIQUE KEY c_ref (customer_ref)` - `KEY c_c (customer_co...

                                  I have a db table with"

 - id bigint NOT NULL AUTO_INCREMENT (primary key)  
- customer_ref varchar(64) NOT NULL  
- customer_counter int NOT NULL  
- description varchar(255) NOT NULL   

I have also the following indexes:  
- PRIMARY KEY(id)  
- UNIQUE KEY c_ref (customer_ref)  
- KEY c_c (customer_counter)  
- KEY c_desc(customer_ref, description, customer_counter)  

I have the following update statements:  

    UPDATE CustomerReferences SET customer_counter=100, description='invalidate due to X123' WHERE customer_ref='1000001' and description  'immutable'  

    UPDATE CustomerReferences SET customer_counter=100, description='invalidate due to X123' WHERE id=2 and description  'immutable'  

Problem: I am using the EXPLAIN and I can't find a way to make the query filter the rows directly from the index. It always uses the index of the primary key or the unique key (c_ref). How can I index the table so that these updates filter/update directly via the index? Note that the particular updates should be no-ops as if I do select count(*) based on the where statement I get 0 rows due to the condition description  'immutable'
                                

Jim (123 rep)

Oct 20, 2024, 05:41 PM • Last activity: Oct 21, 2024, 07:02 PM

5 votes

4 answers

862 views

MySQL using a multi-column index even when the first column isn't being queried

mysql index mysql-8.0 covering-index

I have MySQL version 8.0.37. From what I understand about a multi-column index in this version, it will be used by MySQL ONLY if the query contains a subset of all the columns, starting from the first. For ex, I have this index in my InnoDB table ``` mysql> show indexes from my_table; +-------------...

mysql> show indexes from my_table;
+------------------------+------------+------------------------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name                        | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------------------+------------+------------------------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| my_table |          0 | PRIMARY                         |            1 | id          | A         |       32643 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| my_table |          1 | my_table_entity_id              |            1 | entity_id   | A         |       20160 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| my_table |          1 | my_table_entity_id_sub_id_value |            1 | entity_id   | A         |       18222 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| my_table |          1 | my_table_entity_id_sub_id_value |            2 | sub_id      | A         |       32985 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| my_table |          1 | my_table_entity_id_sub_id_value |            3 | value       | A         |       32545 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+------------------------+------------+------------------------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

With my_table_entity_id_sub_id_value index, I can run queries over entity_id, or both entity_id and sub_id or all the 3 columns. This is also what the MySQL [documentation](https://dev.mysql.com/doc/refman/8.4/en/multiple-column-indexes.html) says. However, this is the explain analyze output of a query over only the 2nd and 3rd columns, i.e., sub_id and value, and still the index is being used.

mysql> explain analyze select distinct entity_id from my_table where sub_id = 107 and value  = 'd90e7a26-2fc5-4e16-87c5-a2e9da5a26f7';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Group (no aggregates)  (cost=3552 rows=330) (actual time=3.52..14.7 rows=3103 loops=1)
    -> Filter: ((my_table.value = 'd90e7a26-2fc5-4e16-87c5-a2e9da5a26f7') and (my_table.sub_id = 107))  (cost=3519 rows=330) (actual time=3.44..14.3 rows=3103 loops=1)
        -> Covering index scan on my_table using my_table_entity_id_sub_id_value  (cost=3519 rows=32985) (actual time=0.0741..10.4 rows=33202 loops=1)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.39 sec)

I realise it's a "covering index" scan. What I understand about them is that they're used for retrieving values directly from the index, so I understand entity_id which I'm selecting is in that index. However, the where is still over only the 2nd and 3rd columns, and that is the filtration criteria. Am I missing something here? What am I not understanding about covering index scans?

Sidharth Samant (203 rep)

Oct 6, 2024, 05:41 AM • Last activity: Oct 7, 2024, 06:30 PM

2 votes

2 answers

306 views

Any pitfalls/benefits of creating PK with include(..) in PostgreSQL?

postgresql database-design index postgresql-performance covering-index

Application is aggressively caching data in memory and in order to support consistency (preventing persisting stale data) it is doing something like: ``` -- typical table structure: create table t1 ( id varchar(16) primary key, version_stamp int4, .... ) ``` ``` -- typical update statement update t1...

Application is aggressively caching data in memory and in order to support consistency (preventing persisting stale data) it is doing something like:

-- typical table structure:
create table t1 (
    id            varchar(16) primary key,
    version_stamp int4,
    ....
)

-- typical update statement
update t1 set
   version_stamp = version_stamp + 1,
   col1 = ?,
   col2 = ?,
...
where id = ? and version_stamp = ?

If update mentioned above tells that no rows have been updated, that means application has attempted to persist stale data and exception gets thrown, and the main idea is to prevent or, at least, minimise such cases. In order to do so, application is performing following queries (per request, transaction or method call):

select version_stamp from t1
  where id = ?

If no rows have been returned, that means the row have been deleted, if returned version_stamp differs with version_stamp kept in memory, that means we are dealing with stale data. The question is: is it worth to define primary keys as:

create unique index on t1(id) include(version_stamp)

Or not in such case. The typical RPS for such queries is about 10k per second.

Andrey B. Panfilov (231 rep)

Aug 7, 2023, 01:42 AM • Last activity: Aug 9, 2023, 01:35 AM

0 votes

1 answers

787 views

Adding a non-clustered, covering index, on a primary key, when clustered index exists

sql-server performance clustered-index covering-index

I went through the related articles, suggested by SO, but didn't really find all the answers, so I'll try to be specific here. Presume we have a (MSSQL) table, with a simple ID (int) primary key, with a clustered index only on it. Let's presume that this table has 20 columns. Now, I have a query, th...

                                  I went through the related articles, suggested by SO, but didn't really find all the answers, so I'll try to be specific here.

Presume we have a (MSSQL) table, with a simple ID (int) primary key, with a clustered index only on it.
Let's presume that this table has 20 columns.

Now, I have a query, that queries the table only by that ID and needs always only the 2 of those 20 columns.

Is it beneficial to create a new, non-clustered index, on the same primary key column, and INCLUDE the 2 needed columns as non-keys?

Something like this (example stolen from https://stackoverflow.com/q/4168634/261332) :

       CREATE NONCLUSTERED INDEX MyIndex
       ON MyTable(ID)
       INCLUDE (Name, Address)

I think it is, and I believe it is only due to the fact that this new index would be smaller in size (because its leaves are much smaller (having 2 instead of 20 columns)).

But, somehow, to me is illogical that the DB indexes are organized in such "poor" way that when the DB wants to read the index and its data, it actually reads both the indexes and the data together, so then it makes a difference how big the leaves are.

Or, maybe the the indexes are actually kept "separated" so the search of the key is always equally fast, but the slowdown comes from reading out the proper leaf, after we know the actual key? If it is so, are the benefits of such NC indexes then exaggerated, since it's about 1 page read?

Thanks for clarifying this topic to me (and hopefully some others as well).

userfuser (147 rep)

Feb 8, 2023, 11:26 AM • Last activity: Feb 8, 2023, 06:13 PM

8 votes

3 answers

6944 views

Advantage of using INCLUDE as against adding the column in INDEX for covering index

postgresql index covering-index

Postgres docs state the following about [*Index-Only Scans and Covering-Indexes*][1]: > if you commonly run queries like > > `SELECT y FROM tab WHERE x = 'key';` > > the traditional approach to speeding up such queries would be to > create an index on x only. However, an index defined as > > `CREATE...

                                  Postgres docs state the following about *Index-Only Scans and Covering-Indexes* :

> if you commonly run queries like
> 
> SELECT y FROM tab WHERE x = 'key';
> 
> the traditional approach to speeding up such queries would be to
> create an index on x only. However, an index defined as
> 
> CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);
> 
> could handle these queries as index-only scans, because y can be
> obtained from the index without visiting the heap.
> 
> Because column y is not part of the index's search key, it does not
> have to be of a data type that the index can handle; it's merely
> stored in the index and is not interpreted by the index machinery.
> Also, if the index is a unique index, that is
> 
> CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
> 
> the uniqueness condition applies to just column x, not to the
> combination of x and y. (An INCLUDE clause can also be written in
> UNIQUE and PRIMARY KEY constraints, providing alternative syntax for
> setting up an index like this.)

**Question 1:** If the data type of y can be added in index and there is no uniqueness requirement then  is there any advantage of using CREATE INDEX tab_x_y ON tab(x) INCLUDE (y) over CREATE INDEX tab_x_y ON tab(x, y) for queries like SELECT y FROM tab WHERE x = 'key';?

> It's wise to be conservative about adding non-key payload columns to
> an index, especially wide columns. If an index tuple exceeds the
> maximum size allowed for the index type, data insertion will fail. In
> any case, non-key columns duplicate data from the index's table and
> bloat the size of the index, thus potentially slowing searches.

**Question 2:** Can someone explain with an example what wide columns mean?

**Question 3:** Can someone explain the below statement in context of INCLUDE(y). If INCLUDE supports index only scans then y will also have to be stored in index. Then how does the below statement not hold for INCLUDE(y).

> In any case, non-key columns duplicate data from the index's table and
> bloat the size of the index
                                

tuk (1273 rep)

Jun 12, 2022, 06:20 PM • Last activity: Jun 13, 2022, 06:51 AM

2 votes

2 answers

2584 views

Non-clustered Primary Key and Clustered Index

sql-server index clustered-index nonclustered-index covering-index

It is my understanding that in SQL Server, you can have a Primary Key that is non-clustered, and have another index that is the clustered one. To me, this seems the same as just having a Primary Key, and an extra UNIQUE key. So I have two questions: 1) if a Primary Key is non-clustered, does it stor...

                                  It is my understanding that in SQL Server, you can have a Primary Key that is non-clustered, and have another index that is the clustered one.

To me, this seems the same as just having a Primary Key, and an extra UNIQUE key.

So I have two questions:

1) if a Primary Key is non-clustered, does it store all the columns with it? Or only the Primary Key columns and the columns referencing the clustered index?

2) I've just read that if the PK isn't the clustered index, then the clustered index does NOT have to be UNIQUE (but it's highly encouraged). Does this mean then that the table could be "randomly sorted" on the rows with the same key?

Nuno (829 rep)

May 16, 2022, 10:07 AM • Last activity: May 17, 2022, 11:23 AM

1 votes

0 answers

31 views

How to create Non-clustered Covering index on PK while creating a table

clustered-index nonclustered-index covering-index

I am trying to create a table where PK will not be a clustered index. PK will be non clustered with INCLUDE clause to add cover. CREATE TABLE [Ref].[User]( [UserRegisteredId] [varchar](100) NOT NULL, [UserName] [varchar](500) NOT NULL, [FirstName] [varchar](500) NULL, [LastName] [varchar](500) NULL,...

                                  I am trying to create a table where PK will not be a clustered index. PK will be non clustered with INCLUDE clause to add cover.

    CREATE TABLE [Ref].[User](
    	[UserRegisteredId] [varchar](100) NOT NULL,
    	[UserName] [varchar](500) NOT NULL,
    	[FirstName] [varchar](500) NULL,
    	[LastName] [varchar](500) NULL,
    	[PhoneNumber] [varchar](20) NULL,
    	[Email] [nvarchar](500) NOT NULL,
    	[CreatedDate] [datetime2](2) NOT NULL,
    	[ModifiedDate] [datetime2](2) NULL,
    	CONSTRAINT [UCI_UserName] UNIQUE CLUSTERED ([UserName] ASC),
        CONSTRAINT [PK_User] PRIMARY KEY ([UserRegisteredId])
    )

as you can see above UserName is unique clustered index. Now I created a non-clustered index on PK with INCLUDE as below:

    CREATE UNIQUE NONCLUSTERED INDEX NCI_UserId ON [Ref].[User] ([UserRegisteredId]) INCLUDE ([FirstName],[LastName])

After creating table and covering NCI on PK i noticed that there is already non-clustered index is created on PK automatically at the time table creation (Highlighted in attached image).




How can i avoid creation of two NCI on same column(PK) ? i need NCI which is created as covering index (NCI_UserId)

Is there any way that i can create covering non clustered index on PK while creating a table ?
                                

Heta Desai (21 rep)

Apr 23, 2020, 03:05 PM

2 votes

1 answers

230 views

Key lookup still happening after creating covering index

sql-server bookmark-lookup covering-index

I've implemented a covering index to avoid a key lookup: CREATE INDEX IX_StatusHistory_Covering ON StatusHistory(ID) INCLUDE (Status_ID, StatusComment, StatusReason_ID, StatusReasonComment, UserEnteredStatusDateTime, ChangeDateTime, ChangedBy_UserName, IMWBWagonMass) WITH (ONLINE= ON) But the key lo...

                                  I've implemented a covering index to avoid a key lookup:

    CREATE INDEX IX_StatusHistory_Covering ON StatusHistory(ID) 
           INCLUDE (Status_ID, StatusComment, StatusReason_ID,
           StatusReasonComment, UserEnteredStatusDateTime, 
           ChangeDateTime, ChangedBy_UserName, IMWBWagonMass) 
           WITH (ONLINE= ON)

But the key lookup is still happening.




Execution Plan 

Here is the query causing this:

    DECLARE @default_tare DECIMAL(18,2) = 19.94
    SELECT		TOP 100  *,			
			[Gross (WI)] -  ISNULL([Tare (WX)],@default_tare) AS [Arrived Nett (WI-WX)],
			[Gross (WN)] - ISNULL([Tare (WT)],@default_tare) AS  [Tipped Nett (WN-WT)],			
			[Client Weight] - ([Gross (WI)] -  ISNULL([Tare (WX)],@default_tare)) AS [Arrived Variance],			
			[Client Weight] - ([Gross (WN)] - ISNULL([Tare (WT)],@default_tare)) AS [Tipped Variance]

    FROM		(SELECT		CASE WHEN LoadedWeight > 9999 THEN LoadedWeight/1000 ELSE LoadedWeight END [Client Weight]
						,CASE WHEN his.Status_Code IN ('WH','WI') THEN his.IMWBWagonMass END AS [Gross (WI)]
						,CASE WHEN his.Status_Code ='WN' THEN his.IMWBWagonMass END AS [Gross (WN)]
						,CASE WHEN his.Status_Code = 'WX' THEN his.[IMWBWagonMass] ELSE NULL END AS [Tare (WX)]					
						,CASE WHEN his.Status_Code = 'WT' THEN his.[IMWBWagonMass] ELSE NULL END AS [Tare (WT)]
						,dp.Description AS Commodity
						,dp.Grade						
						,Header_Client_Name AS Client
						,Header_Destination_Name AS Destination
						,Header_Origin_Name AS Origin
						,Header_Product_Name AS [Commodity Name]
						,Header_ProductSubCategory_Name AS [Commodity Group]
						,his.StatusComment AS [Status Comment]
						,LEFT(LOWER(Rcd.Header_LastUpdatedByUser_UserName),CASE WHEN Header_LastUpdatedByUser_UserName 'System' THEN CHARINDEX( '@',Rcd.Header_LastUpdatedByUser_UserName)-1 ELSE 999 END) AS [Last updated by]
						,ISNULL(Header_StatusDateTime, Header_CreatedDateTime) AS [Last Updated]
						,st.Name  AS [Status Name]
						,StatusReasonComment AS [Reason Comment]
						,StatusReasonName AS [Reason Name]
						,UserEnteredStatusDateTime AS [Status Actual Event Time]
						,ChangeDateTime AS [Status System Change Time]
						,ChangedBy_UserName AS [Status Changed by]
						,st.Code  AS [Status Code]
						,sr.Code AS [Reason Code]
						,his.ID AS [Status History ID]
						,his.StatusReasonDateTime  AS [Reason Time]
						,his.IMWBWagonMass AS [Wagon Status History Mass]
						,rcd.ID AS [Wagon Line ID]
						,Wagon_Number AS [Wagon Number]
						,GrossWeight AS [Gross (Most Recent)]
						,TareWeight AS [Tare (Most Recent)]
						,Tarps
						,DelayDescription AS [Delay Description]
						,rcd.StatusComment AS [Current Wagon Status Comment]
						,rcd.StatusDateTime AS [Current Wagon Status Date Time]
						,rcd.Status_Code AS [Current Wagon Status Code]
						,rcd.Status_Name AS [Current Wagon Status]
						,RailConsignment_ID
						,Header_Number AS [Consignment]
						,Header_DepartureFromClientDateTime AS [Departure Time]
						,Header_EstimatedArrivalDateTime AS [Estimated Arrival Time]
						,Header_IsInbound
						,Header_PermitNumber AS [Permit Nr]
						,Header_RailAccountNumber AS [Rail Account Nr]
						,Header_Sender_Name AS [Sender]
						,Header_SenderSidingNumber AS [Sender Siding Nr]
						,Header_Status_Code AS [Train Current Status Code]
						,Header_Status_Name AS [Train Current Status]
						,Header_StatusDateTime AS [Train Current Status Actual Event Time]
						,Header_Train AS [Train]
						,Header_CreatedDateTime AS [Created On]
						,rcd.dw_timestamp
						,CreatedDateTime AS [Wagon Created On]
						,Header_Contract_Number AS [Contract Nr]
						,rcd.IMWBWagonMass AS [Wagon Mass (Most Recent)]
						,Header_WeeklyRailPlan_ID AS [NWB ID]
						,NettWeight AS [Nett (Most Recent)]
						,CASE WHEN rcd.Status_Code = 'WA' THEN rcd.Header_StatusDateTime ELSE Header_WAChangeDateTime END AS [Wagons Uncoupled From Locomotive]
						,CASE WHEN rcd.Status_Code = 'WF' THEN rcd.Header_StatusDateTime ELSE Header_WFChangeDateTime END AS [Consignment Finalised]
						,CASE WHEN rcd.Status_Code = 'WG' THEN rcd.Header_StatusDateTime ELSE Header_WGChangeDateTime END AS [Consignment at the Gate]
			FROM		RailConsignmentDetails AS  rcd WITH(NOLOCK)
			JOIN		FullStatusHistoryRailLine AS his WITH(NOLOCK) on his.ID = rcd.ID
			JOIN		DimProduct AS dp WITH(NOLOCK) ON dp.ID = rcd.Header_Product_ID
			LEFT JOIN	DimConsignmentStatus AS st  WITH(NOLOCK)ON st.ID = his.Status_ID
			LEFT JOIN	DimStatusReasons   AS sr WITH(NOLOCK)  ON st.ID = his.StatusReason_ID
			WHERE		Header_Deleted = 0
			AND			rcd.Deleted = 0
			AND			ISNULL(Header_StatusDateTime, Header_CreatedDateTime) BETWEEN CAST(DATEADD(MONTH,-2,GETDATE()) AS DATE) AND GETDATE()+ 1) AS t


The FullStatusHistoryLine view is:

    CREATE VIEW [dbo].[FullStatusHistoryRailLine]
        AS
            SELECT      sh.PK,
                        sh.ID,
                        CAST(sh.Number AS VARCHAR(255)) AS Number,
                        sh.Status_ID,
                        sh.Status_Name,
                        sh.StatusComment,
                        sh.StatusReason_ID,
                        sh.StatusReasonComment,
						dsr.Name AS StatusReasonName,
                        sh.UserEnteredStatusDateTime,
                        sh.ChangeDateTime,
                        sh.ChangedBy_Id,
                        sh.ChangedBy_UserName,
                        sh.dw_timestamp,
                        dcs.code AS Status_Code,
                        sh.StatusReason_Code,						
						prev.StatusReasonDateTime AS FromStatusReasonDateTime,
                        sh.StatusReasonDateTime,
						sh.IMWBWagonMass
            FROM        StatusHistory AS sh
            INNER JOIN	DimConsignmentStatus AS dcs ON sh.Status_ID = dcs.ID
			LEFT JOIN	DimStatusReasons AS dsr ON dsr.ID = sh.StatusReason_ID
			OUTER APPLY (SELECT TOP 1 StatusReasonDateTime 
						FROM	StatusHistory
						WHERE	ID = sh.ID
						AND		Number = sh.Number
						AND		IsHeader = sh.IsHeader
						AND		TransportMode_Name = sh.TransportMode_Name
						AND		StatusReasonDateTime < sh.StatusReasonDateTime
						ORDER BY StatusReasonDateTime DESC) AS prev
			WHERE		sh.TransportMode_Name = 'Rail'
			AND			sh.IsHeader = 0
            UNION ALL
			SELECT      CAST(rcd.ID AS VARCHAR) + '_' + CAST(ISNULL(CONVERT(VARCHAR(64),rcd.StatusReasonDateTime,126),CONVERT(VARCHAR(64),rcd.LastUpdatedDateTime,126)) AS VARCHAR) + '_L' AS PK,
                        rcd.ID,
                        CAST(rcd.Wagon_Number AS VARCHAR(255)) AS Number,
                        rcd.Status_ID,
                        rcd.Status_Name,
                        rcd.StatusComment,
                        rcd.StatusReason_ID,
                        '' AS StatusReasonComment,
						rcd.StatusReason_Name AS StatusReasonName,
                        rcd.StatusDateTime AS UserEnteredStatusDateTime,
                        rcd.StatusSavedDateTime AS ChangeDateTime,
                        rcd.LastUpdatedByUser_Id AS ChangedBy_Id,
                        rcd.LastUpdatedByUser_UserName AS ChangedBy_UserName,
                        rcd.dw_timestamp,
                        rcd.Status_Code,
                        rcd.StatusReason_Code,
						prev.StatusReasonDateTime AS FromStatusReasonDateTime,
                        rcd.StatusReasonDateTime,
						rcd.IMWBWagonMass
            FROM		RailConsignmentDetails AS rcd
			OUTER APPLY (SELECT TOP 1 StatusReasonDateTime 
						FROM	StatusHistory AS sh
						WHERE	sh.ID = rcd.ID
						AND		Number =  CAST(rcd.Wagon_Number AS VARCHAR(255))
						AND		IsHeader = 0
						AND		TransportMode_Name = 'Rail'
						AND		StatusReasonDateTime < rcd.Header_StatusReasonDateTime
						ORDER BY StatusReasonDateTime DESC) AS prev

And finally the DDL:

    CREATE TABLE [dbo].[StatusHistory](
	[PK] [varchar](99) NOT NULL,
	[ID] [int] NOT NULL,
	[Number] [varchar](255) NULL,
	[Status_ID] [int] NULL,
	[Status_Name] [nvarchar](max) NOT NULL,
	[StatusComment] [varchar](255) NULL,
	[StatusReason_ID] [int] NULL,
	[StatusReasonComment] [varchar](255) NULL,
	[UserEnteredStatusDateTime] [datetime] NULL,
	[ChangeDateTime] [datetime] NULL,
	[ChangedBy_Id] [nvarchar](128) NOT NULL,
	[ChangedBy_UserName] [nvarchar](256) NOT NULL,
	[IsHeader] [bit] NOT NULL,
	[dw_timestamp] [datetime] NOT NULL,
	[EventTime] [datetime] NOT NULL,
	[StatusReason_Code] [nvarchar](max) NULL,
	[StatusReasonDateTime] [datetime] NULL,
	[TransportMode_Name] [varchar](5) NULL,
	[IMWBWagonMass] [decimal](18, 2) NULL,
	[RoadFirstWeight] [decimal](18, 2) NULL,
	[RoadSecondWeight] [decimal](18, 2) NULL,
	[DraftSurveyTons] [decimal](18, 2) NULL,
	[StatusHistory_key] [int] IDENTITY(1,1) NOT NULL,
    CONSTRAINT [XPK_StatusHistory] PRIMARY KEY CLUSTERED 
    (
	[StatusHistory_key] ASC
    )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

                                

Evan Barke (33 rep)

Feb 27, 2020, 08:52 AM • Last activity: Feb 27, 2020, 01:04 PM

1 votes

1 answers

307 views

Postgres 11+: are covering indices (INCLUDE) useful for join/where conditions?

postgresql index query covering-index

I'd like to better understand when covering indices can be useful to make index-only scans possible in Postgres 11+. As the [documentation](https://www.postgresql.org/docs/current/indexes-index-only-scans.html) says, given the covering index ``` CREATE INDEX tab_x_y ON tab(x) INCLUDE (y); ``` querie...

CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);

queries like this can use it for index-only scans:

SELECT y FROM tab WHERE x = 'key';

Now I am wondering if such a covering index could also allow index-only scans when the covering columns appear as conditions. For instance, assume a covering index:

CREATE INDEX tab_x_y_z ON tab(x) INCLUDE (y, z);

Would this allow for index-only scans for the following queries?

SELECT z FROM tab WHERE x = 'key' AND y = 1;

SELECT x, y, z FROM (VALUES ('key1'),('key2'),('key3')) sub(id)
JOIN tab ON tab.x = sub.id WHERE y = 1;

tomka (967 rep)

Jan 16, 2020, 11:23 PM • Last activity: Jan 17, 2020, 06:26 PM

0 votes

2 answers

82 views

non clustered index coverage with INCLUDED columns

sql-server performance index-tuning nonclustered-index covering-index query-performance

If I have two indexes: IDX_1 = (COL1), (COL2) DESC IDX_2 = (COL1) INCLUDE (COL3) will a single defined below cover both queries? does the Descending keyword have any bearing on the usage of INCLUDE statement for covering both queries? IDX_3 = (COL1) INCLUDE (COL2), (COL3)

                                  If I have two indexes:

    IDX_1 = (COL1), (COL2) DESC
    IDX_2 = (COL1) INCLUDE (COL3)

will a single defined below cover both queries? does the Descending keyword have any bearing on the usage of INCLUDE statement for covering both queries?

    IDX_3 = (COL1) INCLUDE (COL2), (COL3)

DamagedGoods (2591 rep)

Dec 9, 2019, 11:38 PM • Last activity: Dec 10, 2019, 03:12 PM

1 votes

1 answers

177 views

Indexing Strategy for the query performance

sql-server index-tuning performance covering-index performance-tuning

I have a question on indexing strategy as i am trying to do indexing for a database. I understand the basic indexing strategy which are: - Clustered index on PrimaryKey - Add Non Clustered index for other columns as per requirement. - Should have indexes for WHERE/JOIN/GROUP BY/ORDER BY columns - Co...

                                  I have a question on indexing strategy as i am trying to do indexing for a database.

I understand the basic indexing strategy which are:

 - Clustered index on PrimaryKey
 - Add Non Clustered index for other columns as per requirement.
 - Should have indexes for WHERE/JOIN/GROUP BY/ORDER BY columns
 - Covering indexes for columns to avoid keylookup.
 - Check the workload and avoid adding too many indexes
 - Index Foreign Keys
 - Index any other unique keys etc..
I still have some questions and i couldn't find an exact answer to my questions.

Here is an example of a table that i have.

    USE [DEMODB]
    GO
    
    
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE TABLE [dbo].[Employee_Header](
    	[Id] [uniqueidentifier] NOT NULL,
    	[PageId] [uniqueidentifier] NOT NULL,
    	[Dept] [varchar](256) NULL,
    	[DeptName] [varchar](256) NULL,
    	[SubUnit] [varchar](256) NULL,
    	[ShiftType] [varchar](256) NULL,
    	[ReportDate] [varchar](256) NULL,
    	[ReportTime] [varchar](256) NULL,
    	[AccessNumber] [varchar](256) NULL,
    	[AccessOnAfterDate] [varchar](256) NULL,
    	[AccessOnAfterTime] [varchar](256) NULL,
    	[EmpBand] [varchar](256) NULL,
    	[WorkSite] [varchar](256) NULL,
    	[Location] [varchar](256) NULL,
    	[Location_FieldValue] [varchar](256) NULL,
    	[Location_Description] [varchar](256) NULL,
     CONSTRAINT [PK_Employee_Header] PRIMARY KEY CLUSTERED 
    (
    	[Id] ASC
    )WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

As you can see [Id] is the primary key here.[PageID] is the foreign key and it is not unique.
There are few queries which uses whereclause on `[Dept],[SubUnit], [ShiftType] and [ReportDate]'.

One query displays columns  [Dept] and [ShiftType] and other query displays [Dept],[ReportDate],[ReportTime],[EmpBand]

 1. I am confused as to how to create NonClustered indexes with Include
    columns so that same indexes can be used to both queries Or
 2. should i create separate clustered indexes for where each columns 
 3. then create a separate single NonClustered index with PrimaryKey
        column as unique column and all the display columns in the include
        section.
 4. Should i just keep the display columns in the keycolumn section? 

Most of the examples available shows querying to a single table with an Id and one or more select column.
Can someone help me to understand how you approach similar situations?


I created two indexes as shown below.

First one is for the where clauses.I have all the where clause fields added as key columns in this non clustered index

`CREATE NONCLUSTERED INDEX [ix_Employee_Filters] ON [dbo].[Employee_Header]
(
	[Dept] ASC,
	[SubUnit] ASC,
	[ShiftType] ASC,
	[ReportDate] ASC,
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO`

The second one is for the select columns.Some of the select columns are already part of the first index so i have only two columns that needs to be covered.
So i added them as Include fields with Id(Primary key field) as the key column.

`CREATE NONCLUSTERED INDEX [ix_Employee_Include_Fields] ON [dbo].[Employee_Header]
(
	[Id] ASC
)
INCLUDE ( 	[ReportTime],
	[EmpBand]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO`

Still when i run and see the execution plan, i get the message that there is a missing index.

`CREATE NONCLUSTERED INDEX []
ON [dbo].[Employee_Header] ([ReportDate])
INCLUDE ([Id],[Dept],[ShiftType],[ReportTime],[EmpBand])`

I don't understand why query optimizer is suggesting to a date filed as key field and other fields in the include section.
I already have a Nonclustered index with ReportDate among other where fields.

Appreciate for sharing your approach.But can anyone answer/share your perspective on the specific example mentioned?
                                

user9516827 (1345 rep)

Oct 16, 2019, 06:54 PM • Last activity: Oct 17, 2019, 07:42 PM

0 votes

1 answers

1186 views

Column as an Included column or as a key column

sql-server optimization clustered-index nonclustered-index covering-index

I have a query which has an optional where condition on column `classID(not unique)`. This same `classID` is being displayed as a select column in various other queries. Currently i have classID as an included column in an index `idx`,so i think that the idx will be used when classID is in select st...

                                  I have a query which has an optional where condition on column classID(not unique).
This same classID is being displayed as a select column in various other queries.
Currently i have classID as an included column in an index idx,so i think that the idx will be used when classID is in select statement.
Do i have to create a a non clustered index for the classID as a key column when the query has where condition for classID.?

Confused on this whether to create separate non-clustered indexes for all the where clause columns.?

                                

user9516827 (1345 rep)

Oct 14, 2019, 07:12 PM • Last activity: Oct 15, 2019, 04:56 AM

8 votes

2 answers

395 views

Covering index used despite missing column

index innodb mariadb covering-index

I have the following query, using MariaDB 10 / InnoDB: SELECT id, sender_id, receiver_id, thread_id, date_created, content FROM user_message WHERE thread_id = 12345 AND placeholder = FALSE ORDER BY date_created DESC LIMIT 20 This query fetches messages according to the given conditions and sorts by...

                                  I have the following query, using MariaDB 10 / InnoDB:

    SELECT id, sender_id, receiver_id, thread_id, date_created, content 
    FROM user_message 
    WHERE thread_id = 12345 
      AND placeholder = FALSE
    ORDER BY date_created DESC 
    LIMIT 20

This query fetches messages according to the given conditions and sorts by date created.

I have a covering index over (thread_id, date_created).

When running EXPLAIN, the correct index is used and I get the output "Using where", although the query is using a column in the middle of the statement that is not in the index. I can use any value for "placeholder = x" and the result is the same.

If I change the sorting to use another column, the EXPLAIN correctly indicates "Using where. Using filesort."

I'm having a head-scratching moment. Could anyone please shed light on this? What I would expect to see is that an additional filesort would be needed as the covering index could not be completely used due to the additional column.

Tom (205 rep)

Jun 17, 2019, 05:53 PM • Last activity: Jun 18, 2019, 08:31 AM

0 votes

1 answers

91 views

Why a query is not covered when using a range operator on documents?

mongodb mongodb-3.6 covering-index

I have created the following 1000 documents: for(i=0;i<1000;i++){db.doc.insert({"doc":{"k1":"v_"+i,"k2":i},"scal":i})} So a document looks like this: var d1 = db.doc.findOne() { "_id" : ObjectId("5ce25f8920b0be2428648e38"), "doc" : { "k1" : "v_0", "k2" : 0 }, "scal" : 0 } I've created the following...

                                  I have created the following 1000 documents:

    for(i=0;i<1000;i++){db.doc.insert({"doc":{"k1":"v_"+i,"k2":i},"scal":i})}


So a document looks like this:

    var d1 = db.doc.findOne()
    {
    	"_id" : ObjectId("5ce25f8920b0be2428648e38"),
    	"doc" : {
    		"k1" : "v_0",
    		"k2" : 0
    	},
    	"scal" : 0
    }

I've created the following two indexes:

    db.doc.createIndex({"doc":1})
    db.doc.createIndex({"scal":1})


I wonder why the following query is not covered:


    db.doc.explain(true).find({doc:{$gt:d1.doc}},{_id:0,doc:1})
    {
    	...
    	"executionStats" : {
    		"executionSuccess" : true,
    		"nReturned" : 999,
    		"executionTimeMillis" : 1,
    		"totalKeysExamined" : 999,
    		"totalDocsExamined" : 999,
    		...
    }

However, the same query using a scalar instead of a document is covered:

    db.doc.explain(true).find({scal:{$gt:d1.scal}},{_id:0,scal:1})
    {
    	...
    	"executionStats" : {
    		"executionSuccess" : true,
    		"nReturned" : 999,
    		"executionTimeMillis" : 0,
    		"totalKeysExamined" : 999,
    		"totalDocsExamined" : 0,
    		...
    }

I only get the query covered on documents, when I'm **not** using a range operator:

    db.doc.explain(true).find({doc:d1.doc},{_id:0,doc:1})
    {
    	...
    	"executionStats" : {
    		"executionSuccess" : true,
    		"nReturned" : 1,
    		"executionTimeMillis" : 0,
    		"totalKeysExamined" : 1,
    		"totalDocsExamined" : 0,
    		...
    }

Even $in would result in a covered query:

    var dis = db.doc.distinct("doc")
    db.doc.explain(true).find({doc:{$in:dis}},{_id:0,doc:1})
    {
    ...
    "executionStats" : {
    		"executionSuccess" : true,
    		"nReturned" : 1000,
    		"executionTimeMillis" : 8,
    		"totalKeysExamined" : 1000,
    		"totalDocsExamined" : 0,
    		...
    }

However range operators such as $gt seem not to be able to result in a covered query when applied to documents. Is this an expected behaviour which is documented anywhere or is it a bug?

I'm using a replSet running with mongodb v3.6 Linux 64 Bit.
                                

Kay (211 rep)

May 20, 2019, 09:12 AM • Last activity: May 20, 2019, 06:10 PM

-2 votes

2 answers

121 views

Can a non-clustered index be transformed to a covering index in SQL Server?

sql-server nonclustered-index covering-index

I have a non-clustered index which includes one column. Now I want to alter the index and add additional two columns as covering columns. Is it possible? If yes, then can you, please, provide the T-SQL script.

                                  I have a non-clustered index which includes one column. Now I want to alter the index and add additional two columns as covering columns. Is it possible? If yes, then can you, please, provide the T-SQL script.

                                

igelr (2162 rep)

Dec 12, 2018, 08:02 AM • Last activity: Dec 12, 2018, 08:18 AM

Showing page 1 of 16 total questions