Index Strategy for MariaDB Query that matches on one column and between two others

2 votes
1 answer
219 views
                          I have a table that contains a set of measurements for a continuous stream of processes. Although each process is individual, they are categorized into groups. The events have a start and end timestamp and a process group identifier.

The table structure is as follows (InnoDB, MariaDB 10):

**Table Name: measurements**
 
     CREATE TABLE measurements (
     row_id int(11) NOT NULL AUTO_INCREMENT,
     process_name varchar(100) COLLATE utf8_bin NOT NULL,
     process_id int(11) NOT NULL,
     process_group_id tinyint(4) NOT NULL,
     measurement_1 float NOT NULL,
     measurement_2 float NOT NULL,
     measurement_3 float NOT NULL,
     measurement_4 float NOT NULL,
     start_timestamp int(11) NOT NULL,
     end_timestamp int(11) NOT NULL,
     PRIMARY KEY (row_id),
     KEY process_group_id (process_group_id,
         start_timestamp,end_timestamp),
     KEY process_id (process_id)
    ) ENGINE=InnoDB 
      AUTO_INCREMENT=7294932 
      DEFAULT CHARSET=utf8 COLLATE=utf8_bin

I'm designing a query to obtain the sum of measurements 1,2,3 & 4 for all processes running within a group at a particular point in time so that the app can express each measurement for a specific process as a percentage of the total measurements in the group at that time. The start and end times of processes within a group are not synchronized and they are of variable length. 

So for a process running in Group 5, at timestamp 1431388800

    SELECT  SUM(measurement_1), 
            SUM(measurement_2), 
            SUM(measurement_3),
            SUM(measurement_4)
        FROM  measurements
        WHERE  process_group_id = 5
          AND  1431388800 
               BETWEEN start_timestamp 
                   AND end_timestamp 

This query runs, but takes around 0.5s. The table has 8m records and grows by about 30,000 a day.

I have an index on process_group_id, start_timestamp, end_timestamp. However, the query does not appear to use anything but the process_group_id part of the index. I created an additional index on process_group_id alone to check this, and once created EXPLAIN showed it using this index.

After some searching, I saw a suggestion to modify the query and add an ORDER BY clause. Having done this the query is accelerated to around 0.06s and it seems to use the full index. However, I'm unsure as to why:

    SELECT  process_group_id, 
            SUM(measurement_1), 
            SUM(measurement_2),
            SUM(measurement_3), 
            SUM(measurement_4)
        FROM  measurements
        WHERE  process_group_id = 5
          AND  1431388800 
      BETWEEN start_timestamp 
          AND end_timestamp
        ORDER BY  process_group_id ASC 

With 30,000 new records a day that requires their shares to be calculated, 0.06s is still not particularly fast. Is there a better way of structuring either the table or designing the query to get a few orders of magnitude quicker, or is a query which matches on one column and then a range query on two others always going to be fairly slow to run?
                        
Asked by Esteban (21 rep)
May 12, 2015, 08:51 PM
Last activity: Aug 1, 2025, 06:04 PM
Index Strategy for MariaDB Query that matches on one column and between two others

Related Questions