Sample Header Ad - 728x90

Index Strategy for MariaDB Query that matches on one column and between two others

2 votes
1 answer
219 views
I have a table that contains a set of measurements for a continuous stream of processes. Although each process is individual, they are categorized into groups. The events have a start and end timestamp and a process group identifier. The table structure is as follows (InnoDB, MariaDB 10): **Table Name: measurements** CREATE TABLE measurements ( row_id int(11) NOT NULL AUTO_INCREMENT, process_name varchar(100) COLLATE utf8_bin NOT NULL, process_id int(11) NOT NULL, process_group_id tinyint(4) NOT NULL, measurement_1 float NOT NULL, measurement_2 float NOT NULL, measurement_3 float NOT NULL, measurement_4 float NOT NULL, start_timestamp int(11) NOT NULL, end_timestamp int(11) NOT NULL, PRIMARY KEY (row_id), KEY process_group_id (process_group_id, start_timestamp,end_timestamp), KEY process_id (process_id) ) ENGINE=InnoDB AUTO_INCREMENT=7294932 DEFAULT CHARSET=utf8 COLLATE=utf8_bin I'm designing a query to obtain the sum of measurements 1,2,3 & 4 for all processes running within a group at a particular point in time so that the app can express each measurement for a specific process as a percentage of the total measurements in the group at that time. The start and end times of processes within a group are not synchronized and they are of variable length. So for a process running in Group 5, at timestamp 1431388800 SELECT SUM(measurement_1), SUM(measurement_2), SUM(measurement_3), SUM(measurement_4) FROM measurements WHERE process_group_id = 5 AND 1431388800 BETWEEN start_timestamp AND end_timestamp This query runs, but takes around 0.5s. The table has 8m records and grows by about 30,000 a day. I have an index on process_group_id, start_timestamp, end_timestamp. However, the query does not appear to use anything but the process_group_id part of the index. I created an additional index on process_group_id alone to check this, and once created EXPLAIN showed it using this index. After some searching, I saw a suggestion to modify the query and add an ORDER BY clause. Having done this the query is accelerated to around 0.06s and it seems to use the full index. However, I'm unsure as to why: SELECT process_group_id, SUM(measurement_1), SUM(measurement_2), SUM(measurement_3), SUM(measurement_4) FROM measurements WHERE process_group_id = 5 AND 1431388800 BETWEEN start_timestamp AND end_timestamp ORDER BY process_group_id ASC With 30,000 new records a day that requires their shares to be calculated, 0.06s is still not particularly fast. Is there a better way of structuring either the table or designing the query to get a few orders of magnitude quicker, or is a query which matches on one column and then a range query on two others always going to be fairly slow to run?
Asked by Esteban (21 rep)
May 12, 2015, 08:51 PM
Last activity: Aug 1, 2025, 06:04 PM