Index Strategy for MariaDB Query that matches on one column and between two others
2
votes
1
answer
219
views
I have a table that contains a set of measurements for a continuous stream of processes. Although each process is individual, they are categorized into groups. The events have a start and end timestamp and a process group identifier.
The table structure is as follows (InnoDB, MariaDB 10):
**Table Name: measurements**
CREATE TABLE
measurements
(
row_id
int(11) NOT NULL AUTO_INCREMENT,
process_name
varchar(100) COLLATE utf8_bin NOT NULL,
process_id
int(11) NOT NULL,
process_group_id
tinyint(4) NOT NULL,
measurement_1
float NOT NULL,
measurement_2
float NOT NULL,
measurement_3
float NOT NULL,
measurement_4
float NOT NULL,
start_timestamp
int(11) NOT NULL,
end_timestamp
int(11) NOT NULL,
PRIMARY KEY (row_id
),
KEY process_group_id
(process_group_id
,
start_timestamp
,end_timestamp
),
KEY process_id
(process_id
)
) ENGINE=InnoDB
AUTO_INCREMENT=7294932
DEFAULT CHARSET=utf8 COLLATE=utf8_bin
I'm designing a query to obtain the sum of measurements 1,2,3 & 4 for all processes running within a group at a particular point in time so that the app can express each measurement for a specific process as a percentage of the total measurements in the group at that time. The start and end times of processes within a group are not synchronized and they are of variable length.
So for a process running in Group 5, at timestamp 1431388800
SELECT SUM(measurement_1),
SUM(measurement_2),
SUM(measurement_3),
SUM(measurement_4)
FROM measurements
WHERE process_group_id = 5
AND 1431388800
BETWEEN start_timestamp
AND end_timestamp
This query runs, but takes around 0.5s. The table has 8m records and grows by about 30,000 a day.
I have an index on process_group_id, start_timestamp, end_timestamp. However, the query does not appear to use anything but the process_group_id part of the index. I created an additional index on process_group_id alone to check this, and once created EXPLAIN showed it using this index.
After some searching, I saw a suggestion to modify the query and add an ORDER BY clause. Having done this the query is accelerated to around 0.06s and it seems to use the full index. However, I'm unsure as to why:
SELECT process_group_id,
SUM(measurement_1),
SUM(measurement_2),
SUM(measurement_3),
SUM(measurement_4)
FROM measurements
WHERE process_group_id = 5
AND 1431388800
BETWEEN start_timestamp
AND end_timestamp
ORDER BY process_group_id ASC
With 30,000 new records a day that requires their shares to be calculated, 0.06s is still not particularly fast. Is there a better way of structuring either the table or designing the query to get a few orders of magnitude quicker, or is a query which matches on one column and then a range query on two others always going to be fairly slow to run?
Asked by Esteban
(21 rep)
May 12, 2015, 08:51 PM
Last activity: Aug 1, 2025, 06:04 PM
Last activity: Aug 1, 2025, 06:04 PM