optimization with subquery not working as expected
0
votes
1
answer
136
views
I have a MySQL InnoDB database. One table called
affymetrixProbeset
contains more than 300 million rows.
Querying this table with INNER JOIN
to other tables, with an ORDER BY
and offset/limit takes almost 4 minutes. I saw that creating a subquery could be more optimized. I tried to move from my initial query with INNER JOIN
to an approach using a subquery
The query is:
SELECT affymetrixProbeset.*
FROM affymetrixProbeset
WHERE affymetrixProbeset.bgeeAffymetrixChipId IN(
SELECT affymetrixChip.bgeeAffymetrixChipId
FROM affymetrixChip INNER JOIN cond ON affymetrixChip.conditionId = cond.conditionId
WHERE cond.speciesId = 9606)
order by affymetrixProbeset.affymetrixProbesetId, affymetrixProbeset.bgeeGeneId
limit 10;
The columns used in the ORDER BY
correspond to the primary key of the table affymetrixProbeset
.
The explain
related to this query is
+----+-------------+--------------------+------------+--------+------------------------------+-------------+---------+--------------------------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------+------------+--------+------------------------------+-------------+---------+--------------------------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | affymetrixChip | NULL | index | PRIMARY,conditionId | conditionId | 3 | NULL | 12990 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | cond | NULL | eq_ref | PRIMARY,speciesId | PRIMARY | 3 | bgee_v15_dev.affymetrixChip.conditionId | 1 | 10.99 | Using where |
| 1 | SIMPLE | affymetrixProbeset | NULL | ref | PRIMARY,bgeeAffymetrixChipId | PRIMARY | 3 | bgee_v15_dev.affymetrixChip.bgeeAffymetrixChipId | 175 | 100.00 | NULL |
+----+-------------+--------------------+------------+--------+------------------------------+-------------+---------+--------------------------------------------------+-------+----------+----------------------------------------------+
The explain is exactly the same as my initial query. The query takes the same time (~4 minutes)
It looks like the ORDER BY
is not done on the affymetrixProbeset
table but directly on the affymetrixChip
table that is part of the subquery.
I was expecting MySQL to run the subquery and then order using the affymetrixProbeset
table. Could you explain to me why it is not done as I expected?
I checked also the performance of separating this query in two.
The first one is:
SELECT affymetrixChip.bgeeAffymetrixChipId
FROM affymetrixChip INNER JOIN cond ON affymetrixChip.conditionId = cond.conditionId
WHERE cond.speciesId = 9606;
I then passed the bgeeAffymetrixChipId
s to the query
SELECT affymetrixProbeset.*
FROM affymetrixProbeset
WHERE affymetrixProbeset.bgeeAffymetrixChipId IN(.....)
order by affymetrixProbeset.affymetrixProbesetId, affymetrixProbeset.bgeeGeneId
limit 10;
I wrote .....
in the query as I had more than 5000 IDs.
The query run almost instantaneous as it order using the primary key.
Could you please explain why the subquery did not perform as I expected?
Is there an other option to optimize the query than creating 2 queries?
UPDATE:
After adding some indexes proposed by @Rick, the explain
now looks like :
+----+-------------+--------------------+------------+--------+-----------------------------------------------------+-------------+---------+------------------------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------+------------+--------+-----------------------------------------------------+-------------+---------+------------------------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | affymetrixChip | NULL | index | PRIMARY,conditionId | conditionId | 6 | NULL | 12561 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | cond | NULL | eq_ref | PRIMARY,speciesId | PRIMARY | 3 | bgee_v15_0.affymetrixChip.conditionId | 1 | 10.94 | Using where |
| 1 | SIMPLE | affymetrixProbeset | NULL | ref | PRIMARY,bgeeAffymetrixChipId_2,bgeeAffymetrixChipId | PRIMARY | 3 | bgee_v15_0.affymetrixChip.bgeeAffymetrixChipId | 174 | 100.00 | NULL |
+----+-------------+--------------------+------------+--------+-----------------------------------------------------+-------------+---------+------------------------------------------------+-------+----------+----------------------------------------------+
The query still takes 3 minutes to run.
The SHOW CREATE TABLE
of the PK/FK/indexes of these tables are :
# table cond
PRIMARY KEY (conditionId
),
UNIQUE KEY speciesId
(speciesId
,conditionId
),
UNIQUE KEY anatEntityId
(anatEntityId
,cellTypeId
,stageId
,speciesId
,sex
,sexInferred
,strain
),
KEY exprMappedConditionId
(exprMappedConditionId
),
KEY cellTypeId
(cellTypeId
),
KEY stageId
(stageId
),
CONSTRAINT cond_ibfk_1
FOREIGN KEY (exprMappedConditionId
) REFERENCES cond
(conditionId
) ON DELETE CASCADE,
CONSTRAINT cond_ibfk_2
FOREIGN KEY (anatEntityId
) REFERENCES anatEntity
(anatEntityId
) ON DELETE CASCADE,
CONSTRAINT cond_ibfk_3
FOREIGN KEY (cellTypeId
) REFERENCES anatEntity
(anatEntityId
) ON DELETE CASCADE,
CONSTRAINT cond_ibfk_4
FOREIGN KEY (stageId
) REFERENCES stage
(stageId
) ON DELETE CASCADE,
CONSTRAINT cond_ibfk_5
FOREIGN KEY (speciesId
) REFERENCES species
(speciesId
) ON DELETE CASCADE
# table affymetrixChip
PRIMARY KEY (bgeeAffymetrixChipId
),
UNIQUE KEY affymetrixChipId
(affymetrixChipId
,microarrayExperimentId
),
UNIQUE KEY conditionId
(conditionId
,bgeeAffymetrixChipId
),
KEY microarrayExperimentId
(microarrayExperimentId
),
KEY chipTypeId
(chipTypeId
),
CONSTRAINT affymetrixChip_ibfk_1
FOREIGN KEY (microarrayExperimentId
) REFERENCES microarrayExperiment
(microarrayExperimentId
) ON DELETE CASCADE,
CONSTRAINT affymetrixChip_ibfk_2
FOREIGN KEY (chipTypeId
) REFERENCES chipType
(chipTypeId
) ON DELETE SET NULL,
CONSTRAINT affymetrixChip_ibfk_3
FOREIGN KEY (conditionId
) REFERENCES cond
(conditionId
) ON DELETE CASCADE
#table affymetrixProbeset
PRIMARY KEY (bgeeAffymetrixChipId
,affymetrixProbesetId
),
UNIQUE KEY bgeeAffymetrixChipId_2
(bgeeAffymetrixChipId
,affymetrixProbesetId
,bgeeGeneId
),
KEY bgeeAffymetrixChipId
(bgeeAffymetrixChipId
,expressionId
,bgeeGeneId
,normalizedSignalIntensity
),
KEY expressionId
(expressionId
),
KEY bgeeGeneId
(bgeeGeneId
,bgeeAffymetrixChipId
,affymetrixProbesetId
),
CONSTRAINT affymetrixProbeset_ibfk_1
FOREIGN KEY (bgeeAffymetrixChipId
) REFERENCES affymetrixChip
(bgeeAffymetrixChipId
) ON DELETE CASCADE,
CONSTRAINT affymetrixProbeset_ibfk_2
FOREIGN KEY (bgeeGeneId
) REFERENCES gene
(bgeeGeneId
) ON DELETE CASCADE,
CONSTRAINT affymetrixProbeset_ibfk_3
FOREIGN KEY (expressionId
) REFERENCES expression
(expressionId
) ON DELETE SET NULL
UPDATE 2 :
Sizes of the tables as result of SHOW TABLE STATUS
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment
| affymetrixChip | InnoDB | 10 | Dynamic | 13020 | 122 | 1589248 | 0 | 1097728 | 2097152 | 104523 | 2022-12-02 13:23:15 | NULL | NULL | utf8_general_ci | NULL | | |
| affymetrixProbeset | InnoDB | 10 | Dynamic | 312237020 | 93 | 29169287168 | 0 | 41072721920 | 6291456 | NULL | 2022-10-31 09:34:15 | NULL | NULL | utf8_general_ci | NULL | |
| cond | InnoDB | 10 | Dynamic | 44578 | 106 | 4734976 | 0 | 6815744 | 2097152 | 44663 | 2022-12-02 13:19:09 | NULL | NULL | utf8_general_ci | NULL | |
The query below gives 4889 results
SELECT COUNT(*) FROM cond WHERE speciesId = 9606;
The query below gives 5452 results
SELECT COUNT(bgeeAffymetrixChipId) FROM affymetrixChip INNER JOIN cond ON cond.conditionId = affymetrixChip.conditionId WHERE cond.speciesId = 9606;
Asked by jwollbrett
(1 rep)
Nov 25, 2022, 05:18 PM
Last activity: Aug 10, 2025, 03:05 PM
Last activity: Aug 10, 2025, 03:05 PM