含有IN的子查询

本文主要探讨MySQL5.5和MYSQL5.6对包含in子查询的优化情况。

  • MySQL5.5

首先来看一下,在MySQL5.5及以下版本是如何处理的,比如下面这条语句:

select name from search_hash where info_hash in (select info_hash from search_filelist);

这种语句我们见的很多,熟悉mysql的人都会建议对其改写,但为什么呢?先看它的执行计划:

mysql> explain extended select name from search_hash where info_hash in (select info_hash from search_filelist);
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type        | table           | type            | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | PRIMARY            | search_hash     | ALL             | NULL          | NULL    | NULL    | NULL | 604435 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | search_filelist | unique_subquery | PRIMARY       | PRIMARY | 122     | func |      1 |   100.00 | Using index |
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+----------+-------------+

很奇怪吧,第二行显示查询的类型是相关子查询。这怎么就扯上相关子查询呢?再看一下优化器的执行情况:

mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                 |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `zsky`.`search_hash`.`name` AS `name` from `zsky`.`search_hash` where <in_optimizer>(`zsky`.`search_hash`.`info_hash`,<exists>(<primary_index_lookup>(<cache>(`zsky`.`search_hash`.`info_hash`) in search_filelist on PRIMARY))) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

MYSQL优化器会把这个查询改写为下面的相关子查询:
select name from search_hash where exists (select 1 from search_filelist where search_hash.info_hash = search_filelist.info_hash);
不信看下面,两者的执行计划极其相似。

mysql> explain select name from search_hash where exists (select 1 from search_filelist where search_hash.info_hash = search_filelist.info_hash);
+----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+
| id | select_type        | table           | type   | possible_keys | key     | key_len | ref                        | rows   | Extra       |
+----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+
|  1 | PRIMARY            | search_hash     | ALL    | NULL          | NULL    | NULL    | NULL                       | 604435 | Using where |
|  2 | DEPENDENT SUBQUERY | search_filelist | eq_ref | PRIMARY       | PRIMARY | 122     | zsky.search_hash.info_hash |      1 | Using index |
+----+--------------------+-----------------+--------+---------------+---------+---------+----------------------------+--------+-------------+

改写后的查询会对search_hash表做全表扫描,然后用每一个info_hash值再去 search_filelist表查,如果外面这个表很大的话,就会很慢。比如例子中的表有60万行,也就是子查询要执行60万次。一般别人会建议改写为表连接,比如下面这样:
select name from search_hash,search_filelist where search_hash.info_hash = search_filelist.info_hash;

mysql> explain select name from search_hash,search_filelist where search_hash.info_hash = search_filelist.info_hash ;
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
| id | select_type | table           | type  | possible_keys | key       | key_len | ref                            | rows   | Extra       |
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
|  1 | SIMPLE      | search_filelist | index | PRIMARY       | PRIMARY   | 122     | NULL                           | 144657 | Using index |
|  1 | SIMPLE      | search_hash     | ref   | info_hash     | info_hash | 123     | zsky.search_filelist.info_hash |      1 | Using where |
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+

这个时候是扫描整个search_filelist表,然后对每一个info_hash值去search_hash表中查。但为什么改写为表连接后就快些呢?
当然左表能够使用索引是一部分原因。此外需要注意到的是,优化器自动会选择记录较少的search_filelist作为左表。虽然都需要扫描整张表,但后者需要扫描的数据量少很多,执行的次数也少。一般情况下,in里面的子查询返回的结果集都会是主查询的子集,或者记录少于主查询,那么改写后的查询需要扫描的数量也就更少。

 

  • MySQL5.6

而在MySQL5.6中,优化器会自动将in 子查询改写为表连接。也就是说在MYSQL5.6及以上版本已经不需要手工优化包含in的子查询。下面是在MYSQL5.6中的的执行计划:

mysql> explain select name from search_hash where info_hash in (select info_hash from search_filelist);
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
| id | select_type | table           | type  | possible_keys | key       | key_len | ref                            | rows   | Extra       |
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
|  1 | SIMPLE      | search_filelist | index | PRIMARY       | PRIMARY   | 122     | NULL                           | 110474 | Using index |
|  1 | SIMPLE      | search_hash     | ref   | info_hash     | info_hash | 123     | zsky.search_filelist.info_hash |      1 | NULL        |
+----+-------------+-----------------+-------+---------------+-----------+---------+--------------------------------+--------+-------------+
转载请注明出处:http://www.cnblogs.com/ayard/
原文地址:https://www.cnblogs.com/ayard/p/9317507.html