SQL优化（3）：使用explain了解SQL性能-part2

接上文，上文对type列用实例做了说明，本文对Extra列进行一些说明。

Extra列

Using filesort

前文说，需要对所有的查询结果进行一次排序，例如当使用order by时。但是若查询时用到了index，那么对于order by来说可能就不需要排序了，因为index数据就是按照有序的方式存储的，即按照index的方式进行排列即可。

按照某一列（非主键）进行排序

mysql> EXPLAIN SELECT * FROM t1 ORDER BY c_str_value;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 4915 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+

按照主键进行排序，可以看到这种情况下并没有filesort。因为InnoDB中，数据项在主键索引的叶节点上，所以等于说所有的数据是按照主键次序存储的，所以不用排序。

mysql> EXPLAIN SELECT * FROM t1 order by c_primary_key;
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------+
|  1 | SIMPLE      | t1    | index | NULL          | PRIMARY | 4       | NULL | 10034 | NULL  |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------+

如果我们在c_str_value上建立索引，那么索引数据就会多出一个二级索引（存的是当前column+对应的主键）。那么我们再次执行EXPLAIN，如下。结果显示同样需要filesort，为什么mysql不按照c_str_value的索引顺序读记录？偏偏还要进行一次排序呢。这是因为如果从索引读数据的话，是按照索引值->主键值->一行记录。那么我们可以知道在主键->一行记录的过程中，磁盘是随机读取，这样反而不如filesort来的快。

mysql> CREATE INDEX c_str_value_idx ON t1(c_str_value);
mysql> EXPLAIN SELECT * FROM t1 order by c_str_value;
+----+-------------+-------+------+---------------+------+---------+------+-------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 10034 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------------+

那么假如查询的列只是（主键，c_str_value）两个列时，不需要filesort，因为c_str_value的索引数据足够应对。

mysql> EXPLAIN SELECT c_primary_key,c_str_value FROM t1 order by c_str_value;
+----+-------------+-------+-------+---------------+-----------------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys | key             | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------+-----------------+---------+------+-------+-------------+
|  1 | SIMPLE      | t1    | index | NULL          | c_str_value_idx | 65      | NULL | 10034 | Using index |
+----+-------------+-------+-------+---------------+-----------------+---------+------+-------+-------------+
mysql> DROP INDEX c_str_value_idx ON t1;

如果仅仅是主键列+index列不满足需求！那么可以考虑加复合索引，例如需要查询（c_primary_key,c_multi_key_part1,c_multi_key_part2）这三列的话，那么可以给(c_multi_key_part1,c_multi_key_part2)加上index，那么这个索引数据足以应对这次排序，所以不用filesort了。

mysql> explain select c_primary_key,c_multi_key_part1,c_multi_key_part2 from t2 order by c_multi_key_part1;
+----+-------------+-------+-------+---------------+-------------------+---------+------+---------+-------------+
| id | select_type | table | type  | possible_keys | key               | key_len | ref  | rows    | Extra       |
+----+-------------+-------+-------+---------------+-------------------+---------+------+---------+-------------+
|  1 | SIMPLE      | t2    | index | NULL          | c_multi_key_part1 | 130     | NULL | 1179542 | Using index |
+----+-------------+-------+-------+---------------+-------------------+---------+------+---------+-------------+
mysql> explain select c_primary_key,c_multi_key_part1,c_multi_key_part2 from t2 where c_multi_key_part1='1' order by c_multi_key_part2;
+----+-------------+-------+------+-------------------+-------------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys     | key               | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+-------------------+-------------------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | t2    | ref  | c_multi_key_part1 | c_multi_key_part1 | 65      | const |    1 | Using where; Using index |
+----+-------------+-------+------+-------------------+-------------------+---------+-------+------+--------------------------+

在使用复合索引时，要注意复合索引的次序，若是上面SQL中的where条件改为（where c_multi_key_part2=’1’ order by c_multi_key_part1）那么就用不到这个复合索引了。

使用适当的index，来把排序工作放在Insert时完成！

Using index

若是用到了索引数据，那么Extra就会有Using index。

同时需要注意是否有Using where，若是没有，意味着index的所有数据将会被遍历，那么很可能性能也不是太高。

具体实例，可以参考前面的例子。

Using temporary

如字面意思使用临时表。

t1左关联t2表，并按照t2表进行排序，可以看到用了临时表+排序。从输出的字面解读来说，对于来自t1的每一行记录，都有t2的唯一一条（eq_fef）对应。但是t1并不知道t2表的字段c_primary_key是索引（有序的），所以最终到t1查完之后得到result时，仍然需要进行filesort。

mysql> explain select t1.c_primary_key, t1.c_unique_key, t2.c_primary_key from t1 left join t2 on t1.c_primary_key=t2.c_primary_key order by t2.c_primary_key;
+----+-------------+-------+--------+---------------+--------------+---------+-------------------------+---------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key          | key_len | ref                     | rows    | Extra                                        |
+----+-------------+-------+--------+---------------+--------------+---------+-------------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | t1    | index  | NULL          | c_unique_key | 65      | NULL                    | 1181681 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY      | 4       | dbTest.t1.c_primary_key |       1 | Using index                                  |
+----+-------------+-------+--------+---------------+--------------+---------+-------------------------+---------+----------------------------------------------+

对于以上sql，可以把t1和t2调整一下次序，可以看到，对于t2的每一行记录，t1都有唯一一行与之对应。对于匹配的每一行结果，t2表知道这已经是按照我的t2.c_primary_key的次序进行排序的了，所以即不用存储临时结果，又不用再做排序了。

mysql> explain select t1.c_primary_key, t1.c_unique_key, t2.c_primary_key from t2 left join t1 on t1.c_primary_key=t2.c_primary_key order by t2.c_primary_key;
+----+-------------+-------+--------+---------------+---------+---------+-------------------------+---------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                     | rows    | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-------------------------+---------+-------------+
|  1 | SIMPLE      | t2    | index  | NULL          | PRIMARY | 4       | NULL                    | 1179542 | Using index |
|  1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | dbTest.t2.c_primary_key |       1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+-------------------------+---------+-------------+

right join跟left join相反，left join看左边的表，而right join看右边的表!

实际上看explain输出的第一个表示谁！

Group By子句：

Group By可以使用index进行分组统计，这里索引（c_multi_key_part1,c_multi_key_part2）满足查询需求。

mysql> explain select max(c_multi_key_part2),c_multi_key_part1 from t1 group by c_multi_key_part1;
+----+-------------+-------+-------+-------------------+-------------------+---------+------+---------+-------------+
| id | select_type | table | type  | possible_keys     | key               | key_len | ref  | rows    | Extra       |
+----+-------------+-------+-------+-------------------+-------------------+---------+------+---------+-------------+
|  1 | SIMPLE      | t1    | index | c_multi_key_part1 | c_multi_key_part1 | 130     | NULL | 1181681 | Using index |
+----+-------------+-------+-------+-------------------+-------------------+---------+------+---------+-------------+

Group By中对索引的使用有两种，松散索引扫描和紧凑索引扫描。松散索引扫描需要读取的键值数量和分组的组的数量一样多，也就是比实际存在的键值数目少很多。而紧凑型扫描将读取所有满足条件的索引值。如下，统计c_key不同值的个数，Extra中Using index for group-by表示使用松散扫描。

mysql> explain select count(distinct c_key) from t1;
+----+-------------+-------+-------+---------------+-------+---------+------+---------+-------------------------------------+
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | rows    | Extra                               |
+----+-------------+-------+-------+---------------+-------+---------+------+---------+-------------------------------------+
|  1 | SIMPLE      | t1    | range | c_key         | c_key | 65      | NULL | 1181682 | Using index for group-by (scanning) |
+----+-------------+-------+-------+---------------+-------+---------+------+---------+-------------------------------------+

紧凑扫描的例子这里不再给出。

Group By在没有合适索引的情况下，会使用临时表存储结果，然后对临时表进行排序操作，才最终得到结果。

mysql> explain select c_str_value from t1 group by c_str_value;
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1181681 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+

而事实上，若是只要分组而不需要排序的情况下，那么可以使用oder by null，告诉服务器不需要对进过进行排序，如下，这样就没有了Using filesort。

mysql> explain select c_str_value from t1 group by c_str_value order by null;
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra           |
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1181681 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+

小结

在Extra，如果有Using temporary和Using filesort，一般来说性能都不会太好，所以需要重点考虑。想在短期或者几篇文档中把所有方面都搞清楚，看来也不现实，还是多在实践中积累吧。