mysql-explain

mysql query optimizer

在前面的mysql架构中提到了mysql数据库中有一个专门负责优化 select 语句的优化器模块。它的主要功能就是通过计算分析系统中收集的统计信息，为客户端的 Query 提供系统认为最优的执行计划（他认为最优的数据检索方式，但不见得是DBA认为是最优的，这部分最耗费时间）。

当客户端向mysql请求一条 Query ，命令解析器模块完成分类，区别出是select 并转发给mysql query optimizer时，mysql query optimizer 首先会对整条 query 进行优化，处理掉一些常量表达式的预算，然后分析 Query 中 Hint 信息（如果有），看显示Hint信息是否可以完全确定该 Query 的执行计划。如果没有 Hint 或 Hint 信息还不足以完全确定执行计划，则会读取所涉及对象的统计信息，根据Query 进行写相应的计算分析，然后再得到最后的执行计划。

我们没有能力修改mysql底层源码，那么只能写出尽量满足优化器所期望的sql语句来提高查询效率。

sql慢以及性能下降的原因

从程序员本身找原因，无非以下几点。

查询语句写的烂
索引失效
关联查询太多join(设计缺陷或不得已的需求)
服务器相关参数设置不合理(缓冲线程数等)

要解决前两个问题，就需要使用到一个查询优化神器-explain工具，下面是explain的总结。

explain

简单定义：

使用explain关键字可以模拟优化器执行sql语句，从而知道mysql是如何处理sql语句的, 分析使用到的和可能使用到的索引列，及最终查询的rows数，通过降低核心指标rows数起到对mysql查询语句优化的结果，

使用方式：explain + sql语句

结果的表头：

+----+-------------+----------+------------+-------+------------------+------------------+---------+------+------+----------+-----------------------+
| id | select_type | table    | partitions | type  | possible_keys    | key              | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+----------+------------+-------+------------------+------------------+---------+------+------+----------+-----------------------+

explain结果的字段信息有点多，下面分别来进行介绍。当然也可以参考官网的解释。

sql准备

建表sql

CREATE TABLE `t_classes` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `classes_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL COMMENT '班级名称',
 `grade` int(11) NOT NULL COMMENT '年级',
 `student_num` int(11) NOT NULL COMMENT '学生数',
 `head_teacher_id` int(11) NOT NULL COMMENT '班主任',
 `status` tinyint(4) NOT NULL COMMENT '状态',
 `create_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL COMMENT '创建时间',
 `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最后更新时间',
 PRIMARY KEY (`id`),
 KEY `idx_grade` (`grade`) COMMENT '年级索引',
 KEY `idx_name_grade` (`grade`,`classes_name`) USING BTREE,
 KEY `idx_classes_name` (`classes_name`) USING BTREE COMMENT '班级名索引',
 KEY `idx_name_grade_student_num_teacher` (`grade`,`classes_name`,`student_num`,`head_teacher_id`),
 KEY `idx_create_time` (`create_time`),
 KEY `idx_student_num` (`student_num`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `t_score` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `student_id` int(11) NOT NULL COMMENT '学生id',
 `score` int(255) NOT NULL COMMENT '分数',
 `subject_id` int(11) NOT NULL COMMENT '学科id',
 `teacher_id` int(11) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `idx_user` (`student_id`),
 KEY `idx_score` (`score`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `t_student` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '学生id',
 `classes_id` int(11) DEFAULT NULL COMMENT '班级',
 `student_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL COMMENT '学生姓名',
 `age` int(11) NOT NULL COMMENT '年龄',
 `id_card` varchar(255) COLLATE utf8_unicode_ci NOT NULL COMMENT '身份证号',
 `address` varchar(255) COLLATE utf8_unicode_ci NOT NULL COMMENT '地址',
 `no` int(11) NOT NULL COMMENT '学号',
 `status` tinyint(4) NOT NULL COMMENT '状态',
 PRIMARY KEY (`id`),
 UNIQUE KEY `idx_id_card` (`id_card`),
 KEY `idx_classes_id` (`classes_id`),
 KEY `idx_classes_name_card_no` (`classes_id`,`student_name`,`id_card`,`no`),
 KEY `idx_student_name` (`student_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `t_student_copy1` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '学生id',
 `classes_id` int(11) NOT NULL COMMENT '班级',
 `student_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL COMMENT '学生姓名',
 `age` int(11) NOT NULL COMMENT '年龄',
 `id_card` varchar(255) COLLATE utf8_unicode_ci NOT NULL COMMENT '身份证号',
 `address` varchar(255) COLLATE utf8_unicode_ci NOT NULL COMMENT '地址',
 `no` int(11) NOT NULL COMMENT '学号',
 `status` tinyint(4) NOT NULL COMMENT '状态',
 PRIMARY KEY (`id`),
 UNIQUE KEY `idx_id_card` (`id_card`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `t_subject` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '学科id',
 `subject_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL COMMENT '学科名',
 `classes_id` int(11) NOT NULL COMMENT '班级id',
 `teacher_id` int(11) DEFAULT NULL COMMENT '教师id',
 PRIMARY KEY (`id`),
 KEY `idx_teacher_id` (`teacher_id`),
 KEY `idx_classes_id` (`classes_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `t_teacher` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `teacher_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL COMMENT '教师姓名',
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `t_user` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(2550) COLLATE utf8_unicode_ci NOT NULL,
 `birthday` date NOT NULL,
 `sex` tinyint(255) NOT NULL,
 `address` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
 KEY `idx_name` (`name`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

添加数据

insert into t_classes(classes_name,grade,student_num,head_teacher_id,status) values
	('1班', 1, 50, 1, 0),
	('2班', 1, 47, 2, 1),
	('1班', 2, 60, 3, 2),
	('2班', 2, 32, 4, 0),
	('3班', 2, 50, 5, 0),
	('1班', 3, 48, 6, 0);


insert into t_score(student_id, score, subject_id, teacher_id) values
	(1, 78, 1, 1),
	(2, 89, 2, 3),
	(3, 97, 2, 4),
	(1, 68, 3, 5),
	(2, 89, 2, 1),
	(2, 65, 5, 3),
	(3, 99, 4, 1),
	(4, 89, 5, 2),
	(4, 78, 2, 2),
	(5, 78, 5, 3),
	(5, 90, 6, 4),
	(5, 49, 8, 2),
	(6, 78, 7, 5),
	(7, 78, 1, 6);
	(8, 78, 1, 6);
	(9, 78, 1, 6);
	(10, 78, 1, 6);
	(11, 78, 1, 6);
	(12, 78, 1, 6);


insert into t_student_copy1(classes_id, student_name, age, id_card, address, no, status) values 
	(1, 'a', 13, '1', '上海', 1, 0),
	(2, 'ab', 13, '2', '上海', 2, 0),
	(3, 'abc', 13, '3', '上海', 3,0),
	(4, 'c', 13, '4', '上海', 4, 0),
	(5, 'aa', 13, '5', '上海', 5, 0),
	(6, 'xfasf', 13, '6', '上海', 6, 0),
	(1, 'asfa', 13, '7', '上海', 7, 0),
	(2, 'addfs', 13, '8', '上海', 8, 0),
	(3, 'aeqwr', 13, '9', '上海', 9, 0),
	(4, 'afasd', 13, '10', '上海', 10, 0),
	(5, 'acc', 13, '11', '上海', 11, 0),
	(6, 'aadfg', 13, '12', '上海', 12, 0 );


insert into t_subject(subject_name, classes_id, teacher_id) values
	('数学', 1, 1),
	('语文', 5, 2),
	('英语', 6, 3),
	('化学', 4, 4),
	('美术', 3, 5),
	('物理', 2, 6);

insert into t_teacher(teacher_name) values
	('李平'),
	('张三'),
	('李四'),
	('赵柳'),
	('孙乾'),
	('王五');

insert into t_user(name, birthday, sex, address) values
	('abc', '1998-09-01', 0, '南京'),
	('ba', '1998-09-01', 0, '南京'),
	('abasfc', '1998-09-01', 0, '南京'),
	('abasdfc', '1998-09-01', 0, '南京'),
	('abasdfc', '1998-09-01', 0, '南京'),
	('134abc', '1998-09-01', 0, '南京'),
	('a234bc', '1998-09-01', 0, '南京'),
	('abasfdc', '1998-09-01', 0, '南京'),
	('sdfadbc', '1998-09-01', 0, '南京'),
	('vasdfqwer', '1998-09-01', 0, '南京'),
	('aafsdf', '1998-09-01', 0, '南京');

id

select查询的序列号，包含一组数字，表示查询中执行select子句或操作表的顺序, 有几个 select 就有几个id，总共的情况可以以下三种情况:

id相同，执行顺序由上至下
id不同，如果是子查询，id的序号会递增，id值越大优先级越高，越先被执行
id为null, 最后执行

mysql> EXPLAIN SELECT *,(select classes_name from t_classes) classes_name FROM t_student WHERE classes_id = 1;
+----+-------------+-----------+------------+-------+-----------------------------------------+------------------+---------+-------+------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys                           | key              | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------+------------+-------+-----------------------------------------+------------------+---------+-------+------+----------+-------------+
|  1 | PRIMARY     | t_student | NULL       | ref   | idx_classes_id,idx_classes_name_card_no | idx_classes_id   | 5       | const |    2 |   100.00 | NULL        |
|  2 | SUBQUERY    | t_classes | NULL       | index | NULL                                    | idx_classes_name | 767     | NULL  |    6 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+-----------------------------------------+------------------+---------+-------+------+----------+-------------+
2 rows in set (0.03 sec)

select_type

表示查询的类型, 大概有以下一些值

① simple：表示不需要union操作或者不包含子查询的简单select查询或连接查询

② primary: 一个需要union操作或含有子查询的select，位于最外层的单位查询的select_type即为primary。

③ union：连接两个select查询

④ union result：包含union的结果集，在union和union all语句中,因为它不需要参与查询，所以id字段为null

⑤ dependent subquery：表示这个subquery的查询要受到外部表查询的影响

⑥ dependent union：与union一样，出现在union 或union all语句中，但是这个查询要受到外部查询的影响

⑦ subquery : 除了from字句中包含的子查询外，其他地方出现的子查询都可能是subquery

⑧ from字句中出现的子查询，也叫做派生表。

mysql>  EXPLAIN select * from t_classes;
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | t_classes | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    6 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set (0.04 sec)

table

显示这一行的数据是关于哪张表的.

table列包含以下四种值：

NULL：比如select一些与数据库表无关的内容，如select now()
<union M, N>：表示由UNION操作产生的临时表，M和N表示产生临时表的源表
：表示是由id为M的表派生而来的临时表
：表示是由id为M的子查询物化而来的临时表

type

功能是显示查询使用了何种类型.

type一共有12种方式, 详情参考官网, 这里只列出比较重要的几种.

system: 表只有一行记录（等于系统表），这是const类型的特例，平时不会出现，这个也可以忽略不计
constant: 表示通过索引一次就找到了，const用于比较primary key或者unique索引。因为只匹配一行数据，所以很快。如将主键至于where列表中，MySQL就能将该查询转换为一个常量
eq_ref: 唯一性索引，对于每个索引键，表中只有一条记录与之匹配，常见于主键或唯一索引扫描
ref: 非唯一索引扫描，返回匹配某个单独值的所有行。
本质上也是一种索引访问，它返回所有匹配某个单独值的行，然而，
它可能会找到多个符合条件的行，所以他应该属于查找和扫描的混合体
range: 只检索给定范围的行，使用一个索引来选择行。key列显示使用了哪个索引
一般就是在你的where语句中出现了between、<、>、in等的查询
这种范围扫描索引扫描比全表扫描要好，因为他只需要开始索引的某一点，而结束语另一点，不用扫描全部索引
index: Full Index Scan, index与ALL区别为index类型只遍历索引树。这通常比ALL快，因为索引文件通常比数据文件小。（也就是说虽然all和index都是读全表，但index是从索引中读取的，而all是从硬盘中读的）
all: Full Table Scan, 将遍历全表以找到匹配的行

上述几种性能从好到坏分别 system > const > eq_ref > ref > range > index > ALL , 一般来说，得保证查询至少达到range级别，最好达到ref.

mysql> explain select * from t_subject where teacher_id = 1; 
+----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys  | key            | key_len | ref   | rows | filtered | Extra |
+----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | t_subject | NULL       | ref  | idx_teacher_id | idx_teacher_id | 5       | const |    1 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+----------------+----------------+---------+-------+------+----------+-------+
1 row in set (0.06 sec)

mysql> EXPLAIN select * from t_classes where id in (1,2,3);
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t_classes | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |    3 |   100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set (0.07 sec)

possible_keys

显示可能应用在这张表中的索引,一个或多个。查询涉及的字段上若存在索引，则该索引将被列出，但不一定被查询实际使用。

mysql> explain select id from t_student where classes_id = 1 ;
+----+-------------+-----------+------------+------+-----------------------------------------+----------------+---------+-------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys                           | key            | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+-----------------------------------------+----------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | t_student | NULL       | ref  | idx_classes_id,idx_classes_name_card_no | idx_classes_id | 5       | const |    2 |   100.00 | Using index |
+----+-------------+-----------+------------+------+-----------------------------------------+----------------+---------+-------+------+----------+-------------+
1 row in set (0.06 sec)

key

key列表示经过查询优化器计算使用不同索引成本后决定使用的索引名，即实际使用的索引列。如果为null则没有使用索引。

查询中若使用了覆盖索引，则索引和查询的select字段重叠

key_len

key_len列表示当优化器决定使用某个索引执行查询时，该索引记录的最大长度，它是由这三个部分构成的：

① 对于使用固定长度类型的索引列来说，它实际占用的存储空间的最大长度就是该固定值，对于指定字符集的变长类型的索引列来说，比如某个索引列的类型是varchar(255)，使用的字符集是utf8（MySQL utf8字符集占用3个字节长度），那么该列实际占用的最大存储空间就是255 × 3 + 2= 767个字节。

② 如果该索引列可以存储NULL值，则key_len比不可以存储NULL值时多1个字节。

③ 对于变长字段来说，都会有2个字节的空间来存储该变长列的实际长度。

# classes_id字段是int类型，可以为空，占用了四个字节 + 1可以为空的字节，长度是5
mysql> EXPLAIN select id from t_student where classes_id = 1;
+----+-------------+-----------+------------+------+-----------------------------------------+----------------+---------+-------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys                           | key            | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+-----------------------------------------+----------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | t_student | NULL       | ref  | idx_classes_id,idx_classes_name_card_no | idx_classes_id | 5       | const |    2 |   100.00 | Using index |
+----+-------------+-----------+------------+------+-----------------------------------------+----------------+---------+-------+------+----------+-------------+
1 row in set (0.06 sec)

ref

当使用索引列等值匹配的条件去执行查询时，也就是在访问方法是const、eq_ref、ref、ref_or_null、unique_subquery、index_subquery其中之一时，ref列展示的就是与索引列作等值匹配的条件，如常数或某个列。

# 这里与索引列student_name做等值匹配的是一个具体的字符串,是常数, 所以ref这列显示const
mysql> explain select id from t_student where student_name = '';
+----+-------------+-----------+------------+------+------------------+------------------+---------+-------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys    | key              | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+------------------+------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | t_student | NULL       | ref  | idx_student_name | idx_student_name | 767     | const |    1 |   100.00 | Using index |
+----+-------------+-----------+------------+------+------------------+------------------+---------+-------+------+----------+-------------+
1 row in set (0.07 sec)

rows

根据表统计信息及索引选用情况，大致估算出找到所需的记录所需要读取的行数. 这个可以算是最重要的指标了,因为所有的优化的最终目的都是为了减少最后扫描的行数, 也就是减少rows这个值.

使用了索引, 最后的扫描的行数为12

mysql> EXPLAIN select t_student.id from t_student ; 
+----+-------------+-----------+------------+-------+---------------+----------------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key            | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+----------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t_student | NULL       | index | NULL          | idx_classes_id | 5       | NULL |   12 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+----------------+---------+------+------+----------+-------------+
1 row in set (0.06 sec)

filtered

filtered列在单表查询时没什么意义，但是在连接查询时意义重大，凭借该列和rows列可以粗略估算出当前连接的查询成本，它指返回结果的行占需要读到的行(rows列的值)的百分比。

Extra

Extra列是用来说明一些额外信息的，我们可以通过这些额外信息来更准确的理解MySQL到底将如何执行给定的查询语句。MySQL提供的额外信息有好几十个，就不一个一个介绍了，在这只介绍常见的一些额外信息说明 .

① Using filesort: 如果根据索引列进行排序（order by 索引列）是可以用到索引的，SQL查询引擎会先根据索引列进行排序，然后获取对应记录的主键id执行回表操作，如果排序字段用不到索引则只能在内存中或磁盘中进行排序操作，MySQL把这种在内存或者磁盘上进行排序的方式统称为文件排序（英文名：filesort），如果某个查询需要使用文件排序的方式执行查询，就会在执行计划的Extra列中显示Using filesort

② Using temporary: 许多查询的执行过程中，MySQL会借助临时表来完成一些功能，比如去重、排序之类的，比如我们在执行许多包含distinct、group by、union等子句的查询过程中，如果不能有效利用索引来完成查询，MySQL很有可能寻求通过建立内部的临时表来执行查询。如果查询中使用到了内部的临时表，在执行计划的Extra列将会显示Using temporary提示.

③ USING index: 表示相应的select操作中使用了覆盖索引（Covering Index）,避免回表操作，效率不错！
如果同时出现using where，表明索引被用来执行索引键值的查找；如果没有同时出现using where，表名索引用来读取数据而非执行查找动作。

④ Using where: 使用了where过滤

⑤ using join buffer: 在连接查询执行过程中，当被驱动表不能有效的利用索引加快访问速度，MySQL一般会为其分配一块名叫join buffer的内存块来加快查询速度

⑥ impossible where: where子句的值总是false，不能用来获取任何元组

⑦ select tables optimized away: 在没有GROUPBY子句的情况下，基于索引优化MIN/MAX操作或者对于MyISAM存储引擎优化COUNT(*)操作，不必等到执行阶段再进行计算，查询执行计划生成的阶段即完成优化。

⑧ distinct: 优化distinct，在找到第一匹配的元组后即停止找同样值的工作

⑨ Using index condition：查找使用了索引，但是需要回表查询数据

mysql> EXPLAIN select * from t_student where classes_id > 10;
+----+-------------+-----------+------------+-------+-----------------------------------------+----------------+---------+------+------+----------+-----------------------+
| id | select_type | table     | partitions | type  | possible_keys                           | key            | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-----------+------------+-------+-----------------------------------------+----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | t_student | NULL       | range | idx_classes_id,idx_classes_name_card_no | idx_classes_id | 5       | NULL |    1 |   100.00 | Using index condition |
+----+-------------+-----------+------------+-------+-----------------------------------------+----------------+---------+------+------+----------+-----------------------+
1 row in set (0.06 sec)