如何找到值得优化的查询语句

原文链接：http://www.mysqlperformanceblog.com/2012/09/11/how-to-find-mysql-queries-worth-optimizing/

One question I often get is how one can find out queries which should be optimized. By looking at pt-query-digest report it is easy to find slow queries or queries which cause the large portion of the load on the system but how do we know whenever there is any possibility to make this query run better ? The full answer to this question will indeed require complex analyses as there are many possible ways query can be optimized. There is however one extremely helpful metric which you can use – ratio between rows sent and rows analyzed. Lets look at this example:

我经常收到的一个问题是如何找到需要被优化的查询语句，通过看pt-query-digest报告很容易找出慢查询或那些占用服务器大部分负载的查询，但是我们如何知道是否有可能让这些查询性能表现出色点？完整的回答这个问题确实需要综合分析，因为存在很多种有可能的手段去实施优化。然而，还是有一个非常有用的标准你可以使用--发送的行和分析（扫描）的行的比例，我们看看下面的例子：

# Time: 120911 17:09:44
# User@Host: root[root] @ localhost []
# Thread_id: 64914  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 9.031233  Lock_time: 0.000086  Rows_sent: 0  Rows_examined: 10000000  Rows_affected: 0  Rows_read: 0
# Bytes_sent: 213  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F03
use sbtest;
SET timestamp=1347397784;
select * from sbtest where pad='abc';

The query in this case has sent zero rows (as there are no matches) but it had to examine 10Mil rows to produce result. What would be good scenario ? – query examining same amount of rows as they end up sending. In this case if I index the table I get the following record in the slow query log:

这个例子里的查询发送了0条记录（因为没有匹配行），但是它检查了1000W行记录，才生成结果。怎样才是好的情况？查询检查的行数和最终发送的行数相同的才是好的。在这个例子里如果对这个表加一个索引，在慢查询里得到将得到如下的结果：

# Time: 120911 17:18:05
# User@Host: root[root] @ localhost []
# Thread_id: 65005  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 0.000323  Lock_time: 0.000095  Rows_sent: 0  Rows_examined: 0  Rows_affected: 0  Rows_read: 0
# Bytes_sent: 213  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F14
SET timestamp=1347398285;
select * from sbtest where pad='abc';

Rows_examined=0 same as Rows_sent meaning this query is optimized quite well. Note you may be thinking in this case there is no database access happening at all – you would be wrong. The index lookup is being perform but as only actual rows which are found and returned up to the top level MySQL part for processing are counted the Rows_examined remains zero.

It looks simple so far but it also a huge oversimplification. You can do such simple math only to the queries without aggregate functions/group by and only to ones which examine one table only. What is about queries which query more than one table ?

Rows_examined=0，和Rows_sent一样，标示这个查询优化的相当好。注意到你或许在想，在这个例子里，一点对数据的访问都没有发生--那你就错了。对索引的查找已经被执行，但是因为只有实际被找到并返回到MYSQL上层的行部分才会被统计到Rows_examined里，所以它还是0.

到目前为止看起来还比较简单却过于简单了。只有当没有统计函数/分组(Group by) ，并且只访问了一张表时，你可以做这样一个简单的公式，当查询访问的表超过1张时，会发生什么情况呢？

# Time: 120911 17:25:22
# User@Host: root[root] @ localhost []
# Thread_id: 65098  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 0.000234  Lock_time: 0.000063  Rows_sent: 1  Rows_examined: 1  Rows_affected: 0  Rows_read: 1
# Bytes_sent: 719  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F1D
SET timestamp=1347398722;
select * from sbtest a,sbtest b where a.id=5 and b.id=a.k;

mysql> explain select * from sbtest a,sbtest b where a.id=5 and b.id=a.k;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | a     | const | PRIMARY,k     | PRIMARY | 4       | const |    1 |       |
|  1 | SIMPLE      | b     | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
2 rows in set (0.00 sec)

In this case we actually join 2 tables but because the access type to the tables is “const” MySQL does not count it as access to two tables. In case of “real” access to the data it will:

在这个例子里面，我们做了2个表的JOIN操作，但是因为表访问类型是"const",mysql没有作为2个表访问来统计。在“真实”访问的情况下将是这样的:

# Time: 120911 17:28:12
# User@Host: root[root] @ localhost []
# Thread_id: 65099  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 0.000273  Lock_time: 0.000052  Rows_sent: 1  Rows_examined: 2  Rows_affected: 0  Rows_read: 1
# Bytes_sent: 719  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F23
SET timestamp=1347398892;
select * from sbtest a,sbtest b where a.k=2 and b.id=a.id;

+----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref         | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+
|  1 | SIMPLE      | a     | ref    | PRIMARY,k     | k       | 4       | const       |    1 |       |
|  1 | SIMPLE      | b     | eq_ref | PRIMARY       | PRIMARY | 4       | sbtest.a.id |    1 |       |
+----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+
2 rows in set (0.00 sec)

In this case we have 2 rows analyzed for each row set which is expected as we have 2 (logical) tables used in the query.

This rule also does not work if you have any group by in the query:

这个例子里，为每一个返回行分析了2行，因为我们在查询中使用2个（逻辑）表，和预期相同。

如果再查询中有分组聚合函数，这个规则是不起作用的：

# Time: 120911 17:31:48
# User@Host: root[root] @ localhost []
# Thread_id: 65144  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 5.391612  Lock_time: 0.000121  Rows_sent: 2  Rows_examined: 10000000  Rows_affected: 0  Rows_read: 2
# Bytes_sent: 75  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F24
SET timestamp=1347399108;
select count(*) from sbtest group by k;

This only sends 2 rows while scanning 10 million, while we can’t really optimize this query in a simple way because scanning all that rows are actually needed to produce group by results. What you can think about in this case is removing group by and aggregate functions. Then query would become “select * from sbtest” which would send all 10M rows and hence there is no ways to simply optimize it.

这个只发送了2条而扫描了1000W条记录，我们实际上不能用一个简单的方法优化这个查询，因为实际上生成分组结果需要扫描全部的行.在这个例子里，你可以思考下去掉分组聚合函数。那么查询将变成“select * from sbtest”，这将发送全部的1000W行记录，因此没有简单的办法来优化它。

This method does not only provide you with “yes or no” answer but rather helps to understand how much optimization is possible. For example I might have query which uses some index scans 1000 rows and sends 10… I still might have opportunity to reduce amount of rows it scans 100x, for example by adding combined index.

这个方法不仅仅是提供给你“是或否”的答案，也能帮助理解有多少优化是可能的。举个例子，我可能有一个查询会使用到一些索引，扫描1000行，返回10行。也有些场景在扫描行数多100倍时，也会减少返回的总行数，比分说通过添加复合索引。

So what is the easy way to see if query is worth optimizing ? - see how many rows query sends after group by, distinct and aggregate functions are removed (A) - look at number of rows examined divided by number of tables in join (B) - if B is less or equals to A your query is “perfect” - if B/A is 10x or more this query is a very serious candidate for optimization.

This is simple method and it can be used with pt-query-digest very well as it reports not only average numbers but also the outliers.

那么什么办法能判断某个查询值得去优化？-查看返回多少行--在查询去掉group by,distinct 和聚合函数之后（A）-在JOIN语句里，查看检查的行除以JOIN的表数量（B）--如果B小于等于A，查询就是完美的--如果B/A 大于或等于10，这个查询非常值得优化。

这个简单的方法可以和pt-query-digest 很好的一起工作，因为它不仅仅得出平均值，也能得出异常值。

《完》 2012-09-26 16:11:17