索引原理与慢查询优化

一、MySQL索引管理

1、功能

     （1）. 索引的功能就是加速查找
    （2）. mysql中的primary key，unique，联合唯一也都是索引，这些索引除了加速查找以外，还有约束的功能
   普通索引INDEX：加速查    唯一索引：         -主键索引PRIMARY KEY：加速查找+约束（不为空、不能重复）         -唯一索引UNIQUE:加速查找+约束（不能重复）   联合索引：         -PRIMARY KEY(id,name):联合主键索引         -UNIQUE(id,name):联合唯一索引         -INDEX(id,name):联合普通索引二、索引数据结构

1.         索引字段要尽量的小：通过上面的分析，我们知道IO次数取决于b+数的高度h，
           假设当前数据表的数据为N，每个磁盘块的数据项的数量是m，则有h=㏒(m+1)N，
          当数据量N一定的情况下，m越大，h越小；而m = 磁盘块的大小 / 数据项的大小，
          磁盘块的大小也就是一个数据页的大小，是固定的，如果数据项占的空间越小，
         数据项的数量越多，树的高度越低。这就是为什么每个数据项，即索引字段要尽量的小，
        比如int占4字节，要比bigint8字节少一半。这也是为什么b+树要求把真实的数据放到叶子节点而不是内层节点，
       一旦放到内层节点，磁盘块的数据项会大幅度下降，导致树增高。当数据项等于1时将会退化成线性表。
2.   索引的最左匹配特性：当b+树的数据项是复合的数据结构，比如(name,age,sex)的时候，
     b+数是按照从左到右的顺序来建立搜索树的，比如当(张三,20,F)这样的数据来检索的时候，
    b+树会优先比较name来确定下一步的所搜方向，如果name相同再依次比较age和sex，
   最后得到检索的数据；但当(20,F)这样的没有name的数据来的时候，b+树就不知道下一步该查哪个节点，
  因为建立搜索树的时候name就是第一个比较因子，必须要先根据name来搜索才能知道下一步去哪里查询。
 比如当(张三,F)这样的数据来检索时，b+树可以用name来指定搜索方向，但下一个字段age的缺失，
 所以只能把名字等于张三的数据都找到，然后再匹配性别是F的数据了， 这个是非常重要的性质，即索引的最左匹配特性。

三、 创建/删除索引的语法

#方法一：创建表时
    　　CREATE TABLE 表名 (
                字段名1  数据类型 [完整性约束条件…],
                字段名2  数据类型 [完整性约束条件…],
                [UNIQUE | FULLTEXT | SPATIAL ]   INDEX | KEY
                [索引名]  (字段名[(长度)]  [ASC |DESC]) 
                );


#方法二：CREATE在已存在的表上创建索引
        CREATE  [UNIQUE | FULLTEXT | SPATIAL ]  INDEX  索引名 
                     ON 表名 (字段名[(长度)]  [ASC |DESC]) ;


#方法三：ALTER TABLE在已存在的表上创建索引
        ALTER TABLE 表名 ADD  [UNIQUE | FULLTEXT | SPATIAL ] INDEX
                             索引名 (字段名[(长度)]  [ASC |DESC]) ;
                             
#删除索引：DROP INDEX 索引名 ON 表名字;

1 创建索引
- 在创建表时就创建
create table s1(
id int,
name char(6),
age int,
email varchar(30),
index(id)
);
- 在创建表后创建
create index name on s1(name);#添加普通索引
create unique index age on s1(age);#添加唯一索引
alter table s1 add primary key(id);#添加主键索引
create index name on s1(id,name);#添加联合普通索引

2 删除索引
drop index id on s1;
drop index name on s1;
alter table s1 drop primary key;#添加主键索引

四、测试索引

1、

#1. 准备表
create table s1(
id int,
name varchar(20),
gender char(6),
email varchar(50)
);

#2. 创建存储过程，实现批量插入记录
delimiter $$   #声明存储过程的结束符号为$$
create procedure auto_insert1()
BEGIN
    declare i int default 1;
    while(i<300000)do
        insert into s1 values(i,concat('egon',i),'male',concat('egon',i,'@oldboy'));
        set i=i+1;
    end while;
END$$ 
delimiter ; #重新声明分号为结束符号

#3. 查看存储过程
show create procedure auto_insert1G 

#4. 调用存储过程
call auto_insert1();

2 、在没有索引的前提下测试查询速度

#无索引：从头到尾扫描一遍，所以查询速度很慢

加上索引

五、

1、若想利用索引达到预想的提高查询速度的效果，我们在添加索引时，必须遵循以下原则

#1.最左前缀匹配原则，非常重要的原则，
create index ix_name_email on s1(name,email,)
- 最左前缀匹配：必须按照从左到右的顺序匹配
select * from s1 where name='egon'; #可以
select * from s1 where name='egon' and email='asdf'; #可以
select * from s1 where email='alex@oldboy.com'; #不可以
mysql会一直向右匹配直到遇到范围查询(>、<、between、like)就停止匹配，比如a = 1 and b = 2 and c > 3 and d = 4 如果建立(a,b,c,d)顺序的索引，d是用不到索引的，如果建立(a,b,d,c)的索引则都可以用到，a,b,d的顺序可以任意调整。

#2.=和in可以乱序，比如a = 1 and b = 2 and c = 3 建立(a,b,c)索引可以任意顺序，mysql的查询优化器会帮你优化成索引可以识别的形式

#3.尽量选择区分度高的列作为索引,区分度的公式是count(distinct col)/count(*)，表示字段不重复的比例，比例越大我们扫描的记录数越少，唯一键的区分度是1，而一些状态、性别字段可能在大数据面前区分度就是0，那可能有人会问，这个比例有什么经验值吗？使用场景不同，这个值也很难确定，一般需要join的字段我们都要求是0.1以上，即平均1条扫描10条记录

#4.索引列不能参与计算，保持列“干净”，比如from_unixtime(create_time) = ’2014-05-29’就不能使用到索引，原因很简单，b+树中存的都是数据表中的字段值，但进行检索时，需要把所有元素都应用函数才能比较，显然成本太大。所以语句应该写成create_time = unix_timestamp(’2014-05-29’);

2、最左前缀示范

1 加索引提速：范围
mysql> select count(*) from s1 where id=1000;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.12 sec)

mysql> select count(*) from s1 where id>1000;
+----------+
| count(*) |
+----------+
|   298999 |
+----------+
1 row in set (0.12 sec)

mysql> create index a on s1(id)
    -> ;
Query OK, 0 rows affected (3.21 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select count(*) from s1 where id=1000;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from s1 where id>1000;
+----------+
| count(*) |
+----------+
|   298999 |
+----------+
1 row in set (0.12 sec)

mysql> select count(*) from s1 where id>1000 and id < 2000;
+----------+
| count(*) |
+----------+
|      999 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from s1 where id>1000 and id < 300000;
+----------+
| count(*) |
+----------+
|   298999 |
+----------+
1 row in set (0.13 sec)



3 区分度低的字段不能加索引
mysql> select count(*) from s1 where name='xxx';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from s1 where name='egon';
+----------+
| count(*) |
+----------+
|   299999 |
+----------+
1 row in set (0.19 sec)


mysql> select count(*) from s1 where name='egon' and 

age=123123123123123;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.45 sec)

mysql> create index c on s1(age);
Query OK, 0 rows affected (3.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select count(*) from s1 where name='egon' and 

age=123123123123123;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from s1 where name='egon' and age=10;
+----------+
| count(*) |
+----------+
|   299999 |
+----------+
1 row in set (0.35 sec)


mysql> select count(*) from s1 where name='egon' and age=10 and 

id>3000 and id < 4000;
+----------+
| count(*) |
+----------+
|      999 |
+----------+
1 row in set (0.00 sec)


mysql> select count(*) from s1 where name='egon' and age=10 and 

id>3000 and email='xxxx';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.47 sec)

mysql> create index d on s1(email);
Query OK, 0 rows affected (4.83 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select count(*) from s1 where name='egon' and age=10 and 

id>3000 and email='xxxx';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> drop index a on s1;
Query OK, 0 rows affected (0.10 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> drop index b on s1;
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> drop index c on s1;
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> desc s1;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id    | int(11)     | NO   |     | NULL    |       |
| name  | char(20)    | YES  |     | NULL    |       |
| age   | int(11)     | YES  |     | NULL    |       |
| email | varchar(30) | YES  | MUL | NULL    |       |
+-------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> select count(*) from s1 where name='egon' and age=10 and 

id>3000 and email='xxxx';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

5 增加联合索引，关于范围查询的字段要放到后面
 select count(*) from s1 where name='egon' and age=10 and id>3000 

and email='xxxx';
index(name,email,age,id)

 select count(*) from s1 where name='egon' and age> 10 and 

id=3000 and email='xxxx';
index(name,email,id,age)

 select count(*) from s1 where name like 'egon' and age= 10 and 

id=3000 and email='xxxx';
index(email,id,age,name)


mysql> desc s1;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id    | int(11)     | NO   |     | NULL    |       |
| name  | char(20)    | YES  |     | NULL    |       |
| age   | int(11)     | YES  |     | NULL    |       |
| email | varchar(30) | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> create index xxx on s1(age,email,name,id);
Query OK, 0 rows affected (6.89 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select count(*) from s1 where name='egon' and age=10 and 

id>3000 and email='xxxx';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

6. 最左前缀匹配
index(id,age,email,name)
#条件中一定要出现id
id
id age
id email
id name

email #不行
mysql> select count(*) from s1 where id=3000;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.11 sec)

mysql> create index xxx on s1(id,name,age,email);
Query OK, 0 rows affected (6.44 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql>  select count(*) from s1 where id=3000;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql>  select count(*) from s1 where name='egon';
+----------+
| count(*) |
+----------+
|   299999 |
+----------+
1 row in set (0.16 sec)

mysql>  select count(*) from s1 where 

email='egon3333@oldboy.com';
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.15 sec)

mysql>  select count(*) from s1 where id=1000 and 

email='egon3333@oldboy.com';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql>  select count(*) from s1 where email='egon3333@oldboy.com' 

and id=3000;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)







6.索引列不能参与计算，保持列“干净”

mysql> select count(*) from s1 where id=3000;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.11 sec)

mysql> create index xxx on s1(id,name,age,email);
Query OK, 0 rows affected (6.44 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql>  select count(*) from s1 where id=3000;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql>  select count(*) from s1 where name='egon';
+----------+
| count(*) |
+----------+
|   299999 |
+----------+
1 row in set (0.16 sec)

mysql>  select count(*) from s1 where 

email='egon3333@oldboy.com';
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.15 sec)

mysql>  select count(*) from s1 where id=1000 and 

email='egon3333@oldboy.com';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql>  select count(*) from s1 where email='egon3333@oldboy.com' 

and id=3000;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

其他注意事项

- 避免使用select *
- count(1)或count(列) 代替 count(*)
- 创建表时尽量时 char 代替 varchar
- 表的字段顺序固定长度的字段优先
- 组合索引代替多个单列索引（经常使用多个条件查询时）
- 尽量使用短索引
- 使用连接（JOIN）来代替子查询(Sub-Queries)
- 连表时注意条件类型需一致
- 索引散列值（重复少）不适合建索引，例：性别不适合

索引原理与慢查询优化

四、 测试索引

四、测试索引