高性能MySQL（二）：创建高性能索引

create table city_demo(city varchar(50) not null);

insert into city_demo(city) select city from city

insert into city_demo(city) select city from city_demo

update city_demo set city=(select city from city ORDER BY rand() limit 1)

select * from city;
select * from city_demo;
select count(*) as cnt,city  from city_demo group by city order by cnt desc limit 10

select count(*) as cnt,left(city,3) as pref  from city_demo group by pref order by cnt desc limit 10

select count(*) as cnt,left(city,7) as pref  from city_demo group by pref order by cnt desc limit 10

select count(DISTINCT city)/count(*) from city_demo

select count(distinct city),city from city_demo  -- Athenai

select count(distinct left(city,3))/count(*) as sel3,
count(distinct left(city,4))/count(*) as sel4,
count(distinct left(city,5))/count(*) as sel5,
count(distinct left(city,6))/count(*) as sel6,
count(distinct left(city,7))/count(*) as sel7,
count(distinct left(city,8))/count(*) as sel8  from city_demo  -- 接近0.031

聚簇是指：如果一组表有一些共同的列，则将这样一组表存储在相同的数据库块中;聚簇还表示把相关的数据存储在同一个块上。利用聚簇，一个块可能包含多个表的数据。概念上就是如果两个或多个表经常做链接操作，那么可以把需要的数据预先存储在一起。聚簇还可以用于单个表，可以按某个列将数据分组存储。
更加简单的说，比如说，EMP表和DEPT表，这两个表存储在不同的segment中，甚至有可能存储在不同的TABLESPACE中，因此，他们的数据一定不会在同一个BLOCK里。而我们又会经常对这两个表做关联查询，比如说：select * from emp,dept where emp.deptno = dept.deptno .仔细想想，查询主要是对BLOCK的操作，查询的BLOCK越多，系统IO就消耗越大。如果我把这两个表的数据聚集在少量的BLOCK里，查询效率一定会提高不少。
比如我现在将值deptno=10的所有员工抽取出来，并且把对应的部门信息也存储在这个BLOCK里(如果存不下了，可以为原来的块串联另外的块)。这就是索引聚簇表的工作原理。[1]