链接MyISAM文件

原文链接：http://www.mysqlperformanceblog.com/2012/11/19/concatenating-myisam-files/

（读完这篇文章，我很开心的笑了，原来MYSQL还能这么玩，作者你真牛逼，我服你了）

Recently, I found myself involved in the migration of a large read-only InnoDB database to MyISAM (eventually packed). The only issue was that for one of the table, we were talking of 5 TB of data, 23B rows. Not small… I calculated that with something like insert into MyISAM_table… select * from Innodb_table…would take about 10 days. The bottleneck was clearly the lack of concurrency on the read part from InnoDB and then the key management for MyISAM. The server has many dozen drives so it was easy to add more concurrency so I kicked off, from a script, insertions into 16 identical MyISAM files for distinct parts of the table. That was much faster and would complete within a day.

最近我在参与一个把很大只读的INNODB数据库迁移到MYISAM的工作。发现的唯一问题是其中的一个表有5T的数据，23亿行。真不小。。。我算了一下，如果用insert into MyISAM_table… select * from Innodb_table…这样的方法将花费10天。明显的瓶颈就是在INNODB这边的读不是并发的，然后就是在MYISAM那边的索引维护。这个服务器有很多的磁盘，所以很容易增加更多的并发。通过一个脚本，把数据的不同部分写入到16个相同结构的MYISAM表。这快了非常多，可以在一天内完成。

Then, while the Innodb extraction was running at a nice pace, I thought about the next phase. My first idea was simply to do insert into MyISAM_table select * from MyISAM_table1 and so on for the 16 files. Since MyISAM are flat files, that should be faster, especially with the keys disabled. At that point, I remembered, from a previous disaster recovery work where a database directory has been wiped out that the MyISAM files have no headers which make them difficult (read almost impossible) to locate on a drive with tools like ext3grep. No headers… that means the first byte of byte of a file is the first byte of the first row… So we should be able to concatenate these files. Let’s see.

INNODB数据提取的速度挺不错的，我考虑一下一步该怎么做。我最初的想法简单的对16个文件做insert into MyISAM_table select * from MyISAM_table1操作。因为MYISAM是堆文件，这个操作应该比较快，尤其是在关闭索引的情况下。（后面一截没看懂，为什么没有文件头啥的？重点看怎么样连接MYD文件吧）

mysql> create table test_concat(id int unsigned not null, primary key (id)) engine=myisam;
 Query OK, 0 rows affected (0.01 sec)
 
mysql> create table test_concat_part like test_concat;
 Query OK, 0 rows affected (0.01 sec)
 
mysql> insert into test_concat (id) value (1),(2),(3);
 Query OK, 3 rows affected (0.00 sec)
 Records: 3 Duplicates: 0 Warnings: 0
 
mysql> insert into test_concat_part (id) value (4),(5),(6);
 Query OK, 3 rows affected (0.00 sec)
 Records: 3 Duplicates: 0 Warnings: 0
 
mysql> flush tables;
 Query OK, 0 rows affected (0.01 sec)

Then, at the shell command line:

root@django:/var/lib/mysql/test# ls
 test_concat.frm test_concat.MYD test_concat.MYI test_concat_part.frm test_concat_part.MYD test_concat_part.MYI
 root@django:/var/lib/mysql/test#
 root@django:/var/lib/mysql/test# cat test_concat_part.MYD >> test_concat.MYD
 root@django:/var/lib/mysql/test# myisamchk -rq test_concat
 - check record delete-chain
 - recovering (with sort) MyISAM-table 'test_concat'
 Data records: 3
 - Fixing index 1
 Data records: 6

And then, back in mysql:

mysql> use test
 Database changed
 mysql> flush tables;
 Query OK, 0 rows affected (0.00 sec)
 
mysql> select * from test_concat;
 +----+
 | id |
 +----+
 | 1 |
 | 2 |
 | 3 |
 | 4 |
 | 5 |
 | 6 |
 +----+
 6 rows in set (0.00 sec)

So, yes, you can concatenate MyISAM files, even when multiple keys are defined. Not for everyday use but still pretty cool.

呵呵，你能链接MYISAM文件，即使定义了多个索引时，也是可以链接的。不太会每天用到，但是还是很酷的，对吧？