cvm母机宕机重启后数据库修复

下午正在开周会，然后收到短信，说是X.X.X.X的机器ping不通了，一轮测试过后，发现是某台数据库服务器挂了，先不急着重启，问下tencent客服。。。

乖乖的好家伙，母机的主板坏了。。。。一个小时候，母机起来了，看下数据库起来么。。

[root@VM_145_57_tlinux ~]# mysql -uroot -p1234
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[root@VM_145_57_tlinux ~]# mysql -uroot -p    
Enter password: 
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[root@VM_145_57_tlinux ~]# ps -ef |grep mysql
root      2016     1  0 17:50 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql     2177  2016 98 17:50 ?        00:11:21 /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=3306
root     13643 13364  0 18:02 pts/0    00:00:00 grep mysql
[root@VM_145_57_tlinux ~]# mysql -uroot -p1234 -S /tmp/mysql.sock
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[root@VM_145_57_tlinux ~]# ls /tmp/mys^C
[root@VM_145_57_tlinux ~]# /etc/init.d/mysqld restart
Stopping mysqld:                                           [  OK  ]
Starting mysqld:                                           [  OK  ]
[root@VM_145_57_tlinux ~]# ps -ef | msyql
-bash: msyql: command not found
[root@VM_145_57_tlinux ~]# ps -ef |grep mysql
root      2016     1  0 17:50 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql     2177  2016 99 17:50 ?        00:13:45 /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=3306
root     13942     1  0 18:03 pts/0    00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql    14100 13942  0 18:03 pts/0    00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=3306
root     14908 13364  0 18:04 pts/0    00:00:00 grep mysql

这下亮了，出现了double进程，没发解释啊。。。【下面这步觉得error】

[root@VM_145_57_tlinux ~]# kill -9 2016
[root@VM_145_57_tlinux ~]# kill -9 2177
[root@VM_145_57_tlinux ~]# ps -ef |grep mysql
root     13942     1  0 18:03 pts/0    00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql    14100 13942  0 18:03 pts/0    00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=3306
root     15240 13364  0 18:05 pts/0    00:00:00 grep mysql

之前起来的进程被我kill -9之后，发现能连上数据库了，可是innodb存储引擎没起来。

mysql> show engines;
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
| Engine     | Support | Comment                                                   | Transactions | XA   | Savepoints |
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                     | NO           | NO   | NO         |
| CSV        | YES     | CSV storage engine                                        | NO           | NO   | NO         |
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance    | NO           | NO   | NO         |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables | NO           | NO   | NO         |
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
4 rows in set (0.00 sec)

似乎能看出问题了，进程起来了，socket没有建立，存储引擎没有启动-->innodb正在后台线程操作！！查下errorlog一探究竟：

150126 17:45:52 mysqld_safe Starting mysqld daemon with databases from /data/mysql/var
150126 17:45:53  InnoDB: Initializing buffer pool, size = 28.0G
150126 17:45:55  InnoDB: Completed initialization of buffer pool
InnoDB: Log scan progressed past the checkpoint lsn 1808 1368235506
150126 17:45:55  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 1808 1373477888

事实上，那个还没创建socket的进程是正在执行double write的回滚工作，继续往下翻页：

InnoDB: Doing recovery: scanned up to log sequence number 1808 3523058688
150126 18:03:02 mysqld_safe Starting mysqld daemon with databases from /data/mysql/var
150126 18:03:03  InnoDB: Initializing buffer pool, size = 28.0G
150126 18:03:03  InnoDB: Error: cannot allocate 30064787456 bytes of
InnoDB: memory with malloc! Total allocated memory
InnoDB: by InnoDB 37927312 bytes. Operating system errno: 12
InnoDB: Check if you should increase the swap file or
InnoDB: ulimits of your operating system.
InnoDB: On FreeBSD check you have compiled the OS with
InnoDB: a big enough maximum process size.
InnoDB: Note that in most 32-bit computers the process
InnoDB: memory space is limited to 2 GB or 4 GB.
InnoDB: We keep retrying the allocation for 60 seconds...
InnoDB: Doing recovery: scanned up to log sequence number 1808 3528301568

这个是我在innodb引擎正在执行recovery的时候强行启动mysql的报错，提示内存不足。。继续往下翻页：

InnoDB: Doing recovery: scanned up to log sequence number 1808 3606944768
150126 18:04:03InnoDB: Fatal error: cannot allocate the memory for the buffer pool
150126 18:04:03 [ERROR] Plugin 'InnoDB' init function returned error.
150126 18:04:03 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
150126 18:04:03 [Note] Event Scheduler: Loaded 0 events
150126 18:04:03 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
InnoDB: Doing recovery: scanned up to log sequence number 1808 3612187648
 [此处省略若干行]
InnoDB: Doing recovery: scanned up to log sequence number 1808 3711802368
150126 18:16:28 [Note] /usr/libexec/mysqld: Normal shutdown

150126 18:16:28 [Note] Event Scheduler: Purging the queue. 0 events
150126 18:16:30 [Note] /usr/libexec/mysqld: Shutdown complete

150126 18:16:30 mysqld_safe mysqld from pid file /data/mysql/mysqld/mysqld.pid ended
150126 18:16:30 mysqld_safe Starting mysqld daemon with databases from /data/mysql/var
150126 18:16:30  InnoDB: Initializing buffer pool, size = 28.0G
150126 18:16:32  InnoDB: Completed initialization of buffer pool
InnoDB: Log scan progressed past the checkpoint lsn 1808 1368235506 [kill -9的结果]
150126 18:16:32  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 1808 1373477888

看到上面的错误，才恍然大悟，前面的操作是有多危险，要不是mysql的recovery不那么强悍的话，恐怕数据就被我这样弄没了。。。好悬

InnoDB: Doing recovery: scanned up to log sequence number 1808 4183661568
InnoDB: Doing recovery: scanned up to log sequence number 1808 4185407155
150126 18:38:51  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
150126 18:40:59  InnoDB: Started; log sequence number 1808 4185407155
150126 18:40:59 [Note] Event Scheduler: Loaded 0 events
150126 18:40:59 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution

回滚完成，happy启动服务。

下面来讲下innodb_log_file_size在innodb异常时回滚机制：

下面我们参考 http://www.cnblogs.com/zuoxingyu/archive/2012/10/25/2738864.html 大牛的博客，算一下innodb_log_file_size到底多大为最合适：【作为一个粗略的规则，你可以让这个日志足够大到能容纳最多一小时左右的日志】

mysql> pager grep sequence
PAGER set to 'grep sequence'
mysql> show engine innodb statusG select sleep(60); show engine innodb statusG
Log sequence number 1818 411580730
1 row in set (0.00 sec)

1 row in set (1 min 0.00 sec)

Log sequence number 1818 446445181
1 row in set (0.00 sec)

(411580730-446445181)/1024/1024/60=1999M，向上取整2G，由于默认有2个日志文件，所以按照当前高峰期计算，设为1G为最佳值。