MySQL Innodb 神秘消失

问题描述：

早晨接到 Zabbix 报警，提示 Host: 10.10.1.2， MySQL 主从同步失败。

登录服务器查看具体情况。

shell > mysql

mysql> show slave statusG

Slave I/O thread : YES
Slave SQL thread : NO

Slave SQL: Error 'Unknown storage engine 'InnoDB'' on query. Default database: 'baeng_tv'. 

Query: 'UPDATE std_tv_card SET uid='135601920029371878', devicetoken='60000AM1500D16972129_569C' WHERE id='220888'', Error_code: 1286

Slave: Unknown storage engine 'InnoDB' Error_code: 1286

# 这是 show slave statusG 看到的一些状态信息，说不支持 InnoDB 引擎。这不开玩笑呢么，又不是第一次跑。

shell > vim /data/mysql_data/hostname.err

Version: '5.5.28-log'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
160930 07:20:36 mysqld_safe Number of processes running now: 0
160930 07:20:36 mysqld_safe mysqld restarted

160930  7:20:37 InnoDB: The InnoDB memory heap is disabled
160930  7:20:37 InnoDB: Mutexes and rw_locks use GCC atomic builtins
160930  7:20:37 InnoDB: Compressed tables use zlib 1.2.3
160930  7:20:38 InnoDB: Initializing buffer pool, size = 1.0G
InnoDB: mmap(1098907648 bytes) failed; errno 12
160930  7:20:38 InnoDB: Completed initialization of buffer pool

160930  7:20:38 InnoDB: Fatal error: cannot allocate memory for the buffer pool
160930  7:20:38 [ERROR] Plugin 'InnoDB' init function returned error.
160930  7:20:38 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

160930  7:20:38 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
160930  7:20:38 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
160930  7:20:38 [Note] Server socket created on IP: '0.0.0.0'.

160930  7:20:38 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.003702' at position 287099739, relay log './hostname-relay-bin.010674' position: 2597075
160930  7:20:38 [Note] Slave I/O thread: connected to master 'repl@10.10.1.8:3306',replication started in log 'mysql-bin.003702' at position 287261852
160930  7:20:39 [Note] Event Scheduler: Loaded 0 events
160930  7:20:39 [Note] /usr/local/mysql-5.5/bin/mysqld: ready for connections.
Version: '5.5.28-log'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution

160930  7:23:04 [ERROR] Slave SQL: Error 'Unknown storage engine 'InnoDB'' on query. Default database: 'baeng_tv'. Query: 'UPDATE std_tv_card SET uid='135601920029371878', devicetoken='60000AM1500D16972129_569C' WHERE id='220888'', Error_code: 1286
160930  7:23:04 [Warning] Slave: Unknown storage engine 'InnoDB' Error_code: 1286
160930  7:23:04 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.003702' position 288077147

# mysql.err

InnoDB: Initializing buffer pool, size = 1.0G

# 初始化缓存池，大小为 1G

InnoDB: Fatal error: cannot allocate memory for the buffer pool

# 无法为 InnoDB 缓存池分配内存

Plugin 'InnoDB' init function returned error.

# InnoDB 插件 init 函数返回错误

Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

# InnoDB 插件存储引擎注册失败

# 接着就是初始化 Slave SQL thread 启动同步，I/O 线程连接 master。
# 最后提示，Slave SQL 错误，未知的存储引擎，查询语句 'UPDATE 失败'。

解决方法：

shell > free -m

# 当时空闲内存只有 936M，而初始化的缓存池为 1G。

# 看到这里，应该知道故障是由系统内存不足造成的，该机器内存为 24G。上面就跑了一个 MySQL Slave 跟一些任务计划。

# 说到任务计划，这是一些查询数据，然后写入 ElasticSearch 生成索引的一些 PHP 脚本。

shell > ps aux | grep php | wc -l
2000

# 好家伙！！！

# 任务计划是每 5、10、15 分，分别执行不同的 PHP 脚本。执行完、下次循环。
# 也就是如果没有执行完，等到下一个时间点就会重新启动一个 PHP 脚本... 所以占用了大量系统内存。

shell > ps aux | grep -v grep | grep php | awk '{print $2}' | xargs -i kill {}

# 先将这些阻塞的进程全部杀死

shell > free -m
             total       used       free     shared    buffers     cached
Mem:         24020       4933      19086          0        140       2880
-/+ buffers/cache:       1912      22107
Swap:         8191          9       8182

# 内存释放了

shell > /etc/init.d/mysql.server restart

shell > mysql

mysql> show slave statusG

             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

        Seconds_Behind_Master: 1340

# 主从同步正常，延迟就让它自己补上吧，过一会就好了。

# 故障消除
# 接下来让故障不再出现

shell > vim script/yiic_index.sh
#!/bin/bash

logfile='/root/script/logs/yiic.log'
filepath='/data/git-webroot/yiic index'

line=`ps aux | grep -v grep | grep 'yiic index' | wc -l`

if [ $line -eq 0 ]:then
  /usr/local/php/bin/php $filepath >/dev/null &
else
  echo `date "+%F %T"` $filepath >> $logfile
  exit 0
fi

# End

shell > crontab -e

*/5 * * * * /root/script/logs/yiic_index.sh

# 这样就控制住了 PHP 进程的数量。

# 但是，这样写脚本会出现僵尸进程。
# 先执行 sh 脚本，然后将 php 进程放入后台，退出 sh 脚本，这样 sh 就是 PHP 的父进程了，所以产生僵尸。
# 但这是可控的，如果 PHP 执行时间过长，下次 crond 调用 sh 时，是不执行 PHP 的。
# 生成的僵尸进程也不必担心，当 PHP 执行完毕后，僵尸自动死亡。

shell > crontab -e

*/5 * * * * timeout 2000 /usr/local/php/bin/php /data/git-webroot/yiic index >/dev/null

# 也可以这样来控制 PHP 进程执行时间，随你选咯