mysql-mmm故障整理

Auth: Jin
Date: 20140414

1.master-slave同步问题
1)故障描述和错误代码:
监控报警slave故障
登录slave服务器查看
mysql> show slave statusG
发现有Errno
详细的记录在另外电脑,电脑硬盘坏了,没有具体的报错了
2)排查步骤:
(1)查看错误日志
tail -n 100 /var/log/mysql/mysql_7.error
没有发现问题
cat /mysql/data7/master.info
18
mysql-bin.000106
847011590
查看master信息
# cat /mysql/data7/relay-log.info
binlogs/mysql-relay-bin.000022
846786796
mysql-bin.000106
846786650
8
从master读取到slave的信息
ll /mysql/data7/binlogs/
total 826996
-rw-rw----. 1 mysql mysql 199 Apr 8 17:39 mysql-relay-bin.000021
-rw-rw----. 1 mysql mysql 846828651 Apr 14 14:31 mysql-relay-bin.000022
-rw-rw----. 1 mysql mysql 62 Apr 8 17:39 mysql-relay-bin.index
tail -f /mysql/data7/binlogs/mysql-relay-bin.000022
都正常
初步判断IO线程正常
(2)仔细看slave status
mysql> show slave statusG
可以看到IO线程是正常的,SQL线程有问题
3)解决办法
重新按最后的position重新同步

4)总结:
(1)需要熟悉repliction的原理
(2)一般情况之需要看slave的status即可,info文件信息都读取进status了,根据status判断是IO的问题,还是SQL的问题


2,master当机,无法自动恢复问题
1)故障秒素和错误代码
监控报警master服务器宕了
登录mmm_control查看,发现writer切换到backup master上去了。原来的master处于正在恢复总
2)排查步骤
查看master服务-正常
slave status也OK,没有找到自动切换的原因
后查资料说一个bug,把检测时间修改为了10可避免,我设置的是60
3)解决办法
(1)手动把“出故障的”master为online
(2)将writer角色mv给master

Auth: Jin
Date: 20140505
Title: SQL注入攻击锁表,引起MMM slave无法复制

故障描述:slave复制不正常
root@JQ-pdt-host-3 ~]# mmm_control @3307 show
db2(192.168.201.2) slave/REPLICATION_DELAY. Roles:
db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.21), reader(192.168.201.22), reader(192.168.201.23)
db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
db217(192.168.201.217) slave/REPLICATION_DELAY. Roles:
db3(192.168.201.3) master/ONLINE. Roles:

[root@JQ-pdt-host-2 ~]# mysql -h 127.0.0.1 -P 3307 -u root -ppassword
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 2155233
Server version: 5.5.30-log MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> show slave statusG;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.201.216
Master_User: dbslave
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: mysql-bin.000109
Read_Master_Log_Pos: 671221507
Relay_Log_File: mysql-relay-bin.000031
Relay_Log_Pos: 668578820
Relay_Master_Log_File: mysql-bin.000109
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql,test,performance_schema
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 668578674
Relay_Log_Space: 671221852
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 1503
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 33071


mysql> show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.201.216
Master_User: dbslave
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: mysql-bin.000109
Read_Master_Log_Pos: 671759015
Relay_Log_File: mysql-relay-bin.000031
Relay_Log_Pos: 668578820
Relay_Master_Log_File: mysql-bin.000109
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql,test,performance_schema
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 668578674
Relay_Log_Space: 671759360
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 1724
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 33071
1 row in set (0.00 sec)


原因分析1:SQL线程阻塞了
Seconds_Behind_Master: 在增加
Exec_Master_Log_Pos: 668578674 没有变化
根据网上的资料,
http://blog.csdn.net/zbszhangbosen/article/details/8494921
Seconds_Behind_Master表示slave上SQL thread与IO thread之间的延迟,我们都知道在MySQL的复制环境中,slave先从master上将
binlog拉取到本地(通过IO thread),
然后通过SQLthread将binlog重放,而Seconds_Behind_Master表示本地relaylog中未被执行完的那部分的差值

原因分析2:查询slave没有将relaylog应用到数据库的原因

操作
1.mmm剔除db2
[root@JQ-pdt-host-3 ~]# mmm_control @3307 set_offline db2
无法关闭

2.查看错误日志
[root@JQ-pdt-host-217 ~]# tail -f /var/log/mysql/mysql_7.error
140501 19:32:05 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 117120783, Error_code: 1032
140502 17:02:17 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 233682298, Error_code: 1032
140502 17:55:03 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 244489088, Error_code: 1032
140503 3:23:02 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 308850228, Error_code: 1032
140503 4:15:10 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 313199736, Error_code: 1032
140503 8:29:47 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 332919503, Error_code: 1032
140503 18:47:15 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 391309758, Error_code: 1032
140504 1:38:20 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 431859737, Error_code: 1032
140505 9:30:08 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 606849321, Error_code: 1032
140505 14:17:00 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
end_log_pos 638173055, Error_code: 1032
master
140328 9:32:19 [Note] Event Scheduler: Loaded 0 events
140328 9:32:19 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.5.30-log' socket: '/mysql/data7/mysql.sock' port: 3307 MySQL Community Server (GPL)
140328 9:44:30 [Note] Error reading relay log event: slave SQL thread was killed
140328 9:44:30 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (
server_errno=2013)
140328 9:44:30 [Note] Slave I/O thread killed while reading event
140328 9:44:30 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000153', position 496435278
140328 9:44:43 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000153' at position
496435278, relay log 'binlogs/mysql-relay-bin.000254' position: 959809
140328 9:44:43 [Note] Slave I/O thread: connected to master 'dbslave@192.168.201.3:3307',replication started in log
'mysql-bin.000153' at position 496435278
140328 15:04:47 [Note] Slave: received end packet from server, apparent master shutdown:
140328 15:04:47 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000153' at
position 532390356
140328 15:04:47 [ERROR] Slave I/O: error reconnecting to master 'dbslave@192.168.201.3:3307' - retry-time: 60
retries: 86400, Error_code: 2003
140328 15:07:47 [Note] Slave: connected to master 'dbslave@192.168.201.3:3307',replication resumed in log 'mysql-
bin.000153' at position 532390356
140328 15:44:47 [Note] Error reading relay log event: slave SQL thread was killed
140328 15:44:47 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (
server_errno=2013)
140328 15:44:47 [Note] Slave I/O thread killed while reading event
140328 15:44:47 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000154', position 4451889
140328 15:45:02 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000154' at position
4451889, relay log 'binlogs/mysql-relay-bin.000257' position: 253
140328 15:45:02 [Note] Slave I/O thread: connected to master 'dbslave@192.168.201.3:3307',replication started in log
'mysql-bin.000154' at position 4451889
报错信息都是比较老的

3.查看线程
mysql> show processlist |
| 2 | system user | | NULL | Connect | 2438 | Waiting for table level lock | NULL
|
| 30 | dbproxy | 192.168.201.218:36994 | ask | Query | 2431 | Waiting for table level lock |
SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
| 33 | dbproxy | 192.168.201.218:37000 | ask | Query | 2435 | Waiting for table level lock |
SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
| 37 | dbproxy | 192.168.201.243:55050 | ask | Query | 2420 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 41 | dbproxy | 192.168.201.243:55053 | ask | Query | 2408 | Waiting for table level lock |
select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 42 | dbproxy | 192.168.201.243:55059 | ask | Query | 2390 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 46 | dbproxy | 192.168.201.243:55062 | ask | Query | 2437 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 50 | dbproxy | 192.168.201.243:55065 | ask | Query | 2407 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 51 | dbproxy | 192.168.201.243:55069 | ask | Query | 2393 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 55 | dbproxy | 192.168.201.218:37714 | ask | Query | 2434 | Waiting for table level lock |
SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
| 82 | dbproxy | 192.168.201.243:55224 | ask | Query | 2425 | Waiting for table level lock |
select q.question_id,q.question_cate_id,q.question_title,q.question_subtitle,q.question_addtime from |
| 120 | dbproxy | 192.168.201.243:55455 | ask | Query | 2439 | User sleep |
select * from ask_question where question_id=''+(select(sleep(3)))+'' |
| 258 | dbproxy | 192.168.201.243:54612 | ask | Query | 2409 | Waiting for table level lock |
select * from ask_question where question_id=''+(select(sleep(3)))+'' |
| 259 | dbproxy | 192.168.201.243:54615 | ask | Query | 2436 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 1269 | dbproxy | 192.168.201.243:39716 | ask | Query | 2412 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 1270 | dbproxy | 192.168.201.243:39720 | ask | Query | 2416 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 1610 | dbproxy | 192.168.201.243:59369 | ask | Query | 2435 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 1614 | dbproxy | 192.168.201.243:59372 | ask | Query | 2414 | Waiting for table level lock |
select * from `ask_question` where question_id = '618401' |
| 2080 | dbproxy | 192.168.201.243:37535 | ask | Query | 2406 | Waiting for table level lock |
select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 2081 | dbproxy | 192.168.201.243:37538 | ask | Query | 2407 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4866 | dbproxy | 192.168.201.243:37556 | ask | Query | 2436 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4867 | dbproxy | 192.168.201.243:37559 | ask | Query | 2430 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4868 | dbproxy | 192.168.201.243:37562 | ask | Query | 2426 | Waiting for table level lock |
SELECT COUNT(*) AS `numrows`
FROM (`ask_question`)
WHERE `question_uid` = '1112844184'
AND `question |
| 4869 | dbproxy | 192.168.201.243:37565 | ask | Query | 2398 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4870 | dbproxy | 192.168.201.243:37568 | ask | Query | 2435 | Waiting for table level lock |
select q.question_id tid,q.question_username author,q.question_uid authorid,q.question_title subject |
| 4871 | dbproxy | 192.168.201.243:37571 | ask | Query | 2413 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4872 | dbproxy | 192.168.201.243:37574 | ask | Query | 2415 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4873 | dbproxy | 192.168.201.243:37577 | ask | Query | 2404 | Waiting for table level lock |
SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 4874 | dbproxy | 192.168.201.243:43949 | ask | Query | 2421 | Waiting for table level lock |
SELECT COUNT(1) AS count FROM ask_question WHERE (question_uid=793 OR question_expert_id=43) AND que |
| 4876 | dbproxy | 192.168.201.243:43971 | ask | Query | 2419 | Waiting for table level lock |
SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 4877 | dbproxy | 192.168.201.243:43974 | ask | Query | 2400 | Waiting for table level lock |
select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 4878 | dbproxy | 192.168.201.243:43977 | ask | Query | 2393 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 9496 | dbproxy | 192.168.201.218:34009 | ask | Query | 2430 | Waiting for table level lock |
SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
| 15948 | dbproxy | 192.168.201.243:45947 | ask | Query | 2433 | Waiting for table level lock |
SELECT COUNT(*) AS `numrows`
FROM (`ask_question`)
WHERE `question_uid` = '43606'
AND `question_cate |
| 15949 | dbproxy | 192.168.201.243:45952 | ask | Query | 2438 | Waiting for table level lock |
select q.*,c.category_name,c.category_parent_id from ask_question as q left join ask_category as c o |
| 15950 | dbproxy | 192.168.201.243:45956 | ask | Query | 2425 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 932995 | dbproxy | 192.168.201.243:33088 | ask | Query | 2401 | Waiting for table level lock |
select q.question_id,q.question_cate_id,q.question_title,q.question_subtitle,q.question_addtime from |
| 2055023 | dbproxy | 192.168.201.243:41700 | ask | Query | 2390 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055027 | dbproxy | 192.168.201.243:41717 | ask | Query | 2392 | Waiting for table level lock |
SELECT COUNT(1) AS count FROM ask_question WHERE question_uid=745 |
| 2055028 | dbproxy | 192.168.201.243:41721 | ask | Query | 2392 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055029 | dbproxy | 192.168.201.243:41724 | ask | Query | 2392 | Waiting for table level lock |
select q.question_id,q.question_cate_id,q.question_title,q.question_subtitle,q.question_addtime from |
| 2055030 | dbproxy | 192.168.201.243:41727 | ask | Query | 2390 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055031 | dbproxy | 192.168.201.243:41730 | ask | Query | 2378 | Waiting for table level lock |
select * from ask_question where question_id=''+(select(sleep(3)))+'' |
| 2055032 | dbproxy | 192.168.201.243:41733 | ask | Query | 2389 | Waiting for table level lock |
select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 2055033 | dbproxy | 192.168.201.243:41736 | ask | Query | 2378 | Waiting for table level lock |
SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 2055034 | dbproxy | 192.168.201.243:41739 | ask | Query | 2388 | Waiting for table level lock |
select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 2055036 | dbproxy | 192.168.201.243:41742 | ask | Query | 2388 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055039 | dbproxy | 192.168.201.243:41747 | ask | Query | 2383 | Waiting for table level lock |
select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 2055040 | dbproxy | 192.168.201.243:41752 | ask | Query | 2381 | Waiting for table level lock |
SELECT COUNT(*) AS `numrows`
FROM (`ask_question` as q)
LEFT JOIN `ask_question_attention` as qa ON |
| 2055044 | dbproxy | 192.168.201.243:41764 | ask | Query | 2379 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055047 | dbproxy | 192.168.201.243:41789 | ask | Query | 2377 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055051 | dbproxy | 192.168.201.243:41792 | ask | Query | 2371 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055052 | dbproxy | 192.168.201.243:41799 | ask | Query | 2376 | Waiting for table level lock |
SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 2055053 | dbproxy | 192.168.201.243:41804 | ask | Query | 2367 | Waiting for table level lock |
SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 2055054 | dbproxy | 192.168.201.243:41810 | ask | Query | 2373 | Waiting for table level lock |
SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055055 | dbproxy | 192.168.201.243:41819 | ask | Query | 2368 | Waiting for table level lock |
select * from `ask_question` where question_id = '626326' |
| 2055063 | dbproxy | 192.168.201.243:41843 | ask | Query | 2367 | Waiting for table level lock |
SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 2056431 | root | 127.0.0.1:60233 | NULL | Query | 0 | NULL | show
processlist |
+---------+-------------+-----------------------+------+---------+---------+----------------------------------

引起原因:锁表(全局表读锁)

4.kill线程
kill不掉

5.重启mysql
[root@JQ-pdt-host-2 ~]# mysqld_multi stop 7 --password=password
[root@JQ-pdt-host-2 ~]# mysqld_multi report 7
Reporting MySQL servers
MySQL server from group: mysqld7 is not running
[root@JQ-pdt-host-2 ~]# mysqld_multi start 7
[root@JQ-pdt-host-2 ~]# mysqld_multi report 7
Reporting MySQL servers
MySQL server from group: mysqld7 is running

6.确认复制
mysql> show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.201.216
Master_User: dbslave
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: mysql-bin.000109
Read_Master_Log_Pos: 674890825
Relay_Log_File: mysql-relay-bin.000033
Relay_Log_Pos: 66526
Relay_Master_Log_File: mysql-bin.000109
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql,test,performance_schema
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 674890825
Relay_Log_Space: 66682
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 33071
1 row in set (0.00 sec)
正常

7.确认mmm
[root@JQ-pdt-host-3 ~]# mmm_control @3307 show
db2(192.168.201.2) slave/ADMIN_OFFLINE. Roles:
db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.21), reader(192.168.201.22), reader(192.168.201.23)
db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
db217(192.168.201.217) slave/REPLICATION_DELAY. Roles:
db3(192.168.201.3) master/ONLINE. Roles:

[root@JQ-pdt-host-3 ~]# mmm_control @3307 set_online db2
OK: State of 'db2' changed to ONLINE. Now you can wait some time and check its new roles!
[root@JQ-pdt-host-3 ~]# mmm_control @3307 show
db2(192.168.201.2) slave/ONLINE. Roles: reader(192.168.201.22)
db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.21), reader(192.168.201.23)
db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
db217(192.168.201.217) slave/REPLICATION_DELAY. Roles:
db3(192.168.201.3) master/ONLINE. Roles:


8.另外一台场一样操(不用se offline)
[root@JQ-pdt-host-217 ~]# mysqld_multi stop 7 --password=password
[root@JQ-pdt-host-217 ~]# mysqld_multi report 7
Reporting MySQL servers
MySQL server from group: mysqld7 is not running
[root@JQ-pdt-host-217 ~]# mysqld_multi start 7
[root@JQ-pdt-host-3 ~]# mmm_control @3307 show
db2(192.168.201.2) slave/ONLINE. Roles: reader(192.168.201.22)
db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.23)
db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
db217(192.168.201.217) slave/ONLINE. Roles: reader(192.168.201.21)
db3(192.168.201.3) master/ONLINE. Roles:

9.联系开发 提交slow-log

在这个过程中后三台slave搞锁了


10.查询前端错误和访问日志
grep 'share.html' /var/log/nginx/ask_access.log
183.60.196.29 - - [05/May/2014:16:47:19 +0800] "GET /question/share.html?
question_id=2000000000%27%28%22%29%29%28%28%29%27%27 HTTP/1.1" 500 5 "-" "sqlmap/0.9 (http://sqlmap.sourceforge.net)"

183.60.196.29 - - [05/May/2014:16:47:19 +0800] "GET /question/share.html?question_id=2000000000%29%20AND
%20993%3D1067%20AND%20%289822%3D9822 HTTP/1.1" 500 1293 "-" "sqlmap/0.9 (http://sqlmap.sourceforge.net)"

183.60.196.29 - - [05/May/2014:16:47:19 +0800] "GET /question/share.html?question_id=2000000000%20AND%205341%3D5341
HTTP/1.1" 500 1293 "-" "sqlmap/0.9 (http://sqlmap.sourceforge.net)"
14.17.87.22 - - [05/May/2014:16:49:39 +0800] "GET /question/share.html?question_id=999999.9+union+all+select
+0x31303235343830303536%2C0x31303235343830303536%2C0x31303235343830303536%2C0x31303235343830303536%2C0x31303235343830
303536-- HTTP/1.1" 500 1293 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"
14.17.87.22 - - [05/May/2014:16:49:45 +0800] "GET /question/share.html?question_id=2000000000%3B+if+
%281%3D1%29+waitfor+delay+%2700%3A00%3A01%27-- HTTP/1.1" 500 5 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT
5.1; SV1; .NET CLR 2.0.50727) Havij"

14.17.87.22 - - [05/May/2014:16:49:45 +0800] "GET /question/share.html?question_id=2000000000+and+if
%281%3D1%2CBENCHMARK%28218400%2CMD5%280x41%29%29%2C0%29 HTTP/1.1" 500 1293 "-" "Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij
14.17.87.22 - - [05/May/2014:16:49:46 +0800] "GET /question/share.html?question_id=2000000000%27+and+if%28Length
%28%28database%28%29%29%29%3C32%2CBENCHMARK%28105380%2CMD5%280x41%29%29%2C0%29+and+%27x%27%3D%27x HTTP/1.1" 500 1293
"-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"
14.17.87.22 - - [05/May/2014:16:49:46 +0800] "GET /question/share.html?question_id=2000000000%27+and+if%28Length
%28%28database%28%29%29%29%3C16%2CBENCHMARK%28105380%2CMD5%280x41%29%29%2C0%29+and+%27x%27%3D%27x HTTP/1.1" 500 1293
"-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"

14.17.87.22 - - [05/May/2014:16:49:46 +0800] "GET /question/share.html?question_id=2000000000%27+and+if%28Length
%28%28database%28%29%29%29%3C4%2CBENCHMARK%28105380%2CMD5%280x41%29%29%2C0%29+and+%27x%27%3D%27x HTTP/1.1" 500 1293
"-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"
121.8.210.24 - - [05/May/2014:18:43:53 +0800] "GET /question/share.html?question_id='and 1=(if(2>1,1,0)) and '
HTTP/1.1" 500 1293 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:43:57 +0800] "GET /question/share.html?
question_id='and 1=(if(2>1,1,')) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:01
+0800] "GET /question/share.html?question_id='and 1=(if(2>1,1,')) and ' HTTP/1.1" 500 5 "http://ask.qq.com"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:03 +0800] "GET /question/share.html?question_id='and 1=(if(2>1,1,)) and '
HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:25 +0800] "GET /question/share.html?
question_id='and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1;
WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - -
[05/May/2014:18:44:26 +0800] "GET /question/share.html?question_id='and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1"
500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:27 +0800] "GET /question/share.html?
question_id='and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1;
WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - -
[05/May/2014:18:44:34 +0800] "GET /question/share.html?question_id=' and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1"
500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:35 +0800] "GET /question/share.html?
question_id=' and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT
6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - -
[05/May/2014:18:46:58 +0800] "GET /question/share.html?question_id=' union select(0)from(select(sleep(9))) HTTP/1.1"
500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:46:58 +0800] "GET /question/share.html?
question_id=' union select(0)from(select(sleep(9))) HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1;
WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - -
[05/May/2014:18:46:59 +0800] "GET /question/share.html?question_id=' union select(0)from(select(sleep(9))) HTTP/1.1"
500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:47:37 +0800] "GET /question/share.html?
question_id='%2b(select(0)from(select(sleep(0)))v)%2b' HTTP/1.1" 500 1293 "http://ask.qq.com" "Mozilla/5.0 (Windows
NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"


11.确定原因
SQL注入引起的, 日志内可见如下访问:
"GET /question/share.html?question_id=620779%20AND%20100%3E(SELECT%20COUNT(*)%20FROM
%20information_schema.columns,information_schema.columns%20T2,information_schema.columns
%20T3,information_schema.columns%20T4,information_schema.columns%20T5,information_schema.columns
%20T6,information_schema.columns%20T7,information_schema.columns%20T8,information_schema.columns
%20T9,information_schema.columns%20T10%20WHERE%207551=7551)%20AND%20(SELECT%20LENGTH(CURRENT_USER))%20%3C
%202147483647 HTTP/1.1" 200 5 "-" "-"
"GET /question/share.html?question_id='%2b(select(0)from(select(sleep(3)))v)%2b'%22%2b(select(0)from(select(sleep
(0)))v)%2b%22*/ HTTP/1.1" 500

排查 web_code/application/controllers/quesiton.php::share 对question_id未做SQL注入防范,已修正

12.解决办法
开发patch补丁


总结:
本次故障原因
1.开发未做SQL注入防范,漏洞被别人抓住
2.SQL注入引起slave锁表,进一步造成slave SQL线程无法正常工作,脱离MMM

MMM故障排查总结
1.查看MMM状态
2.查看MYSQL服务状态
3.查看单个mysql 同步状态和proceslist状态
4.查看单个mysql errorlog和slow log
5.前端access 和 error log

原文地址:https://www.cnblogs.com/diege/p/3664078.html