慢查询阻塞了xtrabackup进而阻塞以后的sql导致的系统瘫痪问题

收到开发反应一库的sql频繁超时,系统几乎瘫痪,无法执行任何操作,我登上库先查看到当前的线程,发现有大量的线程状态是

 Waiting for table flush

查看当前的事务

从昨天开始执行,到今天早晨还没执行完,具体原因还没深究,先将此线程释放,然后备份才可以flush table成功继而备份完成后后面一系列被阻塞的sql都得以正常运行

mysql> select * from information_schema.innodb_trxG
*************************** 1. row ***************************
                    trx_id: 192611452
                 trx_state: RUNNING
               trx_started: 2017-11-30 18:33:58
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 3688
       trx_mysql_thread_id: 352932171
                 trx_query: DELETE FROM xx WHERE xx IN(SELECT xx
                                                FROM xx WHERE Remarks LIKE xx)
       trx_operation_state: unlock_row
         trx_tables_in_use: 2
         trx_tables_locked: 2
          trx_lock_structs: 3688
     trx_lock_memory_bytes: 368848
           trx_rows_locked: 4
         trx_rows_modified: 0
   trx_concurrency_tickets: 0
       trx_isolation_level: READ COMMITTED
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0

后来想了一下每天的凌晨两点有物理备份,于是查看备份日志,发现果然是上面的事务阻塞了物理备份;

物理备份的整个流程

先记录当前redo log的序列号
171201 02:00:02 >> log scanned up to (54138135415)
xtrabackup: Generating a list of tablespaces
xtrabackup: using the full scan for incremental backup
xtrabackup: Starting 4 threads for parallel data files transfer
然后备份innodb库表
171201 02:00:12 [01] Copying .
备份完之后flush table;因为被阻塞,所以知道释放完事务后才成功
171201 02:00:17 Executing FLUSH NO_WRITE_TO_BINLOG TABLES...
接着开始备份非事务库表
171201 09:36:13 Executing FLUSH TABLES WITH READ LOCK... 171201 09:36:13 >> log scanned up to (54147795188) 171201 09:36:14 Starting to backup non-InnoDB tables and files 171201 09:36:14 [01] Copying .... xtrabackup: The latest check point (for incremental): '54138858140' xtrabackup: Stopping log copying thread. .171201 09:36:14 >> log scanned up to (54147795198) 171201 09:36:14 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
备份完之后释放表锁
171201 09:36:14 Executing UNLOCK TABLES 171201 09:36:14 All tables unlocked 171201 09:36:14 [00] Copying ib_buffer_pool to xxx 171201 09:36:14 [00] ...done 171201 09:36:14 Backup created in directory xxxx MySQL binlog position: xxx 171201 09:36:14 [00] Writing backup-my.cnf 171201 09:36:14 [00] ...done 171201 09:36:14 [00] Writing xtrabackup_info 171201 09:36:14 [00] ...done xtrabackup: Transaction log of lsn (54138129801) to (54147795198) was copied. 171201 09:36:15 completed OK!

被阻塞的语句是FLUSH NO_WRITE_TO_BINLOG TABLES...

官方解释flush tables

Closes all open tables, forces all tables in use to be closed, and flushes the query cache and prepared statement cache.

没有涉及到锁相关的字眼;但是测试表明在执行查询或者变更还未完成时,如果另起一个会话执行flush tables 则会被阻塞,

如果此后如果有操作慢查询中的表的任何sql都会被阻塞;

          

原文地址:https://www.cnblogs.com/Bccd/p/7940809.html