MySQL死锁

https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks.html

什么是mysql的死锁?

A deadlock is a situation where different transactions are unable to proceed because each holds a lock that the other needs. Because both transactions are waiting for a resource to become available, neither ever release the locks it holds. 

简单来说可以提炼出2个词:环路等待(each holds a lock that the other needs)和不可剥夺(neither ever release the locks it holds)。

其实广泛意义上死锁的四个必要条件也可以直接简化为上述两个条件,剩下的互斥和请求保持条件只是两个众所周知的补充。

一、一个简单的死锁示例:

会话A:

mysql> CREATE TABLE t (i INT) ENGINE = InnoDB;
Query OK, 0 rows affected (1.07 sec)

mysql> INSERT INTO t (i) VALUES(1);
Query OK, 1 row affected (0.09 sec)

mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t WHERE i = 1 LOCK IN SHARE MODE;
+------+
| i    |
+------+
| 1    |
+------+

会话B:

mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

mysql> DELETE FROM t WHERE i = 1;

此时会话B会被阻塞(直到锁请求超时)。

此时会话A继续执行:
DELETE FROM t WHERE i = 1;

会话B会被立马rollback,因为产生了死锁,最近的死锁信息可以通过show engine innodb statusG看到。

打开innodb_print_all_deadlocks参数之后,死锁信息还会在error日志里打印。鉴于本例过于简单就不占用篇幅分析死锁信息了。

set @@global.innodb_print_all_deadlocks=on;

innodb会选择耗费资源较少的事务进行回滚(取决于DML涉及的行数和size)。

二、一个实际的死锁示例:

error日志里显示的死锁日志为:

InnoDB: transactions deadlock detected, dumping detailed information.
*** (1) TRANSACTION:
TRANSACTION 209262583957, ACTIVE 1 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 129183854, OS thread handle 0x7f1aeae7a700, query id 68320628504 <服务器A信息> updating
update  tb_authorize_info set account_balance=account_balance-  100.00 
     where (SELECT a.account_balance from 
(select account_balance from tb_authorize_info a where appId =  '49E5BD695F853DC3' )a)  -  100.00 > 0 
 and appId = '49E5BD695F853DC3'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1845 page no 4 n bits 96 index `PRIMARY` of table `xxx`.`tb_authorize_info` trx id 209262583957 lock_mode X locks rec but not gap waiting
Record lock, heap no 18 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
......

*** (2) TRANSACTION:
TRANSACTION 209262584968, ACTIVE 1 sec starting index read
mysql tables in use 2, locked 2
4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 129183879, OS thread handle 0x7f198b208700, query id 68320632234 <服务器B信息> updating
update  tb_authorize_info set account_balance=account_balance-  100.00 
     where (SELECT a.account_balance from 
(select account_balance from tb_authorize_info a where appId =  '49E5BD695F853DC3' )a)  -  100.00 > 0 
 and appId = '49E5BD695F853DC3'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1845 page no 4 n bits 96 index `PRIMARY` of table `xxx`.`tb_authorize_info` trx id 209262584968 lock mode S locks rec but not gap
Record lock, heap no 18 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
......

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1845 page no 4 n bits 96 index `PRIMARY` of table `xxx`.`tb_authorize_info` trx id 209262584968 lock_mode X locks rec but not gap waiting
Record lock, heap no 18 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
......

*** WE ROLL BACK TRANSACTION (2)

这个死锁属于简单的死锁,由于网络或其他延迟导致应用请求发送到了2台负载均衡的应用服务器,两个应用程序同时请求数据库执行SQL,两者都根据where条件先获取到了S锁,然后准备升级为(或新增)X锁以便更新,但是各自被对方的S锁阻塞,因此形成死锁,不过死锁很快被mysql杀掉,事务1正常执行完毕,事务二回滚,前台业务除了一点点延迟基本没啥影响。

三、stackoverflow上另一个死锁:

有人在stackoverflow上发了一个死锁的信息,尝试直接解析此类信息对分析高并发下的SQL卡慢会有帮助因此尝试自己解析。

https://dba.stackexchange.com/questions/39550/when-and-why-can-this-kind-of-deadlock-occur

LATEST DETECTED DEADLOCK
------------------------
130409  0:40:58
*** (1) TRANSACTION:
TRANSACTION 3D61D41F, ACTIVE 3 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 43 lock struct(s), heap size 6960, 358 row lock(s), undo log entries 43
MySQL thread id 17241690, OS thread handle 0x7ffd3469a700, query id 860259163 localhost root update
#############
INSERT INTO `notification` (`other_grouped_notifications_count`, `user_id`, `notifiable_type`, `action_item`, `action_id`, `created_at`, `status`, `updated_at`) 
VALUES (0, 4442, 'MATCH', 'MATCH', 224716, 1365448255, 1, 1365448255)
#############
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 272207 n bits 1272 index `user_id` of table `notification` trx id 3D61D41F lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 69 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 8000115b; asc    [;;
 1: len 4; hex 0005e0bb; asc     ;;
-- 事务1欲插入数据user_id=4442,因此首先获取了对应主键(lower_bound,4443]范围上的插入意向锁,然后想要在辅助索引(lower_bound,4443]的范围上加insert intention lock,但被阻塞,推断这个范围上已经有了其他事务的行锁
-- 事务1需要获取2个插入意向锁后才会开始插入操作,这两个锁的获取是不可分割的
*** (2) TRANSACTION:
TRANSACTION 3D61C472, ACTIVE 15 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1248, 2 row lock(s)
MySQL thread id 17266704, OS thread handle 0x7ffd34b01700, query id 860250374 localhost root Updating
#############
UPDATE `notification` SET `status`=0 WHERE user_id = 4443 and status=1
#############
*** (2) HOLDS THE LOCK(S):
-- 事务2的update语句要更新user_id=4443的记录,因此首先在user_id索引的(lower_bound,4443]范围添加了X模式的next-key行锁,事务1就是被这个next-key行锁阻塞的
RECORD LOCKS space id 0 page no 272207 n bits 1272 index `user_id` of table `notification` trx id 3D61C472 lock_mode X
Record lock, heap no 69 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 8000115b; asc    [;;
 1: len 4; hex 0005e0bb; asc     ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
-- 当事务2尝试更新主键数据时要获取user_id=4443对应主键的行锁,但是发现主键的(lower_bound,4443]范围上已经被事务1加了insert intention lock,因此被阻塞
-- 同样事务2获取辅助索引的next-key和主键的record锁也是不可分割的,只有都获取完毕才能进行update
RECORD LOCKS space id 0 page no 261029 n bits 248 index `PRIMARY` of table `notification` trx id 3D61C472 lock_mode X locks rec but not gap waiting
Record lock, heap no 161 PHYSICAL RECORD: n_fields 16; compact format; info bits 0
 0: len 4; hex 0005e0bb; asc     ;;
 1: len 6; hex 00000c75178f; asc    u  ;;
 2: len 7; hex 480007c00c1d10; asc H      ;;
 3: len 4; hex 8000115b; asc    [;;
 4: len 8; hex 5245474953544552; asc REGISTER;;
 5: SQL NULL;
 6: SQL NULL;
 7: SQL NULL;
 8: len 4; hex d117dd91; asc     ;;
 9: len 4; hex d117dd91; asc     ;;
 10: len 1; hex 80; asc  ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;
 14: SQL NULL;
 15: len 4; hex 80000000; asc     ;;

*** WE ROLL BACK TRANSACTION (2)

所以这个死锁的出现就很容易理解了,事务1先获取了4442位置主键的插入意向锁,在获取辅助索引上的插入意向锁时被事务2 update语句的next-key行锁阻塞导致插入意向锁获取失败,而事务2的update获取了索引的next-key行锁后尝试更新主键(即在主键上加非gap行锁)却被事务1的插入意向锁阻塞。

两个事务都不能放弃自己已有的资源,都请求与对方不兼容的锁,不可剥夺且形成环路等待因此死锁。
这个死锁的根源就在于事务2的update语句持续的时间过长,导致后继insert语句卡死。

四、如何避免死锁?

其实官网有一篇完整的介绍:https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html

但是内容有点多,我还是习惯用几句话总结下:

1、尽可能优化SQL的查询性能使得事务尽可能的短小。

2、如果不介意幻读可以使用read committed隔离级别以禁止范围锁。

3、如果前两者都做不到或者SQL优化的空间比较小,那么尽量分表分库,通过增加资源(或者叫分散资源)减少资源冲突的几率。

五、总结:

由于mysql innodb特殊的行锁机制,死锁通常都是涉及到插入意向锁和next-key锁的,因为这两个锁是范围锁,范围锁设计的目的就是为避免幻读,这会锁定一些自己不需要操作的记录。

不过在mysql中死锁从来都不是大问题,死锁通常都是数据库卡慢的果,而非因。而且由于数据库中普遍存在的死锁查杀机制,死锁产生后会很快被查杀。

真正可能引发数据库性能问题的,是高并发下的长事务,这种事务会导致undo等资源的争用,会占用binlog的提交队列导致后继事务处于commit阶段无法提交,即便强制kill也会引发长时间的rollback操作。

因此高并发下的长事务和低性能SQL才是死锁的主因,因为他们慢且作为一个整体在完成之前不会释放资源产生环路等待。

原文地址:https://www.cnblogs.com/leohahah/p/12067100.html