Oracle“死锁”模拟

本着实验优先的原则，先模拟死锁的发生，然后在列一下死锁产生的四个必要条件和处理死锁的一般策略。

1.创建两个简单的表t1_deadlock和t2_deadlock，每个表中仅仅包含一个字段a
sys@ora10g> conn sec/sec
Connected.
sec@ora10g> create table t1_deadlock (a int);

Table created.

sec@ora10g> create table t2_deadlock (a int);

Table created.

2.每张表中仅初始化一条数据
sec@ora10g> insert into t1_deadlock values (1);

1 row created.

sec@ora10g> insert into t2_deadlock values (2);

1 row created.

sec@ora10g> commit;

Commit complete.

3.在第一个会话session1中更新表t1_deadlock中的记录“1”为“1000”，不进行提交
sec@ora10g> update t1_deadlock set a = 1000 where a = 1;

1 row updated.

4.在第二个会话session2中更新表t2_deadlock中的记录“2”为“2000”，不进行提交
sec@ora10g> update t2_deadlock set a = 2000 where a = 2;

1 row updated.

5.此时，没有任何问题发生。OK，现在注意一下下面的现象，我们再回到会话session1中，更新t2_deadlock的记录
sec@ora10g> update t2_deadlock set a = 2000 where a = 2;
这里出现了“锁等待”（“阻塞”）的现象，原因很简单，因为在session2中已经对这条数据执行过这个操作，在session2中已经对该行加了行级锁。
注意，这里是“锁等待”，不是“死锁”，注意这两个概念的区别！

检测“锁等待”的方法曾经提到过，请参考《【实验】【LOCK】“锁等待”模拟、诊断及处理方法》http://space.itpub.net/519536/viewspace-605526
sec@ora10g> @lock

lock lock
holder holder lock lock request blocked
username sessid SERIAL# type id1 id2 mode mode BLOCK sessid
-------- ------ ------- ------ ------ ---- ---- ------- ----- -------
SEC 141 6921 TM 15160 0 3 0 0
SEC 141 6921 TX 393231 1672 6 0 1 145
SEC 145 7402 TM 15159 0 3 0 0
SEC 145 7402 TM 15160 0 3 0 0
SEC 145 7402 TX 131077 1675 6 0 0
164 1 TS 3 1 3 0 0
165 1 CF 0 0 2 0 0
165 1 RS 25 1 2 0 0
165 1 XR 4 0 1 0 0
166 1 RT 1 0 6 0 0
167 1 PW 1 0 3 0 0

11 rows selected.

6.我们关注的“死锁”马上就要隆重出场了：在会话session2中，更新t1_deadlock的记录（满足了死锁产生的四个条件了，请慢慢体会）
sec@ora10g> update t1_deadlock set a = 1000 where a = 1;
这里还是长时间等待的现象，但是这里发生了“死锁”！！
细心的您再去第一个会话session1中看一下，原先一直在等待的SQL语句出现如下的现象：
sec@ora10g> update t2_deadlock set a = 2000 where a = 2;
update t2_deadlock set a = 2000 where a = 2
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

更进一步：查看一下alert警告日志文件发现有如下的记录
Mon Aug 10 11:24:29 2009
ORA-00060: Deadlock detected. More info in file /oracle/app/oracle/admin/ora10g/udump/ora10g_ora_25466.trc.

再进一步：看看系统自动生成的trace文件中记录了什么
这个文件包含了5721行的记录信息，截取其中我们关心的前面N多行的内容（结合刚才检测“锁等待”脚本产生的结果分析一下，看看有没有收获）：
/oracle/app/oracle/admin/ora10g/udump/ora10g_ora_25466.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options
ORACLE_HOME = /oracle/app/oracle/product/10.2.0/db_1
System name: Linux
Node name: testdb
Release: 2.6.18-53.el5xen
Version: #1 SMP Wed Oct 10 16:48:44 EDT 2007
Machine: x86_64
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 14
Unix process pid: 25466, image: oracle@testdb (TNS V1-V3)

*** 2009-08-10 11:24:29.541
*** ACTION NAME:() 2009-08-10 11:24:29.540
*** MODULE NAME:(SQL*Plus) 2009-08-10 11:24:29.540
*** SERVICE NAME:(SYS$USERS) 2009-08-10 11:24:29.540
*** SESSION ID:(145.7402) 2009-08-10 11:24:29.540
DEADLOCK DETECTED
[Transaction Deadlock]
Current SQL statement for this session:
update t2_deadlock set a = 2000 where a = 2
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00020005-0000068b 14 145 X 15 141 X
TX-0006000f-00000688 15 141 X 14 145 X
session 145: DID 0001-000E-0000037D session 141: DID 0001-000F-0000013D
session 141: DID 0001-000F-0000013D session 145: DID 0001-000E-0000037D
Rows waited on:

7.以上种种现象说明什么？
说明：Oracle对于“死锁”是要做处理的，而不是采用下面提到的“鸵鸟算法”不闻不问。
注意trace文件中的一行如下提示信息，说明一般情况下都是应用和人为的，和Oracle同学没有关系：
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL.

8.以上演示了一种“死锁”现象的发生，当然导致死锁发生的情况远远不仅如此。所以在程序设计时一定要好好的进行思考

9.【拓展】
死锁产生的四个必要条件
1）Mutual exclusion（互斥）：资源不能被共享，只能由一个进程使用。
2）Hold and wait（请求并保持）：已经得到资源的进程可以再次申请新的资源。
3）No pre-emption（不可剥夺）：已经分配的资源不能从相应的进程中被强制地剥夺。
4）Circular wait（循环等待条件）：系统中若干进程组成环路，改环路中每个进程都在等待相邻进程正占用的资源。

处理死锁的一般策略
1）鸵鸟算法忽略该问题
2）检测死锁并且恢复
3）仔细地对资源进行动态分配，以避免死锁
4）通过破坏死锁产生呢过的四个必要条件之一，来防止死锁产生

10.总结
死锁对于数据库来说是非常要命的，请多多注意！
对于上面的演示处理的方式：在会话session1中执行rollback进行回滚，会释放导致session2锁等待的锁资源（死锁Oracle已经处理了）。

转：http://www.cnblogs.com/tracy/archive/2011/08/24/2151594.html