ORACLE CBC LATCH 检查

###############
1.DB meet latch: cache buffers chains event

from awr report ,check
latch: cache buffers chains
point to "kcbgtcr: fast path
As far as I remember, this code path only happens for indexe.since it is a hot block:

2.https://blog.pythian.com/tuning-latch-contention-cache-buffers-chain-latches/ (重要)
following : no use ,beacuse sql_id can't point sql_text.it is starnge ,no sql_text in v$sql.

select /*+ materialize*/* from DBA_HIST_ACTIVE_SESS_HISTORY
where
sample_time between to_date('2018-12-14 16:00:00','yyyy-mm-dd hh24:mi:ss')
and to_date('2018-12-14 16:30:00','yyyy-mm-dd hh24:mi:ss')
and event='latch: cache buffers chains'

2.1.following : CBC point to RANGE SCAN INDEX

p1:504403164211105256
p2:155
PLAN:INDEX
SQL_PLAN:RANGE SCAN

http://tool.oschina.net/hexconvert/

将504403164211105256 从10进制转化成 16 机制70000016262ce00

select hladdr, file#, dbablk, decode(state,1,'cur ',3,'CR',state) ST, tch
from x$bh where hladdr in
(select addr from (select addr from v$latch_children where addr='70000016262ce00'
order by sleeps, misses,immediate_misses desc )where rownum <2)

4.no time to check hot block,because it is prod, we first kill all session to make app normal.
MOSC has this script to locate a hot block:

select /*+ RULE */
e.owner ||'.'|| e.segment_name segment_name,
e.extent_id extent#,
x.dbablk - e.block_id + 1 block#,
x.tch,
l.child#
from
sys.v$latch_children l,
sys.x$bh x,
sys.dba_extents e
where
x.hladdr = 'ADDR' and
e.file_id = x.file# and
x.hladdr = l.addr and
x.dbablk between e.block_id and e.block_id + e.blocks -1

##########change to awr find the suspect index segment, find the oorder by logicel read segment index
Segments by Logical Reads DB/Inst: EMS/ems Snaps: 51672-51673
EMSOPR EMS_DATA FND_ATM_ATTACHMENT_M INDEX 312,744,640 56.88

suspect the hot block is FND_ATM_ATTACHMENT_MULTI_N1
I_name:FND_ATM_ATTACHMENT_MULTI_N1
T_name:FND_ATM_ATTACHMENT_MULTI

##########ps

Collection commands for Hanganalyze and Systemstate: Non-RAC:
Sometimes, database may actually just be very slow and not actually hanging. It is therefore recommended, where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether processes are moving at all or whether they are "frozen".

Hanganalyze
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
-- Wait one minute before getting the second hanganalyze
oradebug hanganalyze 3
oradebug tracefile_name
exit

Systemstate
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266
oradebug dump systemstate 266
oradebug tracefile_name
exit

Collection commands for Hanganalyze and Systemstate: RAC
There are 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 very costly. Therefore without these fixes in place it highly unadvisable to use these level

For information on these patches see:
Document 11800959.8 Bug 11800959 - A SYSTEMSTATE dump with level >= 10 in RAC dumps huge BUSY GLOBAL CACHE ELEMENTS - can hang/crash instances
Document 11827088.8 Bug 11827088 - Latch 'gc element' contention, LMHB terminates the instance

Note: both bugs are fixed in 11.2.0.3.

Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088
For 11g:
sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 266
oradebug -g all dump systemstate 266
exit
Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088
sqlplus '/ as sysdba'
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
exit
For 10g, run oradebug setmypid instead of oradebug setorapname reco:
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug -g all hanganalyze 3
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 258
oradebug -g all dump systemstate 258
exit
In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.

#########sample 1:

刚开始以为是CBC LATCH . 后面发现还是SQL 性能问题，需要调优SQL ,因为 statuts 是"WAITED SHORT TIME", so "latch: cache buffers chains" is NOT current want event, it is last wait event

<==== session "State" column shows "WAITED SHORT TIME", so "latch: cache buffers chains" is NOT current want event, it is last wait event

V$SESSION_WAIT中的常用列

SID: session标识 EVENT: session当前等待的事件，或者最后一次等待事件。 WAIT_TIME: session等待事件的时间(单位，百分之一秒)如果本列为0，说明session当前session还未有任何等待。 SEQ#: session等待事件将触发其值自增长 P1, P2, P3: 等待事件中等待的详细资料 P1TEXT, P2TEXT, P3TEXT: 解释说明p1,p2,p3事件

附注： 1.State字段有四种含义﹕ Waiting：SESSION正等待这个事件。 Waited unknown time：由于设置了timed_statistics值为false，导致不能得到时间信息。表示发生了等待，但时间很短。 Wait short time：表示发生了等待，但由于时间非常短不超过一个时间单位，所以没有记录。 Waited knnow time：如果session等待然后得到了所需资源，那么将从waiting进入本状态。

Wait_time值也有四种含义：值>0：最后一次等待时间(单位：10ms)，当前未在等待状态。值=0：session正在等待当前的事件。值=-1：最后一次等待时间小于1个统计单位，当前未在等待状态。值=-2：时间统计状态未置为可用，当前未在等待状态。

3.Wait_time和Second_in_wait字段值与state相关: 如果state值为Waiting，那么wait_time值无用。Second_in_wait值

<==== session "State" column shows "WAITED SHORT TIME", so "latch: cache buffers chains" is NOT current want event, it is last wait event