High waits on control file sequential read

High waits on control file sequential read (文档 ID 2277867.1)

In case we run into an issue where control file sequential read is the top event on the AWR report and we cannot see I/O issues on tablespace stats.

Example:

Tablespace IO Stats
ordered by IOs (Reads + Writes) desc
Tablespace Reads Av Rds/s Av Rd(ms) Av Blks/Rd 1-bk Rds/s Av 1-bk Rd(ms) Writes Writes avg/s Buffer Waits Av Buf Wt(ms)
SYSAUX 23,798 3 0.33 1.00 1,247 3.30 0 0 0 0.00
SYSTEM 13,242 2 0.27 1.00 155 1.84 0 0 0 0.00

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
DB CPU 172.9 46.3
control file sequential read 91,690 48.5 1 13.0 System I/O >>>>> almost 50% of db time!
gc current block busy 13 35 2692 9.4 Cluster

The control file sequential read Oracle metric indicates the process is waiting for blocks to be read from a control file. This happens in many cases. For example, while:

making a backup of the controlfiles -- rman in process
sharing information (between instances) from the controlfile -- RAC system
reading other blocks from the controlfiles
reading the header block

Because the controlfile contains the SCN's for the last transactions, the controlfiles are constantly being updated. Normally, the I/O associated with the controlfile is small, but you can see high controlfile waits in the following circumstances:

High DML periods - The controlfiles are continuously updated during DML (update) activity.
Conflicting DML - One common cause of high controlfile write waits is running batch updates while running an Oracle backup.
Bad disk placement - Placing several copies of a controlfile on the same disk platter can cause disk enqueues and a system-wide slowdown.
Too many control files - Remember, you have to write to all controlfiles.
Manual COMMITS and log switches - Custom code that issues frequent COMMIT processing must write to the controlfile, as also when switching log files.

Looking at the AWR report we can see:

IOStat by Filetype summary

'Data' columns suffixed with M,G,T,P are in multiples of 1024 other columns suffixed with K,M,G,T,P are in multiples of 1000
Small Read and Large Read are average service times, in milliseconds
Ordered by (Data Read + Write) desc

Filetype Name Reads: Data Reqs per sec Data per sec Writes: Data Reqs per sec Data per sec Small Read Large Read
Control File 71.1G 23.40 3.369M 1.2G 2.82 .056M 2.60 17.80

This means 71.1G on controlfile reads which is very huge.

rman@apsrp0189 (TNS V1-V3) begin dbms_rcvman.getRequiredS...
29,781 29,781 167 178.33 0.00 12.41 5mw55dp9p6nyt rman@apsrp0189 (TNS V1-V3) select nvl(max(al.recid), '0')...
11,022 11,022 167 66.00 0.00 4.59 gf87bnxzf9raf rman@apsrp0189 (TNS V1-V3) select db.version_time, db.con...
9,872 9,872 54 182.81 0.00 4.11 7u9ndtp0gcmu8 rman@apsrp0189 (TNS V1-V3) SELECT NVL(MAX(AL.NEXT_CHANGE#...

This means a backup is going on during that period this could be the cause of the high number of reads.

We need to examine:

1. Where the controlfiles are placed:

Are all in the same disk?

control_files +REDO01/<instance_name>/controlfile/current.256.856619303, +REDO02/spppr01/controlfile/current.256.856619303

2. Size of the control files

SQL> SELECT name FROM gv$controlfile;

SQL> SELECT * FROM gv$system_event WHERE event LIKE '%control%

SQL> SELECT event, wait_time, p1, p2, p3 FROM gv$session_wait WHERE event LIKE '%control%';

SQL> ALTER SESSION set events 'immediate trace name controlf level 3';

3. Check "Control File" for "IOStat by Filetype summary" of AWR report and compare the figure for both good and bad time: is there a big difference between good & bad time?

5. If log switch is slow (you can observe it in alert log), and you may also see excessive controlfile reads.

sampe 1:


IOStat by Function summary
'Data' columns suffixed with M,G,T,P are in multiples of 1024 other columns suffixed with K,M,G,T,P are in multiples of 1000
ordered by (Data Read + Write) desc
Function Name Reads: Data Reqs per sec Data per sec Writes: Data Reqs per sec Data per sec Waits: Count Avg Tm(ms)
RMAN 243G 106.15 70.11M 1M 0.02 0M 386           42.13
Others 530M 9.47 .149M 36M 0.62 .01M 34.7K          126.45   <-rman 备份调用了大部分资源,导致AVG time 只有126  ms 和 134 ms
LGWR 63M 1.13 .018M 32M 0.63 .009M 5270      134.38
DBWR 0M 0.00 0M 5M 0.18 .001M 0
Direct Reads 1M 0.02 0M 1M 0.02 0M 0
Buffer Cache Reads 1M 0.03 0M 0M 0.00 0M 104 120.76
Direct Writes 0M 0.00 0M 0M 0.01 0M 0
TOTAL: 243.6G 116.79 70.277M 75M 1.48 .021M 40.5K 126.66

Top 10 Foreground Events by Total Wait Time


Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
control file sequential read 28,888 3715.7 129 99.1 System I/O                             <- 间接导致控制文件读 I/O 达到  129 ms.
log file sync 179 106.7 596 2.8 Commit

查看使用到大量control file sequential read 的sql 消耗时间达到了10s 以上。

这是正常现象。

原文地址:https://www.cnblogs.com/feiyun8616/p/8159215.html