log file sync

<pre name="code" class="sql"><pre name="code" class="sql"><pre name="code" class="sql">　什么是log file sync等待事件呢？在一个提交（commit）十分频繁的数据库中，一般会出现log file sync等

待事件，当这个等待事件出现在top5中，
这个时侯我们需要针对log file sync等待事件进行优化，一定要尽快分析并解决问题，否则当log file sync等待时间从几毫秒直接到20几毫秒可能导致系统性能急剧下降，甚至

会导致短暂的挂起。

当一个会话提交事务,会话的redo信息需要被更新到redo logfile，用户SESSION会提交给LGWR写log buffer到redo file。当LGWR完成写后,会返回给用户SESSION。

Wait time: 包括些log buffer和传递

一般log file sycn的等待时间都非常短 1-5ms，log file sync的平均等待时间是小于5ms,不会有什么问题，但是一旦出问题，往往都比较难解决。

1、log file sync的原凶到底是什么？
频繁commit/rollback或磁盘I/O有问题，大量物理读写争用

2、log file sync平均等待事件时间到7毫秒算正常情况？评估log file sync等待事件的指标是什么？
对于OLTP，还算正常。但是对于批量处理，有点慢
指标是平均等待时间，以及AWR后续的Wait Event Histogram


（1）优化了redo日志的I/O性能，尽量使用快速磁盘，不要把redo log file存放在raid 5的磁盘上；
（2）加大日志缓冲区(log buffer)；
（3）使用批量提交，减少提交的次数；
（4）部分经常提交的事务设置为异步提交；
（5）适当使用NOLOGGING/UNRECOVERABLE等选项；
（6）采用专用网络，正确设置网络UDP buffer参数；

等待原因不是磁盘写速过慢就是commit提交过于频繁

Top 5 Timed Events

Event	                 Waits     	Time(s)	         Avg Wait(ms)	% Total Call Time	Wait Class
log file sync	         1,399,500	 91,457	             65	                67.7	         Commit
CPU time	 	 39,269	 	 29.1	 
db file sequential read	 966,426	 4,649                5	                3.4	      User I/O
log file parallel write	 113,657	 2,614	              23	 1.9	             System I/O
db file scattered read	 242,833	 1,050	              4	                 .8	     User I/O


当一个用户提交(commits)或者回滚(rollback),session的redo信息需要写出到redo logfile中.
用户进程将通知LGWR执行写出操作,LGWR完成任务以后会通知用户进程.
这个等待事件就是指用户进程等待LGWR的写完成通知.



对于回滚操作，该事件记录从用户发出rollback命令到回滚完成的时间.

如果该等待过多，可能说明LGWR的写出效率低下，或者系统提交过于频繁.
针对该问题，可以关注:
log file parallel write等待事件
user commits,user rollback等统计信息可以用于观察提交或回滚次数

解决方案:
1.提高LGWR性能
尽量使用快速磁盘，不要把redo log file存放在raid 5的磁盘上
2.使用批量提交
3.适当使用NOLOGGING/UNRECOVERABLE等选项

Statistic                 Total         per Second      per Trans
redo blocks written	4,280,179	1,182.32	4.71
redo writes	        113,652	          31.39	        0.13
user commits	        907,876	          250.78	1.00
每秒钟提交250次


lgwr刷新太慢可能会导致这个问题，导致lgwr刷新慢也有几种情况
  1.IO子系统太慢
  2.lgwr不能获得足够的cpu资源
  3.遭遇了大事务(expdp,insert /*+ append */ as ,imp,create as )
也可能是log buffer设置的太小了，不过在现在已经不太可能。默认的尺寸已经很大了。

后来发现为如下SQL效率低下导致；
update hb_work_order set state = '10F', return_data = sysdate where order_serial in ( select distinct out_serial from tf_ne_order mm where mm.state_code not 

in ('10I', '10D', 

'10E') and mm.out_serial in( select order_serial from hb_work_order a where a.create_date > sysdate - 1/6 and state = '10D' ) ) and order_serial not in ( 

select distinct 

out_serial from tf_ne_order mm where mm.state_code in ('10I', '10D', '10E') and mm.out_serial in( select order_serial from hb_work_order a where 

a.create_date > sysdate - 1/6 

and state = '10D' ) )and (select count(1) from tf_ne_order a where a.out_serial=order_serial)>0

总结:
一般来说如果log file parallel write事件的时间长或整个系统上相对等待的时间长,就可以理解为重做日志文件所在的I/O系统的性能存在问题。