[案例]Oracle11g RAC重启节点2-rac2,RAC不能正常提供服务

Oracle:11.2.0.4

Linux:RHEL6.8

2节点:rac1、rac2

主机名:ze02db01、ze02db02

故障复盘:

在节点2-ze02db02,停掉实例rac2

/u01/app/11.2.0/grid/bin/srvctl stop instance -d orcl 

/u01/app/11.2.0/grid/bin/srvctl start instance -d  orcl

此时,在节点1-ze02db01 ,查看数据库CRS状态不正常

ora.orcl.db
1 ONLINE ONLINE ze02db01 Open
2 ONLINE ONLINE ze02db02  starting ...

然后,我将在节点2-ze02db02

$ sqlplus / as sysdba

>startup 

查看数据库CRS状态不正常,尝试在节点1 对节点2,进行重启

/u01/app/11.2.0/grid/bin/srvctl stop instance -d orcl 

此时:RAC不能对外提供服务

[/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" 
failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 12:12:29.432: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" 
failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 12:12:29.636: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" 
failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 12:12:29.839: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" 
failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 12:12:30.043: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" 
failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 18:09:01.504: 
[crsd(9762)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.orcl.db'. Details at (:CRSPE00111:) {2:9343:228} in /u01/app/11.2.0/grid/log/ze02db01/crsd/crsd.log.
节点1alert日志信息
2020-07-17 18:50:45.700: [UiServer][2427864832]{1:2032:53986} Sending message to PE. ctx= 0x7f270000b850, Client PID: 12554
2020-07-17 18:50:45.700: [   CRSPE][2429966080]{1:2032:53986} Cmd : 0x7f270c113cf0 : flags: FORCE_TAG | HOST_TAG | QUEUE_TAG
2020-07-17 18:50:45.700: [   CRSPE][2429966080]{1:2032:53986} Processing PE command id=119569. Description: [Start Resource : 0x7f270c113cf0]
2020-07-17 18:50:45.702: [   CRSPE][2429966080]{1:2032:53986} Filtering duplicate ops: server [ze02db02] state [ONLINE]
2020-07-17 18:50:45.702: [   CRSPE][2429966080]{1:2032:53986} Op 0x7f270c00db10 has 16 WOs
2020-07-17 18:50:45.702: [   CRSPE][2429966080]{1:2032:53986} ICE has queued an operation. Details: Operation [START of [ora.orcl.db 2 1] on [ze02db02] : local=0, unplanned=00x7f270c00db10] c
annot run cause it needs W lock for: WO for Placement Path RI:[ora.orcl.db 2 1] server [ze02db02] target states [ONLINE INTERMEDIATE ], locked by op [START of [ora.orcl.db 2 1] on [ze02db02] : local=0, unplanned=00x7f270c0df540]. 
Owner: CRS-2682: It is locked by 'grid' for command 'Start Resource' issued from 'ze02db02'
2020-07-17 18:50:49.490: [   CRSPE][2429966080]{2:9343:273} Processing PE command id=323. Description: [Stat Resource : 0x7f270c00d8a0]
2020-07-17 18:50:51.506: [   CRSPE][2429966080]{2:9343:274} Processing PE command id=324. Description: [Stat Resource : 0x7f270c145e00]
2020-07-17 18:50:52.410: [   CRSPE][2429966080]{2:9343:275} Processing PE command id=325. Description: [Stat Resource : 0x7f270c145e00]
2020-07-17 18:50:53.070: [   CRSPE][2429966080]{2:9343:276} Processing PE command id=326. Description: [Stat Resource : 0x7f270c145e00]
2020-07-17 18:51:35.517: [UiServer][2425763584] CS(0x7f270400a270)set Properties ( grid,0x7f273c0dac90)
2020-07-17 18:51:35.527: [UiServer][2427864832]{1:2032:53987} Sending message to PE. ctx= 0x7f270000ac30, Client PID: 9882
2020-07-17 18:51:35.528: [   CRSPE][2429966080]{1:2032:53987} Processing PE command id=119570. Description: [Stat Resource : 0x7f270c00d8a0]
2020-07-17 18:51:35.528: [   CRSPE][2429966080]{1:2032:53987} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == ze02db01))
2020-07-17 18:51:35.529: [UiServer][2427864832]{1:2032:53987} Done for ctx=0x7f270000ac30

这个时候,进入SQLPLUS将实例关闭

$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Mon Jul 20 16:13:46 2020

Copyright (c) 1982, 2013, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SYS@orcl2> shutdown immediate

RAC的资源在节点1,可以正常提供服务

最终发现:

系统multipath -ll多路径软件不能读取共享磁盘

service multipathd restart
start_udev

多路径正常,重启节点2的crs、instance、nodeapp、listener 。RAC crs状态仍然不正常。出现lock问题

最终,通过重启节点2服务器, RAC正常

原文地址:https://www.cnblogs.com/elontian/p/13345742.html