ORA-01078、ORA-01565、ORA-17503、ORA-29701

OS:

Oracle Linux Server release 5.7

DB:

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

问题:

在RAC测试环境中,一个节点被测试同事重启服务器,导致此节点无法正常启动,报错信息如下:

SQL>startup

ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file "++DATA/ofcdb/spfileofcdb.ora"
ORA-17503: ksfdopn:2 Failed to open file "+DATA/ofcdb/spfileofcdb.ora"

ORA-29701:unable to connect to Cluster Synchronization Service

一、检查 CRS 状态

[root@ofc_node1 ~]# /home/oracle/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

二、检查CRS 的启动情况
[root@ofc_node1 ~]# /home/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS 
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE 
ora.cluster_interconnect.haip
1 ONLINE OFFLINE 
ora.crf
1 ONLINE ONLINE ofc_node1 
ora.crsd
1 ONLINE OFFLINE 
ora.cssd
1 ONLINE OFFLINE STARTING 
ora.cssdmonitor
1 ONLINE ONLINE ofc_node1 
ora.ctssd
1 ONLINE OFFLINE 
ora.diskmon
1 ONLINE OFFLINE 
ora.evmd
1 ONLINE OFFLINE 
ora.gipcd
1 ONLINE ONLINE ofc_node1 
ora.gpnpd
1 ONLINE ONLINE ofc_node1 
ora.mdnsd
1 ONLINE ONLINE ofc_node1

如上显示,ora.cssd 进程启动出现问题

三、检查ocssd 的日志
[oracle@ofc_node1 cssd]$ tail -20f /home/oracle/app/11.2.0/grid/log/ofc_node1/cssd/ocssd.log

2013-11-13 17:44:07.696: [ CSSD][1091463488]clssnmvDHBValidateNCopy: node 2, ofc_node2, has a disk HB, but no network HB, DHB has
rcfg 230109004, wrtcnt, 44243250, LATS 24064574, lastSeqNo 44243249, uniqueness 1361347113, timestamp 1384335843/1507552134
2013-11-13 17:44:08.697: [ CSSD][1091463488]clssnmvDHBValidateNCopy: node 2, ofc_node2, has a disk HB, but no network HB, DHB has
rcfg 230109004, wrtcnt, 44243251, LATS 24065574, lastSeqNo 44243250, uniqueness 1361347113, timestamp 1384335844/1507553134

发现大量如上所述的 日志信息

四、查找metalink,定位错误信息

五、检查网络信息,私网 eth1 果然有问题
[root@ofc_node1 ~]# /sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 00:1C:C4:94:9C:A6 
inet addr:192.168.12.179 Bcast:192.168.12.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:308822 errors:0 dropped:0 overruns:0 frame:0
TX packets:14067 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:20886515 (19.9 MiB) TX bytes:2593284 (2.4 MiB)
Interrupt:16 Memory:f8000000-f8012800

eth1 Link encap:Ethernet HWaddr 00:1C:C4:93:7D:EC 
inet addr:1.1.1.179 Bcast:1.1.1.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:78 errors:0 dropped:0 overruns:0 frame:0
TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7041 (6.8 KiB) TX bytes:12444 (12.1 KiB)
Interrupt:17 Memory:fa000000-fa012800

两个节点互相 Ping 不通

[root@ofc_node1 ~]# ping 1.1.1.180
PING 1.1.1.180 (1.1.1.180) 56(84) bytes of data.
From 1.1.1.179 icmp_seq=1 Destination Host Unreachable

六、重启网络设置
[root@ofc_node1 ~]# /etc/init.d/network restart

七、重启CRS
[root@ofc_node1 ~]# /home/oracle/app/11.2.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ofc_node1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ofc_node1'
CRS-2673: Attempting to stop 'ora.crf' on 'ofc_node1'
CRS-2677: Stop of 'ora.crf' on 'ofc_node1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'ofc_node1'
CRS-2677: Stop of 'ora.mdnsd' on 'ofc_node1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'ofc_node1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ofc_node1'
CRS-2677: Stop of 'ora.gpnpd' on 'ofc_node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ofc_node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@ofc_node1 ~]# /home/oracle/app/11.2.0/grid/bin/crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services

[root@ofc_node1 ~]# /home/oracle/app/11.2.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

八、检查crs 状态
[root@ofc_node1 ~]#
[root@ofc_node1 ~]# /home/oracle/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

[oracle@ofc_node1 ~]$ crs_stat -t
Name Type Target State Host 
------------------------------------------------------------
ora.DATA.dg ora....up.type ONLINE ONLINE ofc_node1 
ora....ER.lsnr ora....er.type ONLINE ONLINE ofc_node1 
ora....N1.lsnr ora....er.type ONLINE ONLINE ofc_node2 
ora.asm ora.asm.type ONLINE ONLINE ofc_node1 
ora.cvu ora.cvu.type ONLINE ONLINE ofc_node2 
ora.gsd ora.gsd.type OFFLINE OFFLINE 
ora....network ora....rk.type ONLINE ONLINE ofc_node1 
ora.oc4j ora.oc4j.type ONLINE ONLINE ofc_node2 
ora....SM1.asm application ONLINE ONLINE ofc_node1 
ora....E1.lsnr application ONLINE ONLINE ofc_node1 
ora....de1.gsd application OFFLINE OFFLINE 
ora....de1.ons application ONLINE ONLINE ofc_node1 
ora....de1.vip ora....t1.type ONLINE ONLINE ofc_node1 
ora....SM2.asm application ONLINE ONLINE ofc_node2 
ora....E2.lsnr application ONLINE ONLINE ofc_node2 
ora....de2.gsd application OFFLINE OFFLINE 
ora....de2.ons application ONLINE ONLINE ofc_node2 
ora....de2.vip ora....t1.type ONLINE ONLINE ofc_node2 
ora.ofcdb.db ora....se.type ONLINE ONLINE ofc_node2 
ora.ons ora.ons.type ONLINE ONLINE ofc_node1 
ora.scan1.vip ora....ip.type ONLINE ONLINE ofc_node2

[oracle@ofc_node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Wed Nov 13 17:56:23 2013

Copyright (c) 1982, 2010, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL>

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DATA/ofcdb/datafile/system.256.780865119
+DATA/ofcdb/datafile/sysaux.257.780865121
+DATA/ofcdb/datafile/undotbs1.258.780865121
+DATA/ofcdb/datafile/users.259.780865121
+DATA/ofcdb/datafile/undotbs2.267.780865281

附文档: Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)
How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1)

原文地址:https://www.cnblogs.com/hankyoon/p/5174542.html