转 禁用HAIP,cluster_interconnects配错了集群无法启动

简介:

在Oracle 11.2.0.2之前,私网的冗余一般是通过在OS上做网卡绑定(如Bond等)来实现的,从Oracle 11.2.0.2版本开始推出HAIP(Highly Available Virtual IP)技术替代了操作系统层面的网卡绑定技术,功能更强大、更兼容。HAIP通过其提供的独特的169.254.*网段的IP地址实现集群内部链接的高可用及负载均衡。所以,在11.2.0.2或更高版本安装RAC的时候需要注意169.254.*的IP地址不能被占用。有了HAIP技术则可以不再需要借助任何第三方的冗余技术来实现私网网卡的冗余。

资源ora.cluster_interconnect.haip将会启动一个到四个本地HAIP地址附在Private网络适配器上。通过HAIP完成Oracle RAC和ASM等内部通讯。如果某一个私有网卡物理损坏,那么该网卡上的HAIP地址会漂移到其它的可用的私有网络上。多个私网网卡可以在安装阶段定义,也可以在GRID配置完成之后,通过调用$GRID_HOME/bin/oifcfg setif工具(命令为:oifcfg setif -global eth2/192.168.1.0:cluster_interconnect)来配置HAIP。

HAIP的个数取决于GRID激活的私网网卡的个数。如果只有1块私网网卡,那么GRID将会创建1个HAIP。如果有两块私网网卡,那么GRID将会创建两个HAIP。若超过两块私网网卡则GRID创建4个HAIP。GRID最多支持4块私网网卡,而集群实际上使用的HAIP地址数则取决于集群中最先启动的节点中激活的私网网卡数目。如果选中更多的私网网卡作为Oracle的私有网络,那么多余4个的不能被激活。

管理ora.cluster_interconnect.haip这个资源的是ohasd.bin进程。其对应的log位于$GRID_HOME/log/<nodename>/ohasd/ohasd.log以及$GRID_HOME/log/<nodename>/agent/ohasd/orarootagent_root/orarootagent_root.log这两个位置。在HAIP资源online以后,通过操作系统命令ifconfig -a就能查看到多了类似于eth0:1的虚拟网卡,HAIP地址为169.254.X.X。当然也可以在数据库级别通过GV$CLUSTER_INTERCONNECTS视图查看HAIP的地址。HAIP对应的地址由系统自动分配,无法由用户手工进行指定。

Oracle数据库和ASM实例可以通过HAIP来实现私网通讯的高可用性和负载均衡。私网的流量会在这些私网网卡上实现负载均衡,如果某个网卡出现了故障,它上面的HAIP会自动切换到别的可用的私网网卡上,从而不影响私网的通讯。Windows平台目前不支持HAIP技术。

在有些客户环境下,私网是通过VLAN划出来的,而出于网络管理要求,VLAN的IP地址与网卡必须是绑定的,私网IP也必须是固定的IP地址(虽然按Oracle RAC的安装要求,私网应该是独立隔离的网络),这时HAIP会无法分配,导致依赖它的ASM资源无法启动。HAIP存在不少Bug,若不幸碰到,则可以将HAIP功能禁用掉。如果用户使用的是操作系统级别的绑定或者没有使用私网的绑定,那么可以通过在RDBMS和ASM的参数文件中设置CLUSTER_INTECONNECTS指定私网地址将HAIP覆盖(如果有多个私网地址,请用英文冒号分隔)。虽然说HAIP本身依然存在,但是ASM实例和RDBMS实例以后就不会使用HAIP。

以下步骤就是通过在RDBMS和ASM的参数文件中设置CLUSTER_INTECONNECTS指定私网地址将HAIP覆盖(如果有多个私网地址,请用英文冒号分隔)。虽然说HAIP本身依然存在,但是ASM实例和RDBMS实例以后就不会使用HAIP。

#######sample 0

cluster_interconnects

SQL> select * from v$cluster_interconnects;

NAME
---------------------------------------------
IP_ADDRESS IS_PUBLIC
------------------------------------------------ ---------
SOURCE
--------------------------------------------------------------------------------
en8
169.254.251.241 NO


SQL> show parameter spfile

NAME TYPE
------------------------------------ ---------------------------------
VALUE
------------------------------
spfile string
+db_OCR/db-cluster/asmpara
meterfile/registry.253.1006541


[grid@pdbdb02:/home/grid]$ asmcmd find --type ASMPARAMETERFILE +db_OCR "*"
+db_OCR/db-cluster/ASMPARAMETERFILE/REGISTRY.253.1006541113

[grid@pdbdb02:/home/grid]$ asmcmd lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED HIGH N 512 4096 1048576 5125 3869 2050 606 0 Y db_OCR/


sqllus / as sysasm
create pfile='/tmp/dba/pfile_asm.ora' from spfile='+db_OCR/db-cluster/asmparameterfile/registry.253.1006541113';

modify pfile_asm.ora
processes=170
sessions=180
shared_pool_size = 5G
large_pool_size = 1G
db_cache_size = 1G
sga_max_size=8192M

sqllus / as sysasm
shutdown abort
startup pfile='/tmp/dba/pfile_asm.ora';

create spfile='+NEW_DATA/spfileASM.ora' from pfile='/tmp/dba/pfile_asm.ora';

##We now see the ASM spfile itself (REGISTRY.253.843597139) and its alias (spfileASM.ora)
asmcmd ls -l +NEW_DATA/spfileASM.ora
Type Redund Striped Time Sys Name
N spfileASM.ora => +NEW_DATA/db-cluster/ASMPARAMETERFILE/REGISTRY.253.1006958153

/db/db/grid/11.2.0/bin/crsctl stop crs

/db/db/grid/11.2.0/bin/crsctl start crs


alter system set cluster_interconnects='190.0.1.201' sid='+ASM1' scope=spfile;
alter system set cluster_interconnects='190.0.1.202' sid='+ASM2' scope=spfile;

/db/db/grid/11.2.0/bin/crsctl stop crs

/db/db/grid/11.2.0/bin/crsctl start crs

SQL> select * from v$cluster_interconnects;

NAME
---------------------------------------------
IP_ADDRESS IS_PUBLIC
------------------------------------------------ ---------
SOURCE
--------------------------------------------------------------------------------
en8
190.0.1.201 NO
cluster_interconnects parameter

###sample1

ASM spfile discovery

So, how can the ASM instance read the spfile on startup, if the spfile is in a disk group that is not mounted yet? Not only that - the ASM doesn't really know which disk group has the spfile, or even if the spfile is in a disk group. And what is the value of the ASM discovery string?

The ASM Administration guide says this on the topic:

When an Oracle ASM instance searches for an initialization parameter file, the search order is:

  1. The location of the initialization parameter file specified in the Grid Plug and Play (GPnP) profile.
  2. If the location has not been set in the GPnP profile, then the search order changes to:
    1. SPFILE in the Oracle ASM instance home (e.g. $ORACLE_HOME/dbs/spfile+ASM.ora)
    2. PFILE in the Oracle ASM instance home


This does not tell us anything about the ASM discovery string, but at least it tells us about the spfile and the GPnP profile. It turns out the ASM discovery string is also in the GPnP profile. Here are the values from an Exadata environment:

$ gpnptool getpval -p=profile.xml -asm_dis -o-
o/*/*
$ gpnptool getpval -p=profile.xml -asm_spf -o-
+DBFS_DG/spfileASM.ora

There is no GPnP profile in a single instance set up, so this information is in the ASM resource (ora.asm), stored in the Oracle Local Repository (OLR). Here are the values from a single instance environment:

$ crsctl stat res ora.asm -p | egrep "ASM_DISKSTRING|SPFILE"
ASM_DISKSTRING=
SPFILE=+DATA/ASM/ASMPARAMETERFILE/registry.253.822856169

So far so good. Now the ASM knows where to look for ASM disks and where the spfile is. But the disk group is not mounted yet, as the ASM instance still hasn't started up, so how can ASM read the spfile?

The trick is in the ASM disk headers. To support the ASM spfile in a disk group, two new fields were added to the ASM disk header:

  • kfdhdb.spfile - Allocation unit number of the ASM spfile.
  • kfdhdb.spfflg - ASM spfile flag. If this value is 1, the ASM spfile is on this disk in allocation unit kfdhdb.spfile.

As part of the disk discovery process, the ASM instance reads the disk headers and looks for the spfile information. Once it finds the disks that have the spfile, it can read the actual initialization parameters.

Let's have a look at my disk group DATA. First check the disk group state and redundancy

$ asmcmd lsdg -g DATA | cut -c1-26
Inst_ID  State    Type
      1  MOUNTED  NORMAL

The disk group is mounted and the redundancy is normal. This means the ASM spfile will be mirrored, so we should see two disks with kfdhdb.spfile and kfdhdb.spfflg values set. Let's have a look:

$ for disk in `asmcmd lsdsk -G DATA --suppressheader`
> do
> echo $disk
> kfed read $disk | grep spf
> done
/dev/sdc1
kfdhdb.spfile:                       46 ; 0x0f4: 0x0000002e
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
/dev/sdd1
kfdhdb.spfile:                     2212 ; 0x0f4: 0x000008a4
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
/dev/sde1
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000

As we can see, two disks have the ASM spfile.

Let's check the contents of the Allocation Unit 46 on disk /dev/sdc1:

$ dd if=/dev/sdc1 bs=1048576 skip=46 count=1 | strings
+ASM.__oracle_base='/u01/app/grid'#ORACLE_BASE set from in memory value
+ASM.asm_diskgroups='RECO','ACFS'#Manual Mount
*.asm_power_limit=1
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0352732 s, 29.7 MB/s

The AU 46 on disk /dev/sdc1 indeed contains the ASM spfile.

ASM spfile alias block

In addition to the new ASM disk header fields, there is a new ASM metadata block type - KFBTYP_ASMSPFALS - that describes the ASM spfile alias. The ASM spfile alias block will be the last block in the ASM spfile.

Let's have a look at the last block of the Allocation Unit 46:

$ kfed read /dev/sdc1 aun=46 blkn=255
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           27 ; 0x002: KFBTYP_ASMSPFALS
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                     255 ; 0x004: blk=255
kfbh.block.obj:                     253 ; 0x008: file=253
kfbh.check:                   806373865 ; 0x00c: 0x301049e9
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfspbals.incarn:              822856169 ; 0x000: 0x310bc9e9
kfspbals.blksz:                     512 ; 0x004: 0x00000200
kfspbals.size:                        3 ; 0x008: 0x0003
kfspbals.path.len:                    0 ; 0x00a: 0x0000
kfspbals.path.buf:                      ; 0x00c: length=0

There is not much in this metadata block. Most of the entries have the block header info (fields kfbh.*). The actual ASM spfile alias data (fields kfspbals.*) has only few entries. The spfile file incarnation (822856169) is part of the file name (REGISTRY.253.822856169), the block size is 512 (bytes) and the file size is 3 blocks. The path info is empty, meaning I don't actually have the ASM spfile alias.

Let's create one. I will first create a pfile from the existing spfile and then create the spfile alias from that pfile.

$ sqlplus / as sysasm

SQL> create pfile='/tmp/pfile+ASM.ora' from spfile;

File created.

SQL> shutdown abort;
ASM instance shutdown

SQL> startup pfile='/tmp/pfile+ASM.ora';
ASM instance started

Total System Global Area 1135747072 bytes
Fixed Size                  2297344 bytes
Variable Size            1108283904 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

SQL> create spfile='+DATA/spfileASM.ora' from pfile='/tmp/pfile+ASM.ora';

File created.

SQL> exit

Looking for the ASM spfile again shows two entries:

$ asmcmd find --type ASMPARAMETERFILE +DATA "*"
+DATA/ASM/ASMPARAMETERFILE/REGISTRY.253.843597139
+DATA/spfileASM.ora

We now see the ASM spfile itself (REGISTRY.253.843597139) and its alias (spfileASM.ora). Having a closer look at spfileASM.ora confirms this is indeed the alias for the registry file:

$ asmcmd ls -l +DATA/spfileASM.ora
Type              Redund  Striped  Time             Sys  Name
ASMPARAMETERFILE  MIRROR  COARSE   MAR 30 20:00:00  N    spfileASM.ora => +DATA/ASM/ASMPARAMETERFILE/REGISTRY.253.843597139


Check the ASM spfile alias block now:

$ kfed read /dev/sdc1 aun=46 blkn=255
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           27 ; 0x002: KFBTYP_ASMSPFALS
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                     255 ; 0x004: blk=255
kfbh.block.obj:                     253 ; 0x008: file=253
kfbh.check:                  2065104480 ; 0x00c: 0x7b16fe60
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfspbals.incarn:              843597139 ; 0x000: 0x32484553
kfspbals.blksz:                     512 ; 0x004: 0x00000200
kfspbals.size:                        3 ; 0x008: 0x0003
kfspbals.path.len:                   13 ; 0x00a: 0x000d
kfspbals.path.buf:        spfileASM.ora ; 0x00c: length=13

Now we see that the alias file name appears in the ASM spfile alias block. Note the new incarnation number, as this is a new ASM spfile, created from the pfile.

Conclusion

Starting with ASM version 11.2, the ASM spfile can be stored in an ASM disk group. To support this feature, we now have new ASMCMD commands and new ASM metadata structures.

########

http://blog.itpub.net/7590112/viewspace-1410355/ 

挫挫把vip和priv ip傻傻分不清,  扔来这么一段
-------------------------begin-------------------
禁用HAIP,其中的将169.254.x.x修改为两个节点的private IP后
将所有节点的ASM实例及所有数据库实例的cluster_interconnects参数值修改为对应节点的private IP的值:
SQL> alter system set cluster_interconnects='83.16.193.38' sid='+ASM1' scope=spfile;
SQL> alter system set cluster_interconnects='83.16.193.40' sid='+ASM2' scope=spfile;
SQL> alter system set cluster_interconnects='83.16.193.38' sid='orcl1' scope=spfile;
SQL> alter system set cluster_interconnects='83.16.193.40' sid='orcl2' scope=spfile;

启动集群时报错,看报错是找不到private IP,83.16.193.138和83.16.193.140,现在重启了机器,ifconfig -a又能看到169.254.70.237这个ip了

eth1:1    Link encap:Ethernet  HWaddr 18:C5:8A:1A:3A:70  
          inet addr:169.254.70.237  Bcast:169.254.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:44 
启集群抱错
CRS-2672: Attempting to start 'ora.asm' on 'aas20150114l2'
CRS-5017: The resource action "ora.asm start" encountered the following error: 
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 83.16.193.138 not found. Check output from ifconfig command
. For details refer to "(:CLSN00107:)" in "/oracle/app/11.2.0/grid/log/aas20150114l1/agent/ohasd/oraagent_grid/oraagent_grid.log".
CRS-2674: Start of 'ora.asm' on 'aas20150114l1' failed
-------------------------扔来这么一段end-----------------------
开始排查

grid@AAS20150114L1:~> sqlplus '/as sysasm'
SQL*Plus: Release 11.2.0.3.0 Production on Tue Mar 3 14:09:46 2015
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 83.16.193.138 not found. Check output from ifconfig command
SQL> !

asm启不来,在pfile后面加上正确的cluster_interconnects
+ASM1.asm_diskgroups='DATDG'#Manual Mount
+ASM2.asm_diskgroups='DATDG'#Manual Mount
*.asm_diskstring='/dev/asmdisk*'
*.asm_power_limit=1
*.diagnostic_dest='/oracle/app/grid'
*.instance_type='asm'
*.large_pool_size=12M
*.memory_max_target=1572864000
*.memory_target=0
*.pga_aggregate_target=524288000
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=1048576000
+ASM1.cluster_interconnects='83.5.227.200'
+ASM2.cluster_interconnects='83.5.227.201'
用pfile启动
sqlplus '/as sysasm'
SQL> startup pfile='/oracle/app/11.2.0/grid/dbs/init+ASM1.ora';
ASM instance started

Total System Global Area 1603411968 bytes
Fixed Size                  2228784 bytes
Variable Size            1567628752 bytes
ASM Cache                  33554432 bytes
ASM diskgroups mounted

随后实例自动起来了。
SQL> select name,ip_address from v$cluster_interconnects;
NAME            IP_ADDRESS
--------------- ----------------
eth1:1          169.254.204.52

SQL> alter system set cluster_interconnects='83.5.227.200' sid='orcl1' scope=spfile;
System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 2.2556E+11 bytes
Fixed Size                  2241744 bytes
Variable Size            1.1489E+11 bytes
Database Buffers         1.1006E+11 bytes
Redo Buffers              604123136 bytes
Database mounted.
Database opened.

再到gi里
SQL> create spfile from pfile;
create spfile from pfile
*
ERROR at line 1:
ORA-17502: ksfdcre:4 Failed to create file
+DGOCR/aas-cluster/asmparameterfile/registry.253.873210539
ORA-15177: cannot operate on system aliases
缺少目录吧,进去建一个。
ASMCMD>cd +DGOCR/aas-cluster
ASMCMD> mkdir asmparameterfile

sqlplus '/as sysasm'
SQL> create spfile from pfile;
File created.

SQL> shutdown immediate;
ORA-15097: cannot SHUTDOWN ASM instance with connected client (process 17079)
直接重启整个集群正常
AAS20150114L1:/ # /oracle/app/11.2.0/grid/bin/crsctl stop cluster -all 
CRS-2673: Attempting to stop 'ora.crsd' on 'aas20150114l1'
CRS-2673: Attempting to stop 'ora.crsd' on 'aas20150114l2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'aas20150114l1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'aas20150114l1'
CRS-2673: Attempting to stop 'ora.oc4j' on 'aas20150114l1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'aas20150114l1'
CRS-2673: Attempting to stop 'ora.cvu' on 'aas20150114l1'
...。。。

AAS20150114L1:/ # /oracle/app/11.2.0/grid/bin/crsctl start cluster -all 
......。。。
CRS-2672: Attempting to start 'ora.crsd' on 'aas20150114l2'
CRS-2676: Start of 'ora.crsd' on 'aas20150114l1' succeeded
CRS-2676: Start of 'ora.crsd' on 'aas20150114l2' succeeded

###sample debug 方法:

os/db1/grid/11.2.0/log/sdb1db01/ohasd/ohasd.log.
/os/db1/grid/11.2.0/log/sdb1db01/cssd/ocssd.log

/os/db1/app/grid/diag/asm/+asm/+ASM1/trace
/os/db1/grid/11.2.0/log/sdb1db01/agent/ohasd/orarootagent_root/orarootagent_root.log


/os/db1/grid/11.2.0/bin/oifcfg getif

/os/db1/grid/11.2.0/bin/crsctl stop crs -f

/os/db1/grid/11.2.0/bin/crsctl start crs


/os/db1/grid/11.2.0/bin/crsctl stat res -t -init


/os/db1/app/grid/diag/asm/+asm/+ASM2/trace


alter system set cluster_interconnects='190.0.2.31' sid='+ASM1' scope=spfile;
alter system set cluster_interconnects='190.0.2.32' sid='+ASM2' scope=spfile;

/os/db1/grid/11.2.0/log/sdb1db02/

vi /os/db1/grid/11.2.0/log/sdb1db02/agent/ohasd/oraagent_grid/oraagent_grid.log

/os/db1/app/grid/diag/asm/+asm/+ASM2/trace

kfod disks=all
kfod disks=all status=ture asm_diskstrings='/dev/*_disk*'


/os/db1/grid/11.2.0/log/sdb1db02/agent/ohasd/oraagent_grid//oraagent_grid.log


kfed read /dev/ocr_disk1
kfed read '/dev/vote_disk1'

原文地址:https://www.cnblogs.com/feiyun8616/p/10602632.html