ORA-00603 ORA-27504 ORA-27300 ORA-27301 ORA-27302 参考学习

ORA-00603 ORA-27504 ORA-27300 ORA-27301 ORA-27302

--节点1 告警日志

Thu Jun 20 03:02:18 2019
Thread 1 advanced to log sequence 2092 (LGWR switch)
Current log# 11 seq# 2092 mem# 0: +DATA/test/onlinelog/redo11_1.log
Current log# 11 seq# 2092 mem# 1: +DATA/test/onlinelog/redo11_2.log
Thu Jun 20 03:02:18 2019
LNS: Standby redo logfile selected for thread 1 sequence 2092 for destination LOG_ARCHIVE_DEST_2
Thu Jun 20 03:02:20 2019
Archived Log entry 5703 added for thread 1 sequence 2091 ID 0x9b8db1ed dest 1:
Thu Jun 20 03:03:53 2019
skgxpvfynet: mtype: 61 process 316751 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /u01/app/oracle/diag/rdbms/test/test1/trace/test1_ora_316751.trc (incident=560025):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_560025/test1_ora_316751_i560025.trc
Thu Jun 20 03:03:54 2019
Dumping diagnostic data in directory=[cdmp_20190620030354], requested by (instance=1, osid=316751), summary=[incident=560025].
opiodr aborting process unknown ospid (316751) as a result of ORA-603
Thu Jun 20 03:04:03 2019
Sweep [inc][560025]: completed
Sweep [inc2][560025]: completed
Thu Jun 20 03:06:15 2019
Thread 1 advanced to log sequence 2093 (LGWR switch)
Current log# 5 seq# 2093 mem# 0: +DATA/test/onlinelog/redo05_1.log
Current log# 5 seq# 2093 mem# 1: +DATA/test/onlinelog/redo05_2.log
Thu Jun 20 03:06:16 2019

---trc 文件--trc文件，初步看应该是服务器网卡mtu值相关问题
*** 2019-06-20 03:03:53.395
*** CLIENT ID:() 2019-06-20 03:03:53.395
*** SERVICE NAME:() 2019-06-20 03:03:53.395
*** MODULE NAME:() 2019-06-20 03:03:53.395
*** ACTION NAME:() 2019-06-20 03:03:53.395

SKGXP:[7f5ab0ef57c0.0]{0}: SKGXPVFYNET: Socket self-test could not verify successful transmission of 32768 bytes (mtype 61).
SKGXP:[7f5ab0ef57c0.1]{0}: The network is required to support UDP protocol sends of this size. Socket is bound to 169.254.16.188.
SKGXP:[7f5ab0ef57c0.2]{0}: phase 'send', 0 tries, 100 loops, 32354 ms (last)
struct ksxpp * ksxppg_ [0xc11fde0, 0x7f5aadaf5588) = 0x7f5aadaf5580
Dump of memory from 0x00007F5AADAF5580 to 0x00007F5AADAF6AB0

---数据库状态
select INST_ID,INSTANCE_NAME,STARTUP_TIME from gv$instance;

--添加到CRT
set lines 1000 alter session set nls_date_format='yyyymmdd hh24:mi:ss'; set time on select INST_ID,INSTANCE_NAME,STARTUP_TIME from gv$instance;

09:42:29 SYS@test1(test1)> set lines 1000
09:43:30 SYS@test1(test1)> alter session set nls_date_format='yyyymmdd hh24:mi:ss';

Session altered.

09:43:31 SYS@test1(test1)> set time on
09:43:31 SYS@test1(test1)> select INST_ID,INSTANCE_NAME,STARTUP_TIME from gv$instance;

INST_ID INSTANCE_NAME STARTUP_TIME
---------- ---------------- -----------------
1 test1 20190509 15:30:08
2 test2 20190507 21:37:04

---节点2 日志
Thu Jun 20 03:03:54 2019
Dumping diagnostic data in directory=[cdmp_20190620030354], requested by (instance=1, osid=316751), summary=[incident=560025].
Thu Jun 20 03:10:10 2019
Thread 2 advanced to log sequence 912 (LGWR switch)
Current log# 6 seq# 912 mem# 0: +DATA/test/onlinelog/redo06_1.log
Current log# 6 seq# 912 mem# 1: +DATA/test/onlinelog/redo06_2.log

---参考官方文档

2041723.1

CAUSE
This happens due to less space available for network buffer reservation.

SOLUTION
1. On servers with High Physical Memory, the parameter vm.min_free_kbytes should be set in the order of 0.4% of total Physical Memory. This helps in keeping a larger range of defragmented memory pages available for network buffers reducing the probability of a low-buffer-space conditions.

*** For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1073742 ***

On NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value is to be split across all the nodes.

On NUMA Enabled Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here 'n' is the number of NUMA nodes.

2. Additionally, the MTU value should be modified as below

#ifconfig lo mtu 16436

To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo :

MTU=16436
Save the file and restart the network service to load the changes

#service network restart

Note : While making the changes in CRS nodes, if network is restarted while CRS is up, it can hung CRS. So cluster services should be stopped prior to the network restart.

---野鸡博客
http://ju.outofmemory.cn/entry/76102

---实际操作，两个节点执行

[root@test1 bin]# ifconfig lo mtu 16436
[root@test1 bin]#

[root@test1 bin]# vi /etc/sysconfig/network-scripts/ifcfg-lo
DEVICE=lo
IPADDR=127.0.0.1
NETMASK=255.0.0.0
NETWORK=127.0.0.0
# If you're having problems with gated making 127.0.0.0/8 a martian,
# you can change this to something else (255.255.255.255, for example)
BROADCAST=127.255.255.255
ONBOOT=yes
NAME=loopback
MTU=16436

---重启网络服务
# systemctl stop network
# systemctl start network

二、设定 vm.min_free_kbytes 参数为物理内存的0.4%

For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1073742 ***

---此次修改主机内存 512G ，所以这个值是 1073742*2=2147484

vm.min_free_kbytes=2147484

[oracle@test1 ~]$ cat /proc/sys/vm/min_free_kbytes
65536
[oracle@test1 ~]$

调整MIN_FREE_KBYTES的目的是保持物理内存有足够的空闲空间，防止突发性的换页。

vi /etc/sysctl.conf
vm.min_free_kbytes=2147484

--使生效
sysctl -p