ORA-00445: background process "J000" did not start after 120 seconds

客户反馈数据库宕机：

查看alert日志：

1 Mon Dec 30 08:56:01 2019
2 WARNING: inbound connection timed out (ORA-3136)
3 Mon Dec 30 08:56:04 2019
4 Errors in file /u01/app/oracle/diag/rdbms/ecology/ecology/trace/ecology_cjq0_25270.trc (incident=300282):
5 ORA-00445: background process "J001" did not start after 30 seconds
6 Incident details in: /u01/app/oracle/diag/rdbms/ecology/ecology/incident/incdir_300282/ecology_cjq0_25270_i300282.trc
7 Mon Dec 30 08:56:05 2019

查看trc文件

/u01/app/oracle/diag/rdbms/ecology/ecology/trace/ecology_cjq0_25270.trc
 
 
*** 2019-12-31 08:49:26.444
Process diagnostic dump for J000, OS id=23742
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 08:49:21 ]
NOTE: scheduling delay has not been sampled for 5.062184 secs 0.000000 secs from [ 08:49:21 - 08:49:26 ], 5 sec avg
0.000000 secs from [ 08:49:21 - 08:49:26 ], 1 min avg
 
*** 2019-12-31 08:49:28.330
0.000000 secs from [ 08:45:08 - 08:49:28 ], 5 min avg
 
*** 2019-12-31 08:49:43.789
loadavg : 153.96 132.74 76.11
Memory (Avail / Total) = 289.81M / 64411.24M
Swap (Avail / Total) = 35820.70M / 64767.98M
skgpgcmdout: read() for cmd /bin/ps -elf | /bin/egrep 'PID | 23742' | /bin/grep -v grep timed out after 13.740 seconds
 
*** 2019-12-31 08:49:56.451
Stack:
skgpgcmdout: read() for cmd /usr/bin/gdb --batch -quiet -x /tmp/stackTcHuSK /proc/23742/exe 23742 < /dev/null 2>&1 timed out after 12.660 seconds
 
-------------------------------------------------------------------------------
Process diagnostic dump actual duration=30.000000 sec
(max dump time=30.000000 sec)
 
*** 2019-12-31 08:49:56.451
Waited for process J000 to initialize for 120 seconds
 
*** 2019-12-31 08:49:56.451
Process diagnostic dump for J000, OS id=23742
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 08:49:21 ]
NOTE: scheduling delay has not been sampled for 35.069379 secs 0.000000 secs from [ 08:49:21 - 08:49:56 ], 5 sec avg
0.000000 secs from [ 08:49:21 - 08:49:56 ], 1 min avg
0.000000 secs from [ 08:45:08 - 08:49:56 ], 5 min avg
 
*** 2019-12-31 08:50:12.312
loadavg : 154.88 134.93 78.63
Memory (Avail / Total) = 288.15M / 64411.24M
Swap (Avail / Total) = 35665.90M / 64767.98M
skgpgcmdout: read() for cmd /bin/ps -elf | /bin/egrep 'PID | 23742' | /bin/grep -v grep timed out after 15.000 seconds
 
*** 2019-12-31 08:50:26.454
Stack:
skgpgcmdout: read() for cmd /usr/bin/gdb --batch -quiet -x /tmp/stackd1W3Ol /proc/23742/exe 23742 < /dev/null 2>&1 timed out after 14.140 seconds
 
-------------------------------------------------------------------------------
Process diagnostic dump actual duration=30.000000 sec
(max dump time=30.000000 sec)
 
*** 2019-12-31 08:50:26.454
 
*** 2019-12-31 08:52:17.853
Killing process (ospid 23742): (reason=KSOREQ_WAIT_CANCELLED error=0)
... and the process is still alive after kill!
 
*** 2019-12-31 08:53:07.555
Incident 713 created, dump file: /u01/app/database/diag/rdbms/feilioa/feilioa_1/incident/incdir_713/feilioa_1_cjq0_1370_i713.trc
ORA-00445: background process "J000" did not start after 120 seconds

【ID 1379200.1】中对这个错误的描述：

What does this message mean ?

The message indicates that we failed to spawn a new process at the Operating System level to serve the request. There are various causes for this issue.

This typically occurs when there is a shortage or misconfiguration in Operating System Resources, and thereby the problem should be investigated from an OS perspective. However there are a few causes related to the Oracle Database as well.