服务器之ECC报错检查

需要使用ipmitool工具

[root@Resource ~]# yum install ipmitool

首先查看是否有ecc报错

如下图:

[root@Resource ~]# ipmitool sel list
   1 | 11/26/2016 | 05:21:07 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted
   2 | 11/26/2016 | 05:30:54 | OS Boot | C: boot completed | Asserted
   3 | 11/26/2016 | 05:30:54 | OEM record dc | 000137 | 00001e395800
   4 | 02/14/2017 | 16:58:06 | Physical Security #0x73 | General Chassis intrusion () | Asserted
   5 | 02/14/2017 | 16:58:11 | Physical Security #0x73 | General Chassis intrusion () | Deasserted
   6 | 02/14/2017 | 16:58:15 | Power Supply #0x74 | Redundancy Lost | Asserted
   7 | 02/14/2017 | 17:24:43 | Physical Security #0x73 | General Chassis intrusion () | Asserted
   8 | 02/14/2017 | 17:29:56 | Physical Security #0x73 | General Chassis intrusion () | Asserted
   9 | 02/14/2017 | 17:40:14 | Physical Security #0x73 | General Chassis intrusion () | Asserted
   a | 02/14/2017 | 17:40:40 | Unknown #0x2e |  | Asserted
   b | 02/14/2017 | 17:40:40 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMB3) | Asserted
   c | 02/14/2017 | 17:40:40 | Unknown #0x2e |  | Asserted
   d | 02/14/2017 | 17:40:40 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMB3) | Asserted
   e | 02/14/2017 | 17:42:26 | Physical Security #0x73 | General Chassis intrusion () | Asserted
   f | 02/14/2017 | 17:42:56 | Unknown #0x2e |  | Asserted
  10 | 02/14/2017 | 17:42:56 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMB3) | Asserted
  11 | 02/14/2017 | 17:42:56 | Unknown #0x2e |  | Asserted
  12 | 02/14/2017 | 17:42:56 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMB3) | Asserted
  13 | 02/14/2017 | 17:44:24 | Physical Security #0x73 | General Chassis intrusion () | Asserted
  14 | 02/14/2017 | 17:44:49 | Unknown #0x2e |  | Asserted
  15 | 02/14/2017 | 17:44:49 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMB3) | Asserted
  16 | 02/14/2017 | 17:44:49 | Unknown #0x2e |  | Asserted
  17 | 02/14/2017 | 17:44:49 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMB3) | Asserted
  18 | 02/14/2017 | 17:48:39 | Physical Security #0x73 | General Chassis intrusion () | Asserted
  19 | 02/15/2017 | 11:37:24 | Physical Security #0x73 | General Chassis intrusion () | Asserted
  1a | 02/15/2017 | 11:37:29 | Physical Security #0x73 | General Chassis intrusion () | Deasserted
  1b | 02/16/2017 | 01:41:13 | Drive Slot #0xa1 | Drive Present () | Deasserted
  1c | 02/16/2017 | 01:41:14 | Drive Slot #0xa3 | Drive Present () | Deasserted
  1d | 02/16/2017 | 01:41:15 | Drive Slot #0xa2 | Drive Present () | Deasserted
  1e | 02/16/2017 | 04:23:43 | Drive Slot #0xa1 | Drive Present () | Asserted
  1f | 02/16/2017 | 04:23:43 | Drive Slot #0xa3 | Drive Present () | Asserted
  20 | 02/16/2017 | 04:23:45 | Drive Slot #0xa0 | Drive Present () | Deasserted
  21 | 02/16/2017 | 04:23:45 | Drive Slot #0xa2 | Drive Present () | Asserted
  22 | 02/16/2017 | 04:25:49 | Drive Slot #0xa0 | Drive Present () | Asserted
  23 | 07/10/2017 | 07:27:14 | Temperature #0x04 | Upper Non-critical going high | Asserted
  24 | 07/10/2017 | 10:00:12 | Temperature #0x04 | Upper Non-critical going high | Deasserted
  25 | 07/10/2017 | 10:01:37 | Temperature #0x04 | Upper Non-critical going high | Asserted
  26 | 07/10/2017 | 10:26:07 | Temperature #0x04 | Upper Non-critical going high | Deasserted
  27 | 11/09/2017 | 06:09:42 | Physical Security #0x73 | General Chassis intrusion () | Asserted
  28 | 11/09/2017 | 06:12:32 | Physical Security #0x73 | General Chassis intrusion () | Deasserted

报错查看信息为txt文件,如下

[root@Resource ~]# ipmitool sel save SN12345.txt
1 | 11/26/2016 | 05:21:07 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted
2 | 11/26/2016 | 05:30:54 | OS Boot | C: boot completed | Asserted
3 | 11/26/2016 | 05:30:54 | OEM record dc | 000137 | 00001e395800
4 | 02/14/2017 | 16:58:06 | Physical Security #0x73 | General Chassis intrusion () | Asserted
5 | 02/14/2017 | 16:58:11 | Physical Security #0x73 | General Chassis intrusion () | Deasserted
6 | 02/14/2017 | 16:58:15 | Power Supply #0x74 | Redundancy Lost | Asserted
7 | 02/14/2017 | 17:24:43 | Physical Security #0x73 | General Chassis intrusion () | Asserted
8 | 02/14/2017 | 17:29:56 | Physical Security #0x73 | General Chassis intrusion () | Asserted
9 | 02/14/2017 | 17:40:14 | Physical Security #0x73 | General Chassis intrusion () | Asserted
a | 02/14/2017 | 17:40:40 | Unknown #0x2e | | Asserted
b | 02/14/2017 | 17:40:40 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB3) | Asserted
c | 02/14/2017 | 17:40:40 | Unknown #0x2e | | Asserted
d | 02/14/2017 | 17:40:40 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB3) | Asserted
e | 02/14/2017 | 17:42:26 | Physical Security #0x73 | General Chassis intrusion () | Asserted
f | 02/14/2017 | 17:42:56 | Unknown #0x2e | | Asserted
10 | 02/14/2017 | 17:42:56 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB3) | Asserted
11 | 02/14/2017 | 17:42:56 | Unknown #0x2e | | Asserted
12 | 02/14/2017 | 17:42:56 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB3) | Asserted
13 | 02/14/2017 | 17:44:24 | Physical Security #0x73 | General Chassis intrusion () | Asserted
14 | 02/14/2017 | 17:44:49 | Unknown #0x2e | | Asserted
15 | 02/14/2017 | 17:44:49 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB3) | Asserted
16 | 02/14/2017 | 17:44:49 | Unknown #0x2e | | Asserted
17 | 02/14/2017 | 17:44:49 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC | DIMMB3) | Asserted
18 | 02/14/2017 | 17:48:39 | Physical Security #0x73 | General Chassis intrusion () | Asserted
19 | 02/15/2017 | 11:37:24 | Physical Security #0x73 | General Chassis intrusion () | Asserted
1a | 02/15/2017 | 11:37:29 | Physical Security #0x73 | General Chassis intrusion () | Deasserted
1b | 02/16/2017 | 01:41:13 | Drive Slot #0xa1 | Drive Present () | Deasserted
1c | 02/16/2017 | 01:41:14 | Drive Slot #0xa3 | Drive Present () | Deasserted
1d | 02/16/2017 | 01:41:15 | Drive Slot #0xa2 | Drive Present () | Deasserted
1e | 02/16/2017 | 04:23:43 | Drive Slot #0xa1 | Drive Present () | Asserted
1f | 02/16/2017 | 04:23:43 | Drive Slot #0xa3 | Drive Present () | Asserted
20 | 02/16/2017 | 04:23:45 | Drive Slot #0xa0 | Drive Present () | Deasserted
21 | 02/16/2017 | 04:23:45 | Drive Slot #0xa2 | Drive Present () | Asserted
22 | 02/16/2017 | 04:25:49 | Drive Slot #0xa0 | Drive Present () | Asserted
23 | 07/10/2017 | 07:27:14 | Temperature #0x04 | Upper Non-critical going high | Asserted
24 | 07/10/2017 | 10:00:12 | Temperature #0x04 | Upper Non-critical going high | Deasserted
25 | 07/10/2017 | 10:01:37 | Temperature #0x04 | Upper Non-critical going high | Asserted
26 | 07/10/2017 | 10:26:07 | Temperature #0x04 | Upper Non-critical going high | Deasserted
27 | 11/09/2017 | 06:09:42 | Physical Security #0x73 | General Chassis intrusion () | Asserted
28 | 11/09/2017 | 06:12:32 | Physical Security #0x73 | General Chassis intrusion () | Deasserted
[root@Resource ~]# ll
total 64-rw-r--r--. 1 root root 3220 Nov 17 14:32 SN12345.txt

查看保存的文件信息

[root@Resource ~]# cat SN12345.txt 
0x04 0x10 0x72 0x6f 0x02 0xff 0xff # Event Logging Disabled #0x72 Log area reset/cleared
0x04 0x1f 0x00 0x6f 0x01 0xff 0xff # OS Boot #0x00 C: boot completed
0x37 0x00 0x00 0x1e 0x39 0x58 0x00 # Reserved #0x00 Unknown
0x04 0x05 0x73 0x6f 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0x05 0x73 0xef 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0x08 0x74 0x0b 0x01 0xff 0xff # Power Supply #0x74 Redundancy Lost
0x04 0x05 0x73 0x6f 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0x05 0x73 0x6f 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0x05 0x73 0x6f 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0xc1 0x2e 0x72 0xa2 0x04 0x00 # Unknown #0x2e Unknown
0x04 0x0c 0x02 0x6f 0xa1 0xc1 0x40 # Memory #0x02 Uncorrectable ECC (UnCorrectable ECC | DIMMB3)
0x04 0xc1 0x2e 0x72 0xa2 0x04 0x00 # Unknown #0x2e Unknown
0x04 0x0c 0x02 0x6f 0xa1 0xc1 0x40 # Memory #0x02 Uncorrectable ECC (UnCorrectable ECC | DIMMB3)
0x04 0x05 0x73 0x6f 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0xc1 0x2e 0x72 0xa2 0x04 0x00 # Unknown #0x2e Unknown
0x04 0x0c 0x02 0x6f 0xa1 0xc1 0x40 # Memory #0x02 Uncorrectable ECC (UnCorrectable ECC | DIMMB3)
0x04 0xc1 0x2e 0x72 0xa2 0x04 0x00 # Unknown #0x2e Unknown
0x04 0x0c 0x02 0x6f 0xa1 0xc1 0x40 # Memory #0x02 Uncorrectable ECC (UnCorrectable ECC | DIMMB3)
0x04 0x05 0x73 0x6f 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0xc1 0x2e 0x72 0xa2 0x04 0x00 # Unknown #0x2e Unknown
0x04 0x0c 0x02 0x6f 0xa1 0xc1 0x40 # Memory #0x02 Uncorrectable ECC (UnCorrectable ECC | DIMMB3)
0x04 0xc1 0x2e 0x72 0xa2 0x04 0x00 # Unknown #0x2e Unknown
0x04 0x0c 0x02 0x6f 0xa1 0xc1 0x40 # Memory #0x02 Uncorrectable ECC (UnCorrectable ECC | DIMMB3)
0x04 0x05 0x73 0x6f 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0x05 0x73 0x6f 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0x05 0x73 0xef 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0x0d 0xa1 0xef 0xa0 0x01 0x01 # Drive Slot #0xa1 Drive Present ()
0x04 0x0d 0xa3 0xef 0xa0 0x01 0x03 # Drive Slot #0xa3 Drive Present ()
0x04 0x0d 0xa2 0xef 0xa0 0x01 0x02 # Drive Slot #0xa2 Drive Present ()
0x04 0x0d 0xa1 0x6f 0xa0 0x01 0x01 # Drive Slot #0xa1 Drive Present ()
0x04 0x0d 0xa3 0x6f 0xa0 0x01 0x03 # Drive Slot #0xa3 Drive Present ()
0x04 0x0d 0xa0 0xef 0xa0 0x01 0x00 # Drive Slot #0xa0 Drive Present ()
0x04 0x0d 0xa2 0x6f 0xa0 0x01 0x02 # Drive Slot #0xa2 Drive Present ()
0x04 0x0d 0xa0 0x6f 0xa0 0x01 0x00 # Drive Slot #0xa0 Drive Present ()
0x04 0x01 0x04 0x01 0x57 0xaa 0xaa # Temperature #0x04 Upper Non-critical going high
0x04 0x01 0x04 0x81 0x57 0xa7 0xaa # Temperature #0x04 Upper Non-critical going high
0x04 0x01 0x04 0x01 0x57 0xaa 0xaa # Temperature #0x04 Upper Non-critical going high
0x04 0x01 0x04 0x81 0x57 0xa7 0xaa # Temperature #0x04 Upper Non-critical going high
0x04 0x05 0x73 0x6f 0x80 0x02 0xff # Physical Security #0x73 General Chassis intrusion ()
0x04 0x05 0x73 0xef 0x80 0x01 0xff # Physical Security #0x73 General Chassis intrusion ()

根据红色带有ECC的报错信息,前面的代码

0x04 0x0c 0x02 0x6f 0xa1 0xc1 0x40 # Memory #0x02 Uncorrectable ECC (UnCorrectable ECC | DIMMB3)

根据这些16进制代码判断定位内存的位置,进行更换内存

这个位置,以服务器厂商给的技术文档所要求得为准,因为每个机型的位置都不一样~

原文地址:https://www.cnblogs.com/syavingcs/p/7851526.html