pcie ssd相关问题处理

一、前言

1、背景

基于高性能计算场景,我们常常需要用到一些高性能的SSD作为缓存加速,譬如pcie ssd等,本文主要记录在使用pcie ssd作为ceph osd时遇到的一些问题及对应处理方法

2、硬件说明

2.1、Shannon Direct-IO G3i 1600GB

[root@node113 redhat7]# shannon-status -a
Found Shannon PCIE Flash card /dev/scta:

Basic Information:
Control Device Node:        /dev/scta
Driver Mode:                Block
Block Device Node:            /dev/dfa
Device State:                Attached
Access Mode:                ReadWrite
Product Model:                Direct-IO G3i 1600GB
Serial Number:                SH17705K7320343
Part Number:                MT29F512G08CMCCB
UDID:                               1CB00275-1CB00032-AB17705E-73203430
PCI VendorID:                1CB0
PCI DeviceID:                0275
PCI Bus Address:            04:00:0
PCI Link Speed:                pcie 2.0 x 8 
Firmware Version:            3.3
Firmware Build:                3688e1ff
Driver Version:                3.2.2.10
FPGA Reconfig Support:              Yes
Logical Sector:                512
Physical Sector:            4096
Disk Capacity:                1600.00 GB
Physical Capacity:            2115.52 GB
Overprovision:                24.37%
Max Write Band                0 MB/s
Atomic Write:                       Disabled
Prioritize Write:                   Disabled

2.2、环境说明

[root@node113 ~]# cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core) 
[root@node113 ~]# uname -a
Linux node113 4.14.113 #1 SMP Thu Jul 30 14:55:45 CST 2020 x86_64 x86_64 x86_64 GNU/Linux

3、问题处理

3.1、Shannon Direct-IO G3i 1600GB

3.1.1、系统无法识别到pcie ssd

  • 问题说明:
    服务器安装宝存pcie ssd之后,系统层无法识别硬盘
  • 原因分析:
    使用宝存shannon pcie ssd必须安装驱动程序shannon-module,因Linux驱动程序shannon-module与Linux内核版本关联性很大,官方readhat7系统只提供了3.10.x内核版本的驱动rpm包,而测试环境为4.14.113内核版本,故需要重新编译驱动程序RPM包
  • 处理措施:

1、从Shannon_Linux_Driver_Package_3.2.2.10下载Linux驱动程序
2、参照用户手册,执行以下步骤进行RPM编译安装及内核模块加载

[root@node119 ~]# tar -zxvf Shannon_Linux_Driver_Package_3.2.2.10.tar.gz 
[root@node119 ~]# cd Shannon_Linux_Driver_Package_3.2.2.10
[root@node119 Shannon_Linux_Driver_Package_3.2.2.10]# cd redhat7/
[root@node119 redhat7]# rpmbuild --rebuild shannon-module-3.2.2-10.src.rpm 
Wrote: /root/rpmbuild/RPMS/x86_64/shannon-module-4.14.113-3.2.2-10.x86_64.rpm
[root@node112 redhat7]# cd /root/rpmbuild/RPMS/x86_64/
[root@node112 x86_64]# rpm -ivh shannon-module-4.14.113-3.2.2-10.x86_64.rpm 
[root@node112 x86_64]# modprobe shannon

3.1.2、无法使用pcie ssd创建lvm,导致添加osd失败

  • 问题说明:
    使用pcie ssd作为osd使用时,创建lvm失败,导致添加osd失败
  • 原因分析:
    手动对pcie ssd创建pv,创建失败,报错信息如下,找不到dfa设备
[root@node111 ~]# pvcreate /dev/dfa
Device /dev/dfa not found (or ignored by filtering
  • 处理措施:

1、查看shannon块设备的主设备号

[root@node111 ~]# cat /proc/devices | grep shannon
252 shannon

2、修改lvm.conf,添加pcie ssd信息到types字段内

[root@node111 ~]# cat /etc/lvm/lvm.conf | grep types
    # Configuration option devices/types.
    # List of additional acceptable block device types.
    types = [ "shannon", 252 ]

3.1.3、系统重启后,使用pcie ssd的osd无法自动启动

  • 问题说明:*
    使用pcie ssd作为osd加入到ceph集群使用,节点重启后,普通磁盘的osd可以正常启动,使用pcie ssd的osd无法启动成功
  • 原因分析:

1、当节点重启后,使用systemctl restart ceph-volume@lvm-{osd-id}-{osd-fsid}命令可正常启动osd,初步怀疑可能是osd启动和shannon模块加载顺序问题(osd启动时,shannon驱动未加载,导致找不到磁盘无法启动osd)

[root@node111 ~]# ceph-volume lvm list
====== osd.41 ======
  [block]    /dev/ceph-5a07b4d3-cc9a-4d4c-a29b-877c3b5d875e/osd-block-ad380cf6-774f-4f36-8328-f5f388b9740f



      type                      block
      osd id                    41
      cluster fsid              469729e5-af75-4c18-a58e-28ebe3690e4c
      cluster name              ceph
      osd fsid                  ad380cf6-774f-4f36-8328-f5f388b9740f
      encrypted                 0
      cephx lockbox secret      
      block uuid                Mzoqvr-StiC-8FeI-qUrl-KRBB-onKM-tf7x9l
      block device              /dev/ceph-5a07b4d3-cc9a-4d4c-a29b-877c3b5d875e/osd-block-ad380cf6-774f-4f36-8328-f5f388b9740f
      vdo                       0
      crush device class        None
      devices                   /dev/dfa13
[root@node111 ~]# systemctl restart ceph-volume@lvm-41-ad380cf6-774f-4f36-8328-f5f388b9740f

2、查看系统日志/var/log/messages系统启动打印,osd启动优先于shannon驱动程序加载,证实推测1无误

Oct 13 09:35:28 node111 systemd: Started LSB: Starts and stops the generic storage target daemon.
Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.18.
Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.1.
Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.36.
Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.24.
Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.12.
Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.15.
Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.6.


Oct 13 09:35:44 node111 kernel: <3>shn_info: scta: readwrite, readonly_reason= 0, reduced_write_reason= 0.
Oct 13 09:35:44 node111 kernel: shn_dbg: shannon_init_gendisk(): disk_name=dfa, major=252, minors=64, first_minor=0.
Oct 13 09:35:44 node111 kernel: shn_dbg: shannon_init_gendisk(): disk_name=dfa, major=252, minors=64, first_minor=0.
Oct 13 09:35:44 node111 kernel: shn_dbg: shannon_init_gendisk(): disk_name=dfa, major=252, minors=64, first_minor=0.
Oct 13 09:35:44 node111 kernel: dfa: dfa1 dfa2 dfa3 dfa4 dfa5 dfa6 dfa7 dfa8 dfa9 dfa10 dfa11 dfa12 dfa13
Oct 13 09:35:44 node111 kernel: dfa: dfa1 dfa2 dfa3 dfa4 dfa5 dfa6 dfa7 dfa8 dfa9 dfa10 dfa11 dfa12 dfa13
Oct 13 09:35:44 node111 kernel: <3>shn_info: Attached Direct-IO PCIe Flash /dev/scta as block device /dev/dfa:
Oct 13 09:35:44 node111 kernel: <3>shn_info: Attached Direct-IO PCIe Flash /dev/scta as block device /dev/dfa:
Oct 13 09:35:44 node111 kernel: <3>shn_info: sector size: logical 512 / physical 4096, capacity: 1600 GB, overprovision: 24.37%.
Oct 13 09:35:44 node111 kernel: <3>shn_info: sector size: logical 512 / physical 4096, capacity: 1600 GB, overprovision: 24.37%.
Oct 13 09:35:44 node111 kernel: <3>shn_info: scta: readwrite, readonly_reason= 0, reduced_write_reason= 0.
Oct 13 09:35:44 node111 kernel: <3>shn_info: scta: readwrite, readonly_reason= 0, reduced_write_reason= 0.
Oct 13 09:35:44 node111 kernel: <3>shn_info: Probed Direct-IO PCIe Flash /dev/scta: model: Direct-IO G3i 1600GB, sn: SH17705K7320327
Oct 13 09:35:44 node111 kernel: <3>shn_info: Probed Direct-IO PCIe Flash /dev/scta: model: Direct-IO G3i 1600GB, sn: SH17705K7320327
  • 处理措施:
    1、手动添加脚本如下,添加到内核模块加载程序内,重启时,将优先加载shannon模块,而后再启动osd服务
[root@node111 ~]# cat /etc/sysconfig/modules/shannon.modules 
#!/bin/bash
/sbin/modinfo -F filename shannon > /dev/null 2>&1

if [ $? -eq 0 ]; then
    /sbin/modprobe shannon
fi
[root@node111 ~]# ll /etc/sysconfig/modules/
total 4
-rwxr-xr-x 1 root root 116 Oct 13 10:54 shannon.modules
原文地址:https://www.cnblogs.com/luxf0/p/13900470.html