Kubernetes 安装Rook ceph 1.5

一、rook简介

Rook是一个自管理的分布式存储编排系统，可以为Kubernetes提供便利的存储解决方案。Rook本身并不提供存储，而是在kubernetes和存储系统之间提供适配层，简化存储系统的部署与维护工作。目前，rook支持的存储系统包括：Ceph、CockroachDB、Cassandra、EdgeFS、Minio、NFS，其中Ceph为Stable状态，其余均为Alpha。本文仅介绍Ceph相关内容。

Rook由Operator和Cluster两部分组成：

Operator：由一些CRD和一个All in one镜像构成，包含包含启动和监控存储系统的所有功能。
Cluster：负责创建CRD对象，指定相关参数，包括ceph镜像、元数据持久化位置、磁盘位置、dashboard等等…

下图是Rook的体系结构图，Operator启动之后，首先创建Agent和Discover容器，负责监视和管理各个节点上存储资源。然后创建Cluster，Cluster是创建Operator时定义的CRD。Operator根据Cluster的配置信息启动Ceph的相关容器。存储集群启动之后，使用kubernetes元语创建PVC为应用容器所用。

1、系统要求

本次安装环境

kubernetes 1.16
centos7.7
kernel 5.4.65-200.el7.x86_64
flannel v0.12.0

须要安装lvm包

 yum install -y lvm2

2、内核要求

RBD

通常发行版的内核都编译有，但你最好肯定下：

foxchan@~$ lsmod|grep rbd
rbd                   114688  0 
libceph               368640  1 rbd

能够用如下命令放到开机启动项里

cat > /etc/sysconfig/modules/rbd.modules << EOF
modprobe rbd
EOF

CephFS

若是你想使用cephfs,内核最低要求是 4.17。

二、rook部署

2.1、环境说明

[root@k8s-master ceph]# kubectl get node
NAME         STATUS     ROLES    AGE   VERSION
k8s-master   NotReady   master   92m   v1.16.2
k8s-node1    Ready      <none>   92m   v1.16.2
k8s-node2    Ready      <none>   90m   v1.16.2
[root@k8s-node1 ~]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   20G  0 disk
├─sda1            8:1    0  200M  0 part /boot
└─sda2            8:2    0 19.8G  0 part
  ├─centos-root 253:0    0 15.8G  0 lvm  /
  └─centos-swap 253:1    0    4G  0 lvm
sdb               8:16   0   20G  0 disk
sr0              11:0    1 10.3G  0 rom

每个节点 sdb用来做ceph的数据盘

2.2、拉取项目

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.git

2.3、部署Rook Operator

获取镜像

可能由于国内环境无法Pull镜像，建议提前pull如下镜像

docker pull ceph/ceph:v15.2.5
docker pull rook/ceph:v1.5.1
docker pull registry.aliyuncs.com/it00021hot/cephcsi:v3.1.2
docker pull registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.0.1
docker pull registry.aliyuncs.com/it00021hot/csi-attacher:v3.0.0
docker pull registry.aliyuncs.com/it00021hot/csi-provisioner:v2.0.0
docker pull registry.aliyuncs.com/it00021hot/csi-snapshotter:v3.0.0
docker pull registry.aliyuncs.com/it00021hot/csi-resizer:v1.0.0

docker tag registry.aliyuncs.com/it00021hot/csi-snapshotter:v3.0.0 k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.0
docker tag registry.aliyuncs.com/it00021hot/csi-resizer:v1.0.0 k8s.gcr.io/sig-storage/csi-resizer:v1.0.0
docker tag registry.aliyuncs.com/it00021hot/cephcsi:v3.1.2 quay.io/cephcsi/cephcsi:v3.1.2
docker tag registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.0.1 k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.1
docker tag registry.aliyuncs.com/it00021hot/csi-attacher:v3.0.0 k8s.gcr.io/sig-storage/csi-attacher:v3.0.0
docker tag registry.aliyuncs.com/it00021hot/csi-provisioner:v2.0.0 k8s.gcr.io/sig-storage/csi-provisioner:v2.0.0


####可以将其更改tag并推送到私有仓库；  另外或者tag为yaml文件中的名字，  建议修改为本地私有仓库

修改 operator.yaml 的镜像名,更改为私有仓库

  ROOK_CSI_CEPH_IMAGE: "10.2.55.8:5000/kubernetes/cephcsi:v3.1.2"
  ROOK_CSI_REGISTRAR_IMAGE: "10.2.55.8:5000/kubernetes/csi-node-driver-registrar:v2.0.1"
  ROOK_CSI_RESIZER_IMAGE: "10.2.55.8:5000/kubernetes/csi-resizer:v1.0.0"
  ROOK_CSI_PROVISIONER_IMAGE: "10.2.55.8:5000/kubernetes/csi-provisioner:v2.0.0"
  ROOK_CSI_SNAPSHOTTER_IMAGE: "10.2.55.8:5000/kubernetes/csi-snapshotter:v3.0.0"
  ROOK_CSI_ATTACHER_IMAGE: "10.2.55.8:5000/kubernetes/csi-attacher:v3.0.0"

  ROOK_CSI_KUBELET_DIR_PATH: "/data/k8s/kubelet"       ###如果之前有修改过kubelet 数据目录，这里需要修改

执行 operator.yaml

cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml 
kubectl create -f operator.yaml

2.4、配置cluster

cluster.yaml文件里的内容需要修改，一定要适配自己的硬件情况，请详细阅读配置文件里的注释，避免我踩过的坑。

此文件的配置，除了增删osd设备外，其他的修改都要重装ceph集群才能生效，所以请提前规划好集群。如果修改后不卸载ceph直接apply，会触发ceph集群重装，导致集群异常挂掉

修改内容如下：

vi cluster.yaml

修改内容如下，更多配置参考官网

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
# 命名空间的名字，同一个命名空间只支持一个集群
  name: rook-ceph
  namespace: rook-ceph
spec:
# ceph版本说明
# v13 is mimic, v14 is nautilus, and v15 is octopus.
  cephVersion:
#修改ceph镜像，加速部署时间
    image: image: 10.2.55.8:5000/kubernetes/ceph:v15.2.5
# 是否允许不支持的ceph版本
    allowUnsupported: false
#指定rook数据在节点的保存路径
  dataDirHostPath: /data/k8s/rook
# 升级时如果检查失败是否继续
  skipUpgradeChecks: false
# 从1.5开始，mon的数量必须是奇数
  mon:
    count: 3
# 是否允许在单个节点上部署多个mon pod
    allowMultiplePerNode: false
  mgr:
    modules:
    - name: pg_autoscaler
      enabled: true
# 开启dashboard，禁用ssl，指定端口是7000，你可以默认https配置。我是为了ingress配置省事。
  dashboard:
    enabled: true
    port: 7000
    ssl: false
# 开启prometheusRule
  monitoring:
    enabled: true
# 部署PrometheusRule的命名空间，默认此CR所在命名空间
    rulesNamespace: rook-ceph
# 开启网络为host模式，解决无法使用cephfs pvc的bug
  network:
    provider: host
# 开启crash collector，每个运行了Ceph守护进程的节点上创建crash collector pod
  crashCollector:
    disable: false
  placement:
    osd:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: ceph-osd
              operator: In
              values:
              - enabled
# 存储的设置，默认都是true，意思是会把集群所有node的设备清空初始化。
  storage: # cluster level storage configuration and selection
    useAllNodes: false     #关闭使用所有Node
    useAllDevices: false   #关闭使用所有设备
    nodes:
    - name: "k8s-node1"  #指定存储节点主机
      devices:
      - name: "sdb"    #指定磁盘为/dev/sdb
    - name: "k8s-node2"
      devices:
      - name: "sdb"

更多 cluster 的 CRD 配置参考：

https://github.com/rook/rook/blob/master/Documentation/ceph-cluster-crd.md

为osd节点增加label

[root@k8s-master ceph]# kubectl label nodes k8s-node1 ceph-osd=enabled
node/k8s-node1 labeled
[root@k8s-master ceph]# kubectl label nodes k8s-node2 ceph-osd=enabled
node/k8s-node2 labeled

执行安装

kubectl apply -f cluster.yaml

2.5、增删osd

2.5.1 添加相关label

kubectl label nodes k8s-master ceph-osd=enabled

2.5.2 修改cluster.yaml

  storage: # cluster level storage configuration and selection
    useAllNodes: false     #关闭使用所有Node
    useAllDevices: false   #关闭使用所有设备
    nodes:
    - name: "k8s-node1"  #指定存储节点主机
      devices:
      - name: "sdb"    #指定磁盘为/dev/sdb
    - name: "k8s-node2"
      devices:
      - name: "sdb"
    - name: "k8s-master"
      devices:
      - name: "sdb"

2.5.3 apply cluster.yaml

kubectl apply -f cluster.yaml

删除重新安装rook-ceph会报错，可以执行如下命令生成secret。

kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyring

2.6、安装toolbox

Rook工具箱是一个容器，其中包含用于rook调试和测试的常用工具。该工具箱基于CentOS，因此yum可以轻松安装您选择的更多工具。

kubectl apply -f toolbox.yaml

测试Rook

一旦 toolbox 的 Pod 运行成功后，我们就可以使用下面的命令进入到工具箱内部进行操作：

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

比如现在我们要查看集群的状态，需要满足下面的条件才认为是健康的：

查看rook状态：ceph status

所有 mons 应该达到法定数量
mgr 应该是激活状态
至少有一个 OSD 处于激活状态
如果不是 HEALTH_OK 状态，则应该查看告警或者错误信息

2.7、访问dashboard

[root@k8s-master ceph]# kubectl get svc -n rook-ceph
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
csi-cephfsplugin-metrics   ClusterIP   10.103.103.152   <none>        8080/TCP,8081/TCP   9m39s
csi-rbdplugin-metrics      ClusterIP   10.109.21.95     <none>        8080/TCP,8081/TCP   9m41s
rook-ceph-mgr              ClusterIP   10.103.36.44     <none>        9283/TCP            8m50s
rook-ceph-mgr-dashboard    NodePort    10.104.55.171    <none>        7000:30112/TCP      8m50s
rook-ceph-mon-a            ClusterIP   10.103.40.41     <none>        6789/TCP,3300/TCP   9m36s
rook-ceph-mon-b            ClusterIP   10.96.138.43     <none>        6789/TCP,3300/TCP   9m14s
rook-ceph-mon-c            ClusterIP   10.108.169.68    <none>        6789/TCP,3300/TCP   9m3s

### 获得dashboard的登录密码，用户为admin ###。密码通过如下方式获得：

[root@k8s-master ceph]# kubectl get secrets -n rook-ceph rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d
bagfSJpb/3Nj0DN5I#7Z

登录后界面如下：

三、部署块存储

3.1 创建pool和StorageClass

ceph-storageclass.yaml

kubectl apply -f csi/rbd/storageclass.yaml

# 定义一个块存储池
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  # 每个数据副本必须跨越不同的故障域分布，如果设置为host，则保证每个副本在不同机器上
  failureDomain: host
  # 副本数量
  replicated:
    size: 3
    # Disallow setting pool with replica 1, this could lead to data loss without recovery.
    # Make sure you're *ABSOLUTELY CERTAIN* that is what you want
    requireSafeReplicaSize: true
    # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
    # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
    #targetSizeRatio: .5
---
# 定义一个StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-block
# 该SC的Provisioner标识，rook-ceph前缀即当前命名空间
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
    # clusterID 就是集群所在的命名空间名
    # If you change this namespace, also change the namespace below where the secret namespaces are defined
    clusterID: rook-ceph

    # If you want to use erasure coded pool with RBD, you need to create
    # two pools. one erasure coded and one replicated.
    # You need to specify the replicated pool here in the `pool` parameter, it is
    # used for the metadata of the images.
    # The erasure coded pool must be set as the `dataPool` parameter below.
    #dataPool: ec-data-pool
    # RBD镜像在哪个池中创建
    pool: replicapool

    # RBD image format. Defaults to "2".
    imageFormat: "2"

    # 指定image特性，CSI RBD目前仅仅支持layering
    imageFeatures: layering

    # Ceph admin 管理凭证配置,由operator 自动生成
    # in the same namespace as the cluster.
    csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
    csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
    csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
    csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
    # 卷的文件系统类型，默认ext4，不建议xfs，因为存在潜在的死锁问题（超融合设置下卷被挂载到相同节点作为OSD时）
    csi.storage.k8s.io/fstype: ext4
# uncomment the following to use rbd-nbd as mounter on supported nodes
# **IMPORTANT**: If you are using rbd-nbd as the mounter, during upgrade you will be hit a ceph-csi
# issue that causes the mount to be disconnected. You will need to follow special upgrade steps
# to restart your application pods. Therefore, this option is not recommended.
#mounter: rbd-nbd
allowVolumeExpansion: true
reclaimPolicy: Delete

3.2 demo示例

推荐pvc 和应用写到一个yaml里面

vim ceph-demo.yaml

#创建pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-demo-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: csirbd-demo-pod
  labels:
    test-cephrbd: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      test-cephrbd: "true"
  template:
    metadata:
      labels:
        test-cephrbd: "true"
    spec:
      containers:
       - name: web-server-rbd
         image: 10.2.55.8:5000/library/nginx:1.18.0
         volumeMounts:
           - name: mypvc
             mountPath: /usr/share/nginx/html
      volumes:
       - name: mypvc
         persistentVolumeClaim:
           claimName: rbd-demo-pvc
           readOnly: false

四、部署文件系统

4.1 创建CephFS

CephFS的CSI驱动使用Quotas来强制应用PVC声明的大小，仅仅4.17+内核才能支持CephFS quotas。

如果内核不支持，而且你需要配额管理，配置Operator环境变量 CSI_FORCE_CEPHFS_KERNEL_CLIENT: false来启用FUSE客户端。

使用FUSE客户端时，升级Ceph集群时应用Pod会断开mount，需要重启才能再次使用PV。

cd rook/cluster/examples/kubernetes/ceph
kubectl apply -f filesystem.yaml

五、删除ceph集群

删除ceph集群前，请先清理相关pod

删除块存储和文件存储

kubectl delete -n rook-ceph cephblockpool replicapool
kubectl delete storageclass rook-ceph-block
kubectl delete -f csi/cephfs/filesystem.yaml
kubectl delete storageclass csi-cephfs rook-ceph-block
kubectl -n rook-ceph delete cephcluster rook-ceph

删除operator和相关crd

kubectl delete -f cluster.yaml
kubectl delete -f operator.yaml
kubectl delete -f common.yaml
kubectl delete -f crds.yaml

清除主机上的数据

删除Ceph集群后，在之前部署Ceph组件节点的/data/rook/目录，会遗留下Ceph集群的配置信息。

rm -rf  /data/k8s/rook/*

若之后再部署新的Ceph集群，先把之前Ceph集群的这些信息删除，不然启动monitor会失败；

# cat clean-rook-dir.sh
hosts=(
  192.168.130.130
  192.168.130.131
  192.168.130.132
)
for host in ${hosts[@]} ; do
  ssh $host "rm -rf /data/k8s/rook/*"
done

清除device

yum install gdisk -y
export DISK="/dev/sdb"
sgdisk --zap-all $DISK
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
blkdiscard $DISK
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
rm -rf /dev/ceph-*

如果因为某些原因导致删除ceph集群卡主，可以先执行以下命令，再删除ceph集群就不会卡主了

kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge

转载于： https://my.oschina.net/u/4346166/blog/4752651

https://blog.csdn.net/qq_40592377/article/details/110292089