Kubernetes进阶实战读书笔记：POD对象的生命周期（探针检测）

一、存活性检测（设置exec探针）

它只有一个可用属性 "command"，用于制定要执行的命令、下面订一张资源清单liveness-exec.yaml

1、资源清单

[root@master chapter4]# cat liveness-exec.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness-exec
  name: liveness-exec
spec:
  containers:
  - name: liveness-demo
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 60; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - test
        - -e
        - /tmp/healthy

上面的资源清单中定义了一个pod对象，基于busybox镜像启动一个运行"touch /tmp/healthy; sleep 60; rm -rf /tmp/healthy; sleep 600" 命令的容器
此命令在容器启动时创建/tmp/healthy"文件，并于60秒之后将其删除。存活性探针运行"test -e /tmp/healthy" 命令检查文件的存在性，若文件存在则返回状态码0，表示成功通过测试

2、运行

首先执行如下命令，创建pod对象liveness-exec

[root@master chapter4]# kubectl apply -f liveness-exec.yaml 
pod/liveness-exec created
[root@master chapter4]# kubectl get pods liveness-exec
NAME            READY   STATUS    RESTARTS   AGE
liveness-exec   1/1     Running   0          42s

3、验证效果

在60秒之内使用"kubectl describe pod liveness-exec"查看其详细信息，其存活性探测不会出现错误。而超过60秒之后，再次运行查看其详细信息可以发现，存活性探测出现了故障，并且隔更长一段时间之后再查看甚至还可以看到容器重启的相关信息

[root@master chapter4]# kubectl describe pod liveness-exec
Name:         liveness-exec

  PodScheduled      True 
......
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Killing    3m12s (x3 over 7m32s)  kubelet, node2     Container liveness-demo failed liveness probe, will be restarted
  Normal   Pulling    2m41s (x4 over 9m16s)  kubelet, node2     Pulling image "busybox"
  Normal   Pulled     2m26s (x4 over 8m58s)  kubelet, node2     Successfully pulled image "busybox"

另外，输出信息的"Conditions" 一段中还清晰地显示了容器健康状态监测及状态变化的相关信息：容器当前处于"Running "状态，但是前一次是为"Terminated"，原因是退出码为137的错误信息，它表示进程是被外部信号所终止的，137事实上是由两部分数字之和生成的：128+signum，其中signum是导致进程终止的信号的数字标识，9表示SIGKILL,这意味着进程是被强行终止的

[root@master chapter4]# kubectl describe pod liveness-exec
Name:         liveness-exec
......

Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
    State:          Running
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 09 Jun 2020 11:35:32 +0800
      Finished:     Tue, 09 Jun 2020 11:37:26 +0800
    Ready:          False
    Restart Count:  26
    Liveness:       exec [test -e /tmp/healthy] delay=0s timeout=1s period=10s #success=1 #failure=3

待容器重启完成后再次查看，容器已经处于正常运行状态，直到文件再次被删除，存活性探测失败而重启。从下面的命令显示可以看出在4分钟内已然重启了两次

[root@master chapter4]# kubectl get pods liveness-exec
NAME            READY   STATUS    RESTARTS   AGE
liveness-exec   1/1     Running   4          9m14s

[root@master chapter4]# kubectl describe pod liveness-exec
Name:         liveness-exec
......
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  <unknown>              default-scheduler  Successfully assigned default/liveness-exec to node2
  Normal   Created    4m36s (x3 over 8m58s)  kubelet, node2     Created container liveness-demo
  Normal   Started    4m35s (x3 over 8m57s)  kubelet, node2     Started container liveness-demo
  Warning  Unhealthy  3m12s (x9 over 7m52s)  kubelet, node2     Liveness probe failed:
  Normal   Killing    3m12s (x3 over 7m32s)  kubelet, node2     Container liveness-demo failed liveness probe, will be restarted
  Normal   Pulling    2m41s (x4 over 9m16s)  kubelet, node2     Pulling image "busybox"
  Normal   Pulled     2m26s (x4 over 8m58s)  kubelet, node2     Successfully pulled image "busybox"

需要特别说明的是，exec指定的命令运行于容器中，会消耗容器的可用资源配额，另外，考虑到探测操作的效率本身等因素、探测操作的命令应该简单和轻量

二、存活性检测（设置http探针）

1、官方手册详解

[root@master chapter4]# kubectl explain pod.spec.containers.livenessProbe.httpGet
KIND:     Pod
VERSION:  v1

RESOURCE: httpGet <Object>

DESCRIPTION:
     HTTPGet specifies the http request to perform.

     HTTPGetAction describes an action based on HTTP Get requests.

FIELDS:
   host	<string> #请求的主机地址，默认为POD IP;也可以在httpheaders中使用"Host：" 来定义
     Host name to connect to, defaults to the pod IP. You probably want to set
     "Host" in httpHeaders instead.

   httpHeaders	<[]Object> #自定义的请求报文首部
     Custom headers to set in the request. HTTP allows repeated headers.

   path	<string>  #请求http资源路径，即URL path
     Path to access on the HTTP server.

   port	<string> -required-  #请求端口，必须字段
     Name or number of the port to access on the container. Number must be in
     the range 1 to 65535. Name must be an IANA_SVC_NAME.

   scheme	<string>  #建立连接使用的协议，仅可为HTTPS，默认为HTTP
     Scheme to use for connecting to the host. Defaults to HTTP.

2、资源清单

创建一个专用于httpGet测试页面的文件healthz：

[root@master chapter4]# cat liveness-http.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness-demo
    image: nginx:1.12-alpine
    ports:
    - name: http
      containerPort: 80
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - -c
          - 'echo Healty > /usr/share/nginx/html/healthz'
    livenessProbe:
      httpGet:
        path: /healthz
        port: http

3、创建运行

首先创建POD对象

[root@master chapter4]# kubectl apply -f liveness-http.yaml 
pod/liveness-http created

4、验证效果

而后查看其监控康状态监测相关的信息，健康状态监测正常时，容器也讲正常运行

root@master chapter4]# kubectl describe pod liveness-http
Name:         liveness-http
......
Events:
  Type    Reason     Age        From               Message
  ----    ------     ----       ----               -------
  Normal  Scheduled  <unknown>  default-scheduler  Successfully assigned default/liveness-http to node2
  Normal  Pulling    55s        kubelet, node2     Pulling image "nginx:1.12-alpine"
  Normal  Pulled     21s        kubelet, node2     Successfully pulled image "nginx:1.12-alpine"
  Normal  Created    21s        kubelet, node2     Created container liveness-demo
  Normal  Started    21s        kubelet, node2     Started container liveness-demo

接下来借助于"kubectl exec" 命令删除经由poststart hook创建的测试页面healthz：

[root@master chapter4]# kubectl exec liveness-http rm /usr/share/nginx/html/healthz
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
[root@master chapter4]# kubectl exec liveness-http rm /usr/share/nginx/html/healthz
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.

再次执行"kubectl get pods liveness-http" 查看其详细的状态信息，事件输出中的信息可以表明探测测试失败，容器被杀掉后进行了重新创建

[root@master chapter4]# kubectl get pods liveness-http
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   2          5m11s

[root@master chapter4]# kubectl describe pod liveness-http
Name:         liveness-http
......
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  <unknown>              default-scheduler  Successfully assigned default/liveness-http to node2
  Normal   Pulling    5m58s                  kubelet, node2     Pulling image "nginx:1.12-alpine"
  Normal   Pulled     5m24s                  kubelet, node2     Successfully pulled image "nginx:1.12-alpine"
  Warning  Unhealthy  2m12s (x6 over 3m2s)   kubelet, node2     Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    2m12s (x2 over 2m42s)  kubelet, node2     Container liveness-demo failed liveness probe, will be restarted
  Normal   Created    2m11s (x3 over 5m24s)  kubelet, node2     Created container liveness-demo
  Normal   Started    2m11s (x3 over 5m24s)  kubelet, node2     Started container liveness-demo
  Normal   Pulled     2m11s (x2 over 2m41s)  kubelet, node2     Container image "nginx:1.12-alpine" already present on machine

一般来说HTTP类型的探测操作应该针对专用的URL路径进行，例如：/healthz

另外此URL路径对应的web资源应该以轻量化的方式在内部对应用程序的个关键组件进行全面检测以确保可正常向客户端提供完整的服务

需要注意的是：这种检测试试仅对分层架构中的前一层有效、但重启操作却无法解决其后端服务(如数据库或缓存服务)导致的故障此时容器可能会被一次次的重启，知道后端服务恢复正常位置。其他两种检测方式也存在类似的问题

三、存活性检测（设置TCP探针）

1、官方手册详解

[root@master chapter4]# kubectl explain pod.spec.containers.livenessProbe.tcpSocket
KIND:     Pod
VERSION:  v1

RESOURCE: tcpSocket <Object>

DESCRIPTION:
     TCPSocket specifies an action involving a TCP port. TCP hooks not yet
     supported

     TCPSocketAction describes an action based on opening a socket

FIELDS:
   host	<string>  #请求连接的目标IP地址，默认POD ip
     Optional: Host name to connect to, defaults to the pod IP.

   port	<string> -required-  #请求连接的目标端口，必选字段
     Number or name of the port to access on the container. Number must be in
     the range 1 to 65535. Name must be an IANA_SVC_NAME.

2、模板示例

cat nginx_pod_tcpSocket.yaml
apiVersion: v1
kind: Pod
metadata:
  name: tcpSocket
spec:
  containers:
    - name: nginx
      image: 10.0.0.11:5000/nginx:1.13
      ports:
        - containerPort: 80
      livenessProbe:
        tcpSocket:
          port: 80
        initialDelaySeconds: 3
        periodSeconds: 3

四、存活性探测行为属性

1、查看存活性探测pod对象的详细信息

使用"kubectl describe" 命令查看配置了存活性探测的pod对象的详细信息时，其相关容器中会输出类似如下一行的内容

[root@master chapter4]# kubectl describe pod liveness-exec
Name:         liveness-exec
......
    Ready:          False
    Restart Count:  10
    Liveness:       exec [test -e /tmp/healthy] delay=0s timeout=1s period=10s #success=1 #failure=3

它给出了探测方式及其额外的配置属性delay、timeout、period、success和failure及其各自的相关属性值。

用户没有明确定义这些属性字段时，它们会使用各自的默认值，例如上面显示出的设定，这些属性信息可通过"pod.spec.containers.livenessProbe" 的如下属性字段来给出：

2、官方手册详解

kubectl explain pod.spec.containers.livenessProbe

[root@master chapter4]# kubectl explain pod.spec.containers.livenessProbe
KIND: Pod
VERSION: v1

RESOURCE: livenessProbe <Object>

DESCRIPTION:
Periodic probe of container liveness. Container will be restarted if the
probe fails. Cannot be updated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

Probe describes a health check to be performed against a container to
determine whether it is alive or ready to receive traffic.

FIELDS:
exec <Object>
One and only one of the following should be specified. Exec specifies the
action to take.

failureThreshold <integer>
#处于成功状态时，探测操作至少连续多少次的失败才被视为是检测不通过、显示为#failure属性、默认值为3、最小值为1
Minimum consecutive failures for the probe to be considered failed after
having succeeded. Defaults to 3. Minimum value is 1.

httpGet <Object>
HTTPGet specifies the http request to perform.

initialDelaySeconds <integer> #存活性探针延迟时长、即容器启动多久之后再开始第一次探测操作，显示为delay属性；默认为0秒、即容器启动后立刻便开始进行探测
Number of seconds after the container has started before liveness probes
are initiated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

periodSeconds <integer>
#存活性探针的频度，显示为period属性、默认值为10s、最小值为1s、过高频率会对pod对象带来较大的额外开销、而过低的频率会使得对错误的发应不及时
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
value is 1.

successThreshold <integer>
#处于失败状态时、探测操作至少连续多少次的成功才被认为通过检测，显示为#success属性、默认值为1、最小值也为1
Minimum consecutive successes for the probe to be considered successful
after having failed. Defaults to 1. Must be 1 for liveness and startup.
Minimum value is 1.

tcpSocket <Object>
TCPSocket specifies an action involving a TCP port. TCP hooks not yet
supported

timeoutSeconds <integer> #存活性探测的超时时长，显示为timeout属性，默认为1s、最小值也为1s
Number of seconds after which the probe times out. Defaults to 1 second.
Minimum value is 1. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

根据修改的清单再次创建pod对象并进行效果测试，可以从输出的详细信息中看出已经更新到自定义的属性，其内容如下所示

[root@master chapter4]# kubectl describe pod liveness-exec
Name: liveness-exec
......
Ready: False
Restart Count: 10
Liveness: exec [test -e /tmp/healthy] delay=5s timeout=2s period=5s #success=1 #failure=3

五、就绪性探测

1、就绪性探测的用途

就绪性探测是用来判断容器就绪与否的周期性操作、他用于探测容器是否已经初始化完成并可服务于客户端请求、探测操作返回"success"状态时，即为传递容器已经"就绪"的信号

探测失败时、就绪性探测不会杀死活重启容器以保证其健康性，而是通知其尚未就绪，并触发依赖于其就绪状态操作(例如从service对象中移除pod对象)以确保客户端请求接入此pod对象

2、价值所在

价值所在：Pod A 依赖的Pod B因网络故障等原因而不可用时，Pod A上的服务应该转为未就绪状态、以免无法向客户端提供完整的相应

将容器定义中liveness的字段名替换为readinessProbe即可定义出就绪性探测的配置、一个简单的示例如下面的配置清单(readiness-exec)所示，它会在pod对象创建完成5秒钟后使用test -e /tmp/ready命令来探测容器的就绪性，命令执行成功即为就绪、探测周期为5秒钟：

3、资源清单

[root@master chapter4]# cat readiness-exec.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: readiness-exec
  name: readiness-exec
spec:
  containers:
  - name: readiness-demo
    image: busybox
    args: ["/bin/sh", "-c", "while true; do rm -f /tmp/ready; sleep 30; touch /tmp/ready; sleep 300; done"] 
    readinessProbe:
      exec:
        command: ["test", "-e", "/tmp/ready"]
      initialDelaySeconds: 5
      periodSeconds: 5

4、创建运行

首先、使用"kubectl create"命令将资源配置清单定义的资源创建到集群中：

[root@master chapter4]# kubectl create -f readiness-exec.yaml 
pod/readiness-exec created

5、效果验证

接着、运行"kubectl get -w "命令监视其资源变动信息，由如下命令结果可知，尽管pod对象处于Running状态，但知道就绪探测命令执行成功后pod资源才转为"就绪"

[root@master chapter4]# kubectl get pods -l test=readiness-exec -w 
NAME             READY   STATUS    RESTARTS   AGE
readiness-exec   0/1     Running   0          22s
readiness-exec   1/1     Running   0          50s

另外、还可以从pod对象的详细信息中得到类似如下的表示其已经处于就绪状态的信息

[root@master chapter4]# kubectl describe pod readiness-exec
Name:         readiness-exec
.......
    Ready:          True
    Restart Count:  0
    Readiness:      exec [test -e /tmp/ready] delay=5s timeout=1s period=5s #success=1 #failure=3

特别提醒：

未定义就绪性探测的POD迪欧瞎忙活早pod进入"Running" 状态后将立即就绪，在容器需要时间进行初始化场景中，在应用真正就绪之前

必然无法正常想用客户请求，因此、生产实践中，必须为关键性pod资源中的容器定义就绪性探测机制，其探测机制的定义请参考4.6节中定义