k8s——Job

Job Controller

Job Controller负责根据Job Spec创建Pod，并持续监控Pod的状态，直至其成功结束。如果失败，则根据restartPolicy（只支持OnFailure和Never，不支持Always）决定是否创建新的Pod再次重试任务。

Job用途
容器按照持续运行的时间可分为两类:服务类容器和工作类容器
服务类容器通常持续提供服务,需要一直运行,比如HTTPServer、Daemon等。工作类容器则是一次性任务,比如批处理程序,完成后容器就退出
Kubernetes的Deployment、ReplicaSet和DaemonSet都用于管理服务类容器;对于工作类容器,我们使用Job

root@ubuntu:~/tenant# cat job.yaml 
apiVersion: batch/v1 
kind: Job 
metadata:
 name: myjob
spec:
 template:
  metadata:
    name: myjob
  spec:
   containers:
   - name: hello
     image: busybox
     command: ["echo","hello k8s job !"]
   restartPolicy: Never

restartPolicy 指定什么情况下需要重启容器。对于Job,只能设置为Never(启动容器失败了，会一直重新启动新的pod）或者OnFailure（启动容器失败，不会重新启动新的pod，节省资源）。对于其他controller(比如Deployment),

root@ubuntu:~/tenant#  kubectl get pods  -o wide
NAME                             READY   STATUS              RESTARTS   AGE     IP               NODE      NOMINATED NODE   READINESS GATES
busybox                          1/1     Running             0          46m     10.244.129.145   centos7   <none>           <none>
example-foo-54dc4db9fc-lqz9j     1/1     Running             0          19d     10.244.29.26     bogon     <none>           <none>
job-1-nginx-0                    0/1     Completed           0          22d     10.244.29.19     bogon     <none>           <none>
myjob-pl75c                      0/1     ContainerCreating   0          8s      <none>           centos7   <none>           <none>
nginx-ds-f7sjm                   1/1     Running             0          10m     10.244.29.23     bogon     <none>           <none>
nginx-ds-ldlrq                   1/1     Running             0          10m     10.244.41.1      cloud     <none>           <none>
nginx-ds-p8nqz                   1/1     Running             0          10m     10.244.243.195   ubuntu    <none>           <none>
nginx-ds-xrt8b                   1/1     Running             0          10m     10.244.129.146   centos7   <none>           <none>
test-job-default-nginx-0         1/1     Running             0          15d     10.244.29.3      bogon     <none>           <none>
test-job-default-nginx-1         1/1     Running             0          15d     10.244.29.9      bogon     <none>           <none>
test-job-default-nginx-2         1/1     Running             0          15d     10.244.29.19     bogon     <none>           <none>
test-job-default-nginx-3         1/1     Running             0          15d     10.244.29.63     bogon     <none>           <none>
test-job-default-nginx-4         1/1     Running             0          15d     10.244.29.1      bogon     <none>           <none>
test-job-default-nginx-5         1/1     Running             0          15d     10.244.29.2      bogon     <none>           <none>
test-job-v2-default-nginx-v2-0   1/1     Running             0          14d     10.244.29.20     bogon     <none>           <none>
web-0                            1/1     Running             0          3h22m   10.244.129.142   centos7   <none>           <none>
web-1                            1/1     Running             0          3h16m   10.244.129.143   centos7   <none>           <none>
root@ubuntu:~/tenant# kubectl get job
NAME    COMPLETIONS   DURATION   AGE
myjob   1/1           12s        8m3s
root@ubuntu:~/tenant#

root@ubuntu:~/tenant# kubectl get job
NAME    COMPLETIONS   DURATION   AGE
myjob   1/1           12s        8m3s
root@ubuntu:~/tenant# kubectl logs  myjob-pl75c 
hello k8s job !
root@ubuntu:~/tenant#

以上是Pod成功执行的情况,如果Pod失败了会怎么样呢?
修改job.yml,故意引入一个错误

root@ubuntu:~/tenant# vi job.yaml 
apiVersion: batch/v1
kind: Job
metadata:
 name: myjob
spec:
 template:
  metadata:
    name: myjob
  spec:
   containers:
   - name: hello
     image: busybox
     command: ["invalid cmd","hello k8s job !"]
   restartPolicy: Never

root@ubuntu:~/tenant# kubectl create -f job.yaml 
job.batch/myjob created
root@ubuntu:~/tenant# kubectl get pod
NAME                             READY   STATUS              RESTARTS   AGE
busybox                          1/1     Running             0          56m
example-foo-54dc4db9fc-lqz9j     1/1     Running             0          19d
job-1-nginx-0                    0/1     Completed           0          22d
myjob-j6mtv                      0/1     ContainerCreating   0          9s
 
root@ubuntu:~/tenant# kubectl get job
NAME    COMPLETIONS   DURATION   AGE
myjob   0/1           23s        23s
root@ubuntu:~/tenant# kubectl get pod
NAME                             READY   STATUS               RESTARTS   AGE
busybox                          1/1     Running              0          57m
example-foo-54dc4db9fc-lqz9j     1/1     Running              0          19d
job-1-nginx-0                    0/1     Completed            0          22d
myjob-j6mtv                      0/1     ContainerCannotRun   0          27s
myjob-zrgmk                      0/1     ContainerCannotRun   0          15s
 
root@ubuntu:~/tenant# kubectl get pod
NAME                             READY   STATUS               RESTARTS   AGE
busybox                          1/1     Running              0          57m
example-foo-54dc4db9fc-lqz9j     1/1     Running              0          19d
job-1-nginx-0                    0/1     Completed            0          22d
myjob-j6mtv                      0/1     ContainerCannotRun   0          32s
myjob-zrgmk                      0/1     ContainerCannotRun   0          20s
 
root@ubuntu:~/tenant# kubectl get pod
NAME                             READY   STATUS               RESTARTS   AGE
busybox                          1/1     Running              0          57m
example-foo-54dc4db9fc-lqz9j     1/1     Running              0          19d
job-1-nginx-0                    0/1     Completed            0          22d
myjob-j6mtv                      0/1     ContainerCannotRun   0          38s
myjob-mdfpz                      0/1     ContainerCreating    0          4s
myjob-zrgmk                      0/1     ContainerCannotRun   0          26s
 
root@ubuntu:~/tenant#

root@ubuntu:~/tenant# kubectl get pod | grep myjob
myjob-6kfq8                      0/1     ContainerCannotRun   0          65s
myjob-j6mtv                      0/1     ContainerCannotRun   0          119s
myjob-mdfpz                      0/1     ContainerCannotRun   0          85s
myjob-zrgmk                      0/1     ContainerCannotRun   0          107s
root@ubuntu:~/tenant#

root@ubuntu:~/tenant# kubectl get pod | grep myjob
myjob-6kfq8                      0/1     ContainerCannotRun   0          65s
myjob-j6mtv                      0/1     ContainerCannotRun   0          119s
myjob-mdfpz                      0/1     ContainerCannotRun   0          85s
myjob-zrgmk                      0/1     ContainerCannotRun   0          107s
root@ubuntu:~/tenant# kubectl describe pods myjob-mdfpz
Name:         myjob-mdfpz
Namespace:    default
Priority:     0
Node:         centos7/10.10.16.251
Start Time:   Thu, 29 Jul 2021 15:29:41 +0800
Labels:       controller-uid=2fab27c7-2c65-425b-a698-1a4ffaa24448
              job-name=myjob
Annotations:  cni.projectcalico.org/podIP: 
              cni.projectcalico.org/podIPs: 
Status:       Failed
IP:           10.244.129.150
IPs:
  IP:           10.244.129.150
Controlled By:  Job/myjob
Containers:
  hello:
    Container ID:  docker://0b71696f5d71fb7c4ddb7fcb408c2141e890298092ef2701ce695f82d1ff242e
    Image:         busybox
    Image ID:      docker-pullable://docker.io/busybox@sha256:0f354ec1728d9ff32edcd7d1b8bbdfc798277ad36120dc3dc683be44524c8b60
    Port:          <none>
    Host Port:     <none>
    Command:
      invalid cmd
      hello k8s job !
    State:      Terminated
      Reason:   ContainerCannotRun
      Message:  oci runtime error: container_linux.go:235: starting container process caused "exec: "invalid cmd": executable file not found in $PATH"

      Exit Code:    127
      Started:      Thu, 29 Jul 2021 15:29:49 +0800
      Finished:     Thu, 29 Jul 2021 15:29:49 +0800
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cfr6q (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-cfr6q:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-cfr6q
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age        From               Message
  ----     ------     ----       ----               -------
  Normal   Scheduled  <unknown>  default-scheduler  Successfully assigned default/myjob-mdfpz to centos7
  Normal   Pulling    2m21s      kubelet, centos7   Pulling image "busybox"
  Normal   Pulled     2m17s      kubelet, centos7   Successfully pulled image "busybox"
  Normal   Created    2m16s      kubelet, centos7   Created container hello
  Warning  Failed     2m15s      kubelet, centos7   Error: failed to start container "hello": Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "exec: "invalid cmd": executable file not found in $PATH"
root@ubuntu:~/tenant#

下面解释一个现象:为什么kubectl get pod会看到这么多个失败
的Pod?
原因是:当第一个Pod启动时,容器失败退出,根据restartPolicy:
Never,此失败容器不会被重启,但Job DESIRED的Pod是1,目前SUCCESSFUL为0,不满足,所以Job controller会启动新的Pod,直到SUCCESSFUL为1。对于我们这个例子,SUCCESSFUL永远也到不了1,所以Job controller会一直创建新的Pod。为了终止这个行为,只能删除Job

如果将restartPolicy设置为OnFailure会怎么样?下面我们实践一下,修改myjob.yml后重新启动

apiVersion: batch/v1
kind: Job 
metadata:
 name: myjob
spec:
 template:
  metadata:
    name: myjob
  spec:
   containers:
   - name: hello
     image: busybox
     command: ["invalid cmd","hello k8s job !"]
   restartPolicy: OnFailure

root@ubuntu:~/tenant# kubectl get pod | grep myjob
myjob-f5kvm                      0/1     ContainerCreating   0          9s
root@ubuntu:~/tenant# kubectl get pod | grep myjob
myjob-f5kvm                      0/1     ContainerCreating   0          11s
root@ubuntu:~/tenant# kubectl get pod | grep myjob
myjob-f5kvm                      0/1     RunContainerError   0          12s

root@ubuntu:~/tenant# kubectl get pod | grep myjob
myjob-f5kvm                      0/1     RunContainerError   2          45s
root@ubuntu:~/tenant#

# RESTARTS为2,而且不断增加,说明OnFailure生效,容器失败后会自动重启,不会创建新的pod

Job的并行性

有时我们希望能同时运行多个Pod,提高Job的执行效率。这个可以通过parallelism设置

root@ubuntu:~/tenant# cat job.yaml 
apiVersion: batch/v1
kind: Job 
metadata:
 name: myjob
spec:
 parallelism: 2 ##同时运行两个pod
 template:
  metadata:
    name: myjob
  spec:
   containers:
   - name: hello
     image: busybox
     command: ["echo","hello k8s job !"]
   restartPolicy: OnFailure

root@ubuntu:~/tenant#  kubectl delete  -f job.yaml
job.batch "myjob" deleted
root@ubuntu:~/tenant#  kubectl create  -f job.yaml
job.batch/myjob created
root@ubuntu:~/tenant#  kubectl get pod | grep myjob
myjob-nxw5t                      0/1     Completed           0          11s
myjob-qsc9r                      0/1     ContainerCreating   0          11s
root@ubuntu:~/tenant#  kubectl get jobs.batch 
NAME    COMPLETIONS   DURATION   AGE
myjob   2/1 of 2      14s        19s
root@ubuntu:~/tenant#  kubectl get jobs
NAME    COMPLETIONS   DURATION   AGE
myjob   2/1 of 2      14s        26s
root@ubuntu:~/tenant#  kubectl get pod | grep myjob
myjob-nxw5t                      0/1     Completed   0          30s
myjob-qsc9r                      0/1     Completed   0          30s
root@ubuntu:~/tenant#  kubectl get pod | grep myjob
myjob-nxw5t                      0/1     Completed   0          36s
myjob-qsc9r                      0/1     Completed   0          36s
root@ubuntu:~/tenant#  kubectl get pod | grep myjob
myjob-nxw5t                      0/1     Completed   0          43s
myjob-qsc9r                      0/1     Completed   0          43s
root@ubuntu:~/tenant# kubectl logs  myjob-nxw5t
hello k8s job !
root@ubuntu:~/tenant# kubectl logs   myjob-qsc9r 
hello k8s job !
root@ubuntu:~/tenant#

我们还可以通过completions设置Job成功完成Pod的总数

root@ubuntu:~/tenant# cat job.yaml 
apiVersion: batch/v1
kind: Job 
metadata:
 name: myjob
spec:
 parallelism: 2 ##同时运行两个pod
 completions: 4
 template:
  metadata:
    name: myjob
  spec:
   containers:
   - name: hello
     image: busybox
     command: ["echo","hello k8s job !"]
   restartPolicy: OnFailure

root@ubuntu:~/tenant# cat job.yaml 
apiVersion: batch/v1
kind: Job 
metadata:
 name: myjob
spec:
 parallelism: 2 ##同时运行两个pod
 completions: 4
 template:
  metadata:
    name: myjob
  spec:
   containers:
   - name: hello
     image: busybox
     command: ["echo","hello k8s job !"]
   restartPolicy: OnFailure
 
root@ubuntu:~/tenant#  kubectl create  -f job.yaml
job.batch/myjob created
root@ubuntu:~/tenant#  kubectl get pod | grep myjob
myjob-27fss                      0/1     Completed           0          19s
myjob-dqfgw                      0/1     Completed           0          19s
myjob-m954l                      0/1     ContainerCreating   0          4s
myjob-x9bps                      0/1     ContainerCreating   0          9s
root@ubuntu:~/tenant#  kubectl get pod | grep myjob
myjob-27fss                      0/1     Completed           0          25s
myjob-dqfgw                      0/1     Completed           0          25s
myjob-m954l                      0/1     ContainerCreating   0          10s
myjob-x9bps                      0/1     Completed           0          15s
root@ubuntu:~/tenant#  kubectl get pod | grep myjob
myjob-27fss                      0/1     Completed   0          28s
myjob-dqfgw                      0/1     Completed   0          28s
myjob-m954l                      0/1     Completed   0          13s
myjob-x9bps                      0/1     Completed   0          18s
root@ubuntu:~/tenant# kubectl logs  myjob-m954l
hello k8s job !
root@ubuntu:~/tenant# kubectl logs   myjob-dqfgw 
hello k8s job !
root@ubuntu:~/tenant#