k8s核心资源--namespace及pod

一、kubernetes官方文档

K8s官方文档：https://kubernetes.io/
K8s中文官方文档： https://kubernetes.io/zh/
K8s Github地址：https://github.com/kubernetes/

二、名称空间namespace

2.1、什么是命名空间？

Kubernetes 支持多个虚拟集群，它们底层依赖于同一个物理集群。这些虚拟集群被称为命名空间。
命名空间namespace是k8s集群级别的资源，可以给不同的用户、租户、环境或项目创建对应的命名空间，例如，可以为test、devlopment、production环境分别创建各自的命名空间。

[root@k8s-master1 yaml]# kubectl get namespaces 
NAME                   STATUS   AGE
default                Active   13h
kube-node-lease        Active   13h
kube-public            Active   13h
kube-system            Active   13h
kubernetes-dashboard   Active   12h

2.2、namespace应用场景

命名空间适用于存在很多跨多个团队或项目的用户的场景。对于只有几到几十个用户的集群，根本不需要创建或考虑命名空间。

1）查看名称空间及其资源对象
k8s集群默认提供了几个名称空间用于特定目的，例如，kube-system主要用于运行系统级资源，存放k8s一些组件的。而default则为那些未指定名称空间的资源操作提供一个默认值。

使用kubectl get namespace可以查看namespace资源，使用kubectl describe namespace $NAME可以查看特定的名称空间的详细信息。

2）管理namespace资源
namespace资源属性较少，通常只需要指定名称即可创建，如kubectl create namespace qa。namespace资源的名称仅能由字母、数字、下划线、连接线等字符组成。删除namespace资源会级联删除其包含的所有其他资源对象。

2.3、namespacs使用案例

# 创建一个test命名空间
[root@k8s-master1 ~]# kubectl create ns test

# 切换命名空间
[root@k8s-master1 ~]# kubectl  config set-context --current --namespace=kube-system

# 切换命名空间后，kubectl get pods 如果不指定-n，查看的就是kube-system命名空间的资源了。
[root@k8s-master1 ~]# kubectl get pods
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-6949477b58-9t9k8   1/1     Running   2          13h
calico-node-66b47                          1/1     Running   2          13h
calico-node-6svrr                          1/1     Running   2          13h
calico-node-zgnkl                          1/1     Running   2          13h
coredns-7f89b7bc75-4jvmv                   1/1     Running   1          13h
coredns-7f89b7bc75-zr5mf                   1/1     Running   2          13h
etcd-k8s-master1                           1/1     Running   2          13h
kube-apiserver-k8s-master1                 1/1     Running   2          11h
kube-controller-manager-k8s-master1        1/1     Running   2          11h
kube-proxy-8fzc4                           1/1     Running   2          13h
kube-proxy-n2v4j                           1/1     Running   2          13h
kube-proxy-r9ccp                           1/1     Running   2          13h
kube-scheduler-k8s-master1                 1/1     Running   2          11h
metrics-server-6595f875d6-dx8w6            2/2     Running   3          11h
[root@k8s-master1 ~]# kubectl  config set-context --current --namespace=default
[root@k8s-master1 ~]# kubectl get pods
NAME     READY   STATUS    RESTARTS   AGE
tomcat   1/1     Running   0          21m

# 查看哪些资源属于命名空间级别的
[root@k8s-master1 ~]# kubectl api-resources --namespaced=true
NAME                        SHORTNAMES   APIVERSION                     NAMESPACED   KIND
bindings                                 v1                             true         Binding
configmaps                  cm           v1                             true         ConfigMap
endpoints                   ep           v1                             true         Endpoints
events                      ev           v1                             true         Event
limitranges                 limits       v1                             true         LimitRange
persistentvolumeclaims      pvc          v1                             true         PersistentVolumeClaim
pods                        po           v1                             true         Pod
podtemplates                             v1                             true         PodTemplate
replicationcontrollers      rc           v1                             true         ReplicationController
resourcequotas              quota        v1                             true         ResourceQuota
secrets                                  v1                             true         Secret
serviceaccounts             sa           v1                             true         ServiceAccount
services                    svc          v1                             true         Service
controllerrevisions                      apps/v1                        true         ControllerRevision
daemonsets                  ds           apps/v1                        true         DaemonSet
deployments                 deploy       apps/v1                        true         Deployment
replicasets                 rs           apps/v1                        true         ReplicaSet
statefulsets                sts          apps/v1                        true         StatefulSet
localsubjectaccessreviews                authorization.k8s.io/v1        true         LocalSubjectAccessReview
horizontalpodautoscalers    hpa          autoscaling/v1                 true         HorizontalPodAutoscaler
cronjobs                    cj           batch/v1beta1                  true         CronJob
jobs                                     batch/v1                       true         Job
leases                                   coordination.k8s.io/v1         true         Lease
networkpolicies                          crd.projectcalico.org/v1       true         NetworkPolicy
networksets                              crd.projectcalico.org/v1       true         NetworkSet
endpointslices                           discovery.k8s.io/v1beta1       true         EndpointSlice
events                      ev           events.k8s.io/v1               true         Event
ingresses                   ing          extensions/v1beta1             true         Ingress
pods                                     metrics.k8s.io/v1beta1         true         PodMetrics
ingresses                   ing          networking.k8s.io/v1           true         Ingress
networkpolicies             netpol       networking.k8s.io/v1           true         NetworkPolicy
poddisruptionbudgets        pdb          policy/v1beta1                 true         PodDisruptionBudget
rolebindings                             rbac.authorization.k8s.io/v1   true         RoleBinding
roles                                    rbac.authorization.k8s.io/v1   true         Role

2.4、namespace资源限额

namespace是命名空间，里面有很多资源，那么我们可以对命名空间资源做个限制，防止该命名空间部署的资源超过限制。

[root@k8s-master1 ~]# vim namespace-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-quota
  namespace: test
spec:
  hard:
    requests.cpu: "2"
    requests.memory: 2Gi
    limits.cpu: "4"
    limits.memory: 4Gi
    
[root@k8s-master1 ~]# kubectl apply -f namespace-quota.yaml 
resourcequota/mem-cpu-quota created
[root@k8s-master1 ~]# kubectl describe ns test
Name:         test
Labels:       <none>
Annotations:  <none>
Status:       Active

Resource Quotas
 Name:            mem-cpu-quota
 Resource         Used  Hard
 --------         ---   ---
 limits.cpu       0     4
 limits.memory    0     4Gi
 requests.cpu     0     2
 requests.memory  0     2Gi

No LimitRange resource.

# 说明：创建的ResourceQuota对象将在test名字空间中添加以下限制：
每个容器必须设置内存请求（memory request），内存限额（memory limit），cpu请求（cpu request）和cpu限额（cpu limit）。
    所有容器的内存请求总额不得超过2 GiB。
    所有容器的内存限额总额不得超过4 GiB。
    所有容器的CPU请求总额不得超过2 CPU。
    所有容器的CPU限额总额不得超过4 CPU。

三、核心资源pod详解

3.1、pod是什么

官方文档：https://kubernetes.io/docs/concepts/workloads/pods/

Pod是Kubernetes中的最小调度单元，k8s是通过定义一个Pod的资源，然后在Pod里面运行容器，容器需要指定一个镜像，这样就可以用来运行具体的服务。一个Pod封装一个容器（也可以封装多个容器），Pod里的容器共享存储、网络等。也就是说，应该把整个pod看作虚拟机，然后每个容器相当于运行在虚拟机的进程。

Pod是需要调度到k8s集群的工作节点来运行的，具体调度到哪个节点，是根据scheduler调度器实现的。

可以把pod看成是一个“豌豆荚”，里面有很多“豆子”（容器）。一个豌豆荚里的豆子，它们吸收着共同的营养成分、肥料、水分等，Pod和容器的关系也是一样，Pod里面的容器共享pod的网络、存储等。

pod相当于一个逻辑主机，比方说我们想要部署一个tomcat应用，如果不用容器，我们可能会部署到物理机、虚拟机或者云主机上，那么出现k8s之后，我们就可以定义一个pod资源，在pod里定义一个把tomcat容器，所以pod充当的是一个逻辑主机的角色。

3.2、pod如何管理多个容器

Pod中可以同时运行多个容器。同一个Pod中的容器会自动的分配到同一个 node 上。同一个Pod中的容器共享资源、网络环境，它们总是被同时调度，在一个Pod中同时运行多个容器是一种比较高级的用法，只有当你的容器需要紧密配合协作的时候才考虑用这种模式。例如，你有一个容器作为web服务器运行，需要用到共享的volume，有另一个sidecar容器来从远端获取资源更新这些文件。

一些Pod有init容器和应用容器，在应用程序容器启动之前，运行初始化容器。

3.3、pod网络

Pod是有IP地址的，每个pod都被分配唯一的IP地址（IP地址是靠网络插件calico、flannel、weave等分配的），POD中的容器共享网络名称空间，包括IP地址和网络端口。 Pod内部的容器可以使用localhost相互通信。 Pod中的容器也可以通过网络插件calico与其他节点的Pod通信。

3.4、pod存储

创建Pod的时候可以指定挂载的存储卷。 POD中的所有容器都可以访问共享卷，允许这些容器共享数据。 Pod只要挂载持久化数据卷，Pod重启之后数据还是会存在的。

3.5、pod工作方式

在K8s中，所有的资源都可以使用一个yaml文件来创建，创建Pod也可以使用yaml配置文件。或者使用kubectl run在命令行创建Pod（不常用）。

3.5.1、自主式pod

所谓的自主式Pod，就是直接定义一个Pod资源

[root@k8s-master1 ~]# vim pod-tomcat.yaml
apiVersion: v1
kind: Pod
metadata:
  name: tomcat-test
  namespace: default
  labels:
    app:  tomcat
spec:
  containers:
  - name:  tomcat-java
    ports:
    - containerPort: 8080
    image: tomcat:8.5-jre8-alpine
    imagePullPolicy: IfNotPresent
    
# 更新资源清单文件
[root@k8s-master1 ~]# kubectl apply -f pod-tomcat.yaml 

# 查看pod是否创建成功
[root@k8s-master1 yaml]# kubectl get pods -o wide -l app=tomcat
NAME          READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
tomcat-test   1/1     Running   0          12s   10.244.36.69   k8s-node1   <none>           <none> 

# 但是自主式Pod是存在一个问题的，假如我们不小心删除了pod：
[root@k8s-master1 ~]# kubectl delete pods tomcat-test
#查看pod是否还在
[root@k8s-master1 ~]# kubectl get pods -l app=tomcat  #结果是空，说明pod已经被删除了

# 通过上面可以看到，如果直接定义一个Pod资源，那Pod被删除，就彻底被删除了，不会再创建一个新的Pod，这在生产环境还是具有非常大风险的，所以今后我们接触的Pod，都是控制器管理的。

3.5.2、控制器管理的Pod

常见的管理Pod的控制器：Replicaset、Deployment、Job、CronJob、Daemonset、Statefulset。控制器管理的Pod可以确保Pod始终维持在指定的副本数运行。

# 创建一个资源清单文件
[root@k8s-master1 ~]# vim nginx-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-test
  labels:
    app: nginx-deploy
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80

# 更新资源清单文件
[root@k8s-master1 ~]# kubectl apply -f nginx-deploy.yaml

# 查看Deployment
[root@k8s-master1 ~]# kubectl get deploy -l app=nginx-deploy
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
nginx-test   2/2     2            2           7m40s

# 查看Replicaset
[root@k8s-master1 ~]# kubectl get rs -l app=nginx
NAME                    DESIRED   CURRENT   READY   AGE
nginx-test-85999d4fcb   2         2         2       99s

# 查看pod
[root@k8s-master1 ~]# kubectl get pods -o wide -l app=nginx
NAME                          READY   STATUS    RESTARTS   AGE    IP               NODE        NOMINATED NODE   READINESS GATES
nginx-test-85999d4fcb-gv5kr   1/1     Running   0          114s   10.244.169.140   k8s-node2   <none>           <none>
nginx-test-85999d4fcb-rc8sz   1/1     Running   0          114s   10.244.36.73     k8s-node1   <none>           <none> 

# 删除nginx-test-85999d4fcb-gv5kr这个pod
[root@k8s-master1 ~]# kubectl delete pods nginx-test-85999d4fcb-gv5kr
[root@k8s-master1 ~]# kubectl get pods -o wide -l app=nginx
NAME                          READY   STATUS    RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
nginx-test-85999d4fcb-k6hsm   1/1     Running   0          10s     10.244.36.74   k8s-node1   <none>           <none>
nginx-test-85999d4fcb-rc8sz   1/1     Running   0          3m21s   10.244.36.73   k8s-node1   <none>           <none> 

# 发现重新创建一个新的pod是nginx-test-85999d4fcb-k6hsm
# 通过上面可以发现通过deployment管理的pod，可以确保pod始终维持在指定副本数量

3.6、如何创建一个Pod资源

3.6.1、创建pod流程

# master节点：kubectl -> kube-api -> kubelet -> CRI容器环境初始化

# 第一步：
	客户端提交创建Pod的请求，可以通过调用API Server的Rest API接口，也可以通过kubectl命令行工具。如kubectl apply -f filename.yaml(资源清单文件)
# 第二步：
	apiserver接收到pod创建请求后，会将yaml中的属性信息(metadata)写入etcd。
# 第三步：
	apiserver触发watch机制准备创建pod，信息转发给调度器scheduler，调度器使用调度算法选择node，调度器将node信息给apiserver，apiserver将绑定的node信息写入etcd。调度器用一组规则过滤掉不符合要求的主机。比如Pod指定了所需要的资源量，那么可用资源比Pod需要的资源量少的主机会被过滤掉。
# 第四步：
	apiserver又通过watch机制，调用kubelet，指定pod信息，调用Docker API创建并启动pod内的容器。
# 第五步：
	创建完成之后反馈给kubelet, kubelet又将pod的状态信息给apiserver,apiserver又将pod的状态信息写入etcd。

3.6.2、资源清单YAML文件书写

1）通过kubectl explain 查看定义Pod资源包含哪些字段。

[root@k8s-master1 ~]# kubectl explain pod
KIND:     Pod
VERSION:  v1

DESCRIPTION:
     Pod is a collection of containers that can run on a host. This resource is
     created by clients and scheduled onto hosts.

FIELDS:
   apiVersion	<string>
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind	<string>
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata	<Object>
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec	<Object>
     Specification of the desired behavior of the pod. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

   status	<Object>
     Most recently observed status of the pod. This data may not be up to date.
     Populated by the system. Read-only. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

2）查看pod.metadata字段如何定义

[root@k8s-master1 ~]# kubectl explain pod.metadata
KIND:     Pod
VERSION:  v1

RESOURCE: metadata <Object>

DESCRIPTION:
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

     ObjectMeta is metadata that all persisted resources must have, which
     includes all objects users must create.

FIELDS:
   annotations	<map[string]string>
     Annotations is an unstructured key value map stored with a resource that
     may be set by external tools to store and retrieve arbitrary metadata. They
     are not queryable and should be preserved when modifying objects. More
     info: http://kubernetes.io/docs/user-guide/annotations

   clusterName	<string>
     The name of the cluster which the object belongs to. This is used to
     distinguish resources with same name and namespace in different clusters.
     This field is not set anywhere right now and apiserver is going to ignore
     it if set in create or update request.

   creationTimestamp	<string>
     CreationTimestamp is a timestamp representing the server time when this
     object was created. It is not guaranteed to be set in happens-before order
     across separate operations. Clients may not set this value. It is
     represented in RFC3339 form and is in UTC.

     Populated by the system. Read-only. Null for lists. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   deletionGracePeriodSeconds	<integer>
     Number of seconds allowed for this object to gracefully terminate before it
     will be removed from the system. Only set when deletionTimestamp is also
     set. May only be shortened. Read-only.

   deletionTimestamp	<string>
     DeletionTimestamp is RFC 3339 date and time at which this resource will be
     deleted. This field is set by the server when a graceful deletion is
     requested by the user, and is not directly settable by a client. The
     resource is expected to be deleted (no longer visible from resource lists,
     and not reachable by name) after the time in this field, once the
     finalizers list is empty. As long as the finalizers list contains items,
     deletion is blocked. Once the deletionTimestamp is set, this value may not
     be unset or be set further into the future, although it may be shortened or
     the resource may be deleted prior to this time. For example, a user may
     request that a pod is deleted in 30 seconds. The Kubelet will react by
     sending a graceful termination signal to the containers in the pod. After
     that 30 seconds, the Kubelet will send a hard termination signal (SIGKILL)
     to the container and after cleanup, remove the pod from the API. In the
     presence of network partitions, this object may still exist after this
     timestamp, until an administrator or automated process can determine the
     resource is fully terminated. If not set, graceful deletion of the object
     has not been requested.

     Populated by the system when a graceful deletion is requested. Read-only.
     More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   finalizers	<[]string>
     Must be empty before the object is deleted from the registry. Each entry is
     an identifier for the responsible component that will remove the entry from
     the list. If the deletionTimestamp of the object is non-nil, entries in
     this list can only be removed. Finalizers may be processed and removed in
     any order. Order is NOT enforced because it introduces significant risk of
     stuck finalizers. finalizers is a shared field, any actor with permission
     can reorder it. If the finalizer list is processed in order, then this can
     lead to a situation in which the component responsible for the first
     finalizer in the list is waiting for a signal (field value, external
     system, or other) produced by a component responsible for a finalizer later
     in the list, resulting in a deadlock. Without enforced ordering finalizers
     are free to order amongst themselves and are not vulnerable to ordering
     changes in the list.

   generateName	<string>
     GenerateName is an optional prefix, used by the server, to generate a
     unique name ONLY IF the Name field has not been provided. If this field is
     used, the name returned to the client will be different than the name
     passed. This value will also be combined with a unique suffix. The provided
     value has the same validation rules as the Name field, and may be truncated
     by the length of the suffix required to make the value unique on the
     server.

     If this field is specified and the generated name exists, the server will
     NOT return a 409 - instead, it will either return 201 Created or 500 with
     Reason ServerTimeout indicating a unique name could not be found in the
     time allotted, and the client should retry (optionally after the time
     indicated in the Retry-After header).

     Applied only if Name is not specified. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#idempotency

   generation	<integer>
     A sequence number representing a specific generation of the desired state.
     Populated by the system. Read-only.

   labels	<map[string]string>
     Map of string keys and values that can be used to organize and categorize
     (scope and select) objects. May match selectors of replication controllers
     and services. More info: http://kubernetes.io/docs/user-guide/labels

   managedFields	<[]Object>
     ManagedFields maps workflow-id and version to the set of fields that are
     managed by that workflow. This is mostly for internal housekeeping, and
     users typically shouldn't need to set or understand this field. A workflow
     can be the user's name, a controller's name, or the name of a specific
     apply path like "ci-cd". The set of fields is always in the version that
     the workflow used when modifying the object.

   name	<string>
     Name must be unique within a namespace. Is required when creating
     resources, although some resources may allow a client to request the
     generation of an appropriate name automatically. Name is primarily intended
     for creation idempotence and configuration definition. Cannot be updated.
     More info: http://kubernetes.io/docs/user-guide/identifiers#names

   namespace	<string>
     Namespace defines the space within which each name must be unique. An empty
     namespace is equivalent to the "default" namespace, but "default" is the
     canonical representation. Not all objects are required to be scoped to a
     namespace - the value of this field for those objects will be empty.

     Must be a DNS_LABEL. Cannot be updated. More info:
     http://kubernetes.io/docs/user-guide/namespaces

   ownerReferences	<[]Object>
     List of objects depended by this object. If ALL objects in the list have
     been deleted, this object will be garbage collected. If this object is
     managed by a controller, then an entry in this list will point to this
     controller, with the controller field set to true. There cannot be more
     than one managing controller.

   resourceVersion	<string>
     An opaque value that represents the internal version of this object that
     can be used by clients to determine when objects have changed. May be used
     for optimistic concurrency, change detection, and the watch operation on a
     resource or set of resources. Clients must treat these values as opaque and
     passed unmodified back to the server. They may only be valid for a
     particular resource or set of resources.

     Populated by the system. Read-only. Value must be treated as opaque by
     clients and . More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency

   selfLink	<string>
     SelfLink is a URL representing this object. Populated by the system.
     Read-only.

     DEPRECATED Kubernetes will stop propagating this field in 1.20 release and
     the field is planned to be removed in 1.21 release.

   uid	<string>
     UID is the unique in time and space value for this object. It is typically
     generated by the server on successful creation of a resource and is not
     allowed to change on PUT operations.

     Populated by the system. Read-only. More info:
     http://kubernetes.io/docs/user-guide/identifiers#uids

3.6.3、通过资源清单创建pod

[root@k8s-master1 ~]# vim pod-first.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  namespace: default
  labels:
    app:  tomcat-pod-first
spec:
  containers:
  - name:  tomcat-first
    ports:
    - containerPort: 8080
    image: tomcat:8.5-jre8-alpine
    imagePullPolicy: IfNotPresent
    
# 更新资源清单文件
[root@k8s-master1 ~]# kubectl apply -f pod-first.yaml 

# 查看pod是否创建成功
[root@k8s-master1 yaml]# kubectl get pods -o wide -l app=tomcat-pod-first 
NAME        READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
pod-first   1/1     Running   0          15s   10.244.36.75   k8s-node1   <none>           <none> 

# 查看pod日志
[root@k8s-master1 ~]# kubectl logs pod-first

# 查看pod里指定容器的日志
[root@k8s-master1 ~]# kubectl logs pod-first  -c tomcat-first

# 进入到刚才创建的pod
[root@k8s-master1 ~]# kubectl exec -it pod-first  -- /bin/bash

# 假如pod里有多个容器，进入到pod里的指定容器，按如下命令：
[root@k8s-master1 ~]# kubectl exec -it pod-first  -c  tomcat-first -- /bin/bash

3.6.4、通过kubectl run创建Pod

[root@k8s-master1 yaml]# kubectl run tomcat --image=tomcat:8.5-jre8-alpine --image-pull-policy='IfNotPresent'  --port=8080
pod/tomcat created
[root@k8s-master1 yaml]# kubectl get pods
NAME     READY   STATUS    RESTARTS   AGE
tomcat   1/1     Running   0          5s

3.7、标签

3.7.1、什么是标签？

标签其实就一对 key/value ，被关联到对象上，比如Pod,标签的使用我们倾向于能够表示对象的特殊特点，就是一眼就看出了这个Pod是干什么的，标签可以用来划分特定的对象（比如版本，服务类型等），标签可以在创建一个对象的时候直接定义，也可以在后期随时修改，每一个对象可以拥有多个标签，但是，key值必须是唯一的。创建标签之后也可以方便我们对资源进行分组管理。如果对pod打标签，之后就可以使用标签来查看、删除指定的pod。
在k8s中，大部分资源都可以打标签。

3.7.2、给pod资源打标签

# 对已经存在的pod打标签
[root@k8s-master1 ~]# kubectl get pods pod-first --show-labels
NAME        READY   STATUS    RESTARTS   AGE   LABELS
pod-first   1/1     Running   0          12m   app=tomcat-pod-first
[root@k8s-master1 ~]# kubectl label pods pod-first  release=v1

# 查看标签是否打成功：
[root@k8s-master1 ~]# kubectl get pods pod-first --show-labels
NAME        READY   STATUS    RESTARTS   AGE   LABELS
pod-first   1/1     Running   0          12m   app=tomcat-pod-first,release=v1

3.7.3、查看资源标签

# 查看默认名称空间下所有pod资源的标签
[root@k8s-master1 ~]# kubectl get pods --show-labels 

# 查看默认名称空间下指定pod具有的所有标签
[root@k8s-master1 ~]# kubectl get pods pod-first --show-labels

# 列出默认名称空间下标签key是release的pod，不显示标签
[root@k8s-master1 ~]# kubectl get pods -l release

# 列出默认名称空间下标签key是release、值是v1的pod，不显示标签
[root@k8s-master1 ~]# kubectl get pods -l release=v1

# 列出默认名称空间下标签key是release的所有pod，并打印对应的标签值
[root@k8s-master1 ~]# kubectl get pods -L release

# 查看所有名称空间下的所有pod的标签
[root@k8s-master1 ~]# kubectl get pods --all-namespace --show-labels

[root@k8s-master1 ~]# kubectl get pods -l release=v1 -L release
NAME        READY   STATUS    RESTARTS   AGE   RELEASE
pod-first   1/1     Running   0          14m   v1

3.8、pod资源清单详解

apiVersion: v1       #版本号，例如v1
kind: Pod       #资源类型，如Pod
metadata:       #元数据
  name: string       # Pod名字
  namespace: string    # Pod所属的命名空间
  labels:      #自定义标签
    - name: string     #自定义标签名字
  annotations:       #自定义注解列表
    - name: string
spec:         # Pod中容器的详细定义
  containers:      # Pod中容器列表
  - name: string     #容器名称
    image: string    #容器的镜像名称
    imagePullPolicy: [Always | Never | IfNotPresent] #获取镜像的策略 Alawys表示下载镜像 IfnotPresent表示优先使用本地镜像，否则下载镜像，Nerver表示仅使用本地镜像
    command: [string]    #容器的启动命令列表，如不指定，使用打包时使用的启动命令
    args: [string]     #容器的启动命令参数列表
    workingDir: string     #容器的工作目录
    volumeMounts:    #挂载到容器内部的存储卷配置
    - name: string     #引用pod定义的共享存储卷的名称，需用volumes[]部分定义的的卷名
      mountPath: string    #存储卷在容器内mount的绝对路径，应少于512字符
      readOnly: boolean    #是否为只读模式
    ports:       #需要暴露的端口库号
    - name: string     #端口号名称
      containerPort: int   #容器需要监听的端口号
      hostPort: int    #容器所在主机需要监听的端口号，默认与Container相同
      protocol: string     #端口协议，支持TCP和UDP，默认TCP
    env:       #容器运行前需设置的环境变量列表
    - name: string     #环境变量名称
      value: string    #环境变量的值
    resources:       #资源限制和请求的设置
      limits:      #资源限制的设置
        cpu: string    #cpu的限制，单位为core数
        memory: string     #内存限制，单位可以为Mib/Gib
      requests:      #资源请求的设置
        cpu: string    #cpu请求，容器启动的初始可用数量
        memory: string     #内存请求，容器启动的初始可用内存
    livenessProbe:     #对Pod内个容器健康检查的设置，当探测无响应几次后将自动重启该容器，检查方法有exec、httpGet和tcpSocket，对一个容器只需设置其中一种方法即可
      exec:      #对Pod容器内检查方式设置为exec方式
        command: [string]  #exec方式需要制定的命令或脚本
      httpGet:       #对Pod内个容器健康检查方法设置为HttpGet，需要制定Path、port
        path: string
        port: number
        host: string
        scheme: string
        HttpHeaders:
        - name: string
          value: string
      tcpSocket:     #对Pod内个容器健康检查方式设置为tcpSocket方式
         port: number
       initialDelaySeconds: 0  #容器启动完成后首次探测的时间，单位为秒
       timeoutSeconds: 0   #对容器健康检查探测等待响应的超时时间，单位秒，默认1秒
       periodSeconds: 0    #对容器监控检查的定期探测时间设置，单位秒，默认10秒一次
       successThreshold: 0
       failureThreshold: 0
       securityContext:
         privileged:false
    restartPolicy: [Always | Never | OnFailure] #Pod的重启策略，Always表示一旦不管以何种方式终止运行，kubelet都将重启，OnFailure表示只有Pod以非0退出码退出才重启，Nerver表示不再重启该Pod
    nodeSelector: obeject  #设置NodeSelector表示将该Pod调度到包含这个label的node上，以key：value的格式指定
    imagePullSecrets:    #Pull镜像时使用的secret名称，以key：secretkey格式指定
    - name: string
    hostNetwork:false      #是否使用主机网络模式，默认为false，如果设置为true，表示使用宿主机网络
    volumes:       #在该pod上定义共享存储卷列表
    - name: string     #共享存储卷名称 （volumes类型有很多种）
      emptyDir: {}     #类型为emtyDir的存储卷，与Pod同生命周期的一个临时目录。为空值
      hostPath: string     #类型为hostPath的存储卷，表示挂载Pod所在宿主机的目录
        path: string     #Pod所在宿主机的目录，将被用于同期中mount的目录
      secret:      #类型为secret的存储卷，挂载集群与定义的secre对象到容器内部
        scretname: string  
        items:     
        - key: string
          path: string
      configMap:     #类型为configMap的存储卷，挂载预定义的configMap对象到容器内部
        name: string
        items:
        - key: string
          path: string

3.9、node节点选择器

我们在创建pod资源的时候，pod会根据schduler进行调度，那么默认会调度到随机的一个工作节点，如果我们想要pod调度到指定节点或者调度到一些具有相同特点的node节点，怎么办呢？可以使用pod中的nodeName或者nodeSelector字段指定要调度到的node节点

3.9.1、nodeName

指定pod节点运行在哪个具体node上

[root@k8s-master1 ~]# cat pod-node.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
  namespace: default
  labels:
    app: myapp
    env: dev
spec:
  nodeName: k8s-node1
  containers:
  - name:  tomcat-pod-java
    ports:
    - containerPort: 8080
    image: tomcat:8.5-jre8-alpine
    imagePullPolicy: IfNotPresent
  - name: busybox
    image: busybox:latest
    command:
    - "/bin/sh"
    - "-c"
    - "sleep 3600"

[root@k8s-master1 ~]# kubectl apply -f pod-node.yaml
# 查看pod调度到哪个节点
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
demo-pod    2/2     Running   0          52s   10.244.36.78   k8s-node1   <none>           <none>
pod-first   1/1     Running   0          36m   10.244.36.77   k8s-node1   <none>           <none>

3.9.2、nodeSelector

指定pod调度到具有哪些标签的node节点上

# 给node节点打标签，打个具有disk=ceph的标签
[root@k8s-master1 ~]# kubectl label nodes k8s-node2 disk=ceph
node/k8s-node2 labeled

#定义pod的时候指定要调度到具有disk=ceph标签的node上
[root@k8s-master1 ~]# cat pod-1.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: demo-pod-1
  namespace: default
  labels:
    app: myapp
    env: dev
spec:
  nodeSelector:
    disk: ceph
  containers:
  - name:  tomcat-pod-java
    ports:
    - containerPort: 8080
    image: tomcat:8.5-jre8-alpine
    imagePullPolicy: IfNotPresent

[root@k8s-master1 ~]# kubectl apply -f pod-1.yaml
#查看pod调度到哪个节点
[root@k8s-master1 ~]# kubectl get pods  -o wide
NAME         READY   STATUS    RESTARTS   AGE     IP               NODE        NOMINATED NODE   READINESS GATES
demo-pod     2/2     Running   0          3m38s   10.244.36.78     k8s-node1   <none>           <none>
demo-pod-1   1/1     Running   0          7s      10.244.169.142   k8s-node2   <none>           <none>
pod-first    1/1     Running   0          38m     10.244.36.77     k8s-node1   <none>           <none>

3.10、节点亲和性nodeAffinity

Node节点亲和性针对的是pod和node的关系，Pod调度到node节点的时候匹配的条件

1）文档解释

[root@k8s-master1 ~]# kubectl explain  pods.spec.affinity.nodeAffinity
KIND:     Pod
VERSION:  v1

RESOURCE: nodeAffinity <Object>

DESCRIPTION:
     Describes node affinity scheduling rules for the pod.

     Node affinity is a group of node affinity scheduling rules.

FIELDS:
   # 表示有节点尽量满足这个位置定义的亲和性，这不是一个必须的条件，软亲和性
   preferredDuringSchedulingIgnoredDuringExecution	<[]Object>
     The scheduler will prefer to schedule pods to nodes that satisfy the
     affinity expressions specified by this field, but it may choose a node that
     violates one or more of the expressions. The node that is most preferred is
     the one with the greatest sum of weights, i.e. for each node that meets all
     of the scheduling requirements (resource request, requiredDuringScheduling
     affinity expressions, etc.), compute a sum by iterating through the
     elements of this field and adding "weight" to the sum if the node matches
     the corresponding matchExpressions; the node(s) with the highest sum are
     the most preferred.

   # 必须有节点满足这个位置定义的亲和性，这是个硬性条件，硬亲和性
   requiredDuringSchedulingIgnoredDuringExecution	<Object>
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to an update), the system may or may not try
     to eventually evict the pod from its node.

2）使用requiredDuringSchedulingIgnoredDuringExecution硬亲和性

# 检查当前节点中有任意一个节点拥有zone标签的值是foo或者bar，就可以把pod调度到含有foo或者bar标签的节点上
[root@k8s-master1 ~]# cat pod-nodeaffinity-required.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-node-affinity-demo
  namespace: default
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone
            operator: In
            values:
            - foo
            - bar
            
[root@k8s-master1 ~]# kubectl apply -f pod-nodeaffinity-required.yaml
[root@k8s-master1 ~]# kubectl get pods -o wide | grep pod-node
pod-node-affinity-demo   0/1     Pending   0          9s    <none>           <none>      <none>           <none>             # status的状态是pending，上面说明没有完成调度，因为没有一个拥有zone的标签的值是foo或者bar，而且使用的是硬亲和性，必须满足条件才能完成调度

# 给这个k8s-node1节点打上标签zone=foo，在查看
[root@k8s-master1 ~]# kubectl label nodes k8s-node1 zone=foo
[root@k8s-master1 ~]#  kubectl get pods -o wide | grep pod-node
pod-node-affinity-demo   1/1     Running   0          2m7s   10.244.36.79     k8s-node1   <none>           <none>

3）使用preferredDuringSchedulingIgnoredDuringExecution软亲和性

[root@k8s-master1 ~]# cat pod-nodeaffinity-preferred.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-node-affinity-preferred
  namespace: default
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
         matchExpressions:
         - key: zone1
           operator: In
           values:
           - foo1
           - bar1
        weight: 60
        
# 软亲和性是可以运行这个pod的，尽管没有运行这个pod的节点定义的zone1标签        
[root@k8s-master1 ~]# kubectl apply -f pod-nodeaffinity-preferred.yaml 
[root@k8s-master1 ~]# kubectl get pods -o wide |grep pod-node-affinity-preferred
pod-node-affinity-preferred   1/1     Running   0          39s     10.244.169.143   k8s-node2   <none>           <none>

3.11、Pod亲和性与反亲和性

3.11.1、Pod亲和性podAffinity

# 定义两个pod，第一个pod做为基准，第二个pod跟着它走
[root@k8s-master1 ~]# kubectl delete pods pod-first
[root@k8s-master1 ~]# cat pod-required-affinity-demo.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app2: myapp2
    tier: frontend
spec:
    containers:
    - name: myapp
      image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: backend
    tier: db
spec:
    containers:
    - name: busybox
      image: busybox:latest
      imagePullPolicy: IfNotPresent
      command: ["sh","-c","sleep 3600"]
    affinity:
      podAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
         - labelSelector:
              matchExpressions:
              - {key: app2, operator: In, values: ["myapp2"]}
           topologyKey: kubernetes.io/hostname

# 上面表示创建的pod必须与拥有app=myapp标签的pod在一个节点上
[root@k8s-master1 ~]# kubectl apply -f pod-required-affinity-demo.yaml 
pod/pod-first created
pod/pod-second created
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
pod-first    1/1     Running   0          24s   10.244.36.80   k8s-node1   <none>           <none>
pod-second   1/1     Running   0          23s   10.244.36.81   k8s-node1   <none>           <none>

# 上面说明第一个pod调度到哪，第二个pod也调度到哪，这就是pod节点亲和性
[root@k8s-master1 ~]# kubectl delete -f pod-required-affinity-demo.yaml

# 根据topologyKey: kubernetes.io/hostname调度
[root@k8s-master1 ~]# kubectl get nodes --show-labels
NAME          STATUS   ROLES                  AGE   VERSION   LABELS
k8s-master1   Ready    control-plane,master   16h   v1.20.6   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=
k8s-node1     Ready    worker                 16h   v1.20.6   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=worker,zone=foo
k8s-node2     Ready    worker                 16h   v1.20.6   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ceph,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=worker

3.11.2、Pod反亲和性

# 定义两个pod，第一个pod做为基准，第二个pod跟它调度节点相反
[root@k8s-master1 ~]# cat pod-required-anti-affinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app1: myapp1
    tier: frontend
spec:
    containers:
    - name: myapp
      image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: backend
    tier: db
spec:
    containers:
    - name: busybox
      image: busybox:latest
      imagePullPolicy: IfNotPresent
      command: ["sh","-c","sleep 3600"]
    affinity:
      podAntiAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
         - labelSelector:
              matchExpressions:
              - {key: app1, operator: In, values: ["myapp1"]}
           topologyKey: kubernetes.io/hostname
           
[root@k8s-master1 ~]# kubectl apply -f pod-required-anti-affinity-demo.yaml
# 显示两个pod不在一个node节点上，这就是pod节点反亲和性
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP               NODE        NOMINATED NODE   READINESS GATES
pod-first    1/1     Running   0          33s   10.244.36.82     k8s-node1   <none>           <none>
pod-second   1/1     Running   0          33s   10.244.169.144   k8s-node2   <none>           <none>

[root@k8s-master1 ~]# kubectl delete -f pod-required-anti-affinity-demo.yaml

3.11.3、topologyKey

# 使用不同topologyKey
[root@k8s-master1 ~]# kubectl label nodes  k8s-node2  zone=foo
[root@k8s-master1 ~]# kubectl label nodes  k8s-node1  zone=foo --overwrite
[root@k8s-master1 affinity]# cat pod-first-required-anti-affinity-demo-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app3: myapp3
    tier: frontend
spec:
    containers:
    - name: myapp
      image: ikubernetes/myapp:v1

[root@k8s-master1 affinity]# cat pod-second-required-anti-affinity-demo-1.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: backend
    tier: db
spec:
    containers:
    - name: busybox
      image: busybox:latest
      imagePullPolicy: IfNotPresent
      command: ["sh","-c","sleep 3600"]
    affinity:
      podAntiAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
         - labelSelector:
              matchExpressions:
              - {key: app3 ,operator: In, values: ["myapp3"]}
           topologyKey:  zone

[root@k8s-master1 affinity]# kubectl apply -f pod-first-required-anti-affinity-demo-1.yaml
[root@k8s-master1 affinity]# kubectl apply -f pod-second-required-anti-affinity-demo-1.yaml

[root@k8s-master1 ~]# kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
pod-first    1/1     Running   0          15s   10.244.36.83   k8s-node1   <none>           <none>
pod-second   0/1     Pending   0          7s    <none>         <none>      <none>           <none>

# 第二个节点现是pending，因为两个节点都有标签zone，代表是同一个位置，我们要求反亲和性，不能调度到同一个位置，所以就会处于pending状态，如果在反亲和性这个位置把required改成preferred，那么也会运行
[root@k8s-master1 affinity]# cat pod-second-required-anti-affinity-demo-1.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: backend
    tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["sh","-c","sleep 3600"]
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app3
              operator: In
              values:
              - myapp3
          topologyKey: zone
[root@k8s-master1 affinity]# kubectl apply -f pod-second-required-anti-affinity-demo-1.yaml
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
pod-first    1/1     Running   0          26m   10.244.36.83   k8s-node1   <none>           <none>
pod-second   1/1     Running   0          7s    10.244.36.85   k8s-node1   <none>           <none>

3.12、污点与容忍度

3.12.1、什么是污点与容忍度

给了节点选则的主动权，我们给节点打一个污点，不容忍的pod就运行不上来，污点就是定义在节点上的键值属性数据，可以定决定拒绝那些pod；

taints：键值数据，用在节点上，定义污点；
tolerations：键值数据，用在pod上，定义容忍度，能容忍哪些污点

pod亲和性是pod属性；但是污点是节点的属性

# 控制节点上的污点
[root@k8s-master1 ~]# kubectl describe nodes k8s-master1|grep "Taints"
Taints:             node-role.kubernetes.io/master:NoSchedule

# taints定义
[root@k8s-master1 ~]# kubectl explain node.spec.taints
KIND:     Node
VERSION:  v1

RESOURCE: taints <[]Object>

DESCRIPTION:
     If specified, the node's taints.

     The node this Taint is attached to has the "effect" on any pod that does
     not tolerate the Taint.

FIELDS:
   effect	<string> -required-
     Required. The effect of the taint on pods that do not tolerate the taint.
     Valid effects are NoSchedule, PreferNoSchedule and NoExecute.

   key	<string> -required-
     Required. The taint key to be applied to a node.

   timeAdded	<string>
     TimeAdded represents the time at which the taint was added. It is only
     written for NoExecute taints.

   value	<string>
     The taint value corresponding to the taint key.

taints的effect用来定义对pod对象的排斥等级（效果）：

# NoSchedule：
	仅影响pod调度过程，当pod能容忍这个节点污点，就可以调度到当前节点，后来这个节点的污点改了，加了一个新的污点，使得之前调度的pod不能容忍了，那这个pod会怎么处理，对现存的pod对象不产生影响
# NoExecute：
	既影响调度过程，又影响现存的pod对象，如果现存的pod不能容忍节点后来加的污点，这个pod就会被驱逐
# PreferNoSchedule：
	最好不，也可以，是NoSchedule的柔性版本

pod对象定义容忍度:

# 等值密钥：
	key和value上完全匹配
# 存在性判断：
	key和effect必须同时匹配，value可以是空
在pod上定义的容忍度可能不止一个，在节点上定义的污点可能多个，需要琢个检查容忍度和污点能否匹配，每一个污点都能被容忍，才能完成调度，如果不能容忍怎么办，那就需要看pod的容忍度了

3.12.2、控制节点污点

# 控制节点上的污点与容忍度，我们创建的pod都不会调度到master上，因为我们创建的pod没有容忍度
[root@k8s-master1 ~]# kubectl describe nodes k8s-master1|grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule

# apiserver pod
[root@k8s-master1 ~]# kubectl describe pods kube-apiserver-k8s-master1 -n kube-system
...
Node-Selectors:    <none>
Tolerations:       :NoExecute op=Exists	# 这个pod的容忍度是NoExecute，则可以调度到k8s-master1上
Events:            <none>

# 管理节点污点
[root@k8s-master1 ~]# kubectl taint –help

3.12.3、污点容忍度测试

1）把k8s-node2当成是生产环境专用的，其他node是测试的

# 给k8s-node2打污点，pod如果不能容忍就不会调度过来
[root@k8s-master1 ~]# kubectl taint node k8s-node2 node-type=production:NoSchedule

[root@k8s-master1 ~]# cat pod-taint.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: taint-pod
  namespace: default
  labels:
    tomcat:  tomcat-pod
spec:
  containers:
  - name:  taint-pod
    ports:
    - containerPort: 8080
    image: tomcat:8.5-jre8-alpine
    imagePullPolicy: IfNotPresent 

[root@k8s-master1 ~]# kubectl apply -f pod-taint.yaml
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
taint-pod   1/1     Running   0          7s    10.244.36.86   k8s-node1   <none>           <none> 
# 可以看到都被调度到k8s-node1上了，因为k8s-node2这个节点打了污点，而我们在创建pod的时候没有容忍度，所以k8s-node2上不会有pod调度上去的

2）给k8s-node1也打上污点

[root@k8s-master1 ~]# kubectl taint node k8s-node1 node-type=dev:NoExecute
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME        READY   STATUS        RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
taint-pod   0/1     Terminating   0          3m15s   10.244.36.86   k8s-node1   <none>           <none>
# 上面可以看到已经存在的pod节点都被撵走了

3）创建有容忍度的pod

[root@k8s-master1 ~]# cat pod-demo-1.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: myapp-deploy
  namespace: default
  labels:
    app: myapp
    release: canary
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
    ports:
    - name: http
      containerPort: 80
  tolerations:
  - key: "node-type"
    operator: "Equal"
    value: "production"
    effect: "NoExecute"
    tolerationSeconds: 3600
        
[root@k8s-master1 ~]# kubectl apply -f pod-demo-1.yaml
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
myapp-deploy   0/1     Pending   0          4s    <none>   <none>   <none>           <none>

# 还是显示pending，因为我们使用的是equal（等值匹配），所以key和value，effect必须和node节点定义的污点完全匹配才可以
# 把上面配置effect:"NoExecute"变成effect: "NoSchedule"； 删除tolerationSeconds: 3600
[root@k8s-master1 ~]# cat pod-demo-1.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: myapp-deploy
  namespace: default
  labels:
    app: myapp
    release: canary
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
    ports:
    - name: http
      containerPort: 80
  tolerations:
  - key: "node-type"
    operator: "Equal"
    value: "production"
    effect: "NoSchedule"
[root@k8s-master1 ~]# kubectl delete -f pod-demo-1.yaml
[root@k8s-master1 ~]# kubectl apply -f pod-demo-1.yaml
# 可以调度到k8s-node2上了，因为在pod中定义的容忍度能容忍node节点上的污点
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP               NODE        NOMINATED NODE   READINESS GATES
myapp-deploy   1/1     Running   0          22s   10.244.169.145   k8s-node2   <none>           <none>

4）修改pod定义一

[root@k8s-master1 ~]# cat pod-demo-1.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: myapp-deploy
  namespace: default
  labels:
    app: myapp
    release: canary
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
    ports:
    - name: http
      containerPort: 80
  tolerations:
  - key: "node-type"
    operator: "Exists"
    value: ""
    effect: "NoSchedule"

# 只要对应的键是存在的，exists，其值被自动定义成通配符
[root@k8s-master1 ~]# kubectl delete -f pod-demo-1.yaml 
pod "myapp-deploy" deleted
[root@k8s-master1 ~]# kubectl apply -f pod-demo-1.yaml 
pod/myapp-deploy created
# 发现还是调度到k8s-node2上
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP               NODE        NOMINATED NODE   READINESS GATES
myapp-deploy   1/1     Running   0          5s    10.244.169.146   k8s-node2   <none>           <none>

5）修改pod定义二

[root@k8s-master1 ~]# cat pod-demo-1.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: myapp-deploy
  namespace: default
  labels:
    app: myapp
    release: canary
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
    ports:
    - name: http
      containerPort: 80
  tolerations:
  - key: "node-type"
    operator: "Exists"
    value: ""
    effect: ""

# 有一个node-type的键，不管值是什么，不管是什么效果，都能容忍
[root@k8s-master1 ~]# kubectl delete -f pod-demo-1.yaml 
pod "myapp-deploy" deleted
[root@k8s-master1 ~]# kubectl apply -f pod-demo-1.yaml 
pod/myapp-deploy created
# 可以看到k8s-node2和k8s-node1节点上都有可能有pod被调度
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
myapp-deploy   1/1     Running   0          2s    10.244.36.87   k8s-node1   <none>           <none>

6）删除污点

[root@k8s-master1 ~]# kubectl taint nodes k8s-node1 node-type:NoExecute-
[root@k8s-master1 ~]# kubectl taint nodes k8s-node2 node-type-

3.13、Pod常见状态及重启策略

3.13.1、Pod常见状态

Pod的status定义在PodStatus对象中，其中有一个phase字段。它简单描述了Pod在其生命周期的阶段。熟悉Pod的各种状态对我们理解如何设置Pod的调度策略、重启策略是很有必要的。下面是 phase 可能的值，也就是pod常见的状态：

# 挂起（Pending）：
	我们在请求创建pod时，条件不满足，调度没有完成，没有任何一个节点能满足调度条件，已经创建了pod但是没有适合它运行的节点叫做挂起，调度没有完成，处于pending的状态会持续一段时间：包括调度Pod的时间和通过网络下载镜像的时间。 

# 运行中（Running）：
	Pod已经绑定到了一个节点上，Pod 中所有的容器都已被创建。至少有一个容器正在运行，或者正处于启动或重启状态。

# 成功（Succeeded）：
	Pod 中的所有容器都被成功终止，并且不会再重启。

 # 失败（Failed）：
 	Pod 中的所有容器都已终止了，并且至少有一个容器是因为失败终止。也就是说，容器以非0状态退出或者被系统终止。

# 未知（Unknown）：
	未知状态，所谓pod是什么状态是apiserver和运行在pod节点的kubelet进行通信获取状态信息的，如果节点之上的kubelet本身出故障，那么apiserver就连不上kubelet，得不到信息了，就会看Unknown

# Evicted状态：
	出现这种情况，多见于系统内存或硬盘资源不足，可df-h查看docker存储所在目录的资源使用情况，如果百分比大于85%，就要及时清理下资源，尤其是一些大文件、docker镜像。

# CrashLoopBackOff：
	容器曾经启动了，但可能又异常退出了

# Error 状态：
	Pod 启动过程中发生了错误

3.13.2、Pod重启策略

Pod的重启策略（RestartPolicy）应用于Pod内的所有容器，并且仅在Pod所处的Node上由kubelet进行判断和重启操作。当某个容器异常退出或者健康检查失败时，kubelet将根据 RestartPolicy 的设置来进行相应的操作。

Pod的重启策略包括 Always、OnFailure和Never，默认值为Always。

Always：当容器失败时，由kubelet自动重启该容器。
OnFailure：当容器终止运行且退出码不为0时，由kubelet自动重启该容器。
Never：不论容器运行状态如何，kubelet都不会重启该容器。

[root@k8s-master1 ~]# vim pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
  namespace: default
  labels:
    app: myapp
spec:
  restartPolicy: Always	# 定义重启策略
  containers:
  - name:  tomcat-pod-java
    ports:
    - containerPort: 8080
    image: tomcat:8.5-jre8-alpine
    imagePullPolicy: IfNotPresent

3.14、Pod生命周期

3.14.1、生命周期图解

pod在整个生命周期中有非常多的用户行为：
1、初始化容器完成初始化
2、主容器启动后可以做启动后钩子
3、主容器结束前可以做结束前钩子
4、在主容器运行中可以做一些健康检测，如liveness probe，readness probe

3.14.2、初始化容器

Pod 里面可以有一个或者多个容器，部署应用的容器可以称为主容器，在创建Pod时候，Pod 中可以有一个或多个先于主容器启动的Init容器,这个init容器就可以成为初始化容器，初始化容器一旦执行完，它从启动开始到初始化代码执行完就退出了，它不会一直存在，所以在主容器启动之前执行初始化，初始化容器可以有多个，多个初始化容器是要串行执行的，先执行初始化容器1，在执行初始化容器2等，等初始化容器执行完初始化就退出了，然后再执行主容器，主容器一退出，pod就结束了，主容器退出的时间点就是pod的结束点，它俩时间轴是一致的；

Init容器就是做初始化工作的容器。可以有一个或多个，如果多个按照定义的顺序依次执行，只有所有的初始化容器执行完后，主容器才启动。由于一个Pod里的存储卷是共享的，所以Init Container里产生的数据可以被主容器使用到，Init Container可以在多种K8S资源里被使用到，如Deployment、DaemonSet, StatefulSet、Job等，但都是在Pod启动时，在主容器启动前执行，做初始化工作。

Init容器与普通的容器区别是:
1、Init 容器不支持 Readiness,因为它们必须在Pod就绪之前运行完成
2、每个Init容器必须运行成功,下一个才能够运行
3、如果 Pod 的 Init 容器失败,Kubernetes 会不断地重启该 Pod,直到 Init 容器成功为止，然而,如果Pod对应的restartPolicy值为 Never,它不会重新启动。

初始化容器的官方地址：https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#init-containers-in-use

初始化容器测试：

[root@k8s-master1 init]# cat init.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
  - name: init-mydb
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]

[root@k8s-master1 init]# kubectl apply -f init.yaml
# pod一直处于初始化状态，因为无法解析到myservice
[root@k8s-master1 ~]# kubectl get pods
NAME        READY   STATUS     RESTARTS   AGE
myapp-pod   0/1     Init:0/2   0          3s

[root@k8s-master1 init]# cat service.yaml 
---
apiVersion: v1
kind: Service
metadata:
  name: myservice
spec:
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9376
---
apiVersion: v1
kind: Service
metadata:
  name: mydb
spec:
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9377

[root@k8s-master1 init]# kubectl apply -f service.yaml
[root@k8s-master1 ~]# kubectl get pods
NAME        READY   STATUS    RESTARTS   AGE
myapp-pod   1/1     Running   0          110s
[root@k8s-master1 ~]# kubectl logs myapp-pod 
The app is running!

3.14.3、主容器

1）容器钩子
初始化容器启动之后，开始启动主容器，在主容器启动之前有一个post start hook（容器启动后钩子）和pre stop hook（容器结束前钩子），无论启动后还是结束前所做的事我们可以把它放两个钩子，这个钩子就表示用户可以用它来钩住一些命令，来执行它，做开场前的预设，结束前的清理，如awk有begin，end，和这个效果类似；

postStart：该钩子在容器被创建后立刻触发，通知容器它已经被创建。如果该钩子对应的hook handler执行失败，则该容器会被杀死，并根据该容器的重启策略决定是否要重启该容器，这个钩子不需要传递任何参数。
preStop：该钩子在容器被删除前触发，其所对应的hook handler必须在删除该容器的请求发送给Docker daemon之前完成。在该钩子对应的hook handler完成后不论执行的结果如何，Docker daemon会发送一个SGTERN信号量给Docker daemon来删除该容器，这个钩子不需要传递任何参数。

2）容器探测

在k8s中支持两类对pod的检测：

livenessprobe（pod存活性探测）：存活探针主要作用是，用指定的方式检测pod中的容器应用是否正常运行，如果检测失败，则认为容器不健康，那么Kubelet将根据Pod中设置的 restartPolicy来判断Pod 是否要进行重启操作，如果容器配置中没有配置 livenessProbe，Kubelet 将认为存活探针探测一直为成功状
readinessprobe（pod就绪性探测）：用于判断容器中应用是否启动完成，当探测成功后才使Pod对外提供网络访问，设置容器Ready状态为true，如果探测失败，则设置容器的Ready状态为false。

3.14.4、容器钩子

postStart：容器创建成功后，运行前的任务，用于资源部署、环境准备等。
preStop：在容器被终止前的任务，用于优雅关闭应用程序、通知其他系统等。

使用示例：

......
containers:
- image: sample:v2  
     name: war
     lifecycle：
      postStart:
       exec:
         command:
          - “cp”
          - “/sample.war”
          - “/app”
      prestop:
       httpGet:
        host: monitor.com
        path: /waring
        port: 8080
        scheme: HTTP
......

# 以上示例中，定义了一个Pod，包含一个JAVA的web应用容器，其中设置了PostStart和PreStop回调函数。即在容器创建成功后，复制/sample.war到/app文件夹中。而在容器终止之前，发送HTTP请求到http://monitor.com:8080/waring，即向监控系统发送警告

钩子使用案例：优雅的删除资源对象

当用户请求删除含有pod的资源对象时（如RC、deployment等），K8S为了让应用程序优雅关闭（即让应用程序完成正在处理的请求后，再关闭软件），K8S提供两种信息通知：
1）、默认：K8S通知node执行docker stop命令，docker会先向容器中PID为1的进程发送系统信号SIGTERM，然后等待容器中的应用程序终止执行，如果等待时间达到设定的超时时间，或者默认超时时间（30s），会继续发送SIGKILL的系统信号强行kill掉进程。
2）、使用pod生命周期（利用PreStop回调函数），它执行在发送终止信号之前。
默认情况下，所有的删除操作的优雅退出时间都在30秒以内。kubectl delete命令支持--grace-period=的选项，以运行用户来修改默认值。0表示删除立即执行，并且立即从API中删除pod。在节点上，被设置了立即结束的的pod，仍然会给一个很短的优雅退出时间段，才会开始被强制杀死。如下：
    spec:
      containers:
      - name: nginx-demo
        image: centos:nginx
        lifecycle:
          preStop:
            exec:
              # nginx -s quit gracefully terminate while SIGTERM triggers a quick exit
              command: ["/usr/local/nginx/sbin/nginx","-s","quit"]
        ports:
          - name: http
            containerPort: 80

3.14.5、容器探测

探测种类

1） livenessProbe：存活性探测

许多应用程序经过长时间运行，最终过渡到无法运行的状态，除了重启，无法恢复。通常情况下，K8S会发现应用程序已经终止，然后重启应用程序pod。有时应用程序可能因为某些原因（后端服务故障等）导致暂时无法对外提供服务，但应用软件没有终止，导致K8S无法隔离有故障的pod，调用者可能会访问到有故障的pod，导致业务不稳定。K8S提供livenessProbe来检测容器是否正常运行，并且对相应状况进行相应的补救措施。

2）readinessProbe：就绪性探测

在没有配置readinessProbe的资源对象中，pod中的容器启动完成后，就认为pod中的应用程序可以对外提供服务，该pod就会加入相对应的service，对外提供服务。但有时一些应用程序启动后，需要较长时间的加载才能对外服务，如果这时对外提供服务，执行结果必然无法达到预期效果，影响用户体验。比如使用tomcat的应用程序来说，并不是简单地说tomcat启动成功就可以对外提供服务的，还需要等待spring容器初始化，数据库连接上等等。

探测方法

目前LivenessProbe和ReadinessProbe两种探针都支持下面三种探测方法：

1、ExecAction：在容器中执行指定的命令，如果执行成功，退出码为 0 则探测成功。

2、TCPSocketAction：通过容器的 IP 地址和端口号执行 TCP 检查，如果能够建立 TCP 连接，则表明容器健康。

3、HTTPGetAction：通过容器的IP地址、端口号及路径调用 HTTP Get方法，如果响应的状态码大于等于200且小于400，则认为容器健康

探针探测结果有以下值：

1、Success：表示通过检测。

2、Failure：表示未通过检测。

3、Unknown：表示检测没有正常进行。

Pod探针相关的属性

探针(Probe)有许多可选字段，可以用来更加精确的控制Liveness和Readiness两种探针的行为

initialDelaySeconds： Pod启动后首次进行检查的等待时间，单位“秒”。
periodSeconds：检查的间隔时间，默认为10s，单位“秒”。
timeoutSeconds：探针执行检测请求后，等待响应的超时时间，默认为1s，单位“秒”。
successThreshold：连续探测几次成功，才认为探测成功，默认为 1，在 Liveness 探针中必须为1，最小值为1。
failureThreshold：探测失败的重试次数，重试一定次数后将认为失败，在 readiness 探针中，Pod会被标记为未就绪，默认为 3，最小值为 1

两种探针区别：

ReadinessProbe 和 livenessProbe 可以使用相同探测方式，只是对 Pod 的处置方式不同
readinessProbe 当检测失败后，将 Pod 的 IP:Port 从对应的 EndPoint 列表中删除。
livenessProbe 当检测失败后，将杀死容器并根据 Pod 的重启策略来决定作出对应的措施。

`LivenessProbe` 探针使用示例

1）通过exec方式做健康探测

[root@k8s-master1 ~]# cat liveness-exec.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec
  labels:
    app: liveness
spec:
  containers:
  - name: liveness
    image: busybox
    args:                       #创建测试探针探测的文件
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      initialDelaySeconds: 10   #延迟检测时间
      periodSeconds: 5          #检测时间间隔
      exec:
        command:
        - cat
        - /tmp/healthy
        
# 容器在初始化后，首先创建一个 /tmp/healthy 文件，然后执行睡眠命令，睡眠 30 秒，到时间后执行删除 /tmp/healthy 文件命令。而设置的存活探针检检测方式为执行 shell 命令，用 cat 命令输出 healthy 文件的内容，如果能成功执行这条命令，存活探针就认为探测成功，否则探测失败。在前 30 秒内，由于文件存在，所以存活探针探测时执行 cat /tmp/healthy 命令成功执行。30 秒后 healthy 文件被删除，所以执行命令失败，Kubernetes 会根据 Pod 设置的重启策略来判断，是否重启 Pod

2）通过HTTP方式做健康探测

[root@k8s-master1 ~]# cat liveness-http.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
  labels:
    test: liveness
spec:
  containers:
  - name: liveness
    image: mydlqclub/springboot-helloworld:0.0.1
    livenessProbe:
      initialDelaySeconds: 20   #延迟加载时间
      periodSeconds: 5          #重试时间间隔
      timeoutSeconds: 10        #超时时间设置
      httpGet:
        scheme: HTTP
        port: 8081
        path: /actuator/health
        
# 上面 Pod 中启动的容器是一个 SpringBoot 应用，其中引用了 Actuator 组件，提供了 /actuator/health 健康检查地址，存活探针可以使用 HTTPGet 方式向服务发起请求，请求 8081 端口的 /actuator/health 路径来进行存活判断：http://podip:8081/actuator/health

#任何大于或等于200且小于400的代码表示探测成功。任何其他代码表示失败。如果探测失败，则会杀死 Pod 进行重启操作。

# httpGet探测方式有如下可选的控制字段:
    scheme: 用于连接host的协议，默认为HTTP。
    host：要连接的主机名，默认为Pod IP，可以在http request head中设置host头部。
    port：容器上要访问端口号或名称。
    path：http服务器上的访问URI。
    httpHeaders：自定义HTTP请求headers，HTTP允许重复headers。

3）通过TCP方式做健康探测

[root@k8s-master1 ~]# cat liveness-tcp.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcp
  labels:
    app: liveness
spec:
  containers:
  - name: liveness
    image: nginx
    livenessProbe:
      initialDelaySeconds: 15
      periodSeconds: 20
      tcpSocket:
        port: 80
        
# TCP 检查方式和 HTTP 检查方式非常相似，在容器启动 initialDelaySeconds 参数设定的时间后，kubelet 将发送第一个 livenessProbe 探针，尝试连接容器的 80 端口，如果连接失败则将杀死 Pod 重启容器。

`ReadinessProbe` 探针使用示例

Pod 的ReadinessProbe 探针使用方式和 LivenessProbe 探针探测方法一样，也是支持三种，只是一个是用于探测应用的存活，一个是判断是否对外提供流量的条件。这里用一个 Springboot 项目，设置 ReadinessProbe 探测 SpringBoot 项目的 8081 端口下的 /actuator/health 接口，如果探测成功则代表内部程序已经启动成功，就开放对外提供接口访问，否则内部应用没有成功启动，暂不对外提供访问，直到就绪探针探测成功

[root@k8s-master1 ~]# cat readiness-exec.yaml
apiVersion: v1
kind: Service
metadata:
  name: springboot
  labels:
    app: springboot
spec:
  type: NodePort
  ports:
  - name: server
    port: 8080
    targetPort: 8080
    nodePort: 31180
  - name: management
    port: 8081
    targetPort: 8081
    nodePort: 31181
  selector:
    app: springboot
---
apiVersion: v1
kind: Pod
metadata:
  name: springboot
  labels:
    app: springboot
spec:
  containers:
  - name: springboot
    image: mydlqclub/springboot-helloworld:0.0.1
    ports:
    - name: server
      containerPort: 8080
    - name: management
      containerPort: 8081
    readinessProbe:
      initialDelaySeconds: 20   
      periodSeconds: 5          
      timeoutSeconds: 10   
      httpGet:
        scheme: HTTP
        port: 8081
        path: /actuator/health

ReadinessProbe + LivenessProbe 配合使用示例

apiVersion: v1
kind: Service
metadata:
  name: springboot
  labels:
    app: springboot
spec:
  type: NodePort
  ports:
  - name: server
    port: 8080
    targetPort: 8080
    nodePort: 31180
  - name: management
    port: 8081
    targetPort: 8081
    nodePort: 31181
  selector:
    app: springboot
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: springboot
  labels:
    app: springboot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: springboot
  template:
    metadata:
      name: springboot
      labels:
        app: springboot
    spec:
      containers:
      - name: readiness
        image: mydlqclub/springboot-helloworld:0.0.1
        ports:
        - name: server 
          containerPort: 8080
        - name: management
          containerPort: 8081
        readinessProbe:
          initialDelaySeconds: 20 
          periodSeconds: 5      
          timeoutSeconds: 10        
          httpGet:
            scheme: HTTP
            port: 8081
            path: /actuator/health
        livenessProbe:
           initialDelaySeconds: 30 
           periodSeconds: 10 
           timeoutSeconds: 5 
           httpGet:
             scheme: HTTP
             port: 8081
             path: /actuator/health

作者：Lawrence

出处：http://www.cnblogs.com/hujinzhong/

-------------------------------------------

个性签名：独学而无友，则孤陋而寡闻。做一个灵魂有趣的人！

扫描上面二维码关注我

如果你真心觉得文章写得不错，而且对你有所帮助，那就不妨帮忙“推荐"一下，您的“推荐”和”打赏“将是我最大的写作动力！

本文版权归作者所有，欢迎转载，但未经作者同意必须保留此段声明，且在文章页面明显位置给出原文连接.