Kubernetes Part5 ---- Prometheus+Grafana监控Kubernetes

Prometheus 介绍

Prometheus(普罗米修斯)是一个最初在SoundCloud上构建的监控系统。自2012年成为社区开源项目,拥有非常活跃的开发人员和用户社区。为强调开源及独立维护,Prometheus于2016年加入云原生云计算基金会(CNCF),成为继Kubernetes之后的第二个托管项目。 显然 Prometheus 已经成了K8S平台的标配 众多私有云部署K8S选择prometheus作为监控告警平台(因为免费)

那么Prometheus有什么缺点吗? 还是要有的 尽管免费但是入门相对来说比较难 因为它处在一个快速增长的阶段 需要用户不断的去试错来适配和完善 并且对于运维人员来说 需要懂Prometheus的语法PromQL(Prometheus Query Language) 

才能真正的使用它 基于这两点Prometheus真的还需要时间来检验。 一是产品有待成熟 二是用户需要熟悉该产品

Prometheus 架构

 

Prometheus Server:收集指标和存储时间序列数据,并提供查询接口 

Push Gateway:短期存储指标数据。主要用于临时性的任务

Exporters:采集已有的第三方服务监控指标并暴露metrics

Alertmanager:告警

Web UI:简单的Web控制台

总结:

Prometheus主动去拉取监控端的性能数据(需要在监控节点部署agent)

Prometheus通过push gateway可以吧数据吐出来给第三方系统进行分析

Prometheus通过alertmanager模块可以生成告警 通过邮件 短信平台通知用户

Prometheus没有UI供用户管理 所以需要继承grafana进行展示和配置

Prometheus 使用

Docker 部署 Prometheus

### 部署 Prometheus
docker run -d --name=prometheus -p 9090:9090 prom/prometheus
访问地址:http://IP:9090

### 部署 Grafana
docker run -d --name=grafana -p 3000:3000 grafana/grafana
访问地址:http://IP:3000

 

Grafana添加 Prometheus数据源

 输入promethues端点IP地址

 添加成功

 输入相关指标数据可以进行展示

 Prometheus 监控Linux机器

被监控端部署 node_export 

wget https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
tar xvfz node_exporter-*.*-amd64.tar.gz
cd node_exporter-*.*-amd64
./node_exporter

访问 监控端 http://172.16.0.12:9100

 进入 promethues 修改配置文件 添加断点

[root@k8s-master01 ~]# docker exec -it prometheus sh
/prometheus $ vi /etc/prometheus/prometheus.yml

 

  添加监控端

  - job_name: 'linux server'
    static_configs:
    - targets: ['172.16.0.12:9100']   
[root@k8s-master01 ~]# docker restart prometheus

 再次登陆 Prometheus查看新的端点已经被添加上

 Grafana导入仪表盘

 

 更改组为 ‘linux server’

Promethues 监控K8S

Prometheus监控K8S 主要从两个维度 群集角度和应用角度

Kubernetes本身监控

•Node资源利用率

•Node数量

•每个Node运行Pod数量

•资源对象状态

Pod监控

•Pod总数量及每个控制器预期数量

•Pod状态

•容器资源利用率:CPU、内存、网络

监控架构

Pod

kubelet的节点使用cAdvisor提供的metrics接口获取该节点所有Pod和容器相关的性能指标数据。

指标接口:https://NodeIP:10250/metrics/cadvisor

Node

使用node_exporter收集器采集节点资源利用率。

项目地址:https://github.com/prometheus/node_exporter

K8s资源对象

kube-state-metrics采集了k8s中各种资源对象的状态信息。

项目地址:https://github.com/kubernetes/kube-state-metrics

准备YAML文件

-rw-r--r-- 1 root root  550 Jan  1 16:26 alertmanager-configmap.yaml
-rw-r--r-- 1 root root 2518 Jan  1 16:26 alertmanager-deployment.yaml
drwxr-xr-x 2 root root  129 Jan  1 16:27 dashboard
-rw-r--r-- 1 root root 1262 Jan  1 16:26 grafana.yaml
-rw-r--r-- 1 root root 4358 Jan  1 16:26 kube-state-metrics.yaml
-rw-r--r-- 1 root root 1648 Jan  1 16:25 node-exporter.yml
-rw-r--r-- 1 root root 4815 Jan  1 16:25 prometheus-configmap.yaml
-rw-r--r-- 1 root root 3922 Jan  1 16:25 prometheus-deployment.yaml
-rw-r--r-- 1 root root 4840 Jan  1 16:25 prometheus-rules.yaml
kubectl create namespace ops
kubectl apply -f prometheus-*
[root@k8s-master03 promethues]# kubectl get pod -n ops NAME READY STATUS RESTARTS AGE prometheus-859dbbc5f7-qgpwb 2/2 Running 0 76s

通过NodePort访问Prometheus界面看是否成功

 也已经通过cadvisor收集到了node信息

  

部署grafana

[root@k8s-master03 promethues]# kubectl apply -f grafana.yaml 
deployment.apps/grafana created
persistentvolumeclaim/grafana created
service/grafana created
[root@k8s-master03 promethues]# kubectl get pod -n ops NAME READY STATUS RESTARTS AGE grafana-757fcd5f7c-wdnt4 1/1 Running 0 78s prometheus-859dbbc5f7-qgpwb 2/2 Running 0 15m [root@k8s-master03 promethues]# kubectl get svc -n ops NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE grafana NodePort 10.100.91.227 <none> 80:30030/TCP 86s prometheus NodePort 10.108.216.228 <none> 9090:30090/TCP 15m

通过NodePort访问 Grafana 也是能够被登陆的哈

 先添加Prometheus数据源 http://172.16.0.21:30090

导入群集监控报表

 部署daemonset 监控node

[root@k8s-master03 promethues]# kubectl apply -f node-exporter.yml 
daemonset.apps/node-exporter created
service/node-exporter created

[root@k8s-master03 promethues]# kubectl get pod -n ops 
NAME                          READY   STATUS    RESTARTS   AGE
grafana-757fcd5f7c-wdnt4      1/1     Running   0          12m
node-exporter-rsbzk           1/1     Running   0          44s
node-exporter-wmd65           1/1     Running   0          44s
prometheus-859dbbc5f7-qgpwb   2/2     Running   0          26m

导入监控NODE节点报表

部署 kube_state_metrics 获取kube api资源指标

[root@k8s-master03 promethues]# kubectl apply -f kube-state-metrics.yaml 
deployment.apps/kube-state-metrics created
configmap/kube-state-metrics-config created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics-resizer created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created

[root@k8s-master03 promethues]# kubectl get svc -n ops 
NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
grafana              NodePort    10.100.91.227    <none>        80:30030/TCP        19m
kube-state-metrics   ClusterIP   10.106.88.221    <none>        8080/TCP,8081/TCP   21s
node-exporter        ClusterIP   None             <none>        9100/TCP            7m33s
prometheus           NodePort    10.108.216.228   <none>        9090:30090/TCP      32m

[root@k8s-master03 promethues]# kubectl get pod -n ops NAME READY STATUS RESTARTS AGE grafana-757fcd5f7c-wdnt4 1/1 Running 0 19m kube-state-metrics-667bc48f47-b5hb4 2/2 Running 0 47s node-exporter-rsbzk 1/1 Running 0 8m node-exporter-wmd65 1/1 Running 0 8m prometheus-859dbbc5f7-qgpwb 2/2 Running 0 33m

导入 kube metric 报表

配置告警

[root@k8s-master03 promethues]# kubectl apply -f alertmanager-configmap.yaml 
configmap/alertmanager-config created
[root@k8s-master03 promethues]# kubectl apply -f alertmanager-deployment.yaml 
deployment.apps/alertmanager created
persistentvolumeclaim/alertmanager created
service/alertmanager created

[root@k8s-master03 promethues]# kubectl get pod -n ops 
NAME                                  READY   STATUS    RESTARTS   AGE
alertmanager-7d5fb96b7b-8zfdk         1/2     Running   0          34s
grafana-757fcd5f7c-wdnt4              1/1     Running   0          37m
kube-state-metrics-667bc48f47-b5hb4   2/2     Running   0          18m
node-exporter-rsbzk                   1/1     Running   0          25m
node-exporter-wmd65                   1/1     Running   0          25m
prometheus-859dbbc5f7-qgpwb           2/2     Running   0          51m
[root@k8s-master03 promethues]# kubectl get svc -n ops 
NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager         NodePort    10.108.223.228   <none>        80:30093/TCP        41s
grafana              NodePort    10.100.91.227    <none>        80:30030/TCP        37m
kube-state-metrics   ClusterIP   10.106.88.221    <none>        8080/TCP,8081/TCP   18m
node-exporter        ClusterIP   None             <none>        9100/TCP            25m
prometheus           NodePort    10.108.216.228   <none>        9090:30090/TCP      51m

登陆收件人邮箱查看

 所有的告警规则都是在 prometheus-rules.yaml 文件中定义的 

也可以登陆promethues 界面查看

 手工创建以Pending pod

[root@k8s-master03 promethues]# cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
    resources:
      requests:
        cpu: 8


[root@k8s-master03 promethues]# kubectl  get pod
NAME                                           READY   STATUS    RESTARTS   AGE
delightful-produce-mariadb-0                   1/1     Running   0          3h34m
delightful-produce-wordpress-cf45b6d99-tptzf   1/1     Running   0          3h34m
nfs-client-provisioner-7b87dc5c48-rx7zd        1/1     Running   0          4h2m
nginx                                          0/1     Pending   0          3s

告警被触发

 收到告警邮件了也

 

  - job_name: 'linux server'    static_configs:    - targets: ['172.16.0.12:9100']     

原文地址:https://www.cnblogs.com/houcong24/p/14220121.html