K8s之Prometheus监控

容器监控与报警

容器监控与报警

容器监控的实现方对比虚拟机或者物理机来说比大的区别，比如容器在k8s环境中可以任意横向扩容与缩容，那么就需要监控服务能够自动对新创建的容器进行监控，当容器删除后又能够及时的从监控服务中删除，而传统的zabbix的监控方式需要在每一个容器中安装启动agent，并且在容器自动发现注册及模板关联方面并没有比较好的实现方式。

role	host	port
Prometheus	master2(10.203.104.21)	9090
node exporter	master/node	9100
Grafana	master3(10.203.104.22)	3000
cadvisor	node	8080
alertmanager	master3	9093
haproxy_exporter	HA1(10.203.104.30)	9101

Prometheus

k8s的早期版本基于组件heapster实现对pod和node节点的监控功能，但是从k8s 1.8版本开始使用metrics API的方式监控，并在1.11版本正式将heapster替换，后期的k8s监控主要是通过metrics Server提供核心监控指标，比如Node节点的CPU和内存使用率，其他的监控交由另外一个组件Prometheus 完成

prometheus简介

https://prometheus.io/docs/ #官方文档

https://github.com/prometheus #github地址

Prometheus是基于go语言开发的一套开源的监控、报警和时间序列数据库的组合，是由SoundCloud公司开发的开源监控系统,Prometheus是CNCF（Cloud Native Computing Foundation,云原生计算基金会）继kubernetes 之后毕业的第二个项目，prometheus在容器和微服务领域中得到了广泛的应用，其特点主要如下

使用key-value的多维度格式保存数据
数据不使用MySQL这样的传统数据库，而是使用时序数据库，目前是使用的TSDB
支持第三方dashboard实现更高的图形界面，如grafana(Grafana 2.5.0版本及以上)
功能组件化
不需要依赖存储，数据可以本地保存也可以远程保存
服务自动化发现
强大的数据查询语句功(PromQL,Prometheus Query Language)

prometheus系统架构

prometheus server：主服务，接受外部http请求，收集、存储与查询数据等
prometheus targets: 静态收集的目标服务数据
service discovery：动态发现服务
prometheus alerting：报警通知
pushgateway：数据收集代理服务器(类似于zabbix proxy)
data visualization and export： 数据可视化与数据导出(访问客户端)

prometheus 安装方式

https://prometheus.io/download/ #官方二进制下载及安装，prometheus server的监听端口为9090
https://prometheus.io/docs/prometheus/latest/installation/ #docker镜像直接启动
https://github.com/coreos/kube-prometheus #operator部署

容器方式安装prometheus

本次环境在Master2(10.203.104.21)中安装prometheus

运行prometheus容器

root@master2:~# docker run 
    -p 9090:9090 
    prom/prometheus

在浏览器中访问master2节点的9090端口测试prometheus

operator部署

https://github.com/coreos/kube-prometheus

克隆项目

root@master1:/usr/local/src# git clone https://github.com/coreos/kube-prometheus.git
root@master1:/usr/local/src# cd kube-prometheus-release-0.4/
root@master1:/usr/local/src/kube-prometheus-release-0.4# ls
build.sh            DCO   example.jsonnet  experimental  go.sum  jsonnet           jsonnetfile.lock.json  LICENSE   manifests  OWNERS     scripts                            tests
code-of-conduct.md  docs  examples         go.mod        hack    jsonnetfile.json  kustomization.yaml     Makefile  NOTICE     README.md  sync-to-internal-registry.jsonnet  test.sh

root@master1:/usr/local/src/kube-prometheus-release-0.4# cd manifests/
root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# ls

创建账号规则

root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f setup/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created

创建prometheus

root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f .
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-operator created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created

设置端口转发

$ kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/grafana 3000:3000
$ kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-k8s 9090:9090

web访问master1节点3000端口测试(http://10.203.104.20:3000)

基于NodePort暴露服务

grafana

root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# cat grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 3000
    targetPort: 3000
    nodePort: 33000
  selector:
    app: grafana
    
root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f grafana-service.yaml

web访问master1节点3000端口测试(http://10.203.104.20:33000)

prometheus

root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# cat prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9090
    targetPort: web
  selector:
    app: prometheus
    prometheus: k8s
    nodePort: 39090
  sessionAffinity: ClientIP

 root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f prometheus-service.yaml

二进制方式安装

本次环境在Master2中安装prometheus

解压二进制压缩包文件

root@master2:/usr/local/src# ls
prometheus-2.17.1.linux-amd64.tar.gz

root@master2:/usr/local/src# tar -zxvf prometheus-2.17.1.linux-amd64.tar.gz
prometheus-2.17.1.linux-amd64/
prometheus-2.17.1.linux-amd64/NOTICE
prometheus-2.17.1.linux-amd64/LICENSE
prometheus-2.17.1.linux-amd64/prometheus.yml
prometheus-2.17.1.linux-amd64/prometheus
prometheus-2.17.1.linux-amd64/promtool
prometheus-2.17.1.linux-amd64/console_libraries/
prometheus-2.17.1.linux-amd64/console_libraries/menu.lib
prometheus-2.17.1.linux-amd64/console_libraries/prom.lib
prometheus-2.17.1.linux-amd64/consoles/
prometheus-2.17.1.linux-amd64/consoles/prometheus-overview.html
prometheus-2.17.1.linux-amd64/consoles/index.html.example
prometheus-2.17.1.linux-amd64/consoles/node-cpu.html
prometheus-2.17.1.linux-amd64/consoles/node-overview.html
prometheus-2.17.1.linux-amd64/consoles/node.html
prometheus-2.17.1.linux-amd64/consoles/node-disk.html
prometheus-2.17.1.linux-amd64/consoles/prometheus.html
prometheus-2.17.1.linux-amd64/tsdb

prometheus目录创建软链接

root@master2:/usr/local/src# ln -sv /usr/local/src/prometheus-2.17.1.linux-amd64 /usr/local/prometheus
'/usr/local/prometheus' -> '/usr/local/src/prometheus-2.17.1.linux-amd64'
root@master2:/usr/local/src# cd /usr/local/prometheus
root@master2:/usr/local/prometheus# ls
console_libraries  consoles  LICENSE  NOTICE  prometheus  prometheus.yml  promtool  tsdb

创建prometheus启动脚本

root@master2:/usr/local/prometheus# vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target

[Service]
Restart=on-failure
WorkingDirectory=/usr/local/prometheus/
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
[Install]

WantedBy=multi-user.target

启动prometheus服务

root@master2:/usr/local/prometheus# systemctl start prometheus
root@master2:/usr/local/prometheus# systemctl status prometheus
root@master2:/usr/local/prometheus# systemctl enable prometheus
Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /etc/systemd/system/prometheus.service.

访问prometheus web界面

访问prometheus节点的9090端口

node exporter

收集各k8s node节点(master/node)上的监控指标数据，监听端口为9100

二进制方式安装node exporter(master/node)

解压二进制压缩包文件

root@node1:/usr/local/src# ls
node_exporter-0.18.1.linux-amd64.tar.gz

root@node1:/usr/local/src# tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz 
node_exporter-0.18.1.linux-amd64/
node_exporter-0.18.1.linux-amd64/node_exporter
node_exporter-0.18.1.linux-amd64/NOTICE
node_exporter-0.18.1.linux-amd64/LICENSE

node_exporter目录创建软链接

root@node1:/usr/local/src# ln -sv /usr/local/src/node_exporter-0.18.1.linux-amd64 /usr/local/node_exporter
'/usr/local/node_exporter' -> '/usr/local/src/node_exporter-0.18.1.linux-amd64'

root@node1:/usr/local/src# cd /usr/local/node_exporter
root@node1:/usr/local/node_exporter# ls
LICENSE  node_exporter  NOTICE

创建node exporter启动脚本

root@node1:/usr/local/node_exporter# vim /etc/systemd/system/node-exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
ExecStart=/usr/local/node_exporter/node_exporter

[Install]
WantedBy=multi-user.target

启动node exporter服务

root@node1:/usr/local/node_exporter# systemctl start node-exporter
root@node1:/usr/local/node_exporter# systemctl status node-exporter
root@node1:/usr/local/node_exporter# systemctl enable node-exporter
Created symlink /etc/systemd/system/multi-user.target.wants/node-exporter.service → /etc/systemd/system/node-exporter.service.

访问node exporter web界面

在k8s的master和node节点分别测试访问9100端口

prometheus采集node 指标数据

配置prometheus通过node exporter采集监控指标数据

prometheus配置文件

prometheus server的prometheus.yml文件

root@master2:/usr/local/prometheus# cat prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  #- job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

        # static_configs:
        #- targets: ['localhost:9090']

  # 指定node exporter采集的IP及端口
  - job_name: 'prometheus-node'
    static_configs:
    - targets: ['10.203.104.26:9100','10.203.104.27:9100','10.203.104.28:9100']

  - job_name: 'prometheus-master'
    static_configs:
    - targets: ['10.203.104.20:9100','10.203.104.21:9100','10.203.104.22:9100']

重启prometheus服务

root@master2:/usr/local/prometheus# systemctl restart prometheus

prometheus验证node节点状态

prometheus验证node节点监控数据

Grafana

https://grafana.com/docs/ #官方安装文档

调用prometheus的数据，进行更专业的可视化

安装grafana

在master3(10.203.104.22)中安装grafana，安装版本为v6.7.2

root@master3:/usr/local/src# apt-get install -y adduser libfontconfig1
 root@master3:/usr/local/src# wget https://dl.grafana.com/oss/release/grafana_6.7.2_amd64.deb
root@master3:/usr/local/src# dpkg -i grafana_6.7.2_amd64.deb

配置文件

root@master3:~# vim /etc/grafana/grafana.ini
[server]
# Protocol (http, https, socket)

protocol = http

# The ip address to bind to, empty will bind to all interfaces

http_addr = 0.0.0.0

# The http port to use

http_port = 3000

启动grafana

root@master3:~# systemctl start grafana-server.service
root@master3:~# systemctl enable grafana-server.service
Synchronizing state of grafana-server.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable grafana-server
Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /usr/lib/systemd/system/grafana-server.service.

grafana web界面

登录界面

添加prometheus数据源

import模板

模板下载地址

https://grafana.com/grafana/dashboards

点击目标模板

下载模板

通过模板ID导入

确认模板信息

验证图形信息

饼图插件未安装，需要提前安装
https://grafana.com/grafana/plugins/grafana-piechart-panel

在线安装：
# grafana-cli plugins install grafana-piechart-panel

离线安装：
root@master3:/var/lib/grafana/plugins# pwd
/var/lib/grafana/plugins

root@master3:/var/lib/grafana/plugins# ls
grafana-piechart-panel-v1.5.0-0-g3234d63.zip

root@master3:/var/lib/grafana/plugins# unzip grafana-piechart-panel-v1.5.0-0-g3234d63.zip
root@master3:/var/lib/grafana/plugins# mv grafana-piechart-panel-3234d63/ grafana-piechart-panel
root@master3:/var/lib/grafana/plugins# systemctl restart grafana-server

监控pod资源

node节点都需安装cadvisor

cadvisor由谷歌开源，cadvisor不仅可以搜集一台机器上所有运行的容器信息，还提供基础查询界面和http接口，方便其他组件如Prometheus进行数据抓取，cAdvisor可以对节点机器上的资源及容器进行实时监控和性能数据采集，包括CPU使用情况、内存使用情况、网络吞吐量及文件系统使用情况。

k8s 1.12之前cadvisor集成在node节点的上kubelet服务中，从1.12版本开始分离为两个组件，因此需要在node节点单独部署cadvisor。

https://github.com/google/cadvisor

cadvisor镜像准备

# docker load -i cadvisor_v0.36.0.tar.gz
# docker tag gcr.io/google-containers/cadvisor:v0.36.0 harbor.linux.com/baseimages/cadvisor:v0.36.0
# docker push harbor.linux.com/baseimages/cadvisor:v0.36.0

启动cadvisor容器

# docker run 
    --volume=/:/rootfs:ro 
    --volume=/var/run:/var/run:rw 
    --volume=/sys:/sys:ro 
    --volume=/var/lib/docker/:/var/lib/docker:ro 
    --volume=/dev/disk/:/dev/disk:ro 
    --publish=8080:8080 
    --detach=true 
    --name=cadvisor 
    harbor.linux.com/baseimages/cadvisor:v0.36.0

验证cadvisor web界面：

访问node节点的cadvisor监听端口：http://10.203.104.26:8080/

prometheus采集cadvisor数据

root@master2:~# cat /usr/local/prometheus/prometheus.yml
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  #- job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

        # static_configs:
        #- targets: ['localhost:9090']

    - job_name: 'prometheus-node'
      static_configs:
      - targets: ['10.203.104.26:9100','10.203.104.27:9100','10.203.104.28:9100']

    - job_name: 'prometheus-master'
      static_configs:
      - targets: ['10.203.104.20:9100','10.203.104.21:9100','10.203.104.22:9100']

    - job_name: 'prometheus-pod-cadvisor'
      static_configs:
      - targets: ['10.203.104.26:8080','10.203.104.27:8080','10.203.104.28:8080']

重启prometheus

root@master2:~# systemctl restart prometheus

grafana添加pod监控模板

prometheus报警设置

prometheus触发一条告警的过程：

prometheus--->触发阈值--->超出持续时间--->alertmanager--->分组|抑制|静默--->媒体类型--->邮件|钉钉|微信等。

分组(group): 将类似性质的警报合并为单个通知。
静默(silences): 是一种简单的特定时间静音的机制，例如：服务器要升级维护可以先设置这个时间段告警静默。
抑制(inhibition): 当警报发出后，停止重复发送由此警报引发的其他警报即合并一个故障引起的多个报警事件，可以消除冗余告警

alertmanager主机的IP为10.203.104.22，主机名为master3

下载并报警组件alertmanager

root@master3:/usr/local/src# ls
alertmanager-0.20.0.linux-amd64.tar.gz  grafana_6.7.2_amd64.deb  node_exporter-0.18.1.linux-amd64.tar.gz

root@master3:/usr/local/src# tar -zxvf alertmanager-0.20.0.linux-amd64.tar.gz 
alertmanager-0.20.0.linux-amd64/
alertmanager-0.20.0.linux-amd64/LICENSE
alertmanager-0.20.0.linux-amd64/alertmanager
alertmanager-0.20.0.linux-amd64/amtool
alertmanager-0.20.0.linux-amd64/NOTICE
alertmanager-0.20.0.linux-amd64/alertmanager.yml

root@master3:/usr/local/src# ln -sv /usr/local/src/alertmanager-0.20.0.linux-amd64 /usr/local/alertmanager
'/usr/local/alertmanager' -> '/usr/local/src/alertmanager-0.20.0.linux-amd64'

root@master3:/usr/local/src# cd /usr/local/alertmanager
root@master3:/usr/local/alertmanager# ls
alertmanager  alertmanager.yml  amtool  LICENSE  NOTICE

配置alertmanager

https://prometheus.io/docs/alerting/configuration/ #官方配置文档

root@master3:/usr/local/alertmanager# cat alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_from: '2973707860@qq.com'
  smtp_auth_username: '2973707860@qq.com'
  smtp_auth_password: 'udwthyyxtstcdhcj'
  smtp_hello: '@qq.com'
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  #webhook_configs:
  #- url: 'http://127.0.0.1:5001/'
  email_configs:
    - to: '2973707860@qq.com'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

启动alertmanager服务

二进制启动

root@master3:/usr/local/alertmanager# ./alertmanager --config.file=./alertmanager.yml

服务启动文件

root@master3:/usr/local/alertmanager# cat /etc/systemd/system/alertmanager.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target

启动服务

root@master3:/usr/local/alertmanager# systemctl start alertmanager.service
root@master3:/usr/local/alertmanager# systemctl enable alertmanager.service
Created symlink /etc/systemd/system/multi-user.target.wants/alertmanager.service → /etc/systemd/system/alertmanager.service.

web 测试访问9093端口

配置prometheus报警规则

root@master2:/usr/local/prometheus# cat prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 10.203.104.22:9093   #alertmanager地址

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/usr/local/prometheus/danran_rule.yml"   #指定规则文件
  # - "first_rules.yml"
  # - "second_rules.yml"

创建报警规则文件

root@master2:/usr/local/prometheus# cat danran_rule.yml 
groups:
  - name: danran_pod.rules
    rules:
    - alert: Pod_all_cpu_usage
      expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 75
      for: 5m
      labels:
        severity: critical
        service: pods
      annotations:
        description: 容器 {{ $labels.name }} CPU 资源利用率大于 75% , (current value is {{ $value }})
        summary: Dev CPU 负载告警

    - alert: Pod_all_memory_usage
      expr: sort_desc(avg by(name)(irate(container_memory_usage_bytes{name!=""}[5m]))*100) > 1024*10^3*2
      for: 10m
      labels:
        severity: critical
      annotations:
        description: 容器 {{ $labels.name }} Memory 资源利用率大于 2G , (current value is {{ $value }})
        summary: Dev Memory 负载告警

    - alert: Pod_all_network_receive_usage
      expr: sum by (name)(irate(container_network_receive_bytes_total{container_name="POD"}[1m])) > 1024*1024*50
      for: 10m
      labels:
        severity: critical
      annotations:
        description: 容器 {{ $labels.name }} network_receive 资源利用率大于 50M , (current value is {{ $value }})

报警规则验证

root@master2:/usr/local/prometheus# ./promtool check rules danran_rule.yml
Checking danran_rule.yml
  SUCCESS: 3 rules found

重启prometheus

root@master2:/usr/local/prometheus# systemctl restart prometheus

验证报警规则匹配

10.203.104.22为alertmanager主机

root@master3:/usr/local/alertmanager# ./amtool alert --alertmanager.url=http://10.203.104.22:9093

prometheus首页状态

prometheus web界面验证报警规则

prometheus监控haproxy

haproxy_exporter安装在HA1(10.203.104.30)节点上

部署haproxy_exporter

root@ha1:/usr/local/src# ls
haproxy_exporter-0.10.0.linux-amd64.tar.gz
root@ha1:/usr/local/src# tar -zxvf haproxy_exporter-0.10.0.linux-amd64.tar.gz 
haproxy_exporter-0.10.0.linux-amd64/
haproxy_exporter-0.10.0.linux-amd64/LICENSE
haproxy_exporter-0.10.0.linux-amd64/NOTICE
haproxy_exporter-0.10.0.linux-amd64/haproxy_exporter

root@ha1:/usr/local/src# ln -sv /usr/local/src/haproxy_exporter-0.10.0.linux-amd64 /usr/local/haproxy_exporter
'/usr/local/haproxy_exporter' -> '/usr/local/src/haproxy_exporter-0.10.0.linux-amd64'
root@ha1:/usr/local/src# cd /usr/local/haproxy_exporter

启动haproxy_exporter

root@ha1:/usr/local/haproxy_exporter# ./haproxy_exporter  --haproxy.scrape-uri=unix:/run/haproxy/admin.sock
或指定haproxy的状态页启动
root@ha1:/usr/local/haproxy_exporter# ./haproxy_exporter --haproxy.scrape-uri="http://haadmin:danran@10.203.104.30:9999/haproxy-status;csv"

查看haproxy的状态页配置
root@ha1:/usr/local/src# cat /etc/haproxy/haproxy.cfg
listen stats
    mode http
    bind 0.0.0.0:9999
    stats enable
    log global
    stats uri /haproxy-status
    stats auth haadmin:danran

编辑启动脚本

root@ha1:~# cat /etc/systemd/system/haproxy-exporter.service
[Unit]
Description=Prometheus Haproxy Exporter
After=network.target

[Service]
ExecStart=/usr/local/haproxy_exporter/haproxy_exporter  --haproxy.scrape-uri=unix:/run/haproxy/admin.sock


[Install]
WantedBy=multi-user.target

root@ha1:~# systemctl restart haproxy-exporter.service

验证web界面数据

prometheus server端添加haproxy数据采集

root@master2:/usr/local/prometheus# cat prometheus.yml
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  #- job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

        # static_configs:
        #- targets: ['localhost:9090']

- job_name: 'prometheus-node'
  static_configs:
  - targets: ['10.203.104.26:9100','10.203.104.27:9100','10.203.104.28:9100']

- job_name: 'prometheus-master'
  static_configs:
  - targets: ['10.203.104.20:9100','10.203.104.21:9100','10.203.104.22:9100']

- job_name: 'prometheus-pod'
  static_configs:
  - targets: ['10.203.104.26:8080','10.203.104.27:8080','10.203.104.28:8080']

- job_name: 'prometheus-haproxy'
  static_configs:
  - targets: ['10.203.104.30:9101']

重启prometheus

root@master2~# systemctl restart prometheus

grafana添加数据模板

获取模板
https://grafana.com/grafana/dashboards?dataSource=prometheus&direction=asc&orderBy=name&search=haproxy

在grafana中import 导入下载模版ID或JSON文件