prometheus监控告警(alertmanager)发送邮件通知

特别注意:防止发送通知过快或频繁,导致警告通知轰炸

下载alertmanager

下载地址:https://prometheus.io/download/
下载解压之后直接双击exe文件启动,打开 http://localhost:9093,等 prometheus配置之后重启等会,

修改alertmanager.yml

global:
  resolve_timeout: 5m
  smtp_from: 'xxxxxxxx@qq.com'
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: 'xxxxxxxxxxx@qq.com'
  smtp_auth_password: 'xxxxxxxxxxxxxxx'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: 'xxxxxxxxxx@qq.com'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

修改prometheus.yml

global:
  scrape_interval:     15s
  evaluation_interval: 15s
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 127.0.0.1:9093
rule_files:
    - "machine_alert_rules.yml"
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'node_liux_70'
    static_configs:
    - targets: ['10.0.0.70:9100']

添加machine_alert_rules.yml

groups:
- name: simulator-alert-rule
  rules:
  - alert: check_node_liux_70
    expr: sum(up{job="node_liux_70"}) == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      description: "已经宕机或下线超过1分钟."

原文地址:https://www.cnblogs.com/daikainan/p/14443973.html