普罗米修斯联邦数据的聚合

http://news.sohu.com/a/501577646_121124376

https://www.jianshu.com/p/952eca77dbf3

联合允许Prometheus服务器从另一个Prometheus服务器中截取选定的时间序列。

一、用例

联邦有不同的用例。通常，它用于实现可扩展的Prometheus监控设置或将相关指标从一个服务的Prometheus拉到另一个服务。

1.1 分层联合

分层联合允许Prometheus扩展到具有数十个数据中心和数百万个节点的环境。在此用例中，联合拓扑类似于树，较高级别的Prometheus服务器从较大数量的从属服务器收集聚合时间序列数据。

例如，设置可能包含许多高度详细收集数据的每个数据中心Prometheus服务器（实例级深入分析），以及一组仅收集和存储聚合数据的全局Prometheus服务器（作业级向下钻取））来自那些本地服务器。这提供了聚合全局视图和详细的本地视图。

1.2 跨服务联合

在跨服务联合中，一个服务的Prometheus服务器配置为从另一个服务的Prometheus服务器中提取所选数据，以便对单个服务器中的两个数据集启用警报和查询。

例如，运行多个服务的集群调度程序可能会暴露有关在集群上运行的服务实例的资源使用情况信息（如内存和CPU使用情况）。另一方面，在该集群上运行的服务仅公开特定于应用程序的服务指标。通常，这两组指标都是由单独的Prometheus服务器抓取的。使用联合，包含服务级别度量标准的Prometheus服务器可以从群集Prometheus中提取有关其特定服务的群集资源使用情况度量标准，以便可以在该服务器中使用这两组度量标准。

二、联邦配置

在任何给定的Prometheus服务器上，/federate端点允许检索该服务器中所选时间序列集的当前值。必须至少指定一个match[] URL参数才能选择要公开的系列。每个match[]参数都需要指定一个即时向量选择器，如up或{job="api-server"}。如果提供了多个match[]参数，则选择所有匹配系列的并集。

要将指标从一个服务器联合到另一个服务器，请将目标Prometheus服务器配置为从源服务器的/federate端点进行刮取，同时还启用honor_labels scrape选项（以不覆盖源服务器公开的任何标签）并传入所需的 match[]参数。例如，以下scrape_config将任何带有标签job="prometheus"的系列或以job开头的度量标准名称联合起来：source-prometheus-{1,2,3}:9090的Prometheus服务器进入抓取普罗米修斯：

- job_name: 'federate'
  scrape_interval: 15s

  honor_labels: true
  metrics_path: '/federate'

  params:
    'match[]':
      - '{job="prometheus"}'
      - '{__name__=~"job:.*"}'

  static_configs:
    - targets:
      - 'source-prometheus-1:9090'
      - 'source-prometheus-2:9090'
      - 'source-prometheus-3:9090'

https://www.iyunw.cn/archives/prometheus-federation-jiang-duo-ge-prometheus-jian-kong-ju-he/

官方文档：

https://prometheus.io/docs/prometheus/latest/federation/

功能：

多个prometheus聚合到一起监控
集中存储监控数据
统一报警

# my global config
global:
  scrape_interval:     30s # 30秒更新一次
  evaluation_interval: 2m #这个和报警想对应，2分钟报警从inactive到pending然后2分钟在到fire，期间收集到数据如果变正常，则不再触发报警
  scrape_timeout: 30s  #pull超时30秒，默认10S

# Alertmanager configuration
alerting:  #Alertmanager 配置
  alertmanagers:
  - static_configs:
    - targets: ["172.22.1.14:8080"]

# 报警规则配置
rule_files:  
  - "/etc/prometheus/rule.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"


# job配置
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'   #常规的监控服务器
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['127.0.0.1:9090']

  - job_name: "alertmanager"
    static_configs:
    - targets: ['172.22.1.14:8080']

  - job_name: 'federate-sdorica'  #聚合监控，通过其他prometheus拉取数据
    scrape_interval: 30s         #30秒拉取1次
    honor_labels: true            #不覆盖原来的标签
    metrics_path: '/federate'     #采集路径这个不改
    params:                       
      'match[]':                   #筛选原来prometheus下面的标签
        - '{job=~"kubernetes-.*"}'
        - '{job=~"traefik.*"}'
    static_configs:                #原来prometheus的配置
      - targets: ['1.1.1.1:30090']
        labels:                    #给原来prometheus加一个标签，防止多了不知道是哪个集群的机器
          k8scluster: sdorica-k8s

  - job_name: 'federate-soe.demon.hj5'
    scrape_interval: 30s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job=~"kubernetes-.*"}'
    static_configs:
      - targets: ['3.3.3.3:30090']
        labels:
          k8scluster: soe-demon-hj5-k8s

  - job_name: 'federate-jcyfb.av'
    scrape_interval: 30s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job=~"kubernetes-.*"}'
        - '{job=~"traefik.*"}'
    static_configs:
      - targets: ['2.2.2.2:30090']
        labels:
          k8scluster: jcyfb-av-k8s

上面有三个配置笔记重要的

第一个就是标签honor_labels: true，

3.job：honor_labels

honor_labels主要用于解决prometheus server的label与exporter端用户自定义label冲突的问题。这里如果采集层自定义的标签与联邦层的标签存在冲突，以联邦层自定义的标签为准

官方说明：

#If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
#
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels. This is useful for use cases such as federation, where all labels
# specified in the target should be preserved.

params:
      'match[]':
        - '{job=~"kubernetes-.*"}'
表示联邦层只拉取采集层中job名称为

kubernetes-开头的监控指标数据。