Http请求的性能指标收集——Prometheus

这一篇看一下Prometheus对http请求的指标收集,先来看两个指标定义:

(1)使用的指标类型:

  这三种指标类型的特性在之前推荐的一篇关于Prometheus介绍的文章里有详细说明,这里简单回顾一下,gauge可以进行加减操作,反映指标的当前状态;counter只能增不能减,反映指标的计数,如错误数等;summary用来进行计数和值的汇总,反映指标的分布情况。

public enum MetricsType {
    GAUGE("gauge"),
    SUMMARY("summary"),
    COUNTER("counter");

    private String name;

    MetricsType(String name) {
        this.name = name;
    }

    public String getName() {
        return this.name;
    }
}

(2)指标对象,定义了指标的分类(即通过type和subType来确定一种要收集的指标,每一种指标都有一个唯一对应指标名称),同时也封装了对于指标的操作。

public class Stats {
    private MeterRegistry registry;
    private String type;
    private String namespace;
    private String subType;
    private String concurrentGaugeName; //Gauge名称,如type=http,subType=in,则:pepper.gauge.http.in.concurrent
    private String durationSummaryName; //Summary名称,如type=http,subType=in,则:pepper.summary.http.in.duration
    private String errCounterName; //Counter名称,如type=http,subType=in,则:pepper.counter.http.in.err

    //存储标签和对应类型的Metric对象
    private final ConcurrentMap<List<String>, Counter> errCollector = new ConcurrentHashMap<>();
    private final ConcurrentMap<List<String>, AtomicLong> gaugeCollector = new ConcurrentHashMap<>();
    private final ConcurrentMap<List<String>, Timer> summaryCollector = new ConcurrentHashMap<>();


    public ConcurrentMap<List<String>, Counter> getErrCollector() {
        return errCollector;
    }

    public ConcurrentMap<List<String>, AtomicLong> getGaugeCollector() {
        return gaugeCollector;
    }

    public ConcurrentMap<List<String>, Timer> getSummaryCollector() {
        return summaryCollector;
    }

    public String getType() {
        return type;
    }

    public String getSubType() {
        return subType;
    }

    public String getNamespace() {
        return namespace;
    }

    public Stats(MeterRegistry registry, String type, String namespace, String subType) {
        this.registry = registry;
        this.type = type;
        this.namespace = namespace;
        this.subType = subType;
        concurrentGaugeName = MetricsNameBuilder.builder()
                .setName("concurrent")
                .setType(type)
                .setSubType(subType)
                .setMetricsType(MetricsType.GAUGE)
                .build();

        durationSummaryName = MetricsNameBuilder.builder()
                .setName("duration")
                .setType(type)
                .setSubType(subType)
                .setMetricsType(MetricsType.SUMMARY)
                .build();

        errCounterName = MetricsNameBuilder.builder()
                .setName("err")
                .setType(type)
                .setSubType(subType)
                .setMetricsType(MetricsType.COUNTER)
                .build();
    }

    public void error(String... tags) {
        getOrInitCounter(errCollector, errCounterName, tags).increment();
    }

    public void incConc(String...tags) {
        getOrInitGauge(concurrentGaugeName, tags).incrementAndGet();
    }

    public void decConc(String...tags) {
        getOrInitGauge(concurrentGaugeName, tags).decrementAndGet();
    }

    public void observe(long elapse, String...tags) {
        getOrInitSummary(durationSummaryName, tags).record(elapse, TimeUnit.MILLISECONDS);
    }


    public void observe(long elapse, TimeUnit timeUnit, String...tags) {
        getOrInitSummary(durationSummaryName, tags).record(elapse, timeUnit);
    }

    private Timer getOrInitSummary(String sName, String... tags) {
        final List<String> asList = Arrays.asList(tags);
        Timer timer = summaryCollector.get(asList);
        if (timer != null) {
            return timer;
        }
        timer = Timer.builder(sName)
                .distributionStatisticExpiry(Duration.ofSeconds(60))
                //这里是定义的请求响应时间的分布
                .publishPercentiles(0.9, 0.99, 0.999, 0.99999)
                .publishPercentileHistogram(false)
                .tags(tags)
                .register(registry);
        summaryCollector.putIfAbsent(asList, timer);
        return timer;
    }

    private Counter getOrInitCounter(ConcurrentMap<List<String>, Counter> collector, String counterName, String... tags) {
        final List<String> asList = Arrays.asList(tags);
        final Counter c = collector.get(asList);
        if (c != null) {
            return c;
        }
        Counter counter = registry.counter(counterName, tags);
        collector.putIfAbsent(asList, counter);
        return counter;
    }

    protected AtomicLong getOrInitGauge(String gaugeName, String... tags) {
        final List<String> asList = Arrays.asList(tags);
        final AtomicLong g = gaugeCollector.get(asList);
        if (g != null) return g;
        synchronized (gaugeCollector) {
            if (gaugeCollector.get(asList) == null) {
                final AtomicLong obj = new AtomicLong();
                Gauge.builder(gaugeName, obj, AtomicLong::get).tags(tags).register(registry);
                gaugeCollector.putIfAbsent(asList, obj);
            }
        }
        return gaugeCollector.get(asList);
    }
}

说完了对于指标的两个关键性定义后,来看下是如何收集http请求指标的。

对于http指标的收集是在过滤器中完成的,看下面的过滤器定义:

public class PerfFilter implements Filter {
    //接口请求的收集指标
    private static final Stats PROFILER_STAT = Profiler.Builder
            .builder()
            .type("http")
            .subType("in")
            .namespace("default")
            .build();

    //接口响应状态的请求指标
    private static final Stats PROFILER_STAT_HTTPSTATUS = Profiler.Builder
            .builder()
            .type("http-status")
            .subType("in")
            .namespace("default")
            .build();

    @Override
    public void init(FilterConfig filterConfig) throws ServletException { }

    @Override
    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException {
        HttpServletRequest sRequest = (HttpServletRequest) servletRequest;
        HttpServletResponse sResponse = (HttpServletResponse) servletResponse;

        String url = HttpUtil.getPatternUrl(sRequest.getRequestURI());

        long begin = System.currentTimeMillis();

        //对于http请求,tags包含请求方式(post/get等)和接口的uri(如/get/user)
        String[] tags = {"method", sRequest.getMethod(), "url", url, "type", "exception"};

        //使用Gauge类型的Metric对象,对当前接口的请求次数+1
        PROFILER_STAT.incConc(tags);
        try {
            //doFilter会一直向下传递,直到执行完
            filterChain.doFilter(servletRequest, servletResponse);
        } catch (IOException | ServletException e) {
            //统计系统级别的错误,自定义的业务级别的异常不统计,如BusinessException
            PROFILER_STAT.error(tags);
            throw e;
        } finally {
            //请求处理结束后,该接口的请求次数-1
            PROFILER_STAT.decConc(tags);
            String httpStatus = String.valueOf(sResponse.getStatus());
            //返回结果的tags
            String[] httpStatusTags = {"method", sRequest.getMethod(), "url", url, "type", "status", "status", httpStatus};
            //按请求tags统计接口的响应时间,包括p99,p999
            PROFILER_STAT.observe(System.currentTimeMillis() - begin, TimeUnit.MILLISECONDS, tags);
            //按返回结果tags统计接口的响应时间,这样可以查询接口不同返回状态的响应时间分布
            PROFILER_STAT_HTTPSTATUS.observe(System.currentTimeMillis() - begin, TimeUnit.MILLISECONDS, httpStatusTags);
        }
    }

    @Override
    public void destroy() { }
}

注释已经写的很明白了,所有的指标收集都是在这里完成的,只要在使用时将该过滤器注册到应用中就好,如下:

@Configuration
@ConditionalOnClass(HttpServletRequest.class)
@AutoConfigureOrder(Ordered.HIGHEST_PRECEDENCE)
@ConditionalOnWebApplication
public class WebAutoConfiguration {
    
    @Bean
    public FilterRegistrationBean profilerFilterRegistration() {
        FilterRegistrationBean<Filter> registration = new FilterRegistrationBean<>();
        registration.setFilter(new PerfFilter());
        registration.addUrlPatterns("/*");
        registration.setName("profilerHttpFilter");
        registration.setOrder(1);

        return registration;
    }

    @Bean
    public ProjectInfoController projectInfoController() {
        return new ProjectInfoController();
    }
}

到这里就可以完全明白对于http请求的指标收集是如何操作的了,最后看一些收集到的一些指标的数据展示。

对于gauge类型指标,可以查看当前应用的QPS和并发状况:

              

 对于summary类型指标,可以查看请求响应时间的分布(即p99,p999等)以及请求响应状态情况:

              

原文地址:https://www.cnblogs.com/jing-yi/p/14383063.html