这一篇看一下Prometheus对http请求的指标收集,先来看两个指标定义:
(1)使用的指标类型:
这三种指标类型的特性在之前推荐的一篇关于Prometheus介绍的文章里有详细说明,这里简单回顾一下,gauge可以进行加减操作,反映指标的当前状态;counter只能增不能减,反映指标的计数,如错误数等;summary用来进行计数和值的汇总,反映指标的分布情况。
public enum MetricsType { GAUGE("gauge"), SUMMARY("summary"), COUNTER("counter"); private String name; MetricsType(String name) { this.name = name; } public String getName() { return this.name; } }
(2)指标对象,定义了指标的分类(即通过type和subType来确定一种要收集的指标,每一种指标都有一个唯一对应指标名称),同时也封装了对于指标的操作。
public class Stats { private MeterRegistry registry; private String type; private String namespace; private String subType; private String concurrentGaugeName; //Gauge名称,如type=http,subType=in,则:pepper.gauge.http.in.concurrent private String durationSummaryName; //Summary名称,如type=http,subType=in,则:pepper.summary.http.in.duration private String errCounterName; //Counter名称,如type=http,subType=in,则:pepper.counter.http.in.err //存储标签和对应类型的Metric对象 private final ConcurrentMap<List<String>, Counter> errCollector = new ConcurrentHashMap<>(); private final ConcurrentMap<List<String>, AtomicLong> gaugeCollector = new ConcurrentHashMap<>(); private final ConcurrentMap<List<String>, Timer> summaryCollector = new ConcurrentHashMap<>(); public ConcurrentMap<List<String>, Counter> getErrCollector() { return errCollector; } public ConcurrentMap<List<String>, AtomicLong> getGaugeCollector() { return gaugeCollector; } public ConcurrentMap<List<String>, Timer> getSummaryCollector() { return summaryCollector; } public String getType() { return type; } public String getSubType() { return subType; } public String getNamespace() { return namespace; } public Stats(MeterRegistry registry, String type, String namespace, String subType) { this.registry = registry; this.type = type; this.namespace = namespace; this.subType = subType; concurrentGaugeName = MetricsNameBuilder.builder() .setName("concurrent") .setType(type) .setSubType(subType) .setMetricsType(MetricsType.GAUGE) .build(); durationSummaryName = MetricsNameBuilder.builder() .setName("duration") .setType(type) .setSubType(subType) .setMetricsType(MetricsType.SUMMARY) .build(); errCounterName = MetricsNameBuilder.builder() .setName("err") .setType(type) .setSubType(subType) .setMetricsType(MetricsType.COUNTER) .build(); } public void error(String... tags) { getOrInitCounter(errCollector, errCounterName, tags).increment(); } public void incConc(String...tags) { getOrInitGauge(concurrentGaugeName, tags).incrementAndGet(); } public void decConc(String...tags) { getOrInitGauge(concurrentGaugeName, tags).decrementAndGet(); } public void observe(long elapse, String...tags) { getOrInitSummary(durationSummaryName, tags).record(elapse, TimeUnit.MILLISECONDS); } public void observe(long elapse, TimeUnit timeUnit, String...tags) { getOrInitSummary(durationSummaryName, tags).record(elapse, timeUnit); } private Timer getOrInitSummary(String sName, String... tags) { final List<String> asList = Arrays.asList(tags); Timer timer = summaryCollector.get(asList); if (timer != null) { return timer; } timer = Timer.builder(sName) .distributionStatisticExpiry(Duration.ofSeconds(60)) //这里是定义的请求响应时间的分布 .publishPercentiles(0.9, 0.99, 0.999, 0.99999) .publishPercentileHistogram(false) .tags(tags) .register(registry); summaryCollector.putIfAbsent(asList, timer); return timer; } private Counter getOrInitCounter(ConcurrentMap<List<String>, Counter> collector, String counterName, String... tags) { final List<String> asList = Arrays.asList(tags); final Counter c = collector.get(asList); if (c != null) { return c; } Counter counter = registry.counter(counterName, tags); collector.putIfAbsent(asList, counter); return counter; } protected AtomicLong getOrInitGauge(String gaugeName, String... tags) { final List<String> asList = Arrays.asList(tags); final AtomicLong g = gaugeCollector.get(asList); if (g != null) return g; synchronized (gaugeCollector) { if (gaugeCollector.get(asList) == null) { final AtomicLong obj = new AtomicLong(); Gauge.builder(gaugeName, obj, AtomicLong::get).tags(tags).register(registry); gaugeCollector.putIfAbsent(asList, obj); } } return gaugeCollector.get(asList); } }
说完了对于指标的两个关键性定义后,来看下是如何收集http请求指标的。
对于http指标的收集是在过滤器中完成的,看下面的过滤器定义:
public class PerfFilter implements Filter { //接口请求的收集指标 private static final Stats PROFILER_STAT = Profiler.Builder .builder() .type("http") .subType("in") .namespace("default") .build(); //接口响应状态的请求指标 private static final Stats PROFILER_STAT_HTTPSTATUS = Profiler.Builder .builder() .type("http-status") .subType("in") .namespace("default") .build(); @Override public void init(FilterConfig filterConfig) throws ServletException { } @Override public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException { HttpServletRequest sRequest = (HttpServletRequest) servletRequest; HttpServletResponse sResponse = (HttpServletResponse) servletResponse; String url = HttpUtil.getPatternUrl(sRequest.getRequestURI()); long begin = System.currentTimeMillis(); //对于http请求,tags包含请求方式(post/get等)和接口的uri(如/get/user) String[] tags = {"method", sRequest.getMethod(), "url", url, "type", "exception"}; //使用Gauge类型的Metric对象,对当前接口的请求次数+1 PROFILER_STAT.incConc(tags); try { //doFilter会一直向下传递,直到执行完 filterChain.doFilter(servletRequest, servletResponse); } catch (IOException | ServletException e) { //统计系统级别的错误,自定义的业务级别的异常不统计,如BusinessException PROFILER_STAT.error(tags); throw e; } finally { //请求处理结束后,该接口的请求次数-1 PROFILER_STAT.decConc(tags); String httpStatus = String.valueOf(sResponse.getStatus()); //返回结果的tags String[] httpStatusTags = {"method", sRequest.getMethod(), "url", url, "type", "status", "status", httpStatus}; //按请求tags统计接口的响应时间,包括p99,p999 PROFILER_STAT.observe(System.currentTimeMillis() - begin, TimeUnit.MILLISECONDS, tags); //按返回结果tags统计接口的响应时间,这样可以查询接口不同返回状态的响应时间分布 PROFILER_STAT_HTTPSTATUS.observe(System.currentTimeMillis() - begin, TimeUnit.MILLISECONDS, httpStatusTags); } } @Override public void destroy() { } }
注释已经写的很明白了,所有的指标收集都是在这里完成的,只要在使用时将该过滤器注册到应用中就好,如下:
@Configuration @ConditionalOnClass(HttpServletRequest.class) @AutoConfigureOrder(Ordered.HIGHEST_PRECEDENCE) @ConditionalOnWebApplication public class WebAutoConfiguration { @Bean public FilterRegistrationBean profilerFilterRegistration() { FilterRegistrationBean<Filter> registration = new FilterRegistrationBean<>(); registration.setFilter(new PerfFilter()); registration.addUrlPatterns("/*"); registration.setName("profilerHttpFilter"); registration.setOrder(1); return registration; } @Bean public ProjectInfoController projectInfoController() { return new ProjectInfoController(); } }
到这里就可以完全明白对于http请求的指标收集是如何操作的了,最后看一些收集到的一些指标的数据展示。
对于gauge类型指标,可以查看当前应用的QPS和并发状况:
对于summary类型指标,可以查看请求响应时间的分布(即p99,p999等)以及请求响应状态情况: