【翻译】PROMETHEUS 查询示例

本文翻译自官网：https://prometheus.io/docs/prometheus/latest/querying/examples/

简单时间序列选择

Return all time series with the metric http_requests_total:

返回指标 http_requests_total 的所有时间序列：

```sql

http_requests_total

```

Return all time series with the metric http_requests_total and the given job and handler labels:

返回指标 http_requests_total 指定 job 和 handler 的所有时间序列

```sql

http_requests_total{job="apiserver", handler="/api/comments"}

```

Return a whole range of time (in this case 5 minutes) for the same vector, making it a range vector:

返回相同向量的整个时间范围（这个示例是5分钟），使其成为范围向量

```sql

http_requests_total{job="apiserver", handler="/api/comments"}[5m]

```

Note that an expression resulting in a range vector cannot be graphed directly, but viewed in the tabular ("Console") view of the expression browser.

注意范围向量的表达式结果不能直接绘成图

Using regular expressions, you could select time series only for jobs whose name match a certain pattern, in this case, all jobs that end with server:

使用正则表达式，你可以选择jobs 的名字匹配某些模式的时间序列，在这个示例中，是所有以 server 结尾的job

```sql

http_requests_total{job=~".*server"}

```

All regular expressions in Prometheus use RE2 syntax[https://github.com/google/re2/wiki/Syntax].

在 prometheus 中所有正则表达式使用 RE2 语法

To select all HTTP status codes except 4xx ones, you could run:

为了选择所有http 状态码除了 4xx 的，你可以使用：

```sql

http_requests_total{status!~"4.."}

```

Subquery

子查询

Return the 5-minute rate of the http_requests_total metric for the past 30 minutes, with a resolution of 1 minute.

返回过去 30 分钟指标 http_requests_total 的5 分钟比率，分辨率为 1 分钟

注： 30-25:29-24:28-23:27-22...

```sql

rate(http_requests_total[5m])[30m:1m]

```

This is an example of a nested subquery. The subquery for the deriv function uses the default resolution. Note that using subqueries unnecessarily is unwise.

这是一个嵌套子查询的示例。deriv函数的子查询使用默认分辨率。注意在不必要的时候使用子查询是不明智的。

```sql

max_over_time(deriv(rate(distance_covered_total[5s])[30s:5s])[10m:])

```

Using functions, operators, etc.

Return the per-second rate for all time series with the http_requests_total metric name, as measured over the last 5 minutes:

返回过去5分钟内使用http_requests_total指标名称的所有时间序列的每秒比率

```sql

rate(http_requests_total[5m])

```

Assuming that the http_requests_total time series all have the labels job (fanout by job name) and instance (fanout by instance of the job), we might want to sum over the rate of all instances, so we get fewer output time series, but still preserve the job dimension:

假设http_requests_total时间序列全部具有标签job（按 job 进行扇出）和实例（按 job实例进行扇出），则我们可能希望对所有实例的比率求和，所有得到较少的时间序列输出，但仍然保留 job 维度：

```sql

sum by (job) (

rate(http_requests_total[5m])

)

```

If we have two different metrics with the same dimensional labels, we can apply binary operators to them and elements on both sides with the same label set will get matched and propagated to the output. For example, this expression returns the unused memory in MiB for every instance (on a fictional cluster scheduler exposing these metrics about the instances it runs):

如果我们有两个具有相同维度标签的不同度量，则可以对它们应用二元运算符，并且具有相同标签集的两侧的元素都将被匹配并传播到输出。例如，此表达式为每个实例中未使用的内存MiB（在虚构的群集调度程序上，公开了有关其运行的实例的这些指标）：

```sql

(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024

```

The same expression, but summed by application, could be written like this:

相同的表达式，但由应用程序求和，可以这样写：

```sql

sum by (app, proc) (

instance_memory_limit_bytes - instance_memory_usage_bytes

) / 1024 / 1024

```

If the same fictional cluster scheduler exposed CPU usage metrics like the following for every instance:

如果相同的虚拟群集调度程序针对每个实例公开了以下CPU使用率指标：

```sql

instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"}

instance_cpu_time_ns{app="elephant", proc="worker", rev="34d0f99", env="prod", job="cluster-manager"}

instance_cpu_time_ns{app="turtle", proc="api", rev="4d3a513", env="prod", job="cluster-manager"}

instance_cpu_time_ns{app="fox", proc="widget", rev="4d3a513", env="prod", job="cluster-manager"}

...

```

...we could get the top 3 CPU users grouped by application (app) and process type (proc) like this:

...我们可以获得按应用程序（app）和进程类型（proc）分组的前3位CPU用户：

```sql

topk(3, sum by (app, proc) (rate(instance_cpu_time_ns[5m])))

```

Assuming this metric contains one time series per running instance, you could count the number of running instances per application like this:

假设此指标每个运行实例包含一个时间序列，您可以像这样计算每个应用程序的运行实例数：

```sql