【最佳实践】prometheus 监控 sql server (使用sql_exporter)

【0】核心参考

sql_exporter使用、采集器源码: https://github.com/free/sql_exporter 

MSSQL简略仪表盘:https://grafana.com/grafana/dashboards/9336

更多、更高级的sql server采集指标:https://github.com/influxdata/telegraf/tree/master/plugins/inputs/sqlserver?tdsourcetag=s_pctim_aiomsg

sql server 对象监控 参考:https://docs.microsoft.com/zh-cn/sql/relational-databases/performance-monitor/use-sql-server-objects?view=sql-server-ver15

【简述】

MSSQL的监控,官网没有提供采集器,所以只能用其他程序来监控,本文就用了 sql-exporter。

它可以理解成一个远程连接数据库的工具,可以用它来连接sql server/mysql 等等数据库,并以SQL查询方式采集SQL查询结果。

SQL_Exporter 是中心化的,可以把对不同实例的agent链接,都放在一台linux服务器上,以便管理、修改。你想想看,如果你更新了采集指标,只需要在该台中心化服务器重启sql_exporter agent采集客户端即可生效。

不像其他 mysql/linux/windows采集器,他们是部署在实际被采集的服务器上的,如果你想要更新采集器(虽然它做的很好了,不需要更新。除此之外就是无法加入自定义的采集指标或者采集项),那么需要到那么多台被采集的客户端机器上去覆盖更新,很痛苦的。

【1】安装配置 sql_exporter

【1.1】下载解压 sql_exporter

下载:https://github.com/free/sql_exporter/releases

#wget https://github.com/free/sql_exporter/releases/download/0.5/sql_exporter-0.5.linux-amd64.tar.gz

mkdir  /soft
cd /soft
wget https://github.com/free/sql_exporter/releases/download/0.5/sql_exporter-0.5.linux-amd64.tar.gz
tar -zxf sql_exporter-0.5.linux-amd64.tar.gz 
ln -s sql_exporter-0.5.linux-amd64 sql_exporter
cd sql_exporter

 【1.2】修改配置文件

# Global defaults.
global:
  # Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.
  scrape_timeout_offset: 500ms
  # Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.
  min_interval: 0s
  # Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,
  # as will concurrent scrapes.
  max_connections: 10
  # Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should
  # always be the same as max_connections.
  max_idle_connections: 5

# The target to monitor and the collectors to execute on it.
target:
  # Data source name always has a URI schema that matches the driver name. In some cases (e.g. MySQL)
  # the schema gets dropped or replaced to match the driver expected DSN format.
# data_source_name: 'sqlserver://sql_exporter:a123456!@192.168.191.81:1433/?encrypt=disable' data_source_name:
'sqlserver://sa:a123456!@192.168.191.81:1433'

# Collectors (referenced by name) to execute on the target. collectors: [mssql_standard] # Collector files specifies a list of globs. One collector definition is read from each matching file. collector_files: - "*.collector.yml"

解析:

(1)global

收集器中允许最慢的SQL执行超时时间,注意该超时时间应小于prometheus中的 scrape_time
scrape_timeout_offset: 500ms #用于从 prometheus的 scrape_timeout 中减去一个偏移时间,防止 prometheus 先超时,如果设置了 scrape_timeout,scrape
min_interval: 0s  #收集器每隔0运行一次(默认情况下)收集器每隔0运行一次。
max_connections: 10 #到任何一个目标的最大打开连接数。采集器信息查询将在多个连接上并发运行,
max_idle_connections: 5 #到任何一个目标的最大空闲连接数。除非使用很长的收集间隔,否则应该

(2)target

#数据源

data_source_name: 'sqlserver://sa:a123456!@192.168.191.81:1433'

(3)collector

#引用收集器文件

collector_files:
  - "*.collector.yml"

【1.3】自带的sql server监控采集器 

  

 这里我们配置文件中 已经引用了 配置文件相同目录下的 "*.collector.yml",所以该文件也被包含进来了。

【2】整合 prometheus + sql_exporter

那么这个东西其实是一个采集器啊,但是为什么不能放到windows上呢.......好吧,不管了好像没什么很好的windows采集器,自己也不会做,先用着吧

【2.1】修改prometheus.yml配置文件

  

【2.2】启动 sql_exporter  

(1)封装成系统服务

[Unit]
Description=sql_exporter

[Service]
Type=simple
ExecStart=/soft/sql_exporter/sql_exporter -config.file /soft/sql_exporter/sql_exporter.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target

(2)启动、查看

systemctl daemon-reload
systemctl start sql_exporter
systemctl status sql_exporter -l

启动成功,并且 默认端口是 9399

  

【2.3】核验

http://192.168.175.131:9399/metrics

如下图,这就成功了啊

  

【3】结合 grafana 显示

【3.1】导入MSSQL模板

https://grafana.com/grafana/dashboards?dataSource=prometheus&search=mssql

   

  

 导入这个 9336 模板

  

【3.2】查看仪表盘 

最后结果:好像还行,但很多图表no data ,这个模板和采集器不是很配,有空可以自己配置修改一下

  

【4】自定义MSSQL

【4.0】MSSQL权限与监控账户

DECLARE @sql VARCHAR(max)
SET @sql=CAST('use master;CREATE LOGIN [sql_exporter] WITH PASSWORD=N''qICJEasdqwDiOSrdT96'', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF; GRANT VIEW SERVER STATE TO [sql_exporter];
GRANT VIEW ANY DEFINITION TO [sql_exporter];' AS VARCHAR(max))

select @sql=@sql+CAST('use '+name+';CREATE USER [sql_exporter] FOR LOGIN [sql_exporter];
EXEC sp_addrolemember N''db_datareader'', N''sql_exporter'';'+CHAR(10) AS VARCHAR(max)) 
from master.sys.databases  and state=0
EXEC(@sql)

防火墙什么的就不用我说了吧?

【4.1】自定义采集器

【4.2】采集器启动

布置在linux 中间节点,也可以直接布置在prometheus节点上

如果密码有特殊字符,在URL上使用报错,则参见附录,用特殊字符替换

比如密码: !@#$%^qwe123   转移成 %21%40%23%24%25%5Eqwe123

nohup /data/mssql/sql_exporter -config.data-source-name=sqlserver://sql_exporter:qICJEasdqwDiOSrdT96@10.112.5.106:1433/?encrypt=disable -config.file=/data/mssql/sql_exporter.yml -web.listen-address=127.0.0.1:9400 -log_dir=/data/mssql_log &

nohup /data/mssql/sql_exporter -config.data-source-name=sqlserver://sql_exporter:qICJEasdqwDiOSrdT96@10.112.5.105:1433/?encrypt=disable -config.file=/data/mssql/sql_exporter.yml -web.listen-address=127.0.0.1:9401 -log_dir=/data/mssql_log &

【4.3】prometheus配置

因为是自定义,所以在job name中特地加了 mssql 关键字,以便【4.4】中的变量好获取到所有mssql 的job,以供筛选

- job_name: '大连娱网_mssql'
    static_configs:
    - targets: ['127.0.0.1:9400']
      labels:
        name: '我是第一台机器DB[10.112.5.106]' 
  - targets: ['127.0.0.1:9401']
    labels:
    name:
'我是第二台机器DB[10.112.5.10]'

这里的配置,要和【4.2】的相互对应,否则会采集出问题。

这里的name 为什么会写上IP呢,这是因为为了辨识是哪台机器,也是为了【4.4】中的大盘 仪表盘上可以显示出机器IP来,因为我们这个是自定义的,无法像官网提供的一样。其实我们这个有点类似于pushgateway

【4.4】自定义仪表盘

 核心变量

  

 骚气界面预览

  

【4.5】报警规则

groups:
- name: MSSQL告警规则
  rules:

  - alert: mssql引擎服务宕机
    expr:  windows_service_state{state="running",exported_name="mssqlserver"}!=1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "详细: {{ $labels }}"

   
  - alert: mssql代理服务宕机
    expr:  windows_service_state{exported_name="sqlserveragent",state="running"}!=1                                                     
    for: 1m         
    labels:
      severity: warning
    annotations:
      summary: "详细: {{ $labels }}"

  - alert: mssql引擎服务重启
    expr: mssql_db_uptime < 3600
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "详细: {{ $labels }}"
      description: "mssql引擎服务1小时内有过重启,现已重启{{ $value }} 秒"

  - alert: mssql数据库不可用/不可访问
    expr: mssql_current_state_dbState !=0
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "详细: {{ $labels }}"
      description: "db:{{ $labels.db }}
 value:{{ $labels.value }}={{ $value }} "

  - alert: mssql阻塞
    expr: sum(mssql_current_state_blocking)>5
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "详细: {{ $labels }}"
      description: "mssql请求阻塞数>5,当前:{{ $value }} "

  - alert: mssql请求过多
    expr: sum(mssql_current_state_requests)>100
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "详细: {{ $labels }}"
      description: "mssql请求数>100,当前:{{ $value }} "


  - alert: mssql死锁产生
    expr: increase(mssql_counter{type_object="SQLServer:Locks",type_counter="Number of Deadlocks/sec",type_instance="_Total"}[5m])>0
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "详细: {{ $labels }}"
      description: "mssql 5分钟内死锁产生次数:{{ $value }} "


  - alert: mssql作业执行错误
    expr: increase(mssql_job_state_today[5m])>0
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "详细: {{ $labels }}"
      description: "mssql 今天作业运行错误次数:{{ $value }} "


  - alert: mssql镜像状态变化
    expr: increase(mssql_mirror_sync{value="status"} [5m])!=0
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "详细: {{ $labels }}"
      description: "db:{{ $labels.db }}
 value:{{ $labels.value }}={{ $value }} "
  

【4.6】报警模板

email

{{ define "email.html" }}
{{- if gt (len .Alerts.Firing) 0 -}}{{ range.Alerts }}
告警项: {{ .Labels.alertname }} <br>
项目组:{{ .Labels.job }} <br>
实例名:{{ .Labels.name }}-{{ .Labels.instance }}  <br>
详情:  {{ .Annotations.description }} <br>
级别:  {{ .Labels.severity }}  <br>
开始时间:  {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}<br>
++++++++++++++++++++++++++++++++++++<br>
+++++++++++++++++++++++++++++++++++++<br>
{{ end }}{{ end -}}
{{- if gt (len .Alerts.Resolved) 0 -}}{{ range.Alerts }}
Resolved<br>
告警项: {{ .Labels.alertname }} <br>
项目组:{{ .Labels.job }} <br>
实例名:{{ .Labels.name }}-{{ .Labels.instance }}  <br>
详情:  {{ .Annotations.description }} <br>
级别:  {{ .Labels.severity }}  <br>
开始时间:  {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}<br>
恢复时间:  {{ (.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}<br>
++++++++++++++++++++++++++++++++++++<br>
+++++++++++++++++++++++++++++++++++++<br>
{{ end }}{{ end -}}
{{- end }}

企业微信

{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}{{ range.Alerts }}
告警项: {{ .Labels.alertname }}
项目组:{{ .Labels.job }}
实例名:{{ .Labels.name }}-{{ .Labels.instance }}
详情:  {{ .Annotations.description }}
级别:  {{ .Labels.severity }}
开始时间:  {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
------------------------------------
------------------------------------
{{ end }}{{ end -}}
{{- if gt (len .Alerts.Resolved) 0 -}}{{ range.Alerts }}
Resolved
告警项: {{ .Labels.alertname }}
项目组:{{ .Labels.job }}
实例名:{{ .Labels.name }}-{{ .Labels.instance }}
详情:  {{ .Annotations.description }}
级别:  {{ .Labels.severity }}
开始时间:  {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
恢复时间:  {{ (.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
------------------------------------
------------------------------------
{{ end }}{{ end -}}
{{- end }}

【最佳实践】安装采集器、配置sql server权限

【1】 windows采集器文件

【1.1】上传文件 windows_exporter 采集器文件

【1.2】双击运行

可能有东西出来,也可能一闪而过,也可能一点反应都没有。

【1.3】核验

Win+R 运行  =》Services.msc

查看服务, windows_exporter,这就表示部署好了。

【2】mssql 访问配置

【2.1】防火墙配置/MSSQL权限配置(打开cmd,在dos命令窗口下执行)

netsh advfirewall firewall add rule name="prometheus_monitor" dir=in action=allow remoteip="192.168.1.1,192.168.1.2" protocol=TCP localport="1433,9182"
 
net stop wuauserv
sc config wuauserv start= disabled
sc config TrustedInstaller start= disabled
sc config windows_exporter start= delayed-auto
sc config MSSQLSERVER start= delayed-auto
sc config SQLSERVERAGENT start= delayed-auto


sqlcmd -E
USE [master]
GO

CREATE LOGIN [sql_exporter] WITH PASSWORD=N'qwer1234qwer', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
GO

ALTER SERVER ROLE [sysadmin] ADD MEMBER [sql_exporter]
GO

-------------下面的不要运行,只是参考!--------

 

参考:

DECLARE @sql VARCHAR(max)
SET @sql=CAST('use master;CREATE LOGIN [sql_exporter] WITH PASSWORD=N''qwer1234qwer'', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF; GRANT VIEW SERVER STATE TO [sql_exporter];
GRANT VIEW ANY DEFINITION TO [sql_exporter];' AS VARCHAR(max))
select @sql=@sql+CAST('use '+name+';CREATE USER [sql_exporter] FOR LOGIN [sql_exporter];

EXEC sp_addrolemember N''db_datareader'', N''sql_exporter'';'+CHAR(10) AS VARCHAR(max))
from master.sys.databases where state=0 and is_read_only=0
EXEC(@sql)

Go

ALTER SERVER ROLE [sysadmin] ADD MEMBER [sql_exporter]

GO

参考:

Declare @login varchar(200),@role varchar(200), @login_pwd varchar(200)

Set @login='business_query'
Set @login_pwd='qwer1234qwer'
SET @role='db_datareader'

DECLARE @sql VARCHAR(max)

SET @sql=CAST('use master;CREATE LOGIN '+@login+' WITH PASSWORD=N'''+@login_pwd +''', DEFAULT_DATABASE=[master], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF; GRANT VIEW SERVER STATE TO '+@login+';
GRANT VIEW ANY DEFINITION TO '+@login+';' AS VARCHAR(max))
select @sql=@sql+CAST('use '+name+';CREATE USER '+@login+' FOR LOGIN '+@login+';
EXEC sp_addrolemember N'''+@role+''', N''business_query'';'+CHAR(10) AS VARCHAR(max))
from master.sys.databases where state=0 and is_read_only=0
EXEC(@sql)

Go

【3】采集器服务器配置参考

进入 115.230.30.138——10.20.53.12

cd /data/mssql/

修改 mssql_agent.sh

也要修改prometheus的配置文件;参考:

nohup /data/mssql/sql_exporter -config.data-source-name=sqlserver://sql_exporter:qwer1234qwer@10.20.54.59:1433/?encrypt=disable 
-config.file=/data/mssql/sql_exporter.yml -web.listen-address=127.0.0.1:9402 -log_dir=/data/mssql_log &

【参考文档】

参考:https://www.bilibili.com/read/cv7134580/

官网:https://github.com/free/sql_exporter

 【附录】

由于在url中特殊符号都有特殊意义或者被认为是不安全的字符,所以在拼接url时应当替换出url中的特殊字符

例如 

var x = "2# 前缘肋"

var rp= x.replace('#','%23'); // %23是#的URL编码 要用他来替换原有的#

w3schools网站上列出了此类编码参考

CharacterFrom Windows-1252From UTF-8
space %20 %20
! %21 %21
" %22 %22
# %23 %23
$ %24 %24
% %25 %25
& %26 %26
' %27 %27
( %28 %28
) %29 %29
* %2A %2A
+ %2B %2B
, %2C %2C
- %2D %2D
. %2E %2E
/ %2F %2F
0 %30 %30
1 %31 %31
2 %32 %32
3 %33 %33
4 %34 %34
5 %35 %35
6 %36 %36
7 %37 %37
8 %38 %38
9 %39 %39
: %3A %3A
; %3B %3B
< %3C %3C
= %3D %3D
> %3E %3E
? %3F %3F
@ %40 %40
A %41 %41
B %42 %42
C %43 %43
D %44 %44
E %45 %45
F %46 %46
G %47 %47
H %48 %48
I %49 %49
J %4A %4A
K %4B %4B
L %4C %4C
M %4D %4D
N %4E %4E
O %4F %4F
P %50 %50
Q %51 %51
R %52 %52
S %53 %53
T %54 %54
U %55 %55
V %56 %56
W %57 %57
X %58 %58
Y %59 %59
Z %5A %5A
[ %5B %5B
%5C %5C
] %5D %5D
^ %5E %5E
_ %5F %5F
` %60 %60
a %61 %61
b %62 %62
c %63 %63
d %64 %64
e %65 %65
f %66 %66
g %67 %67
h %68 %68
i %69 %69
j %6A %6A
k %6B %6B
l %6C %6C
m %6D %6D
n %6E %6E
o %6F %6F
p %70 %70
q %71 %71
r %72 %72
s %73 %73
t %74 %74
u %75 %75
v %76 %76
w %77 %77
x %78 %78
y %79 %79
z %7A %7A
{ %7B %7B
| %7C %7C
} %7D %7D
~ %7E %7E
  %7F %7F
` %80 %E2%82%AC
 %81 %81
%82 %E2%80%9A
ƒ %83 %C6%92
%84 %E2%80%9E
%85 %E2%80%A6
%86 %E2%80%A0
%87 %E2%80%A1
ˆ %88 %CB%86
%89 %E2%80%B0
Š %8A %C5%A0
%8B %E2%80%B9
Π%8C %C5%92
 %8D %C5%8D
Ž %8E %C5%BD
 %8F %8F
 %90 %C2%90
%91 %E2%80%98
%92 %E2%80%99
%93 %E2%80%9C
%94 %E2%80%9D
%95 %E2%80%A2
%96 %E2%80%93
%97 %E2%80%94
˜ %98 %CB%9C
%99 %E2%84
š %9A %C5%A1
%9B %E2%80
œ %9C %C5%93
 %9D %9D
ž %9E %C5%BE
Ÿ %9F %C5%B8
  %A0 %C2%A0
¡ %A1 %C2%A1
¢ %A2 %C2%A2
£ %A3 %C2%A3
¤ %A4 %C2%A4
¥ %A5 %C2%A5
¦ %A6 %C2%A6
§ %A7 %C2%A7
¨ %A8 %C2%A8
© %A9 %C2%A9
ª %AA %C2%AA
« %AB %C2%AB
¬ %AC %C2%AC
  %AD %C2%AD
® %AE %C2%AE
¯ %AF %C2%AF
° %B0 %C2%B0
± %B1 %C2%B1
² %B2 %C2%B2
³ %B3 %C2%B3
´ %B4 %C2%B4
µ %B5 %C2%B5
%B6 %C2%B6
· %B7 %C2%B7
¸ %B8 %C2%B8
¹ %B9 %C2%B9
º %BA %C2%BA
» %BB %C2%BB
¼ %BC %C2%BC
½ %BD %C2%BD
¾ %BE %C2%BE
¿ %BF %C2%BF
À %C0 %C3%80
Á %C1 %C3%81
 %C2 %C3%82
à %C3 %C3%83
Ä %C4 %C3%84
Å %C5 %C3%85
Æ %C6 %C3%86
Ç %C7 %C3%87
È %C8 %C3%88
É %C9 %C3%89
Ê %CA %C3%8A
Ë %CB %C3%8B
Ì %CC %C3%8C
Í %CD %C3%8D
Î %CE %C3%8E
Ï %CF %C3%8F
Ð %D0 %C3%90
Ñ %D1 %C3%91
Ò %D2 %C3%92
Ó %D3 %C3%93
Ô %D4 %C3%94
Õ %D5 %C3%95
Ö %D6 %C3%96
× %D7 %C3%97
Ø %D8 %C3%98
Ù %D9 %C3%99
Ú %DA %C3%9A
Û %DB %C3%9B
Ü %DC %C3%9C
Ý %DD %C3%9D
Þ %DE %C3%9E
ß %DF %C3%9F
à %E0 %C3%A0
á %E1 %C3%A1
â %E2 %C3%A2
ã %E3 %C3%A3
ä %E4 %C3%A4
å %E5 %C3%A5
æ %E6 %C3%A6
ç %E7 %C3%A7
è %E8 %C3%A8
é %E9 %C3%A9
ê %EA %C3%AA
ë %EB %C3%AB
ì %EC %C3%AC
í %ED %C3%AD
î %EE %C3%AE
ï %EF %C3%AF
ð %F0 %C3%B0
ñ %F1 %C3%B1
ò %F2 %C3%B2
ó %F3 %C3%B3
ô %F4 %C3%B4
õ %F5 %C3%B5
ö %F6 %C3%B6
÷ %F7 %C3%B7
ø %F8 %C3%B8
ù %F9 %C3%B9
ú %FA %C3%BA
û %FB %C3%BB
ü %FC %C3%BC
ý %FD %C3%BD
þ %FE %C3%BE
ÿ %FF %C3%BF

URL Encoding Reference

The ASCII control characters %00-%1F were originally designed to control hardware devices.

Control characters have nothing to do inside a URL.

ASCII CharacterDescriptionURL-encoding
NUL null character %00
SOH start of header %01
STX start of text %02
ETX end of text %03
EOT end of transmission %04
ENQ enquiry %05
ACK acknowledge %06
BEL bell (ring) %07
BS backspace %08
HT horizontal tab %09
LF line feed %0A
VT vertical tab %0B
FF form feed %0C
CR carriage return %0D
SO shift out %0E
SI shift in %0F
DLE data link escape %10
DC1 device control 1 %11
DC2 device control 2 %12
DC3 device control 3 %13
DC4 device control 4 %14
NAK negative acknowledge %15
SYN synchronize %16
ETB end transmission block %17
CAN cancel %18
EM end of medium %19
SUB substitute %1A
ESC escape %1B
FS file separator %1C
GS group separator %1D
RS record separator %1E
US unit separator

 

原文地址:https://www.cnblogs.com/gered/p/13535212.html