Redis监控之redis_exporter+prometheus+grafana+alertmanager

Redis监控之redis_exporter+prometheus+grafana+alertmanager

redis_exporter安装完后获取的数据太乱阅读太困难,需要配合prometheus和grafana。

操作系统是CentOS Linux 7。 

不出意外需要账号密码的默认都是admin/admin

redis_exporter部署

下载地址:https://github.com/oliver006/redis_exporter/releases/tag/v1.24.0

另外的参考地址:

https://docs.gitlab.com/ee/administration/monitoring/prometheus/redis_exporter.html
https://github.com/oliver006/redis_exporter

下载的文件:redis_exporter-v1.24.0.linux-amd64.tar.gz

解压安装:

tar -zxvf redis_exporter-v1.24.0.linux-amd64.tar.gz -C /
mv /redis_exporter-v1.24.0.linux-amd64/ /redis_exporter

启动redis_exporter

[root@node1 soft]# cd /redis_exporter/
[root@node1 redis_exporter]# ./redis_exporter -redis.addr 192.168.1.214:6380 -web.listen-address 192.168.1.178:9121
INFO[0000] Redis Metrics Exporter v1.24.0    build date: 2021-06-09-01:40:46    sha1: b95cf3b5ce7543119b303766662d1f0400caea94    Go: go1.16.5    GOOS: linux    GOARCH: amd64 
INFO[0000] Providing metrics at 192.168.1.178:9121/metrics 
ERRO[0015] Couldn't connect to redis instance

网上那些一次性写多个地址的方式并不可取,如

-redis.addr 192.168.1.214:6380,192.168.1.214:6379,192.168.1.214:6381

每次刷新都会报错ERRO[0001],如下

[root@node1 redis_exporter]# ./redis_exporter -redis.addr 192.168.1.214:6380,192.168.1.214:6379,192,168.1.214:6381 -web.listen-address 192.168.1.178:9121
INFO[0000] Redis Metrics Exporter v1.24.0    build date: 2021-06-09-01:40:46    sha1: b95cf3b5ce7543119b303766662d1f0400caea94    Go: go1.16.5    GOOS: linux    GOARCH: amd64 
INFO[0000] Providing metrics at 192.168.1.178:9121/metrics 
ERRO[0001] Couldn't connect to redis instance 

这里就只写一个主节点的地址192.168.1.214:6380,网络资料说的是可以自动获取集群其他节点的信息,不过我这个是主从的目前看也是可以自动获取的。

访问192.168.1.178:9121/metrics可以看到获取的信息。

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.4411e-05
go_gc_duration_seconds{quantile="0.25"} 9.8068e-05
go_gc_duration_seconds{quantile="0.5"} 0.000130716
go_gc_duration_seconds{quantile="0.75"} 0.000174814
go_gc_duration_seconds{quantile="1"} 0.000622031
go_gc_duration_seconds_sum 0.047733795
go_gc_duration_seconds_count 326
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 10
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.16.5"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 3.17684e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 5.85939608e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.499842e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 4.416845e+06
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 4.7848542653098556e-05
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 5.065448e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 3.17684e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.1833216e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 4.620288e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 4394
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.1087744e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6453504e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.62787137367609e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 4.421239e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 4800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 78744
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 114688
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 6.200176e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 988766
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 655360
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 655360
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.4793992e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 7
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 18.19
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 13
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.1882496e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.62786724615e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.30558464e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP redis_active_defrag_running active_defrag_running metric
# TYPE redis_active_defrag_running gauge
redis_active_defrag_running 0
# HELP redis_aof_current_rewrite_duration_sec aof_current_rewrite_duration_sec metric
# TYPE redis_aof_current_rewrite_duration_sec gauge
redis_aof_current_rewrite_duration_sec -1
# HELP redis_aof_enabled aof_enabled metric
# TYPE redis_aof_enabled gauge
redis_aof_enabled 0
# HELP redis_aof_last_bgrewrite_status aof_last_bgrewrite_status metric
# TYPE redis_aof_last_bgrewrite_status gauge
redis_aof_last_bgrewrite_status 1
# HELP redis_aof_last_cow_size_bytes aof_last_cow_size_bytes metric
# TYPE redis_aof_last_cow_size_bytes gauge
redis_aof_last_cow_size_bytes 0
# HELP redis_aof_last_rewrite_duration_sec aof_last_rewrite_duration_sec metric
# TYPE redis_aof_last_rewrite_duration_sec gauge
redis_aof_last_rewrite_duration_sec -1
# HELP redis_aof_last_write_status aof_last_write_status metric
# TYPE redis_aof_last_write_status gauge
redis_aof_last_write_status 1
# HELP redis_aof_rewrite_in_progress aof_rewrite_in_progress metric
# TYPE redis_aof_rewrite_in_progress gauge
redis_aof_rewrite_in_progress 0
# HELP redis_aof_rewrite_scheduled aof_rewrite_scheduled metric
# TYPE redis_aof_rewrite_scheduled gauge
redis_aof_rewrite_scheduled 0
# HELP redis_blocked_clients blocked_clients metric
# TYPE redis_blocked_clients gauge
redis_blocked_clients 0
# HELP redis_client_biggest_input_buf client_biggest_input_buf metric
# TYPE redis_client_biggest_input_buf gauge
redis_client_biggest_input_buf 0
# HELP redis_client_longest_output_list client_longest_output_list metric
# TYPE redis_client_longest_output_list gauge
redis_client_longest_output_list 0
# HELP redis_cluster_enabled cluster_enabled metric
# TYPE redis_cluster_enabled gauge
redis_cluster_enabled 0
# HELP redis_commands_duration_seconds_total Total amount of time in seconds spent per command
# TYPE redis_commands_duration_seconds_total counter
redis_commands_duration_seconds_total{cmd="auth"} 1.6e-05
redis_commands_duration_seconds_total{cmd="client"} 0.119519
redis_commands_duration_seconds_total{cmd="command"} 0.000553
redis_commands_duration_seconds_total{cmd="config"} 0.560761
redis_commands_duration_seconds_total{cmd="del"} 40.078852
redis_commands_duration_seconds_total{cmd="eval"} 0.000648
redis_commands_duration_seconds_total{cmd="evalsha"} 52.593835
redis_commands_duration_seconds_total{cmd="exists"} 0.002163
redis_commands_duration_seconds_total{cmd="expire"} 4.639735
redis_commands_duration_seconds_total{cmd="get"} 39.35076
redis_commands_duration_seconds_total{cmd="hdel"} 0.032488
redis_commands_duration_seconds_total{cmd="hget"} 4.143723
redis_commands_duration_seconds_total{cmd="hgetall"} 52.309559
redis_commands_duration_seconds_total{cmd="hincrby"} 6.253747
redis_commands_duration_seconds_total{cmd="hlen"} 0.000279
redis_commands_duration_seconds_total{cmd="hmset"} 97.246473
redis_commands_duration_seconds_total{cmd="host"} 0.002547
redis_commands_duration_seconds_total{cmd="hscan"} 0.027941
redis_commands_duration_seconds_total{cmd="hset"} 0.111718
redis_commands_duration_seconds_total{cmd="incr"} 0.081717
redis_commands_duration_seconds_total{cmd="incrby"} 0.790273
redis_commands_duration_seconds_total{cmd="info"} 472.399096
redis_commands_duration_seconds_total{cmd="keys"} 0.011277
redis_commands_duration_seconds_total{cmd="latency"} 0.011697
redis_commands_duration_seconds_total{cmd="lindex"} 0.003309
redis_commands_duration_seconds_total{cmd="llen"} 0.000243
redis_commands_duration_seconds_total{cmd="lrange"} 0.714049
redis_commands_duration_seconds_total{cmd="lrem"} 0.002257
redis_commands_duration_seconds_total{cmd="ltrim"} 0.081033
redis_commands_duration_seconds_total{cmd="pexpire"} 0.053587
redis_commands_duration_seconds_total{cmd="ping"} 33.619505
redis_commands_duration_seconds_total{cmd="psync"} 0.010975
redis_commands_duration_seconds_total{cmd="publish"} 47.437203
redis_commands_duration_seconds_total{cmd="replconf"} 24.135835
redis_commands_duration_seconds_total{cmd="rpush"} 0.724147
redis_commands_duration_seconds_total{cmd="sadd"} 9.122367
redis_commands_duration_seconds_total{cmd="scan"} 183.549755
redis_commands_duration_seconds_total{cmd="scard"} 1.271612
redis_commands_duration_seconds_total{cmd="select"} 12.112273
redis_commands_duration_seconds_total{cmd="set"} 59.943641
redis_commands_duration_seconds_total{cmd="setex"} 0.390939
redis_commands_duration_seconds_total{cmd="setnx"} 5.509553
redis_commands_duration_seconds_total{cmd="slowlog"} 0.062131
redis_commands_duration_seconds_total{cmd="smembers"} 0.108663
redis_commands_duration_seconds_total{cmd="spop"} 0.6798
redis_commands_duration_seconds_total{cmd="srem"} 0.014079
redis_commands_duration_seconds_total{cmd="sscan"} 0.002472
redis_commands_duration_seconds_total{cmd="subscribe"} 1.2e-05
redis_commands_duration_seconds_total{cmd="ttl"} 0.002117
redis_commands_duration_seconds_total{cmd="type"} 0.003339
redis_commands_duration_seconds_total{cmd="unlink"} 0.020745
# HELP redis_commands_processed_total commands_processed_total metric
# TYPE redis_commands_processed_total counter
redis_commands_processed_total 1.27407536e+08
# HELP redis_commands_total Total number of calls per command
# TYPE redis_commands_total counter
redis_commands_total{cmd="auth"} 9
redis_commands_total{cmd="client"} 79475
redis_commands_total{cmd="command"} 1
redis_commands_total{cmd="config"} 4578
redis_commands_total{cmd="del"} 137331
redis_commands_total{cmd="eval"} 3
redis_commands_total{cmd="evalsha"} 1.528261e+06
redis_commands_total{cmd="exists"} 622
redis_commands_total{cmd="expire"} 2.031993e+06
redis_commands_total{cmd="get"} 1.195089e+07
redis_commands_total{cmd="hdel"} 3209
redis_commands_total{cmd="hget"} 998016
redis_commands_total{cmd="hgetall"} 5.695487e+06
redis_commands_total{cmd="hincrby"} 654030
redis_commands_total{cmd="hlen"} 76
redis_commands_total{cmd="hmset"} 6.570541e+06
redis_commands_total{cmd="host"} 52
redis_commands_total{cmd="hscan"} 76
redis_commands_total{cmd="hset"} 6202
redis_commands_total{cmd="incr"} 7435
redis_commands_total{cmd="incrby"} 121021
redis_commands_total{cmd="info"} 3.791154e+06
redis_commands_total{cmd="keys"} 78
redis_commands_total{cmd="latency"} 4444
redis_commands_total{cmd="lindex"} 46
redis_commands_total{cmd="llen"} 52
redis_commands_total{cmd="lrange"} 170093
redis_commands_total{cmd="lrem"} 46
redis_commands_total{cmd="ltrim"} 3808
redis_commands_total{cmd="pexpire"} 13934
redis_commands_total{cmd="ping"} 2.7573152e+07
redis_commands_total{cmd="psync"} 4
redis_commands_total{cmd="publish"} 7.048611e+06
redis_commands_total{cmd="replconf"} 1.4497687e+07
redis_commands_total{cmd="rpush"} 10005
redis_commands_total{cmd="sadd"} 559362
redis_commands_total{cmd="scan"} 1.2812383e+07
redis_commands_total{cmd="scard"} 258338
redis_commands_total{cmd="select"} 1.0435721e+07
redis_commands_total{cmd="set"} 1.8583699e+07
redis_commands_total{cmd="setex"} 42367
redis_commands_total{cmd="setnx"} 1.535913e+06
redis_commands_total{cmd="slowlog"} 8888
redis_commands_total{cmd="smembers"} 22600
redis_commands_total{cmd="spop"} 236576
redis_commands_total{cmd="srem"} 1752
redis_commands_total{cmd="sscan"} 33
redis_commands_total{cmd="subscribe"} 2
redis_commands_total{cmd="ttl"} 670
redis_commands_total{cmd="type"} 677
redis_commands_total{cmd="unlink"} 6133
# HELP redis_config_maxclients config_maxclients metric
# TYPE redis_config_maxclients gauge
redis_config_maxclients 10000
# HELP redis_config_maxmemory config_maxmemory metric
# TYPE redis_config_maxmemory gauge
redis_config_maxmemory 0
# HELP redis_connected_clients connected_clients metric
# TYPE redis_connected_clients gauge
redis_connected_clients 86
# HELP redis_connected_slave_lag_seconds Lag of connected slave
# TYPE redis_connected_slave_lag_seconds gauge
redis_connected_slave_lag_seconds{slave_ip="192.168.1.214",slave_port="6379",slave_state="online"} 1
redis_connected_slave_lag_seconds{slave_ip="192.168.1.214",slave_port="6381",slave_state="online"} 1
# HELP redis_connected_slave_offset_bytes Offset of connected slave
# TYPE redis_connected_slave_offset_bytes gauge
redis_connected_slave_offset_bytes{slave_ip="192.168.1.214",slave_port="6379",slave_state="online"} 2.1943761833e+10
redis_connected_slave_offset_bytes{slave_ip="192.168.1.214",slave_port="6381",slave_state="online"} 2.1943761833e+10
# HELP redis_connected_slaves connected_slaves metric
# TYPE redis_connected_slaves gauge
redis_connected_slaves 2
# HELP redis_connections_received_total connections_received_total metric
# TYPE redis_connections_received_total counter
redis_connections_received_total 4.7644e+06
# HELP redis_cpu_sys_children_seconds_total cpu_sys_children_seconds_total metric
# TYPE redis_cpu_sys_children_seconds_total counter
redis_cpu_sys_children_seconds_total 1195.64
# HELP redis_cpu_sys_seconds_total cpu_sys_seconds_total metric
# TYPE redis_cpu_sys_seconds_total counter
redis_cpu_sys_seconds_total 12650.77
# HELP redis_cpu_user_children_seconds_total cpu_user_children_seconds_total metric
# TYPE redis_cpu_user_children_seconds_total counter
redis_cpu_user_children_seconds_total 8929.86
# HELP redis_cpu_user_seconds_total cpu_user_seconds_total metric
# TYPE redis_cpu_user_seconds_total counter
redis_cpu_user_seconds_total 8919.24
# HELP redis_db_avg_ttl_seconds Avg TTL in seconds
# TYPE redis_db_avg_ttl_seconds gauge
redis_db_avg_ttl_seconds{db="db11"} 1825.3
redis_db_avg_ttl_seconds{db="db12"} 71020.336
redis_db_avg_ttl_seconds{db="db13"} 84212.367
redis_db_avg_ttl_seconds{db="db14"} 36.304
redis_db_avg_ttl_seconds{db="db15"} 0
redis_db_avg_ttl_seconds{db="db4"} 2306.138
redis_db_avg_ttl_seconds{db="db5"} 0
redis_db_avg_ttl_seconds{db="db6"} 0
redis_db_avg_ttl_seconds{db="db7"} 1.422106525e+06
redis_db_avg_ttl_seconds{db="db9"} 82129.002
# HELP redis_db_keys Total number of keys by DB
# TYPE redis_db_keys gauge
redis_db_keys{db="db0"} 0
redis_db_keys{db="db1"} 0
redis_db_keys{db="db10"} 0
redis_db_keys{db="db11"} 102
redis_db_keys{db="db12"} 83
redis_db_keys{db="db13"} 56
redis_db_keys{db="db14"} 232
redis_db_keys{db="db15"} 3
redis_db_keys{db="db16"} 0
redis_db_keys{db="db17"} 0
redis_db_keys{db="db18"} 0
redis_db_keys{db="db19"} 0
redis_db_keys{db="db2"} 0
redis_db_keys{db="db3"} 0
redis_db_keys{db="db4"} 8
redis_db_keys{db="db5"} 3
redis_db_keys{db="db6"} 6
redis_db_keys{db="db7"} 998
redis_db_keys{db="db8"} 0
redis_db_keys{db="db9"} 24
# HELP redis_db_keys_expiring Total number of expiring keys by DB
# TYPE redis_db_keys_expiring gauge
redis_db_keys_expiring{db="db0"} 0
redis_db_keys_expiring{db="db1"} 0
redis_db_keys_expiring{db="db10"} 0
redis_db_keys_expiring{db="db11"} 1
redis_db_keys_expiring{db="db12"} 15
redis_db_keys_expiring{db="db13"} 2
redis_db_keys_expiring{db="db14"} 2
redis_db_keys_expiring{db="db15"} 0
redis_db_keys_expiring{db="db16"} 0
redis_db_keys_expiring{db="db17"} 0
redis_db_keys_expiring{db="db18"} 0
redis_db_keys_expiring{db="db19"} 0
redis_db_keys_expiring{db="db2"} 0
redis_db_keys_expiring{db="db3"} 0
redis_db_keys_expiring{db="db4"} 8
redis_db_keys_expiring{db="db5"} 0
redis_db_keys_expiring{db="db6"} 0
redis_db_keys_expiring{db="db7"} 960
redis_db_keys_expiring{db="db8"} 0
redis_db_keys_expiring{db="db9"} 3
# HELP redis_defrag_hits defrag_hits metric
# TYPE redis_defrag_hits gauge
redis_defrag_hits 0
# HELP redis_defrag_key_hits defrag_key_hits metric
# TYPE redis_defrag_key_hits gauge
redis_defrag_key_hits 0
# HELP redis_defrag_key_misses defrag_key_misses metric
# TYPE redis_defrag_key_misses gauge
redis_defrag_key_misses 0
# HELP redis_defrag_misses defrag_misses metric
# TYPE redis_defrag_misses gauge
redis_defrag_misses 0
# HELP redis_evicted_keys_total evicted_keys_total metric
# TYPE redis_evicted_keys_total counter
redis_evicted_keys_total 0
# HELP redis_expired_keys_total expired_keys_total metric
# TYPE redis_expired_keys_total counter
redis_expired_keys_total 42862
# HELP redis_exporter_build_info redis exporter build_info
# TYPE redis_exporter_build_info gauge
redis_exporter_build_info{build_date="2021-06-09-01:40:46",commit_sha="b95cf3b5ce7543119b303766662d1f0400caea94",golang_version="go1.16.5",version="v1.24.0"} 1
# HELP redis_exporter_last_scrape_connect_time_seconds exporter_last_scrape_connect_time_seconds metric
# TYPE redis_exporter_last_scrape_connect_time_seconds gauge
redis_exporter_last_scrape_connect_time_seconds 0.000938134
# HELP redis_exporter_last_scrape_duration_seconds exporter_last_scrape_duration_seconds metric
# TYPE redis_exporter_last_scrape_duration_seconds gauge
redis_exporter_last_scrape_duration_seconds 0.00479455
# HELP redis_exporter_last_scrape_error The last scrape error status.
# TYPE redis_exporter_last_scrape_error gauge
redis_exporter_last_scrape_error{err=""} 0
# HELP redis_exporter_scrape_duration_seconds Durations of scrapes by the exporter
# TYPE redis_exporter_scrape_duration_seconds summary
redis_exporter_scrape_duration_seconds_sum 1.1995302149999998
redis_exporter_scrape_duration_seconds_count 237
# HELP redis_exporter_scrapes_total Current total redis scrapes.
# TYPE redis_exporter_scrapes_total counter
redis_exporter_scrapes_total 237
# HELP redis_instance_info Information about the Redis instance
# TYPE redis_instance_info gauge
redis_instance_info{maxmemory_policy="noeviction",os="Linux 3.10.0-957.el7.x86_64 x86_64",process_id="5428",redis_build_id="2d12e85652dc7ce9",redis_mode="standalone",redis_version="4.0.2",role="master",run_id="3f70dd786f2534fae677062ac371f87fd78fe914",tcp_port="6380"} 1
# HELP redis_keyspace_hits_total keyspace_hits_total metric
# TYPE redis_keyspace_hits_total counter
redis_keyspace_hits_total 9.793446e+06
# HELP redis_keyspace_misses_total keyspace_misses_total metric
# TYPE redis_keyspace_misses_total counter
redis_keyspace_misses_total 9.303561e+06
# HELP redis_last_key_groups_scrape_duration_milliseconds Duration of the last key group metrics scrape in milliseconds
# TYPE redis_last_key_groups_scrape_duration_milliseconds gauge
redis_last_key_groups_scrape_duration_milliseconds 0
# HELP redis_last_slow_execution_duration_seconds The amount of time needed for last slow execution, in seconds
# TYPE redis_last_slow_execution_duration_seconds gauge
redis_last_slow_execution_duration_seconds 0.059945
# HELP redis_latest_fork_seconds latest_fork_seconds metric
# TYPE redis_latest_fork_seconds gauge
redis_latest_fork_seconds 0.006136
# HELP redis_lazyfree_pending_objects lazyfree_pending_objects metric
# TYPE redis_lazyfree_pending_objects gauge
redis_lazyfree_pending_objects 0
# HELP redis_loading_dump_file loading_dump_file metric
# TYPE redis_loading_dump_file gauge
redis_loading_dump_file 0
# HELP redis_master_repl_offset master_repl_offset metric
# TYPE redis_master_repl_offset gauge
redis_master_repl_offset 2.1943761833e+10
# HELP redis_mem_fragmentation_ratio mem_fragmentation_ratio metric
# TYPE redis_mem_fragmentation_ratio gauge
redis_mem_fragmentation_ratio 1.12
# HELP redis_memory_max_bytes memory_max_bytes metric
# TYPE redis_memory_max_bytes gauge
redis_memory_max_bytes 0
# HELP redis_memory_used_bytes memory_used_bytes metric
# TYPE redis_memory_used_bytes gauge
redis_memory_used_bytes 1.38829216e+08
# HELP redis_memory_used_dataset_bytes memory_used_dataset_bytes metric
# TYPE redis_memory_used_dataset_bytes gauge
redis_memory_used_dataset_bytes 1.35237404e+08
# HELP redis_memory_used_lua_bytes memory_used_lua_bytes metric
# TYPE redis_memory_used_lua_bytes gauge
redis_memory_used_lua_bytes 37888
# HELP redis_memory_used_overhead_bytes memory_used_overhead_bytes metric
# TYPE redis_memory_used_overhead_bytes gauge
redis_memory_used_overhead_bytes 3.591812e+06
# HELP redis_memory_used_peak_bytes memory_used_peak_bytes metric
# TYPE redis_memory_used_peak_bytes gauge
redis_memory_used_peak_bytes 1.3938588e+08
# HELP redis_memory_used_rss_bytes memory_used_rss_bytes metric
# TYPE redis_memory_used_rss_bytes gauge
redis_memory_used_rss_bytes 1.55652096e+08
# HELP redis_memory_used_startup_bytes memory_used_startup_bytes metric
# TYPE redis_memory_used_startup_bytes gauge
redis_memory_used_startup_bytes 767968
# HELP redis_migrate_cached_sockets_total migrate_cached_sockets_total metric
# TYPE redis_migrate_cached_sockets_total gauge
redis_migrate_cached_sockets_total 0
# HELP redis_net_input_bytes_total net_input_bytes_total metric
# TYPE redis_net_input_bytes_total counter
redis_net_input_bytes_total 2.9809647461e+10
# HELP redis_net_output_bytes_total net_output_bytes_total metric
# TYPE redis_net_output_bytes_total counter
redis_net_output_bytes_total 7.3329383597e+10
# HELP redis_process_id process_id metric
# TYPE redis_process_id gauge
redis_process_id 5428
# HELP redis_pubsub_channels pubsub_channels metric
# TYPE redis_pubsub_channels gauge
redis_pubsub_channels 1
# HELP redis_pubsub_patterns pubsub_patterns metric
# TYPE redis_pubsub_patterns gauge
redis_pubsub_patterns 0
# HELP redis_rdb_bgsave_in_progress rdb_bgsave_in_progress metric
# TYPE redis_rdb_bgsave_in_progress gauge
redis_rdb_bgsave_in_progress 0
# HELP redis_rdb_changes_since_last_save rdb_changes_since_last_save metric
# TYPE redis_rdb_changes_since_last_save gauge
redis_rdb_changes_since_last_save 1670
# HELP redis_rdb_current_bgsave_duration_sec rdb_current_bgsave_duration_sec metric
# TYPE redis_rdb_current_bgsave_duration_sec gauge
redis_rdb_current_bgsave_duration_sec -1
# HELP redis_rdb_last_bgsave_duration_sec rdb_last_bgsave_duration_sec metric
# TYPE redis_rdb_last_bgsave_duration_sec gauge
redis_rdb_last_bgsave_duration_sec 0
# HELP redis_rdb_last_bgsave_status rdb_last_bgsave_status metric
# TYPE redis_rdb_last_bgsave_status gauge
redis_rdb_last_bgsave_status 1
# HELP redis_rdb_last_cow_size_bytes rdb_last_cow_size_bytes metric
# TYPE redis_rdb_last_cow_size_bytes gauge
redis_rdb_last_cow_size_bytes 3.2497664e+07
# HELP redis_rdb_last_save_timestamp_seconds rdb_last_save_timestamp_seconds metric
# TYPE redis_rdb_last_save_timestamp_seconds gauge
redis_rdb_last_save_timestamp_seconds 1.627871113e+09
# HELP redis_rejected_connections_total rejected_connections_total metric
# TYPE redis_rejected_connections_total counter
redis_rejected_connections_total 0
# HELP redis_repl_backlog_first_byte_offset repl_backlog_first_byte_offset metric
# TYPE redis_repl_backlog_first_byte_offset gauge
redis_repl_backlog_first_byte_offset 2.1942713258e+10
# HELP redis_repl_backlog_history_bytes repl_backlog_history_bytes metric
# TYPE redis_repl_backlog_history_bytes gauge
redis_repl_backlog_history_bytes 1.048576e+06
# HELP redis_repl_backlog_is_active repl_backlog_is_active metric
# TYPE redis_repl_backlog_is_active gauge
redis_repl_backlog_is_active 1
# HELP redis_replica_partial_resync_accepted replica_partial_resync_accepted metric
# TYPE redis_replica_partial_resync_accepted gauge
redis_replica_partial_resync_accepted 2
# HELP redis_replica_partial_resync_denied replica_partial_resync_denied metric
# TYPE redis_replica_partial_resync_denied gauge
redis_replica_partial_resync_denied 1
# HELP redis_replica_resyncs_full replica_resyncs_full metric
# TYPE redis_replica_resyncs_full gauge
redis_replica_resyncs_full 2
# HELP redis_replication_backlog_bytes replication_backlog_bytes metric
# TYPE redis_replication_backlog_bytes gauge
redis_replication_backlog_bytes 1.048576e+06
# HELP redis_second_repl_offset second_repl_offset metric
# TYPE redis_second_repl_offset gauge
redis_second_repl_offset -1
# HELP redis_slave_expires_tracked_keys slave_expires_tracked_keys metric
# TYPE redis_slave_expires_tracked_keys gauge
redis_slave_expires_tracked_keys 0
# HELP redis_slowlog_last_id Last id of slowlog
# TYPE redis_slowlog_last_id gauge
redis_slowlog_last_id 12
# HELP redis_slowlog_length Total slowlog
# TYPE redis_slowlog_length gauge
redis_slowlog_length 13
# HELP redis_start_time_seconds Start time of the Redis instance since unix epoch in seconds.
# TYPE redis_start_time_seconds gauge
redis_start_time_seconds 1.620606909e+09
# HELP redis_target_scrape_request_errors_total Errors in requests to the exporter
# TYPE redis_target_scrape_request_errors_total counter
redis_target_scrape_request_errors_total 0
# HELP redis_up Information about the Redis instance
# TYPE redis_up gauge
redis_up 1
# HELP redis_uptime_in_seconds uptime_in_seconds metric
# TYPE redis_uptime_in_seconds gauge
redis_uptime_in_seconds 7.264465e+06
metrics模板

这样redis_exporter也就部署完成了。

设置开机自启动并启动redis_exporter。

cat <<EOF >/etc/systemd/system/redis_exporter.service
[Unit]
Description=Prometheus exporter for Redis metrics.

[Service]
ExecStart=/redis_exporter/redis_exporter -redis.addr 192.168.1.214:6380 -web.listen-address 192.168.1.178:9121
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

更新配置(记得停止前边手工启动的会话)

systemctl daemon-reload
systemctl enable redis_exporter.service
systemctl restart redis_exporter.service
systemctl status redis_exporter.service

prometheus部署

下载地址:https://github.com/prometheus/prometheus/releases/

下载的文件:prometheus-2.28.1.linux-amd64.tar.gz

解压即安装:

[root@node1 soft]# tar -zxvf prometheus-2.28.1.linux-amd64.tar.gz
[root@node1 soft]# mv prometheus-2.28.1.linux-amd64 /prometheus
[root@node1 soft]# cd /prometheus/

添加配置

[root@node1 prometheus]# vi /prometheus/prometheus.yml
添加:
- job_name: 'redis_exporter_targets' static_configs: - targets: - redis://192.168.1.214:6380 - redis://192.168.1.214:6379 - redis://192.168.1.214:6381 metrics_path: /scrape relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.1.178:9121 ## config for scraping the exporter itself - job_name: 'redis_exporter' static_configs: - targets: - 192.168.1.178:9121

启动prometheus

[root@node1 prometheus]# ./prometheus
level=info ts=2021-08-02T02:40:06.001Z caller=main.go:389 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-08-02T02:40:06.002Z caller=main.go:443 msg="Starting Prometheus" version="(version=2.28.1, branch=HEAD, revision=b0944590a1c9a6b35dc5a696869f75f422b107a1)"
level=info ts=2021-08-02T02:40:06.002Z caller=main.go:448 build_context="(go=go1.16.5, user=root@2915dd495090, date=20210701-15:20:10)"
level=info ts=2021-08-02T02:40:06.002Z caller=main.go:449 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 node1 (none))"
level=info ts=2021-08-02T02:40:06.002Z caller=main.go:450 fd_limits="(soft=1024, hard=4096)"
level=info ts=2021-08-02T02:40:06.003Z caller=main.go:451 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2021-08-02T02:40:06.012Z caller=web.go:541 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2021-08-02T02:40:06.013Z caller=main.go:824 msg="Starting TSDB ..."
level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627581602588 maxt=1627588800000 ulid=01FBT12MPWQ0F1HNJTMBJRKVZ4
level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627588802588 maxt=1627596000000 ulid=01FBT7YBYX56PYJTSCNPDGNF8S
level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627546183899 maxt=1627581600000 ulid=01FBT7YCAJW6HK1ZWQT3GAXSHM
level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627596000000 maxt=1627603200000 ulid=01FC279X74Q6P1KKRSK6FSE4ZE
level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627603202588 maxt=1627610400000 ulid=01FC279X9GBYG0VD4FS6MP8V0E
level=info ts=2021-08-02T02:40:06.017Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
level=info ts=2021-08-02T02:40:06.032Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2021-08-02T02:40:06.035Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=2.833274ms
level=info ts=2021-08-02T02:40:06.035Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while"
level=warn ts=2021-08-02T02:40:06.098Z caller=head.go:767 component=tsdb msg="Unknown series references" samples=15293 exemplars=0
level=info ts=2021-08-02T02:40:06.098Z caller=head.go:826 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2021-08-02T02:40:06.116Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=31 maxSegment=34
level=info ts=2021-08-02T02:40:06.117Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=32 maxSegment=34
level=info ts=2021-08-02T02:40:06.131Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=33 maxSegment=34
level=info ts=2021-08-02T02:40:06.131Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=34 maxSegment=34
level=info ts=2021-08-02T02:40:06.131Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=62.785484ms wal_replay_duration=33.30167ms total_replay_duration=98.993811ms
level=info ts=2021-08-02T02:40:06.140Z caller=main.go:851 fs_type=XFS_SUPER_MAGIC
level=info ts=2021-08-02T02:40:06.140Z caller=main.go:854 msg="TSDB started"
level=info ts=2021-08-02T02:40:06.140Z caller=main.go:981 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2021-08-02T02:40:06.150Z caller=main.go:1012 msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=9.905156ms remote_storage=12.884µs web_handler=860ns query_engine=7.018µs scrape=1.041656ms scrape_sd=149.496µs notify=76.325µs notify_sd=43.197µs rules=7.250541ms
level=info ts=2021-08-02T02:40:06.150Z caller=main.go:796 msg="Server is ready to receive web requests."
level=info ts=2021-08-02T02:40:13.942Z caller=compact.go:509 component=tsdb msg="write block resulted in empty block" mint=1627610400000 maxt=1627617600000 duration=23.036437ms
level=info ts=2021-08-02T02:40:13.946Z caller=head.go:967 component=tsdb msg="Head GC completed" duration=3.883036ms
level=info ts=2021-08-02T02:40:13.950Z caller=checkpoint.go:97 component=tsdb msg="Creating checkpoint" from_segment=31 to_segment=32 mint=1627617600000
level=info ts=2021-08-02T02:40:14.059Z caller=head.go:1064 component=tsdb msg="WAL checkpoint complete" first=31 last=32 duration=109.568447ms

  

访问192.168.1.178:9090可以看到获取的信息。

加入开机启动服务

vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
 
[Service]
ExecStart=/prometheus/prometheus 
  --config.file=/prometheus/prometheus.yml 
  --web.listen-address=:9090
 
Restart=on-failure
[Install]
WantedBy=multi-user.target

  

停止前边前台方式的启动方法./prometheus。

启动服务,设置开机自启,并检查服务开启状态。

systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus
systemctl status prometheus

[root@node1 prometheus]# cat /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
 
[Service]
ExecStart=/prometheus/prometheus 
  --config.file=/prometheus/prometheus.yml 
  --web.listen-address=:9090
 
Restart=on-failure
[Install]
[root@node1 prometheus]# systemctl status prometheus
● prometheus.service - Prometheus Monitoring System
   Loaded: loaded (/etc/systemd/system/prometheus.service; static; vendor preset: disabled)
   Active: active (running) since Mon 2021-08-02 11:23:31 CST; 3min 26s ago
 Main PID: 30494 (prometheus)
   CGroup: /system.slice/prometheus.service
           └─30494 /prometheus/prometheus --config.file=/prometheus/prometheus.yml --web.listen-address=:9090

Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.664Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.664Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=18.782µs
Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.664Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while"
Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.665Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.665Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=67.616µs wal_replay_…ration=815.018µs
Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.668Z caller=main.go:851 fs_type=XFS_SUPER_MAGIC
Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.668Z caller=main.go:854 msg="TSDB started"
Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.668Z caller=main.go:981 msg="Loading configuration file" filename=/prometheus/prometheus.yml
Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.677Z caller=main.go:1012 msg="Completed loading of configuration file" filename=/prometheus/prometheus.yml totalDuration=8.9289…ms
Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.677Z caller=main.go:796 msg="Server is ready to receive web requests."
Hint: Some lines were ellipsized, use -l to show in full.

  

关于报警功能的实现,需要部署alertmanager来配合实现。

至此,prometheus也部署完成。

alertmanager部署

下载地址:官网下载GitHub下载

下载文件:alertmanager-0.22.2.linux-amd64.tar.gz

解压安装:

[root@node1 soft]# tar -zxvf alertmanager-0.22.2.linux-amd64.tar.gz -C /
alertmanager-0.22.2.linux-amd64/
alertmanager-0.22.2.linux-amd64/alertmanager.yml
alertmanager-0.22.2.linux-amd64/LICENSE
alertmanager-0.22.2.linux-amd64/NOTICE
alertmanager-0.22.2.linux-amd64/alertmanager
alertmanager-0.22.2.linux-amd64/amtool
[root@node1 soft]# mv /alertmanager-0.22.2.linux-amd64/ /alertmanager
[root@node1 soft]# cd /alertmanager/
[root@node1 alertmanager]# ll
total 47788
-rwxr-xr-x 1 3434 3434 27074026 Jun  2 15:51 alertmanager
-rw-r--r-- 1 3434 3434      348 Jun  2 15:56 alertmanager.yml
-rwxr-xr-x 1 3434 3434 21839682 Jun  2 15:52 amtool
-rw-r--r-- 1 3434 3434    11357 Jun  2 15:56 LICENSE
-rw-r--r-- 1 3434 3434      457 Jun  2 15:56 NOTICE

  

配置邮件发送信息,也有其他的如钉钉的,这里以邮件为例子。

注意:smtp_smarthost不同邮箱是不一样的。

vi /alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.exmail.qq.com:465'
  smtp_from: 'zhaokm@xxxxxxx.xx'
  smtp_auth_username: 'zhaokm@xxxxxxx.xx'
  smtp_auth_password: '邮箱密码'
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: 'zhaokm@xxxxxxx.xx'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

 配置开机启动

cat > /etc/systemd/system/alertmanager.service << "EOF"
[Unit]
Description=alertmanager
After=local-fs.target network-online.target network.target
Wants=local-fs.target network-online.target network.target
 
[Service]
ExecStart=/alertmanager/alertmanager --config.file=/alertmanager/alertmanager.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

 生效配置

[root@node1 alertmanager]# systemctl daemon-reload
[root@node1 alertmanager]# systemctl enable alertmanager
Created symlink from /etc/systemd/system/multi-user.target.wants/alertmanager.service to /etc/systemd/system/alertmanager.service.
[root@node1 alertmanager]# systemctl start alertmanager
[root@node1 alertmanager]# systemctl status alertmanager
● alertmanager.service - alertmanager
   Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-08-02 15:14:58 CST; 3s ago
 Main PID: 9825 (alertmanager)
   CGroup: /system.slice/alertmanager.service
           └─9825 /alertmanager/alertmanager --config.file=/alertmanager/alertmanager.yml

Aug 02 15:14:58 node1 systemd[1]: Started alertmanager.
Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.005Z caller=main.go:221 msg="Starting Alertmanager" version="(version=0.22.2, branch=HEAD, revision=44f8adc06af5...8273f2922051)"
Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.005Z caller=main.go:222 build_context="(go=go1.16.4, user=root@b595c7f32520, date=20210602-07:50:37)"
Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.006Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=192.168.1.178 port=9094
Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.009Z caller=cluster.go:671 component=cluster msg="Waiting for gossip to settle..." interval=2s
Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.110Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/alertmanager/alertmanager.yml
Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.111Z caller=coordinator.go:126 component=configuration msg="Completed loading of configuration file" file=/alert...ertmanager.yml
Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.122Z caller=main.go:514 msg=Listening address=:9093
Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.122Z caller=tls_config.go:191 msg="TLS is disabled." http2=false
Aug 02 15:15:01 node1 alertmanager[9825]: level=info ts=2021-08-02T07:15:01.009Z caller=cluster.go:696 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000791462s
Hint: Some lines were ellipsized, use -l to show in full.

访问192.168.1.178:9093可以看到告警web界面。

修改prometheus的配置,让prometheus监控alertmanager。

vi /prometheus/prometheus.yml
尾部添加
  - job_name: 'alertmanager'
    static_configs:
      - targets: ['192.168.1.178:9093']

修改prometheus的配置,让prometheus连接alertmanager。

vi /prometheus/prometheus.yml
修改
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 192.168.1.178:9093

  

开启告警配置,这个是prometheus里边配置的。

vi /prometheus/prometheus.yml
修改
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "redis.yml"

  

redis.yml报警规则配置,一些阈值自己定义:

vi /prometheus/redis.yml
groups:
- name:  Redis
  rules: 
    - alert: RedisDown
      expr: redis_up  == 0
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "Redis down (instance {{ $labels.instance }})"
        description: "Redis 挂了啊,mmp
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"
    - alert: MissingBackup
      expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "Missing backup (instance {{ $labels.instance }})"
        description: "Redis has not been backuped for 24 hours
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"       
    - alert: OutOfMemory
      expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Out of memory (instance {{ $labels.instance }})"
        description: "Redis is running out of memory (> 90%)
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"
    - alert: ReplicationBroken
      expr: delta(redis_connected_slaves[1m]) < 0
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "Replication broken (instance {{ $labels.instance }})"
        description: "Redis instance lost a slave
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"
    - alert: TooManyConnections
      expr: redis_connected_clients > 10
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "Too many connections (instance {{ $labels.instance }})"
        description: "Redis instance has too many connections
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"       
    - alert: NotEnoughConnections
      expr: redis_connected_clients < 5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Not enough connections (instance {{ $labels.instance }})"
        description: "Redis instance should have more connections (> 5)
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"
    - alert: RejectedConnections
      expr: increase(redis_rejected_connections_total[1m]) > 0
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "Rejected connections (instance {{ $labels.instance }})"
        description: "Some connections to Redis has been rejected
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"

报警如下:

grafana部署

下载地址:https://grafana.com/grafana/download?edition=oss

官方安装指南:
https://grafana.com/docs/grafana/latest/installation/rpm/#2-start-the-server

由于是rpm包,安装起来非常方便。

依赖包缺啥安装啥。

yum install -y fontconfig
yum install -y urw-fonts
rpm -ivh grafana-8.0.6-1.x86_64.rpm 

  

设置开机自启动并开启grafana

/bin/systemctl daemon-reload
/bin/systemctl enable grafana-server.service
/bin/systemctl start grafana-server.service

[root@node1 soft]# which grafana-server
/usr/sbin/grafana-server
[root@node1 soft]# which grafana-cli
/usr/sbin/grafana-cli

  

查看状态

[root@node1 soft]# systemctl status grafana-server
● grafana-server.service - Grafana instance
   Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-07-29 15:58:38 CST; 4min 37s ago
     Docs: http://docs.grafana.org
 Main PID: 6884 (grafana-server)
   CGroup: /system.slice/grafana-server.service
           └─6884 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default...

Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="migrations completed" logger=migrator performed=330 skipped=0 duration=1.710091718s
Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Created default admin" logger=sqlstore user=admin
Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Created default organization" logger=sqlstore
Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Starting plugin search" logger=plugins
Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Registering plugin" logger=plugins id=grafana-plugin-admin-app
Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Registering plugin" logger=plugins id=input
Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="External plugins directory created" logger=plugins directory=/var/lib/grafana/plugins
Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Live Push Gateway initialization" logger=live.push_http
Jul 29 15:58:38 node1 systemd[1]: Started Grafana instance.
Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="HTTP Server Listen" logger=http.server address=[::]:3000 protocol=http subUrl= socket=

访问192.168.1.178:3000就可以访问web版的。

配置数据源。

  

下载仪表盘:

https://grafana.com/grafana/dashboards/763 --用这个
https://grafana.com/grafana/dashboards/12980
https://grafana.com/grafana/dashboards/12776

导入仪表盘:
要导入仪表板,请单击侧面菜单中的 + 图标,然后单击导入,选择数据源后确定。

最终:

注意:Memory Usage这个图表,一直是∞%。是因为redis_memory_max_bytes 获取的值为0,导致 redis_memory_used_bytes / redis_memory_max_bytes 结果不正常。

 解决办法:将redis_memory_max_bytes 改为服务器的真实内存大小。

更改计算公式,其中8370298880为free -b显示的实际的物理内存大小:

redis_memory_used_bytes{instance=~"$instance"}  / 8370298880

参考链接:

Prometheus 监控Redis的正确姿势(redis集群)

Prometheus监控平台Alertmanager配置告警

yam文本格式检测工具:http://www.bejson.com/validators/yaml_editor/

https://www.cnblogs.com/biaopei/p/12096705.html

https://www.jianshu.com/p/924cdd4e8603

原文地址:https://www.cnblogs.com/PiscesCanon/p/15088904.html