Greenplum 监控segment是否正常

在greenplum运行过程中，Segement很有可能因为压力大出现不可用的情况，

主备Segement发现了切换，或是主备Segement网络断开，数据不同步了。在

默认情况下，如果GreenPlum4.X版本中，有一个Segment失败了，数据库还是会

正常运行的，如果是主Segemnt失败了，则切换到备Segment上，这样必须对Segment

是否正常加以监控，一般有以下两种监控方法：

1、检查gp_segment_configuration以确定是否有Segment处于down的状态，或者查看

gp_configuration_history以观察最近数据库是否发生了切换。

select * from gp_segment_configuration where status='d' or mode <>'s';

2、Segemnt已经卡住了，但是Master没有感知到Segment失败，这时候，首先就是要监控

当前运行的SQL是有否超过很长时间的。其次就是要在Greenplum中建立一张心跳表，这张

心跳表至少要在每个Segement都有一条记录，然后不断去更新表中的所有记录，当发现

这个SQL超过一定时间都没执行完，就要发出告警。

操作步骤如下：

1）创建临时表，插入10000条数据

create table xdual_temp

select generate_series(1,10000) id

distributed by (id);

2）建立心跳表，2个字段，第二个字段是timestamp类型的，每次心跳检测数据都会更新

create table xdual (id int,update_time timestamp(0))

distibuted by (id);

3）向心跳表中每个Segment中插入一条数据

insert into xdual(id,update_time)

select id,now() from

(select id,row_number() over(partition by gp_segment_id order by id) rn

from xdual_temp) t;

4）心跳检测

运行update xdual set update_time=now();

只要这个SQL运行正常，就代表每一个Segment都是正常的。