redis2.6高可用方案【3】

Sentinels and Slaves auto discovery

对于需要相互间交换信息、检查彼此可用性的Sentinel，你无需将运行的实例地址在每个Sentinel上配置，因为当Sentinel都监控相同的master时，借助于使用Redis master的Pub/Sub功能，就可以发现其他Sentinel.

这是通过被命名为__sentinel__:hello的通信信道发送Hello Messages来实现的。

相似的，你无需配置依赖于被监控master的slave列表，Sentinel会自动寻找redis 并查询slave列表。

Sentinel Rule #8: 每间隔5秒每一个Sentinel 通过 Pub/Sub channel __sentinel__:hello 发布消息到每个被监控的master，宣布Sentinel的ip、端口、运行id，和备份功能。 ( can-failover在 sentinel.conf中配置).

Sentinel Rule #9: master通过Pub/Sub channel __sentinel__:hello查询Sentinel并寻找未知的Sentinel. master将被扫描发现的新sentinel添加到master所属的sentinel列表中。

Sentinel Rule #10:在被添加master所属sentinel列表前，sentinel会检查是否存在同样runid和服务地址（ip和端口）的sentinel。在这种情况下，所有的匹配老哨兵被去除，并且增加为新的。

Sentinel API

默认情况下，Sentinel使用的TCP端口26379运行（注意，6379是正常Redis的端口）。Sentinel使用redis协议，所以可以使用redis-cli或者其他客户端联系Sentinel。

There are two ways to talk with Sentinel: it is possible to directly query it to check what is the state of the monitored Redis instances from its point of view, to see what other Sentinels it knows, and so forth.另一种方法：每发生一些事，件使用Pub/Sub推送式通知Sentinel，如故障转移或实例进入了一个错误条件，等等。

Sentinel commands

The following is a list of accepted commands:

PING this command simply returns PONG.
SENTINEL masters show a list of monitored masters and their state.
SENTINEL slaves <master name> show a list of slaves for this master, and their state.
SENTINEL is-master-down-by-addr <ip> <port> return a two elements multi bulk reply where the first is 0 or 1 (0 if the master with that address is known and is in SDOWN state, 1 otherwise). The second element of the reply is the subjective leader for this master, that is, the runid of the Redis Sentinel instance that should perform the failover accordingly to the queried instance.
SENTINEL get-master-addr-by-name <master name> return the ip and port number of the master with that name. If a failover is in progress or terminated successfully for this master it returns the address and port of the promoted slave.
SENTINEL reset <pattern> this command will reset all the masters with matching name. The pattern argument is a glob-style pattern. The reset process clears any previous state in a master (including a failover in progress), and removes every slave and sentinel already discovered and associated with the master.

Pub/Sub Messages

A client can use a Sentinel as it was a Redis compatible Pub/Sub server (but you can't use PUBLISH) in order to SUBSCRIBE or PSUBSCRIBE to channels and get notified about specific events.

The channel name is the same as the name of the event. For instance the channel named +sdown will receive all the notifications related to instances entering an SDOWN condition.

To get all the messages simply subscribe using PSUBSCRIBE *.

The following is a list of channels and message formats you can receive using this API. The first word is the channel / event name, the rest is the format of the data.

Note: where instance details is specified it means that the following arguments are provided to identify the target instance:

<instance-type> <name> <ip> <port> @ <master-name> <master-ip> <master-port>

The part identifying the master (from the @ argument to the end) is optional and is only specified if the instance is not a master itself.

+reset-master <instance details> -- The master was reset.
+slave <instance details> -- A new slave was detected and attached.
+failover-state-reconf-slaves <instance details> -- Failover state changed to reconf-slaves state.
+failover-detected <instance details> -- A failover started by another Sentinel or any other external entity was detected (An attached slave turned into a master).
+slave-reconf-sent <instance details> -- The leader sentinel sent the SLAVEOF command to this instance in order to reconfigure it for the new slave.
+slave-reconf-inprog <instance details> -- The slave being reconfigured showed to be a slave of the new master ip:port pair, but the synchronization process is not yet complete.
+slave-reconf-done <instance details> -- The slave is now synchronized with the new master.
-dup-sentinel <instance details> -- One or more sentinels for the specified master were removed as duplicated (this happens for instance when a Sentinel instance is restarted).
+sentinel <instance details> -- A new sentinel for this master was detected and attached.
+sdown <instance details> -- The specified instance is now in Subjectively Down state.
-sdown <instance details> -- The specified instance is no longer in Subjectively Down state.
+odown <instance details> -- The specified instance is now in Objectively Down state.
-odown <instance details> -- The specified instance is no longer in Objectively Down state.
+failover-takedown <instance details> -- 25% of the configured failover timeout has elapsed, but this sentinel can't see any progress, and is the new leader. It starts to act as the new leader reconfiguring the remaining slaves to replicate with the new master.
+failover-triggered <instance details> -- We are starting a new failover as a the leader sentinel.
+failover-state-wait-start <instance details> -- New failover state is wait-start: we are waiting a fixed number of seconds, plus a random number of seconds before starting the failover.
+failover-state-select-slave <instance details> -- New failover state is select-slave: we are trying to find a suitable slave for promotion.
no-good-slave <instance details> -- There is no good slave to promote. Currently we'll try after some time, but probably this will change and the state machine will abort the failover at all in this case.
selected-slave <instance details> -- We found the specified good slave to promote.
failover-state-send-slaveof-noone <instance details> -- We are trynig to reconfigure the promoted slave as master, waiting for it to switch.
failover-end-for-timeout <instance details> -- The failover terminated for timeout. If we are the failover leader, we sent a best effort SLAVEOF command to all the slaves yet to reconfigure.
failover-end <instance details> -- The failover terminated with success. All the slaves appears to be reconfigured to replicate with the new master.
switch-master <master name> <oldip> <oldport> <newip> <newport> -- We are starting to monitor the new master, using the same name of the old one. The old master will be completely removed from our tables.
failover-abort-x-sdown <instance details> -- The failover was undoed (aborted) because the promoted slave appears to be in extended SDOWN state.
-slave-reconf-undo <instance details> -- The failover aborted so we sent a SLAVEOF command to the specified instance to reconfigure it back to the original master instance.
+tilt -- Tilt mode entered.
-tilt -- Tilt mode exited.