[ZooKeeper] 2 环境搭建

上一篇中我们介绍了 ZooKeeper 的一些基本概念,这篇我们讲一下 ZooKeeper 的环境搭建。
 
ZooKeeper 安装模式
  • 单机模式:ZooKeeper 运行在一台服务器上,适合测试环境;
  • 伪集群模式:在一台物理机上运行多个 ZooKeeper 实例;
  • 集群模式:ZooKeeper 运行在一个集群上,称为 ensemble,适合生产环境;
ZooKeeper 通过复制来实现高可用性,只要集合中半数以上的机器处于可用状态就可以保证服务继续。因为 ZooKeeper 的复制策略是保证 znode 树的每一个修改都会被复制到集群中超过半数的机器上。
 

准备工作

  1. 下载地址:http://zookeeper.apache.org/releases.html,本文以 zookeeper-3.4.11.tar.gz 为例。
  2. JDK 环境配置:http://www.cnblogs.com/memento/p/8660021.html
 

Windows 下的配置

单机模式(适合开发环境)

1、将下载的压缩包 zookeeper-3.4.11.tar.gz 解压到 C:solrCloudzk_server_single(以下简称 %ZK_HOME%) 目录下;
2、将 %ZK_HOME%/conf/zoo_sample.cfg 另存为 zoo.cfg,并修改该配置文件:
# ----------------------------------------------------------------------
# 基本配置(最低配置)
# ----------------------------------------------------------------------

# the port at which the clients will connect
# 监听客户端连接的端口
clientPort=2181

# The number of milliseconds of each tick
# 服务器之间或者客户端与服务器之间维持心跳的时间间隔,
# 会话(session)的过期时间为2倍的 tickTime;
tickTime=2000

# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
# 存储内存数据库快照的位置,除非另外说明,否则就是指数据库的更新事务日志
dataDir=../data
 
3、然后启动 %ZK_HOME%/bin/zkServer.cmd 即可;
85982125
 
4、因为这里是单机模式,ZooKeeper 没有其他机器可以复制更新事务,所以当 ZooKeeper 处理失败时服务就会挂掉,这样的适合作为开发环境。
 
5、连接 ZooKeeper 服务器,可以通过 %ZK_HOME%/bin/zkCli.cmd 作为客户端连接到 ZooKeeper 服务器。
binzkCli.cmd -server 127.0.0.1:181
1250390
出现 Welcome to ZooKeeper! JLine support is enabled,则表示已经连接成功!
此时也可以通过 netstat 命令查看 2181 端口是否被占用,或者通过 jps 命令查看启动的 JAVA 进程情况来检查 ZooKeeper 是否启动正常!
42db997c-8612-46c6-be7b-f49376a11cfe
 
6、输入 help 命令可以查看 ZooKeeper 的一些命令:
[zk: 127.0.0.1:2181(CONNECTED) 0] help
ZooKeeper -server host:port cmd args
        stat path [watch]
        set path data [version]
        ls path [watch]
        delquota [-n|-b] path
        ls2 path [watch]
        setAcl path acl
        setquota -n|-b val path
        history
        redo cmdno
        printwatches on|off
        delete path [version]
        sync path
        listquota path
        rmr path
        get path [watch]
        create [-s] [-e] path data acl
        addauth scheme auth
        quit
        getAcl path
        close
        connect host:port
 
下面看一下 ZooKeeper 命令的一些示例:
[zk: 127.0.0.1:2181(CONNECTED) 1] ls /
[zookeeper]

[zk: 127.0.0.1:2181(CONNECTED) 2] create /zk_test my_data
Created /zk_test

[zk: 127.0.0.1:2181(CONNECTED) 3] ls /
[zookeeper, zk_test]

[zk: 127.0.0.1:2181(CONNECTED) 4] get /zk_test
my_data
cZxid = 0x2a
ctime = Wed Apr 11 10:49:31 CST 2018
mZxid = 0x2a
mtime = Wed Apr 11 10:49:31 CST 2018
pZxid = 0x2a
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 0

[zk: 127.0.0.1:2181(CONNECTED) 5] set /zk_test junk
cZxid = 0x2a
ctime = Wed Apr 11 10:49:31 CST 2018
mZxid = 0x2b
mtime = Wed Apr 11 10:50:33 CST 2018
pZxid = 0x2a
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0

[zk: 127.0.0.1:2181(CONNECTED) 6] delete /zk_test

[zk: 127.0.0.1:2181(CONNECTED) 7] ls /
[zookeeper]
 
命令 描述
conf 打印服务配置的详细信息
cons 列举所有连接到该服务器的客户端的连接或会话,包括发送/接收的包数量,会话 id,操作延迟,最后执行的操作等
crst 重置所有连接或会话的统计信息
dump 列举未经处理的会话和临时节点,只对 leader 有效。
envi 打印服务环境的详细信息
ruok 测试服务器是否处于正确状态,如果是返回"imok",否则不作任何响应。返回"imok"只是表示服务器进程是活动的,且绑定到指定的客户端端口,并不代表该服务器已经加入到集群中。
srst 重置服务器统计信息。
srvr 列举服务器的所有详细信息。
stat 列举服务器及其连接的客户端的简要信息。
wchs 列举服务器上 watch 的简要信息。
wchc 通过 session 列举服务器上 watch 的详细信息。输出一个与 watch 相关的会话(连接)列表。
wchp 通过路径列举服务器上 watch 的详细信息。输出一个与 watch 相关的路径(znode)列表。
mntr 输出一些用于监测集群健康的变量。
 
需要下载 netcat for windows,并在环境变量 path 中添加 nc.exe 所在目录。
C:solrCloudzk_server_fakein>echo mntr | nc localhost 2181
zk_version      3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
zk_avg_latency  0
zk_max_latency  0
zk_min_latency  0
zk_packets_received     7
zk_packets_sent 6
zk_num_alive_connections        1
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count  4
zk_watch_count  0
zk_ephemerals_count     0
zk_approximate_data_size        27
C:solrCloudzk_server_fakein>echo ruok | nc localhost 2181
imok
 

伪集群模式

1、将上面配置好的 C:solrCloudzk_server_single 文件夹另存为一份 C:solrCloudzk_server_fake(简称%ZK_HOME%
 
2、伪集群模式是通过每个配置文档模拟一台服务器,所以将 %ZK_HOME%confzoo.cfg 文件复制出三份 zoo1.cfg、zoo2.cfg 和 zoo3.cfg 配置文件,配置信息如下:
zoo1.cfg
# ----------------------------------------------------------------------
# 基本配置(最低配置)
# ----------------------------------------------------------------------

# the port at which the clients will connect
# 监听客户端连接的端口
clientPort=2181

# The number of milliseconds of each tick
# 服务器之间或者客户端与服务器之间维持心跳的时间间隔,
# 会话(session)的过期时间为2倍的 tickTime;
tickTime=2000

# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
# 存储内存数据库快照的位置,除非另外说明,否则就是指数据库的更新事务日志
dataDir=../data1

# ----------------------------------------------------------------------
# 高级配置
# ----------------------------------------------------------------------

# 存储事务日志的位置,分离出默认的 dataDir 设置中包含的更新事务日志记录,避免日志和快照之间的竞争
dataLogDir=../log1

# Java 属性:zookeeper.globalOutstandingLimit
# 客户端提交请求的速度要比 ZooKeeper 处理请求的速度快很多,尤其有大量的客户端的时候。
# 为了避免由于大量请求导致 ZooKeeper 内存耗尽,ZooKeeper 将调节客户端以保证系统中只有不足 globalOutstandingLimit 个未处理请求。
# 默认值 1000
# globalOutstandingLimit=1000

# Java 属性:zookeeper.preAllocSize
# 为了避免地址寻址,ZooKeeper 给事务日志文件分配了 preAllocSize 字节大小的空间。默认块大小为 64M。
# 如果经常使用快照则可以修改该值,减小块大小。
# preAllocSize

# Java 属性:zookeeper.snapCount
# ZooKeeper 使用快照和一个快照日志文件来记录它的事务。snapCount 决定了快照时在事务日志中可以记录的事务数量。
# 为了避免集群中所有的机器同时拍摄快照,每个 ZooKeeper 服务器只有在事务日志中的事务数量达到一个值时才拍摄快照,
# 该值时在运行时生成的介于[snapCount/2+1, snapCount]范围内的随机数。默认值 100000
# snapCount=100000

# the maximum number of client connections.
# increase this if you need to handle more clients
# 限制 ZooKeeper 集群中一个客户端的并发连接数量,通过 IP 地址进行判断识别。
# 可以用于阻止某些 DoS 攻击,包括 file descriptor exhaustion。默认值 60。
# 设置为0时表示取消并发数量限制。
# maxClientCnxns=60

# 3.3.0新增设置
# 最小会话超时时间,默认 minSession=2*tickTime
# minSessionTimeout
# 最大会话超时时间,默认 maxSession=20*tickTime
# maxSessionTimeout

# 3.4.0新增设置
# The number of snapshots to retain in dataDir
# autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
# autopurge.purgeInterval=1

# ----------------------------------------------------------------------
# 集群配置
# ----------------------------------------------------------------------

# The number of ticks that the initial 
# synchronization phase can take
# 允许 follower 连接并同步到 leader 的初始化连接次数,以 tickTime 为单位,总计时长为 initLimit*tickTime 毫秒
initLimit=10

# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
# leader 与 follower 之间发送消息时,请求和应答之间的通信次数,以 tickTime 为单位,总计时长为 syncLimit*tickTime 毫秒
syncLimit=5

# A:一个正整数,表示服务器的编号
# B:服务器的 IP 地址
# C:ZooKeeper 服务器之间的通信端口
# D:leader 选举端口
# server.A=B:C:D
server.1=localhost:2287:3387
server.2=localhost:2288:3388
server.3=localhost:2289:3389
zoo2.cfg,除了如下配置不同,其他与 zoo1.cfg 一致
clientPort=2182
dataDir=../data2
dataLogDir=../log2
zoo3.cfg,除了如下配置不同,其他与 zoo1.cfg 一致
clientPort=2183
dataDir=../data3
dataLogDir=../log3
要注意其中的 clientPort 端口、dataDir  dataLogDir 目录设置,不同的 ZooKeeper 服务器对应不同的配置项。
此时需要手动创建 data1、data2 和 data3,log1、log2 和 log3 六个文件夹。
 
3、需要在每个 data 目录下一个 myid 文件里面分别写入 1,2,3,对应 server.x 中的 x 数字,表示不同 ZooKeeper 服务器的编号。
 
4、然后将 %ZK_HOME%/bin/zkServer.cmd 复制三份 zkServer1.cmd、zkServer2.cmd 和 zkServer3.cmd 来模拟三台 ZooKeeper 服务器启动,需要在文件中增加对应配置文件的参数设置。set ZOOCFG=..confzooX.cfg ,其中 X 表示对应服务器的 zoo.cfg 配置文件,与 2 中的相对应。最终结果如下图所示:
2175687
 
5、最后启动三个 ZooKeeper 服务器;
首先启动 zkServer1.cmd
C:solrCloudzk_server_fakein>zkServer1.cmd

C:solrCloudzk_server_fakein>call "C:Program FilesJavajdk1.8.0_162"injava "-Dzookeeper.log.dir=C:solrCloudzk_server_fakein.." "-Dzookeeper.root.logger=INFO,CONSOLE" -cp "C:solrCloudzk_server_fakein..uildclasses;C:solrCloudzk_server_fakein..uildlib*;C:solrCloudzk_server_fakein..*;C:solrCloudzk_server_fakein..lib*;C:solrCloudzk_server_fakein..conf" org.apache.zookeeper.server.quorum.QuorumPeerMain "..confzoo1.cfg"
2018-04-11 11:46:40,470 [myid:] - INFO  [main:QuorumPeerConfig@136] - Reading configuration from: ..confzoo1.cfg
2018-04-11 11:46:40,489 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: localhost to address: localhost/127.0.0.1
2018-04-11 11:46:40,489 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: localhost to address: localhost/127.0.0.1
2018-04-11 11:46:40,491 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: localhost to address: localhost/127.0.0.1
2018-04-11 11:46:40,491 [myid:] - INFO  [main:QuorumPeerConfig@398] - Defaulting to majority quorums
2018-04-11 11:46:40,503 [myid:1] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2018-04-11 11:46:40,503 [myid:1] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2018-04-11 11:46:40,503 [myid:1] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2018-04-11 11:46:40,560 [myid:1] - INFO  [main:QuorumPeerMain@130] - Starting quorum peer
2018-04-11 11:46:40,746 [myid:1] - INFO  [main:ServerCnxnFactory@117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2018-04-11 11:46:40,747 [myid:1] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2018-04-11 11:46:40,753 [myid:1] - INFO  [main:QuorumPeer@1158] - tickTime set to 2000
2018-04-11 11:46:40,753 [myid:1] - INFO  [main:QuorumPeer@1204] - initLimit set to 10
2018-04-11 11:46:40,753 [myid:1] - INFO  [main:QuorumPeer@1178] - minSessionTimeout set to -1
2018-04-11 11:46:40,753 [myid:1] - INFO  [main:QuorumPeer@1189] - maxSessionTimeout set to -1
2018-04-11 11:46:40,760 [myid:1] - INFO  [main:QuorumPeer@1467] - QuorumPeer communication is not secured!
2018-04-11 11:46:40,761 [myid:1] - INFO  [main:QuorumPeer@1496] - quorum.cnxn.threads.size set to 20
2018-04-11 11:46:40,764 [myid:1] - INFO  [main:QuorumPeer@668] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2018-04-11 11:46:40,771 [myid:1] - INFO  [main:QuorumPeer@683] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2018-04-11 11:46:40,781 [myid:1] - INFO  [ListenerThread:QuorumCnxManager$Listener@736] - My election bind port: localhost/127.0.0.1:3387
2018-04-11 11:46:40,789 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@909] - LOOKING
2018-04-11 11:46:40,790 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@820] - New election. My id =  1, proposed zxid=0x0
2018-04-11 11:46:40,792 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@602] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
此时会提示说无法打开"2号"通道和"3号"通道,错误提示如下,因为"2号"服务器和"3号"服务器还未启动。
2018-04-11 11:48:16,324 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@584] - Cannot open channel to 2 at election address localhost/127.0.0.1:3388
java.net.ConnectException: Connection refused: connect
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
        at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:845)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:957)
2018-04-11 11:48:32,332 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: localhost to address: localhost/127.0.0.1
2018-04-11 11:48:33,338 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@584] - Cannot open channel to 3 at election address localhost/127.0.0.1:3389
java.net.ConnectException: Connection refused: connect
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
        at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:845)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:957)
2018-04-11 11:48:33,338 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: localhost to address: localhost/127.0.0.1
2018-04-11 11:48:33,340 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@854] - Notification time out: 51200
 
同样再启动 zkServer2.cmd,此时 zkServ1 中仍然会提示无法连接上"3号"通道,但是有提示说连接上"2号"通道,提示如下:
2431796
 
而 zkServer2.cmd 则提示无法连接"3号"通道,然后与连接上的"1号"服务器开始竞选 leader,产生一个 leader 和 一个 follow。
2511765
 
最后再启动 zkServer3.cmd,此时不再提示异常了,并且会在三台服务器之间再一次竞选一个 leader,剩下两个为 follow。
不过在 Windows 系统下无法通过 zkServer.cmd 查看服务器状态,需要安装 Cygwin 工具,然后执行如下命令查看三个服务器的状态:
b3e2e431-875e-4103-be1c-7b2d678ec833
 

集群模式

与伪集群配置一样,只要将不同配置文件(zoo.cfg)分别部署在不同服务器上即可。
 

参考说明

 
by. Memento
原文地址:https://www.cnblogs.com/memento/p/8881944.html