ClickHouse高可用集群

 一、集群说明

zookeeper 需要单独部署在其他机器上,以免clickhouse 节点挂掉,引起zookeeper 挂掉。
0. 高可用原理:zookeeper + ReplicatedMergeTree(复制表) + Distributed(分布式表)
1. 前提准备:所有节点防火墙关闭或者开放端口;所有节点建立互信(免密码登录);hosts表和主机名一定要集群保持一致正确配置,因为zookeeper返
回的是主机名,配置错误或不配置复制表时会失败.
clickhouse测试节点3个:10.0.0.236 cdhserver1(clickhouse1), 10.0.0.237 cdhserver2 (clickhouse2),10.0.0.238 cdhserver3 (clickhouse3),10.0.0.239 cdhserver4
(clickhouse4)
zookeeper测试节点3个:10.0.0.237 cdhserver2 (zookeeper),10.0.0.238 cdhserver3 (zookeeper),10.0.0.239 cdhserver4 (zookeeper)
配置方案:4个节点点各配置两个clickhouse实例,相互备份.
cdhserver1: 实例1, 端口: tcp 9006, http 8123, 同步端口9006, 类型: 主节点
cdhserver2: 实例2, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片1, 副本1
cdhserver3: 实例1, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片2, 副本1
cdhserver4: 实例2, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片3, 副本1

二、环境准备
1)主机配置(根据自身情况)

10.0.0.236 cdhserver1 centos 7.1 32G 200G
10.0.0.237 cdhserver2 centos 7.1 32G 200G
10.0.0.238 cdhserver3 centos 7.1 32G 200G
10.0.0.239 cdhserver4 centos 7.1 32G 200G

2)hosts表和主机名

3)所有节点防火墙关闭或者开放端口;
# 1.关闭防火墙
service iptables stop
chkconfig iptables off
chkconfig ip6tables off
# 2.关闭selinux
修改/etc/selinux/config中的SELINUX=disabled后重启
[root@cdhserver1 ~]# vim /etc/selinux/config
SELINUX=disabled

4)优化所有节点服务器打开文件个数
在/etc/security/limits.conf、/etc/security/limits.d/90-nproc.conf这2个文件的末尾加入一下内容:
[root@cdhserver1 software]# vim /etc/security/limits.conf
在文件末尾添加:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
[root@cdhserver1 software]# vim /etc/security/limits.d/90-nproc.conf
在文件末尾添加:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
重启服务器it -n 或者ulimit -a查看设置结果
[root@cdhserver1 ~]# ulimit -n
65536

5)所有节点建立互信(免密码登录);
1、在各节点通过ssh-keygen生成RSA密钥和公钥
ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa
2、将所有的公钥文件汇总到一个总的授权key文件中,在cdhserver1机器执行如下命令,必须逐行执行:
ssh cdhserver1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver3 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver4 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3、设置授权key文件的权限,在M01机器执行如下命令:
chmod 600 ~/.ssh/authorized_keys
4、分发授权key文件到所有服务器,必须逐行执行:
scp ~/.ssh/authorized_keys cdhserver1:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver2:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver3:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver4:~/.ssh/

三、分布式集群安装
1、包准备

说明:
1) 安装资源获取: https://packagecloud.io/Altinity/clickhouse。
2)也可以直接在csdn 下载:https://download.csdn.net/download/u013205003/10968848
2、在所有节点安装clickhouse(centos 7.1 为例)
1) 分别在个服务器上进行如下安装和配置
2)安装clickhouse
安装 libicu
mkdir -p /usr/local/icu/
cd /usr/local/icu/libicu-4.2.1-14.el6.x86_64.rpm
rpm -ivh libicu-4.2.1-14.el6.x86_64.rpm
安装clickhouse
rpm -ivh clickhouse-server-common-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-compressor-1.1.54336-3.el6.x86_64.rpm
rpm -ivh clickhouse-common-static-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-server-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-client-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-test-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-debuginfo-18.14.12-1.el6.x86_64.rpm

3、配置文件修改
1)四个节点修改配置文件config.xml
[root@cdhserver1 ~]# vim /etc/clickhouse-server/config.xml
<http_port>8123</http_port>
<tcp_port>9006</tcp_port>
<listen_host>::</listen_host>
<!-- <listen_host>::1</listen_host> -->
<!-- <listen_host>127.0.0.1</listen_host> -->

<!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
<!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->

2)四个节点修改配置文件users.xml添加用户

[root@cdhserver1 ~]# vim /etc/clickhouse-server/users.xml

修改/etc/clickhouse-server/users.xml
在<!-- Example of user with readonly access. -->上新增:

<ck>
    <password_sha256_hex>8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92</password_sha256_hex>
    <networks incl="networks" replace="replace">
    <ip>::/0</ip>
    </networks>
    <profile>default</profile>
    <quota>default</quota>
</ck>

3)四个节点的/etc/clickhouse-server目录下新建metrika.xml文件集群分片的配置
[root@cdhserver1 ~]# vim /etc/clickhouse-server/metrika.xml
添加如下内容:

<?xml version="1.0"?>
<yandex>
<!--ck集群节点-->
<clickhouse_remote_servers>
    <idc_cluster>
        <!--分片1-->
        <shard>
            <weight>1</weight>
            <replica>
                <host>cdhserver2</host>
                <port>9006</port>
                <user>ck</user>
                <password>123456</password>
               <compression>true</compression>
            </replica>
        </shard>
        <!--分片2-->
        <shard>
            <weight>1</weight>
            <replica>
                <host>cdhserver3</host>
                <port>9006</port>
                <user>ck</user>
                <password>123456</password>
                <compression>true</compression>
            </replica>
        </shard>
        <!--分片3-->
        <shard>
            <weight>1</weight>
            <replica>
                <host>cdhserver4</host>
                <port>9006</port>
                <user>ck</user>
                <password>123456</password>
                <compression>true</compression>
            </replica>
        </shard>
    </idc_cluster>
    <idc_replicate_cluster>
        <!--分片1-->
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>cdhserver2</host>
                <port>9006</port>
                <user>ck</user>
                <password>123456</password>
            </replica>
            <replica>
                <host>hadoop2</host>
                <port>9006</port>
                <user>ck</user>
                <password>123456</password>
            </replica>
        </shard>
        <!--分片2-->
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>cdhserver3</host>
                <port>9006</port>
                <user>ck</user>
                <password>123456</password>
            </replica>
            <replica>
                <host>hadoop3</host>
                <port>9006</port>
                <user>ck</user>
                <password>123456</password>
            </replica>
        </shard>
        <!--分片3-->
                <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>cdhserver4</host>
                <port>9006</port>
                <user>ck</user>
                <password>123456</password>
            </replica>
            <replica>
                <host>hadoop4</host>
                <port>9006</port>
                <user>ck</user>
                <password>123456</password>
            </replica>
        </shard>
    </idc_replicate_cluster>
</clickhouse_remote_servers>

<!--zookeeper相关配置-->
<zookeeper-servers>
    <node index="1">
        <host>cdhserver2</host>
        <port>2181</port>
    </node>
    <node index="2">
        <host>cdhserver3</host>
        <port>2181</port>
    </node>
    <node index="3">
        <host>cdhserver4</host>
        <port>2181</port>
    </node>
</zookeeper-servers>

<!--分片和副本配置-->
<macros>
    <replica>cdhserver1</replica>
</macros>

<!--开启远程访问-->
<networks>
    <ip>::/0</ip>
</networks>

<!--压缩相关配置-->
<clickhouse_compression>
    <case>
        <min_part_size>10000000000</min_part_size>
        <min_part_size_ratio>0.01</min_part_size_ratio>
        <method>lz4</method> <!--压缩算法lz4压缩比zstd快, 更占磁盘-->
    </case>
</clickhouse_compression>
</yandex>

注意:上面标红的地方需要根据节点不同去修改

cdhserver2 分片1,副本1:

<macros>
    <layer>01</layer>
    <shard>01</shard>
    <replica>cdhserver2</replica>
</macros>

cdhserver3 分片1,副本1:

<macros>
    <layer>01</layer>
    <shard>02</shard>
    <replica>cdhserver3</replica>
</macros>  

cdhserver4 分片1,副本1:

<macros>
    <layer>01</layer>
    <shard>03</shard>
    <replica>cdhserver4</replica>
</macros>

4、 启动脚本
启动脚本路径:/etc/init.d/clickhouse-server

前台启动:
[root@cdhserver1 software]# clickhouse-server --config-file=/etc/clickhouse-server/config.xml
后台启动:
[root@cdhserver1 software]# nohup clickhouse-server --config-file=/etc/clickhouse-server/config.xml >null 2>&1 &

复制修改
然后进入到配置文件目录,将原有配置文件拷贝一份,这里是config1.xml,然后修改配置:
修改config.xm,pid使用clickhouse-server-1.pid

五、测试. 创建本地复制表和分布式表
1、所有实例配置完上面这些之后,分别执行启动命令启动,然后所有实例都执行下面语句创建数据库:
CREATE DATABASE monchickey;
2、 然后对于所有实例分别创建对应的复制表,这里测试创建一个简单的表

drop table monchickey.t_co_dwfxdata_0809_30000000_new;
CREATE
	TABLE
		monchickey.t_co_dwfxdata_0809_30000000_new(
		    CRETIME date,
			YEAR  String,
			MONTH String,
			COMPNAME String,
			CBYS String,
			YWHD String,
			DYDJ String,
			ZCLX String,
			AMOUNT Float64,
			YHLB String,
			YSLX String
		) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/t_co_dwfxdata_0809_30000000_new', '{replica}',
			CRETIME,
			(COMPNAME),
			8192
		);
 
drop table monchickey.t_co_dwfxdata_0809_30000000_all;
CREATE TABLE monchickey.t_co_dwfxdata_0809_30000000_all AS monchickey.t_co_dwfxdata_0809_30000000_new ENGINE = Distributed(distable, monchickey, t_co_dwfxdata_0809_30000000_new, rand());
 
select  count (*) from  monchickey.t_co_dwfxdata_0809_30000000_new;
select count(*) from  monchickey.t_co_dwfxdata_0809_30000000_all;

3、简单数据样例

分别在四台机器上插入如下数据

cdhserver1 /clickhouse1

insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','其他长期职工福利','检修-输电运检','500kV及以上','架空线路',9.4441398224E8,'','工');
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','其他短期薪酬','检修-输电运检','500kV及以上','架空线路',2.0894114651E8,'','工');

cdhserver2 /clickhouse2

insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','其他长期职工福利','检修-输电运检','500kV及以上','架空线路',9.4441398224E8,'','工')
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES (('2016-01-','2016年','1月','单位8','临时用工薪酬','检修-输电运检','500kV及以上','架空线路',8.5247987777E8,'','工');
cdhserver3 /clickhouse3

insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','劳务派遣费','检修-输电运检','500kV及以上','架空线路',3.59378772E8,'','工');
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','委托服务费','检修-输电运检','500kV及以上','架空线路',4.7638581405E8,'','费');
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','财产保险费','检修-输电运检','500kV及以上','架空线路',9.527716281E8,'','费');
在任意一台机器上查询数据

a.能查询到完成 7条数据
b.查询总数为7条
select * from monchickey.t_co_dwfxdata_0809_30000000_all;--能查询到完整的7 条数据
select count(*) from monchickey.t_co_dwfxdata_0809_30000000_all; -- 总数为7

六、集群表操作

1、表增加字段

alter TABLE idc.web_initial ON CLUSTER idc_cluster add COLUMN tracerouteip String AFTER jitter;

2、更改列的类型

alter TABLE idc.web_initial ON CLUSTER idc_cluster modify column tracerouteip UInt16;

3、 删除列

alter TABLE idc.web_initial ON CLUSTER idc_cluster drop column tracerouteip;

 4、删除集群多个节点同一张表

drop table tabl  on cluster clickhouse_cluster;

drop TABLE if exists idc.web_initial on CLUSTER idc_cluster ;

5、清理集群表数据

truncate table lmmbase.user_label_uid on cluster crm_4shards_1replicas;

七、优化

1、max_table_size_to_drop
默认情况下, Clickhouse 不允许删除分区或表的大小大于 50GB 的分区或表. 可以通过修改server的配置文件来永久配置. 也可以临时设置一下来删除而不用重启服务.
永久配置
sudo vim /etc/clickhouse-server/config.xml
然后注释掉下面两行
<!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
<!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->
0表示不限制. 或者你可以设置为你想限制的最大的大小.
临时设置
创建个标志文件:
sudo touch '/home/username/clickhouse/flags/force_drop_table' && sudo chmod 666 '/home/username/clickhouse/flags/force_drop_table'
创建好之后, 就可以执行上面的删除分区或表的命令了.
2、max_memory_usage
此参数在/etc/clickhouse-server/users.xml中,表示档次query占内存最大值,超过本值query失败,建议在资源足够情况尽量调大
<max_memory_usage>25000000000</max_memory_usage>
3、删除多个节点上的同张表
drop table tabl on cluster clickhouse_cluster;

参考资料:https://blog.csdn.net/u013205003/article/details/87861613?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~first_rank_v2~rank_v25-3-87861613.nonecase&utm_term=%E6%90%AD%E5%BB%BAclickhouse%E9%AB%98%E5%8F%AF%E7%94%A8%E9%9B%86%E7%BE%A4%E6%90%AD%E5%BB%BA

原文地址:https://www.cnblogs.com/uestc2007/p/13704912.html