# Mycat 分片

分片规则配置文件rule.xml位于conf目录下，它定义了所有拆分表的规则。在使用过程中可以灵活使用不同的分片算法，或者对同一个分片算法使用不同的参数，它让分片过程可配置化，只需要简单的几步就可以让运维人员及数据库管理员轻松将数据拆分到不同的物理库中。
转载： https://blog.csdn.net/u010520146/article/details/90752364

mycat分片规则列表有：

分片枚举
固定分片hash算法
范围约定
取模
按日期（天）分片
一致性hash
按单月小时拆分
范围求模分片
跳增一致性哈希分片

一.分片枚举 sharding-by-intinfile

枚举分片，就是根据某个值，决定这条数据放到哪一个库里面。

1.首先还是之前测试的三个节点上继续进行测试,schema.xml的界面如下:

<dataNode name="dn1" dataHost="localhost1" database="mycat" />
<dataNode name="dn2" dataHost="localhost1" database="mycat2" />
<dataNode name="dn3" dataHost="localhost1" database="mycat3" />

同时采用user表进行测试,表节点如下:

<table name="user" dataNode="dn1,dn2,dn3"  rule="sharding-by-intfile-user" />

2.配置rule.xm文件规则

<tableRule name="sharding-by-intfile-user">
	<rule>
		<columns>city</columns>
		<algorithm>hash-int-user</algorithm>
	</rule>
</tableRule>

<function name="hash-int-user" class="io.mycat.route.function.PartitionByFileMap">
	<property name="mapFile">partition-hash-int-user.txt</property>
	 <property name="type">1</property> 
	<property name="defaultNode">0</property>
</function>

说明如下：函数配置中，ype默认值为0，0表示Integer，非零表示String，所有的节点配置都是从0开始，及0代表节点1。

3. 添加partition-hash-int-user.txt 文件

# 10000=0
10010=1
fujian=0
beijing=1
shanghai=2

说明如下:city字段值为fujian,则插入到dn1,beijjing则插入到dn2,shagnhai则插入到dn3

4. 测试

 insert into  `user`(username,password,city) values('1','2','fujian1')
 insert into  `user`(username,password,city) values('1','2','fujian')
 insert into  `user`(username,password,city) values('1','2','beijing')
 insert into  `user`(username,password,city) values('1','2','shanghai')

结果则分别导入到dn1,dn1,dn2,dn3,符合预期

二.固定分片 hash 算法

本条规则类似于十进制的求模运算，区别在于是二进制的操作,是取 id 的二进制低 10 位，即 id 二进制 &1111111111。

此算法的优点在于如果按照 10 进制取模运算，在连续插入 1-10 时候 1-10 会被分到 1-10 个分片，增大了插入的事务控制难度，而此算法根据二进制则可能会分到连续的分片，减少插入事务事务控制难度。

rule.xml 中规则如下:

<tableRule name="rule1">
    <rule>
        <columns>user_id</columns>
        <algorithm>func1</algorithm>
    </rule>
</tableRule>

<!-- 分区策略：将数据水平分成 3 份，前两份各占 25%，第三份占 50% -->
<function name="func1" class="org.opencloudb.route.function.PartitionByLong">
    <property name="partitionCount">2,1</property>
    <property name="partitionLength">256,512</property>
</function>

配置说明：

columns：标识将要分片的表字段；
algorithm：分片函数；
partitionCount：分片个数列表；
partitionLength：分片范围列表。
分区长度：默认为最大 2^n=1024 ，即最大支持 1024 分区。
count，length 两个数组的长度必须是一致的； ---约束
1024 = sum((count[i]*length[i])). count 和 length 两个向量的点积恒等于 1024。---约束

三.范围约定规则 auto-sharding-long

1.在schema.xml新建表节点

<table name="user" dataNode="dn1,dn2,dn3"  rule="auto-sharding-long" />

2. 在rule.xml中可查看其定义规则,同时也可以修改

  <tableRule name="auto-sharding-long">
		<rule>
			<columns>id</columns>
			<algorithm>rang-long</algorithm>
		</rule>
  </tableRule>

其中,name是规则名字，columns段是分片的表字段，algorithm段是使用的函数

rang-long函数如下:

	<function name="rang-long"
   	class="io.mycat.route.function.AutoPartitionByLong">
   	<property name="mapFile">autopartition-long.txt</property>
   </function>

其中,，name是函数名字，mapfile是要读取的配置文件

3. autopartition-long.txt 如下:

# range start-end ,data node index
# 
# K=1000,M=10000.
0-5M=0
5M-10M=1
10M-1500M=2

意思为: 0-50000分配到dn1,50000-100000分配到dn2,100000-15000000分配到dn3
同时,你也可以自行修改其规则来满足您的业务需求.

4.例子:

插入不同的数据,如下:

insert into `user`(id,username,password) values(49999,'123','123445')
insert into `user`(id,username,password) values(50001,'123','123445')
insert into `user`(id,username,password) values(100001,'123','123445')

结果为在分别节点dn1,dn2,dn3上插入数据,符合预期

四.取模

rule.xml 中规则如下

<tableRule name="mod-long">
    <rule>
        <columns>user_id</columns>
        <algorithm>mod-long</algorithm>
    </rule>
</tableRule>
<function name="mod-long" class="org.opencloudb.route.function.PartitionByMod">
    <!-- how many data nodes -->
    <property name="count">3</property>
</function>

此种配置非常明确，即根据 id 进行十进制求模计算，如取模为0,1,2分别对应分区dn1,dn2,dn3,相比固定分片 hash，此种在批量插入时可能存在批量插入单事务插入多数据分片，增大事务一致性难度。

五.按日期（天）分片

此规则为按天,日期分片,其规则如下:

<tableRule name="sharding-by-month">
		<rule>
			<columns>create_time</columns>
			<algorithm>partbymonth</algorithm>
		</rule>
	</tableRule>
	
	<function name="partbymonth"
		class="io.mycat.route.function.PartitionByMonth">
		<property name="dateFormat">yyyy-MM-dd</property>
		<property name="sBeginDate">2015-01-01</property>
		 <property name="sEndDate">2025-01-02</property>
    	<property name="sPartionDay">30</property>
	</function>

配置说明：

columns ：标识将要分片的表字段；
algorithm ：分片函数；
dateFormat ：日期格式；
sBeginDate ：开始日期；
sEndDate：结束日期；
sPartionDay ：分区天数，即默认从开始日期算起，分隔 10 天一个分区。

六、一致性 hash

一致性 hash 预算有效解决了分布式数据的扩容问题.其规则示例如下:

<tableRule name="sharding-by-murmur">
    <rule>
        <columns>id</columns>
        <algorithm>murmur</algorithm>
    </rule>
</tableRule>

<function name="murmur"
		class="io.mycat.route.function.PartitionByMurmurHash">
		<property name="seed">0</property><!-- 默认是0 -->
		<property name="count">2</property><!-- 要分片的数据库节点数量，必须指定，否则没法分片 -->
		<property name="virtualBucketTimes">160</property><!-- 一个实际的数据库节点被映射为这么多虚拟节点，默认是160倍，也就是虚拟节点数是物理节点数的160倍 -->
		<!-- <property name="weightMapFile">weightMapFile</property> 节点的权重，没有指定权重的节点默认是1。以properties文件的格式填写，以从0开始到count-1的整数值也就是节点索引为key，以节点权重值为值。所有权重值必须是正整数，否则以1代替 -->
		<!-- <property name="bucketMapPath">/etc/mycat/bucketMapPath</property> 
			用于测试时观察各物理节点与虚拟节点的分布情况，如果指定了这个属性，会把虚拟节点的murmur hash值与物理节点的映射按行输出到这个文件，没有默认值，如果不指定，就不会输出任何东西 -->
	</function>

七、取模(crc32slot)

<tableRule name="crc32slot">
       <rule>
           <columns>id</columns>
           <algorithm>crc32slot</algorithm>
       </rule>
 </tableRule>

 <function name="crc32slot" class="io.mycat.route.function.PartitionByCRC32PreSlot">
        <property name="count">3</property><!-- 要分片的数据库节点数量，必须指定，否则没法分片 -->
 </function>

count=3指定需要分库的个数.

八、范围求模分片

优点可以避免扩容时的数据迁移，又可以一定程度上避免范围分片的热点问题。综合了范围分片和求模分片的优点，分片组内使用求模可以保证组内数据比较均匀，分片组之间是范围分片，可以兼顾范围查询。

<tableRule name="auto-sharding-rang-mod">
    <rule>
        <columns>id</columns>
        <algorithm>rang-mod</algorithm>
    </rule>
</tableRule>
<function name="rang-mod" class="org.opencloudb.route.function.PartitionByRangeMod">
    <property name="mapFile">partition-range-mod.txt</property>
    <property name="defaultNode">0</property>
</function>

partition-range-mod.txt 如下

# range start-end ,data node group size
#代表有 5 个分片节点
0-200M=5 
200M1-400M=1
400M1-600M=4
600M1-800M=4
800M1-1000M=6

九、跳增一致性哈希分片

思想源自Google公开论文，比传统一致性哈希更省资源速度更快数据迁移量更少

	<tableRule name="jch">
		<rule>
			<columns>id</columns>
			<algorithm>jump-consistent-hash</algorithm>
		</rule>
	</tableRule>

	<function name="jump-consistent-hash" class="io.mycat.route.function.PartitionByJumpConsistentHash">
		<property name="totalBuckets">3</property>
	</function>