Mybatis批量Insert及水平分表

工作中需要读取很多大数据量（1000w条）的文件并写入到mysql表里，涉及到的技术点主要是数据库的addbatch及水平分表。

数据库的写入场景包括：一条一条的写入和批量写入，这里数据库的批量增加使用mybatis框架完成。

水平分表的意思是本来我们要将1000w的数据写入到一张表里，但为了考虑未来表容量的扩展，及表的性能要求，将本来写入一张表转换成写入多张表。

我在这里没有使用一些框架（Cobar Client，Shardbatis，mybatis-shards），而是采用Hash分表来实现的。

首先是Mybatis批量insert

说一下核心部分，整个工程参考我的github，运行的main方法在org/xiongmaotailang/mybatis/batchinsert/DbUtil.java中，涉及到的脚本是sql.txt

需要的数据表示例，包括4个字段。

CREATE TABLE `newspvuv` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `pv` bigint(11) DEFAULT NULL,
  `uv` bigint(11) DEFAULT NULL,
  `time` varchar(15) NOT NULL,
  PRIMARY KEY (`id`)
)

接下来看看批处理的mapper.xml文件（工程中orgxiongmaotailangmybatisatchinsertmappersDataMapper.xml），批量插入方法的定义

<mapper namespace="org.xiongmaotailang.mybatis.batchinsert.mappers.DataMapper">    
    <insert id="insertPVUV">
        insert into  ${table}(pv,uv,time) values(#{pv},#{uv},#{time})
    </insert>
    <insert id="batchInsertPVUV" parameterType="java.util.List">  
        insert into  ${table}(pv,uv,time) values  
        <foreach collection="list" item="item" index="index"  
            separator=",">  
            (#{item.pv,jdbcType=INTEGER},#{item.uv,jdbcType=INTEGER},#{item.time,jdbcType=CHAR}  
            )  
        </foreach>  
    </insert>  
</mapper>

id="insertPVUV"是一条一条的写入的配置、id="batchInsertPVUV"是批量写入的配置。

对上边二个配置的main方法在DbUtil中。

    public static void main(String[] args) throws InterruptedException {
        testInsert();
        testBatchInsert();
    }
    private static void testInsert() {
        long start=System.currentTimeMillis();
        for (int i = 0; i < 1000; i++) {
            addPvUv(12,i,"123");
        }
        System.out.println("insert 1000 row :"+(System.currentTimeMillis()-start)+"ms");
    }
    private static void testBatchInsert() {
        long start=System.currentTimeMillis();
        List<NewsPvUv> list=new ArrayList<NewsPvUv>();
        for (int i = 0; i < 1000; i++) {
            NewsPvUv n = new NewsPvUv(12, i, "123");
            list.add(n);
        }
        addPvUv(list);
        System.out.println("batch insert 1000 row :"+(System.currentTimeMillis()-start)+"ms");
    }

上边对比了对1000条数据的二种写入方式，在我笔记本的测试结果如下图，可见批量写入的性能高效。

水平分表

整个工程参考我的github

方法一：使用MD5哈希

// 使用md5做hash水平分表
    public static String getTable(String mark, String prefix, int num) {
        if (num == 0)
            return prefix;
        String temp = md5(mark).substring(0, 2);
        int hexdec = Integer.parseInt(temp, 16);// 16转成10进制
        int index = hexdec % num + 1;
        return prefix + "_" + index;
    }

    // 提供和php->md5一样的功能
    private static String md5(String txt) {
        try {
            MessageDigest md = MessageDigest.getInstance("MD5");
            md.update(txt.getBytes("GBK")); // 问题主要出在这里，Java的字符串是unicode编码，不受源码文件的编码影响；而PHP的编码是和源码文件的编码一致，受源码编码影响。
            StringBuffer buf = new StringBuffer();
            for (byte b : md.digest()) {
                buf.append(String.format("%02x", b & 0xff));
            }
            return buf.toString();
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }

方法二：使用移位

// 使用移位分表
    /*
     * 如果我们预估我们系统的用户是100亿，单张表的最优数据量是100万，
     * 那么我们就需要将UID移动20来确保每个表是100万的数据，保留用户表（user_xxxx）四位来扩展1万张表
     */
    public static String getTable1(int uid, String prefix) {
        
        return prefix + "_" + String.format("%04d",(uid>>20));
    }