PHP 7.0 中各种 Hash 速度比较

概述

最近需要对一些很长的 msyql 字段做索引优化。讨论下来有几种解决方案带确定,其中一个就是对现有字符做 hash,然后对此hash和原始字符做联合索引。就此有了 hash 效率比较的需求,文中使用 php 对一段字符做 200 万次 hash,并输出程序执行时间。

系统信息

  • Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
  • Memery 12GB, Swap 12GB
  • HDD 500GB
  • Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
  • PHP 7.0.33-0+deb9u1 (cli) (built: Dec 7 2018 11:36:49) ( NTS )

执行速度排序

Array
(
    [fnv132] => 0.26369595527649
    [fnv1a32] => 0.2675929069519
    [fnv164] => 0.27193093299866
    [adler32] => 0.27417206764221
    [fnv1a64] => 0.28172397613525
    [joaat] => 0.29366397857666
    [crc32b] => 0.34514021873474
    [crc32] => 0.37110996246338
    [md4] => 0.44389486312866
    [md5] => 0.46207499504089
    [tiger128,3] => 0.54009604454041
    [tiger160,3] => 0.55391597747803
    [tiger192,3] => 0.57025694847107
    [sha1] => 0.57897710800171
    [tiger128,4] => 0.61153793334961
    [tiger160,4] => 0.62242317199707
    [tiger192,4] => 0.6432900428772
    [ripemd128] => 0.80352711677551
    [ripemd256] => 0.84451103210449
    [ripemd160] => 1.0310969352722
    [sha224] => 1.0542829036713
    [sha256] => 1.0582711696625
    [ripemd320] => 1.0992820262909
    [sha384] => 1.3508479595184
    [sha512] => 1.396675825119
    [haval128,3] => 1.4093809127808
    [haval192,3] => 1.4192271232605
    [haval160,3] => 1.4261200428009
    [haval224,3] => 1.4328649044037
    [haval256,3] => 1.443500995636
    [haval128,4] => 1.7986199855804
    [haval160,4] => 1.8255050182343
    [haval192,4] => 1.8294408321381
    [haval256,4] => 1.8410999774933
    [haval224,4] => 1.841756105423
    [haval128,5] => 2.1614220142365
    [haval160,5] => 2.1736621856689
    [haval192,5] => 2.1849989891052
    [haval224,5] => 2.1921010017395
    [haval256,5] => 2.1987628936768
    [whirlpool] => 2.3075139522552
    [gost] => 4.3380508422852
    [gost-crypto] => 4.3576400279999
    [snefru256] => 6.5909118652344
    [snefru] => 6.6243891716003
    [md2] => 15.983593940735
)

测试代码及结果

$arr_supperted_algos = hash_algos();
$arr_proc_time = array();
$time0 = microtime(true);

// get all supperted hash algos
if ($arr_supperted_algos !== null){
    echo "------------------------- Support hash algos:-------------------------
";
    print_r($arr_supperted_algos);

    echo "========================= Task begin: =========================
";

    foreach($arr_supperted_algos as $index=>$algos){
        $str_tmp = uniqid(true) . (microtime());
        $time_inner = microtime(true);
        for ($i=0; $i<2000000; $i++){
            hash($algos,$str_tmp);
        }
        $used_seconds = microtime(true) - $time_inner;
        echo ">--  {$algos} processed in  {$used_seconds} seconds.
";
        $arr_proc_time[$algos] = $used_seconds;
    }

    echo "++++++++++++++++++++++ summary (sorted by action time): ++++++++++++++++++++++
";

    // 按照数组的 "值" 升序排列,参见: https://secure.php.net/manual/zh/array.sorting.php
    asort($arr_proc_time);
    print_r($arr_proc_time);
}

$time1 = microtime(true) - $time0;
echo "Finish. Total time: {$time1} seconds.
";

结论

crc32 速度比 md5 快了不少,

在我的另一项测试中发现,40万字符 hash 测试中

  • 第一次有 13 个重复项;
  • 第二次有 53 个重复项;
  • 第三次有 4 个重复项;
  • 第四次有 12 个重复项;
  • 第五次有 15 个重复项

在 4 千万条记录中。。。
抱歉,没测完,PHP 速度太慢了[捂脸],不过用 Java 测下来的结果如下:

  1. 186031 条重复 @168秒
  2. 185386 条重复 @110秒 (使用了HashSet预置容量400000000)
  3. 185514 条重复 @110秒

加上,CRC32 能输出 数字类型(大约是 12 位左右,记得 mysql 中用 bigint ),其重复率千分之四,在DB中效果应该不错,回头试试。

原文地址:https://www.cnblogs.com/mslagee/p/10181136.html