mysql中group by存在局限性探讨(待续)

现在有一个需求:在2018年游戏充值记录表(字段主要有: user_name , money , game_id , 表有6000w行)查找出哪些人在某段日期内累计充值金额在100~500元范围内的,返回满足以上条件的所有用户名

具体表结构:

CREATE TABLE `pay_list_pay_2018` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `orderid` varchar(50) NOT NULL,
  `user_name` varchar(50) NOT NULL,
  `pay_way_id` tinyint(4) NOT NULL,
  `money` float NOT NULL,
  `paid_amount` float unsigned NOT NULL,
  `pay_date` date NOT NULL,
  `pay_time` int(11) NOT NULL DEFAULT '0',
  `agent_id` int(11) NOT NULL DEFAULT '0',
  `placeid` int(11) NOT NULL DEFAULT '0',
  `cplaceid` varchar(50) DEFAULT NULL,
  `adid` varchar(100) DEFAULT NULL,
  `game_id` int(11) NOT NULL DEFAULT '0',
  `server_id` int(11) NOT NULL DEFAULT '0',
  `reg_date` date NOT NULL,
  `reg_time` int(11) NOT NULL DEFAULT '0',
  `cid` tinyint(1) NOT NULL DEFAULT '0',
  `bank_type` int(11) NOT NULL DEFAULT '1',
  `plat_id` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `orderid` (`orderid`),
  KEY `agent_id` (`agent_id`),
  KEY `pay_date` (`pay_date`),
  KEY `game_id` (`game_id`),
  KEY `server_id` (`server_id`),
  KEY `user_name` (`user_name`),
  KEY `reg_date` (`reg_date`),
  KEY `placeid` (`placeid`),
  KEY `pay_way_id` (`pay_way_id`),
  KEY `plat_id` (`plat_id`),
  KEY `pay_time` (`pay_time`),
  KEY `reg_time` (`reg_time`),
  KEY `bank_type` (`bank_type`)
) ENGINE=InnoDB AUTO_INCREMENT=73555898 DEFAULT CHARSET=utf8

那么,由此快速可以用group by和sum函数筛选用户:

SELECT user_name,sum(money)  as pay_money FROM db_pay.`pay_list_pay_2018` WHERE ( `game_id` IN ('688','919','228','179') ) 
AND (  (`pay_time` BETWEEN '1545580800' AND '1545667200' ))
group by user_name having pay_money  BETWEEN  '100'  and '500'

然后,对于以上sql,当pay_time范围较大火灾game_id数量较多的时候,group by需要处理的分组数量也过多(超过1000w),导致sql时间很长:

后面通过PHP脚本处理,每3天统计一次用户累计充值金额,最后再用php数组记录用户名-累计充值金额:

/**
     * 对比充值分表与总表的情况
     */
    public function comparePay() {
        $row_0 = $row_10 = $row_50 = $row_500 = 0;
        $user_names = [];
        $db = DB::getInstance('pay');
        $date1 = strtotime("2018-01-01");
        $date2 = strtotime("2018-01-06");
        for ($s_date = $date1; $s_date < $date2; $s_date+=3600*24*3 ) {
            $e_date = $s_date+3600*24*3;
            $sql = "SELECT user_name,sum(money) as total  FROM db_pay.pay_list_pay_2018 where game_id in ('373','365','811','366','351','352','853','1260','988','1206','1232','883','871','872','881','963','884','1231','870','873','882','880','962','874','731','711','654','628','350','291','310','314','311')
and pay_time between {$s_date} and {$e_date} group by user_name";
            $list = $db->find($sql);
            foreach ($list as $item) {
                $user_names[$item['user_name']] += $item['total'];
            }
        }
        foreach ($user_names as $total) {
            if($total>=0 && $total<10) {
                $row_0++;
            }else if($total>=10 && $total<50) {
                $row_10++;
            }else if($total>=50 && $total<500) {
                $row_50++;
            }else if($total>500) {
                $row_500++;
            }
        }
        echo "
";
        echo count($user_names);
        echo "
";
        echo json_encode(compact('row_0','row_10','row_50','row_500'));
    }

实际上,以上操作耗时也挺久的,只是将group by运算的压力转移到了PHP上,但是虽然统计日期时间段较长时,同样存在优化空间;

之后,可以考虑swoole并发执行sql或者用其他数据方法处理(待续)

原文地址:https://www.cnblogs.com/chq3272991/p/10608468.html