大数据(日志分析)项目

需求:统计一个网页的访问次数pv,及其用户访问量uv

环境:hadoop-cdh伪分布式,mysql,hive,sqoop,springboot

思路:

编码开始:

1.将爬取的元数据上传至hdfs文件系统

bin/hdfs dfs -mkdir /project
bin/hdfs dfs -put /usr/local/2015082818 /project
bin/hdfs dfs -put /usr/local/2015082819 /project

2.hive创建源文件表

create table yhd_source(
id               string ,
url              string ,
referer          string ,
keyword          string ,
type             string ,
guid             string ,
pageId           string ,
moduleId         string ,
linkId           string ,
attachedInfo     string ,
sessionId        string ,
trackerU         string ,
trackerType      string ,
ip               string ,
trackerSrc       string ,
cookie           string ,
orderCode        string ,
trackTime        string ,
endUserId        string ,
firstLink        string ,
sessionViewNo    string ,
productId        string ,
curMerchantId    string ,
provinceId       string ,
cityId           string ,
fee              string ,
edmActivity      string ,
edmEmail         string ,
edmJobId         string ,
ieVersion        string ,
platform         string ,
internalKeyword  string ,
resultSum        string ,
currentPage      string ,
linkPosition     string ,
buttonPosition   string 
)row format delimited fields terminated by '	' location '/project';

3.建一张清洗表,将时间字段清洗,提取部分的时间字段出来

create table yhd_qingxi(
id string,
url string,
guid string,
date string,
hour string
)
row format delimited fields terminated by '	';

4.字段截取,天&小时

insert into table yhd_qingxi select id,url,guid,substring(trackTime,9,2) date,substring(trackTime,12,2) hour from yhd_source;

5.分区的方式:hive静态分区

create table yhd_part1(
id string,
url string,
guid string
)
partitioned by (date string,hour string)
row format delimited fields terminated by '	';

6.加载数据,来源于source源表

insert into table yhd_part1 partition (date='20150828',hour='18') select id,url,guid from yhd_qingxi where date='28' and hour='18';
insert into table yhd_part1 partition (date='20150828',hour='19') select id,url,guid from yhd_qingxi where date='28' and hour='19';

7.PV实现:

select date,hour,count(url) PV from yhd_part1 group by date,hour;
-》按照天和小时进行分区
-》结果:
+-----------+-------+--------+--+
|   date    | hour  |   pv   |
+-----------+-------+--------+--+
| 20150828  | 18    | 64972  |
| 20150828  | 19    | 61162  |
+-----------+-------+--------+--+

8.uv实现

select date,hour,count(distinct guid) UV from yhd_part1 group by date,hour; 

-》结果:
+-----------+-------+--------+--+
|   date    | hour  |   uv   |
+-----------+-------+--------+--+
| 20150828  | 18    | 23938  |
| 20150828  | 19    | 22330  |
+-----------+-------+--------+--+

9.pv与uv整合

create table if not exists result as select date,hour,count(url) PV ,count(distinct guid) UV from yhd_part1 group by date,hour; 

-》结果:
+--------------+--------------+------------+------------+--+
| result.date  | result.hour  | result.pv  | result.uv  |
+--------------+--------------+------------+------------+--+
| 20150828     | 18           | 64972      | 23938      |
| 20150828     | 19           | 61162      | 22330      |
+--------------+--------------+------------+------------+--+

10.将结果导出到mysql表中

将结果导出到mysql表中

先在mysql建表:用于保存结果集
create table if not exists save(
date varchar(30) not null,
hour varchar(30) not null,
pv varchar(30) not null,
uv varchar(30) not null
);

使用sqoop实现导出到mysql

bin/sqoop export 
--connect 
jdbc:mysql://hadoop5.lhcedu.com:3306/sqoop 
--username root 
--password 1234456 
--table save 
--export-dir /user/hive/warehouse/lhc125.db/result 
--num-mappers 1 
--input-fields-terminated-by '01'

+----------+------+-------+-------+
| date     | hour | pv    | uv    |
+----------+------+-------+-------+
| 20150828 | 18   | 64972 | 23938 |
| 20150828 | 19   | 61162 | 22330 |
+----------+------+-------+-------+


hive默认的分隔符:001

11.

动态分区

分区的方式:动态分区

hive-site.xml

<property>
  <name>hive.exec.dynamic.partition</name>
  <value>true</value>
  <description>Whether or not to allow dynamic partitions in DML/DDL.</description>
</property>
-》默认值是true,代表允许使用动态分区实现

<property>
  <name>hive.exec.dynamic.partition.mode</name>
  <value>strict</value>
  <description>In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions.</description>
</property>


-set hive.exec.dynamic.partition.mode=nonstrict;  使用非严格模式

建表:

create table yhd_part2(
id string,
url string,
guid string
)
partitioned by (date string,hour string)
row format delimited fields terminated by '	';

执行动态分区:
insert into table yhd_part2 partition (date,hour) select * from yhd_qingxi;

spring-boot整合Echarts完成图形化报表,

echarts官网案例:https://echarts.baidu.com/examples/editor.html?c=area-stack

 常规项目环境下:

1.在jsp页面引入echarts-min.js库

<%@ page contentType="text/html;charset=UTF-8"  pageEncoding="UTF-8" %>
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta content="always" name="referrer">
<title>file</title>
<script type="text/javascript" src="${pageContext.request.contextPath}/jquery.min.js"></script>
<script src="${pageContext.request.contextPath}/echarts.min.js"></script>

<body>

<!-- 为 ECharts 准备一个具备大小(宽高)的 DOM -->
<div id="ec" style=" 600px;height:400px;"></div>
<script type="text/javascript">
    var myChart = echarts.init(document.getElementById('ec'));

    option = {
        title: {
            text: '一号店pv/uv统计图'
        },
        tooltip : {
            trigger: 'axis',
            axisPointer: {
                type: 'cross',
                label: {
                    backgroundColor: '#6a7985'
                }
            }
        },
        legend: {
            data:['PV','UV']
        },
        toolbox: {
            feature: {
                saveAsImage: {}
            }
        },
        grid: {
            left: '3%',
            right: '4%',
            bottom: '3%',
            containLabel: true
        },
        xAxis : [
            {

            }
        ],
        yAxis : [
            {
                /*type : 'value'*/
            }
        ],
       series : [


        ]
    };
    myChart.setOption(option);
    $.ajax({
        type: "get",
        url: "${pageContext.request.contextPath}/re/query",
        dataType: "JSON",
        success: function (data) {
            myChart.setOption({
                xAxis : [
                    {
                        type : 'category',
                        boundaryGap : false,
                        data : data.h
                    }
                ],
                series: [ {
                    name:'PV',
                    type:'line',
                    stack: '总量',
                    areaStyle: {},
                    data:data.pv
                },
                    {
                        name:'UV',
                        type:'line',
                        stack: '总量',
                        areaStyle: {},
                        data:data.uv
                    }
                ]
            })
        }
    })
</script>
</body>
</html>

2.controller封装完成echarts需要的数据结构的封装

package com.lhc.controller;

import com.lhc.entity.Result;
import com.lhc.service.ResultService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseBody;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Controller
@RequestMapping("re")
public class ResultController {
    @Autowired
    ResultService resultService;

    @RequestMapping("query")
    @ResponseBody
    public Map query(){
        List<Result> results = resultService.queryResult();
        List<String> pv = new ArrayList();
        List<String> uv = new ArrayList();
        List<String> date = new ArrayList<>();
        for (Result result : results) {
            pv.add(result.getPv());
            uv.add(result.getUv());
            date.add(result.getHour());
        }
        Map map = new HashMap();
        map.put("pv",pv);
        map.put("uv",uv);
        map.put("h",date);
        return map;
    }
}

结果,完成图形化报表

原文地址:https://www.cnblogs.com/lhc-hhh/p/10386100.html