离线电商数仓(六十六)之数据质量监控(二)Griffin(三) 安装及使用(一)

2.1 安装前环境准备
2.1.1 安装 ES5.2
1)上传 elasticsearch-5.2.2.tar.gz 到 hadoop102 的/opt/software 目录,并解压到/opt/module目录
[atguigu@hadoop102 software]$ tar -zxvf elasticsearch-5.2.2.tar.gz -C /opt/module/
2)修改/opt/module/elasticsearch-5.2.2/config/elasticsearch.yml 配置文件
[atguigu@hadoop102 config]$ vim elasticsearch.yml 
network.host: hadoop102
http.port: 9200
http.cors.enabled: true
http.cors.allow-origin: "*"
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
3)修改 Linux 系统配置文件/etc/security/limits.conf
[atguigu@hadoop102 elasticsearch-5.2.2]$ sudo vim /etc/security/limits.conf
#添加如下内容 
* soft nproc 65536 
* hard nproc 65536 
* soft nofile 65536 
* hard nofile 65536
[atguigu@hadoop102 elasticsearch-5.2.2]$ sudo vim /etc/sysctl.conf

#添加 
vm.max_map_count=655360
[atguigu@hadoop102 elasticsearch-5.2.2]$ sudo vim /etc/security/limits.d/90-np

#修改配置 
* soft nproc 2048
[atguigu@hadoop102 elasticsearch-5.2.2]$ sudo sysctl -p
4)需要重新启动虚拟机
[atguigu@hadoop102 elasticsearch-5.2.2]$ su root 
root@hadoop102 elasticsearch-5.2.2]# reboot
5)在/opt/module/elasticsearch-5.2.2 路径上,启动 ES
[atguigu@hadoop102 elasticsearch-5.2.2]$ nohup /opt/module/elasticsearch-5.2.2/bin/elasticsearch &
6)在 ES 里创建 griffin 索引
[atguigu@hadoop102 ~]$ curl -XPUT http://hadoop102:9200/griffin -d '
{ "aliases": {}, 
"mappings": {
     "accuracy": {
         "properties": { 
                "name": { 
                       "fields": { 
                             "keyword": {
                                     "ignore_above": 256,
                                      "type": "keyword" 
                                 }
                                   },
                                     "type": "text" 
                                  },
                                "tmst": { 
                                     "type": "date" 
                                } 
                       } 
              }
 },"settings": { 
          "index": {
                "number_of_replicas": "2", 
                "number_of_shards": "5"
           } 
      } 
}'                    
2.1.2 启动 HDFS & Yarn 服务
[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh 
[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh
2.1.3 修改 Hive 配置
注意:Hive 版本至少 2.2 及以上
3)将 Mysql 的 mysql-connector-java-5.1.27-bin.jar 拷贝到/opt/module/hive/lib/
[atguigu@hadoop102 module]$ cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-con nector-java-5.1.27-bin.jar /opt/module/hive/lib/
4)在/opt/module/hive/conf 路径上,修改 hive-site.xml 文件,添加红色部分。(注意 mysql
的密码要正确,否则元数据连接不上)
[atguigu@hadoop102 conf]$ vim hive-site.xml 
#添加如下内容 
<?xml version="1.0"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration>
     <property> 
        <name>javax.jdo.option.ConnectionURL</name>                 
        <value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=tru e</value>
         <description>JDBC connect string for a JDBC metastore</description> 
      </property> 
      <property> 
           <name>javax.jdo.option.ConnectionDriverName</name>         
           <value>com.mysql.jdbc.Driver</value>     
           <description>Driver class name for a JDBC metastore</description>
       </property>
       <property> 
              <name>javax.jdo.option.ConnectionUserName</name> 
              <value>root</value> <description>username to use against metastore database</description> 
        </property> 
        <property> 
               <name>javax.jdo.option.ConnectionPassword</name> 
               <value>123456</value> <description>password to use against metastore database</description> 
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value> </property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop102:9083</value>
</property>
</configuration>
3)启动服务
[atguigu@hadoop102 hive]$ nohup /opt/module/hive/bin/hive --service metastore & 
[atguigu@hadoop102 hive]$ nohup /opt/module/hive/bin/hive --service hiveserver2 &
注意:hive2.x 版本需要启动两个服务 metastore 和 hiveserver2,否则会报错 Exception in thread
"main"
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.RuntimeException:
Unable
to
instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
4)服务启动完毕后在启动 Hive
[atguigu@hadoop102 hive]$ /opt/module/hive/bin/hive
2.1.4 安装 Spark2.4.6
注意:Spark 版本至少 2.2.1 及以上
1)把 spark-2.4.6-bin-hadoop2.7.tgz 上传到/opt/software 目录,并解压到/opt/module
[atguigu@hadoop102 software]$ tar -zxvf spark-2.4.6-bin-hadoop2.7.tgz -C /opt/module/
2)修改名称/opt/module/spark-2.4.6-bin-hadoop2.7 名称为 spark
[atguigu@hadoop102 module]$ mv spark-2.4.3-bin-hadoop2.7/ spark
3)修改/opt/module/spark/conf/spark-defaults.conf.template 名称为 spark-defaults.conf
[atguigu@hadoop102 conf]$ mv spark-defaults.conf.template spark-defaults.conf
4)在 spark-default.conf 文件中配置 Spark 日志路径
[atguigu@hadoop102 conf]$ vim spark-defaults.conf 
#添加如下配置 
spark.eventLog.enabled true 
spark.eventLog.dir 
hdfs://hadoop102:9000/spark_directory
5)修改配置文件 slaves 名称
[atguigu@hadoop102 conf]$ mv slaves.template slaves
6)修改 slave 文件,添加 work 节点:
[atguigu@hadoop102 conf]$ vim slaves
hadoop102 
hadoop103 
hadoop104
7)修改/opt/module/spark/conf/spark-env.sh.template 名称为 spark-env.sh
[atguigu@hadoop102 conf]$ mv spark-env.sh.template spark-env.sh
8)在/opt/module/spark/conf/spark-env.sh 文件中配置 YARN 配置文件路径、配置历史服务器相关参数
[atguigu@hadoop102 conf]$ vim spark-env.sh 
#添加如下参数 YARN_CONF_DIR=/opt/module/hadoop-2.7.2/etc/hadoop 
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 
-Dspark.history.retainedApplications=30 
-Dspark.history.fs.logDirectory=hdfs://hadoop102:9000/spark_di rectory" 
SPARK_MASTER_HOST=hadoop102 
SPARK_MASTER_PORT=7077
9)在 hadoop 集群上提前创建 spark_directory 日志路径
[atguigu@hadoop102 spark]$ hadoop fs -mkdir /spark_directory
10)把 Hive 中/opt/module/hive/lib/datanucleus-*.jar 包拷贝到 Spark 的/opt/module/spark/jars路径
[atguigu@hadoop102 lib]$ cp /opt/module/hive/lib/datanucleus-*.jar /opt/module/spark/jars/
9)把 Hive 中/opt/module/hive/conf/hive-site.xml 包拷贝到 Spark 的/opt/module/spark/conf路径
[atguigu@hadoop102 conf]$ cp /opt/module/hive/conf/hive-site.xml /opt/module/spark/conf/
10)修改 hadoop 配置文件 yarn-site.xml,添加如下内容:
[atguigu@hadoop102 hadoop]$ vi yarn-site.xml
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值, 则直接将其杀掉,默认是 true --> 
<property> 
    <name>yarn.nodemanager.pmem-check-enabled</name>     
    <value>false</value> 
</property> 
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值, 则直接将其杀掉,默认是 true --> 
<property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value> 
</property>    
11)分发 spark & yarn-site.xml
[atguigu@hadoop102 conf]$ xsync /opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml 
[atguigu@hadoop102 conf]$ xsync /opt/module/spark
10)测试环境
 
[atguigu@hadoop102 spark]$ bin/spark-shell 
scala>spark.sql("show databases").show
2.1.5 安装 Livy0.3
1)上传 livy-server-0.3.0.zip 到 hadoop102 的/opt/software 目录下,并解压到/opt/module
[atguigu@hadoop102 software]$ unzip livy-server-0.3.0.zip -d /opt/module/
2)修改/opt/module/livy-server-0.3.0 文件名称为 livy
[atguigu@hadoop102 module]$ mv livy-server-0.3.0/ livy
3)修改/opt/module/livy/conf/livy-env.sh 文件,添加 livy 环境相关参数
 
export HADOOP_CONF_DIR=/opt/module/hadoop-2.7.2/etc/hadoop/ 
export SPARK_HOME=/opt/module/spark/
3)修改/opt/module/livy/conf/livy.conf 文件,配置 livy 与 spark 相关参数
livy.server.host = hadoop102 
livy.spark.master =yarn 
livy.spark.deployMode = client 
livy.repl.enableHiveContext = true 
livy.server.port = 8998
4)配置需要的环境变量
[atguigu@hadoop102 conf]$ sudo vim /etc/profile 
#SPARK_HOME 
export SPARK_HOME=/opt/module/spark 
export PATH=$PATH:$SPARK_HOME/bin
[atguigu@hadoop102 conf]$ source /etc/profile
5)在/opt/module/livy/路径上,启动 livy 服务
 
[atguigu@hadoop102 livy]$ bin/livy-server start
2.1.6 初始化 MySQL 数据库
1)上传 Init_quartz_mysql_innodb.sql 到 hadoop102 的/opt/software 目录
2)使用 mysql 创建 quartz 库,执行脚本初始化表信息
[atguigu@hadoop102 ~]$ mysql -uroot -p123456 
mysql> create database quartz; 
mysql> use quartz; 
mysql> source /opt/software/Init_quartz_mysql_innodb.sql 
mysql> show tables;
2.1.6 初始化 MySQL 数据库
1)上传 Init_quartz_mysql_innodb.sql 到 hadoop102 的/opt/software 目录
2)使用 mysql 创建 quartz 库,执行脚本初始化表信息
[atguigu@hadoop102 ~]$ mysql -uroot -p123456 
mysql> create database quartz; 
mysql> use quartz; 
mysql> source /opt/software/Init_quartz_mysql_innodb.sql 
mysql> show tables;
2.2 编译 Griffin(不选择)
2.2.1 安装 Maven
1)Maven 下载:https://maven.apache.org/download.cgi
2)把 apache-maven-3.6.1-bin.tar.gz 上传到 linux 的/opt/software 目录下
3)解压 apache-maven-3.6.1-bin.tar.gz 到/opt/module/目录下面
[atguigu@hadoop102 software]$ tar -zxvf apache-maven-3.6.1-bin.tar.gz -C /opt/module/
4)修改 apache-maven-3.6.1 的名称为 maven
[atguigu@hadoop102 module]$ mv apache-maven-3.6.1/ maven
5)添加环境变量到/etc/profile 中
 
[atguigu@hadoop102 module]$ sudo vim /etc/profile #MAVEN_HOME
export MAVEN_HOME=/opt/module/maven 
export PATH=$PATH:$MAVEN_HOME/bin
6)测试安装结果
[atguigu@hadoop102 module]$ source /etc/profile 
[atguigu@hadoop102 module]$ mvn -v
7)修改 setting.xml,指定为阿里云
[atguigu@hadoop102 maven]$ cd conf
[atguigu@hadoop102 maven]$ vim settings.xml
<!-- 添加阿里云镜像--> 
<mirror> 
     <id>nexus-aliyun</id> 
     <mirrorOf>central</mirrorOf> 
      <name>Nexus aliyun</name> 
      <url>http://maven.aliyun.com/nexus/content/groups/public</ur l>
</mirror> 
<mirror> 
      <id>UK</id>
      <name>UK Central</name> 
       <url>http://uk.maven.org/maven2</url> 
       <mirrorOf>central</mirrorOf>
</mirror> 
<mirror> 
        <id>repo1</id> 
        <mirrorOf>central</mirrorOf> 
        <name>Human Readable Name for this Mirror.</name> 
        <url>http://repo1.maven.org/maven2/</url> 
</mirror> 
<mirror> 
        <id>repo2</id> 
        <mirrorOf>central</mirrorOf> 
        <name>Human Readable Name for this Mirror.</name> 
        <url>http://repo2.maven.org/maven2/</url> 
</mirror>
8)在/home/atguigu 目录下创建.m2 文件夹
[atguigu@hadoop102 ~]$ mkdir .m2
2.2.2 修改配置文件:
1)上传 griffin-master.zip 到 hadoop102 的/opt/software 目录,并解压 tar.gz 包到/opt/module
[atguigu@hadoop102 software]$ unzip griffin-master.zip -d /opt/module/
2)修改/opt/module/griffin-master/ui/pom.xml 文件,添加 node 和 npm 源。
[atguigu@hadoop102 ui]$ vim pom.xml 
<!-- It will install nodejs and npm --> 
<execution> 
          <id>install node and npm</id> 
          <goals>
                  <goal>install-node-and-npm</goal> 
           </goals> 
<configuration> 
           <nodeVersion>${node.version}</nodeVersion> 
            <npmVersion>${npm.version}</npmVersion> 
            <nodeDownloadRoot>http://nodejs.org/dist/</nodeDownloadRoot> 
<npmDownloadRoot>http://registry.npmjs.org/npm/-/</npmDownloadRoot> </configuration> </execution>
2)修改/opt/module/griffin-master/service/pom.xml 文件,注释掉 org.postgresql,添加 mysql依赖。
[atguigu@hadoop102 service]$ vim pom.xml 
<!-- 
<dependency> 
       <groupId>org.postgresql</groupId> 
      <artifactId>postgresql</artifactId> 
        <version>${postgresql.version}</version> 
</dependency>
 --> 
<dependency>
        <groupId>mysql</groupId> 
         <artifactId>mysql-connector-java</artifactId> 
</dependency> 注意:版本号删除掉
3)修改/opt/module/griffin-master/service/src/main/resources/application.properties 文件
[atguigu@hadoop102 service]$ vim /opt/module/griffin-master/service/src/main/resources/applicat ion.properties 
# Apache Griffin 应用名称 
spring.application.name=griffin_service 
# MySQL 数据库配置信息 spring.datasource.url=jdbc:mysql://hadoop102:3306/quartz?autoR 
econnect=true&useSSL=false 
spring.datasource.username=root
spring.datasource.password=000000 
spring.jpa.generate-ddl=true 
spring.datasource.driver-class-name=com.mysql.jdbc.Driver spring.jpa.show-sql=true 
# Hive metastore 配置信息 hive.metastore.uris=thrift://hadoop102:9083
hive.metastore.dbname=default
hive.hmshandler.retry.attempts=15
hive.hmshandler.retry.interval=2000ms 
# Hive cache time
cache.evict.hive.fixedRate.in.milliseconds=900000 
# Kafka schema registry 按需配置 kafka.schema.registry.url=http://hadoop102:8081 
# Update job instance state at regular intervals jobInstance.fixedDelay.in.milliseconds=60000 
# Expired time of job instance which is 7 days that is 604800000 milliseconds.Time unit only supports milliseconds
jobInstance.expired.milliseconds=604800000 
# schedule predicate job every 5 minutes and repeat 12 times at most 
#interval time unit s:second m:minute h:hour d:day,only support these four units 
predicate.job.interval=5m 
predicate.job.repeat.count=12 
# external properties directory location 
external.config.location= 
# external BATCH or STREAMING env external.env.location= 
# login strategy ("default" or "ldap") login.strategy=default 
# ldap 
ldap.url=ldap://hostname:port 
ldap.email=@example.com ldap.searchBase=DC=org,DC=example
ldap.searchPattern=(sAMAccountName={0}) 
# hdfs default name 
fs.defaultFS= 
# elasticsearch 
elasticsearch.host=hadoop102 
elasticsearch.port=9200 
elasticsearch.scheme=http
# elasticsearch.user = user
# elasticsearch.password = password
# livy
livy.uri=http://hadoop102:8998/batches
# yarn url
yarn.uri=http://hadoop103:8088
# griffin event listener
internal.event.listeners=GriffinJobEventHook
 
4)修改/opt/module/griffin-master/service/src/main/resources/sparkProperties.json 文件
[atguigu@hadoop102
service]$
vim
/opt/module/griffin-master/service/src/main/resources/sparkPro
perties.json
{
"file": "hdfs://hadoop102:9000/griffin/griffin-measure.jar",
"className": "org.apache.griffin.measure.Application",
"name": "griffin",
"queue": "default",
"numExecutors": 2,
"executorCores": 1,
"driverMemory": "1g",
"executorMemory": "1g",
"conf": {
"spark.yarn.dist.files":
"hdfs://hadoop102:9000/home/spark_conf/hive-site.xml"
},
"files": [
]
}
5)修改/opt/module/griffin-master/service/src/main/resources/env/env_batch.json 文件
 
[atguigu@hadoop102 service]$ vim /opt/module/griffin-master/service/src/main/resources/env/env_ batch.json 
{ 
"spark": { "log.level": "INFO" },
"sinks": [ 
      { "type": "CONSOLE",
        "config": { "max.log.lines": 10 } },
      { "type": "HDFS", "config": { "path":"hdfs://hadoop102:9000/griffin/persist", "max.persist.lines": 10000, "max.lines.per.file": 10000 } },
{ "type": "ELASTICSEARCH", "config": {         "method": "post", "api": "http://hadoop102:9200/griffin/accuracy", "connection.timeout": "1m", "retry": 10 } } ],
"griffin.checkpoint": [] }
6)修改/opt/module/griffin-master/service/src/main/resources/env/env_streaming.json 文件
[atguigu@hadoop102 service]$ vim /opt/module/griffin-master/service/src/main/resources/env/env_ streaming.json

{ 
"spark": { "log.level": "WARN", "checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}", "init.clear": true, "batch.interval": "1m", "process.interval": "5m", "config": { "spark.default.parallelism": 4, "spark.task.maxFailures": 5, "spark.streaming.kafkaMaxRatePerPartition": 1000, "spark.streaming.concurrentJobs": 4, "spark.yarn.maxAppAttempts": 5, "spark.yarn.am.attemptFailuresValidityInterval": "1h", "spark.yarn.max.executor.failures": 120, "spark.yarn.executor.failuresValidityInterval": "1h", "spark.hadoop.fs.hdfs.impl.disable.cache": true } },
"sinks": [ 
    { "type": "CONSOLE", "config": { "max.log.lines": 100 } },
    { "type": "HDFS", "config": { "path": "hdfs://hadoop102:9000/griffin/persist", "max.persist.lines": 10000, "max.lines.per.file": 10000 } },
    { "type": "ELASTICSEARCH", "config": { "method": "post", "api": "http://hadoop102:9200/griffin/accuracy" } } ], "griffin.checkpoint": [{ "type": "zk", "config": { "hosts": "zk:2181", "namespace": "griffin/infocache", "lock.path": "lock", "mode": "persist", "init.clear": true, "close.clear": false } } ] }
7)修改/opt/module/griffin-master/service/src/main/resources/quartz.properties 文件
[atguigu@hadoop102 service]$ vim /opt/module/griffin-master/service/src/main/resources/quartz.p roperties 
org.quartz.scheduler.instanceName=spring-boot-quartz 
org.quartz.scheduler.instanceId=AUTO 
org.quartz.threadPool.threadCount=5 
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStor eTX 
# If you use postgresql as your database,set this property value to org.quartz.impl.jdbcjobstore.PostgreSQLDelegate 
# If you use mysql as your database,set this property value to org.quartz.impl.jdbcjobstore.StdJDBCDelegate 
# If you use h2 as your database, it's ok to set this property value to StdJDBCDelegate, PostgreSQLDelegate or others org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjo bstore.StdJDBCDelegate 
org.quartz.jobStore.useProperties=true
org.quartz.jobStore.misfireThreshold=60000 org.quartz.jobStore.tablePrefix=QRTZ_ org.quartz.jobStore.isClustered=true org.quartz.jobStore.clusterCheckinInterval=20000
2.2.3 执行编译
1)在/opt/module/griffin-master 路径执行 maven 命令,开始编译 Griffin 源码
[atguigu@hadoop102 griffin-master]$ mvn -Dmaven.test.skip=true clean install
2)见到如下页面,表示编译成功。(大约需要 1 个小时左右)
 
 
 
 

本文来自博客园,作者:秋华,转载请注明原文链接:https://www.cnblogs.com/qiu-hua/p/13747505.html

原文地址:https://www.cnblogs.com/qiu-hua/p/13747505.html