大数据(sqoop)

1. 什么是sqoop
   sqoop是apache组织开源的一个hadoop的辅助工具
2. 作用:
   hdfs/hive  -----  rdb

搭建Sqoop的开发环境

CDH版 Hadoop  和  Hive
1. 环境搭建
   mac  防火墙 ip  主机名 映射  selinux  jdk
2. 安装hadoop
   1. 解压缩
   2. hadoop-evn.sh
      core-site.xml
       <property>
            <name>fs.default.name</name>
            <value>hdfs://sqoop:8020</value>
           </property>
       <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/install/hadoop-2.5.0-cdh5.3.6/data/tmp</value>
       </property>

      hdfs-site.xml
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
      <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
      </property>
      
      yarn-site.xml
        <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
          </property>
      mapred-site.xml
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
       </property>
      slave
      改为sqoop
   3. 加压缩 cdh5.3.6-snappy-lib-natirve.tar.gz
      lib/native/*  cp  /opt/install/hadoop-2.5.0-cdh5.3.6/lib/native
      命令: cp * /opt/install/hadoop-2.5.0-cdh5.3.6/lib/native/
   4. 格式化   bin/hdfs namenode -format
   5. 启动进程   
       sbin/hadoop-daemon.sh start namenode
    sbin/yarn-daemon.sh start resourcemanager
    sbin/yarn-daemon.sh start nodemanager
    sbin/hadoop-daemon.sh start datanode
    如下则启动成功
[root@sqoop hadoop-2.5.0-cdh5.3.6]# jps
1609 ResourceManager
2021 Jps
1858 NodeManager
1982 DataNode
1485 NameNode


 3. 安装hive
    1. 解压缩
    2. 修改配置文件
       hive-env.sh
           # Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/opt/install/hadoop-2.5.0-cdh5.3.6
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/install/hive-0.13.1-cdh5.3.6/conf
       hive-site.xml [metastore]
        <property>
        <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:mysql://hadoop5.lhcedu.com:3306/cdhmetastore?createDatabaseIfNotExist=true</value>          
        </property>

        <property>
          <name>javax.jdo.option.ConnectionDriverName</name>
          <value>com.mysql.jdbc.Driver</value>
        </property>

        <property>
          <name>javax.jdo.option.ConnectionUserName</name>
          <value>root</value>
        </property>

        <property>
          <name>javax.jdo.option.ConnectionPassword</name>
          <value>123456</value>
        </property>

        <property>
          <name>hive.cli.print.header</name>
          <value>true</value>
        </property>

        <property>
          <name>hive.cli.print.current.db</name>
          <value>true</value>
        </property>
       3. hdfs 创建2个目录
               	mp
               userhivewarehouse
       4. hive_home/lib 导入 mysql jar包
       5. 启动hive
          bin/hive
#sqoop安装
1. 解压缩Sqoop
2. 修改配置 sqoop_home/conf
 cp sqoop-env-template.sh sqoop-env.sh

   修改conf/sqoop-env.sh
   export HADOOP_COMMON_HOME=/opt/install/hadoop-2.5.0-cdh5.3.6
   export HADOOP_MAPRED_HOME=/opt/install/hadoop-2.5.0-cdh5.3.6
   export HIVE_HOME=/opt/cdh5/hive-0.13.1-cdh5.3.6
3. mysql-connect.jar copy sqoop_home/lib
4. 测试sqoop是否正常使用
   bin/sqoop list-databases -connect jdbc:mysql://sqoop:3306 -username root -password 1234456

命令换行使用

bin/sqoop list-databases 
--connect 
jdbc:mysql://sqoop:3306 
--username root 
--password 1234456 

import (mysql 导入数据 hdfs)

#mysql 创建数据库 创建表
create database sqoop

create table mysql_user(
id int primary key,
name varchar(12)
);
insert into mysql_user values (1,'lhc1');
insert into mysql_user values (2,'lhc2');
insert into mysql_user values (3,'lhc3');
insert into mysql_user values (4,'lhc4');
insert into mysql_user values (5,'lhc5');

基本导入

bin/sqoop import 
--connect 
jdbc:mysql://sqoop:3306/sqoop 
--username root 
--password 1234456 
--table mysql_user

#hdfs上传的默认位置 /user/root/mysql_user
#默认情况 5行数据 5split 5map
#没有reduce 
#数据默认是以,分割列   1,suns1

指定hdfs中的导入目录

bin/sqoop import 
--connect 
jdbc:mysql://sqoop:3306/sqoop 
--username root 
--password 1234456 
--table mysql_user 
--target-dir /sqoop

删除hdfs中已经存在的目录

bin/sqoop import 
--connect 
jdbc:mysql://sqoop:3306/sqoop 
--username root 
--password 1234456 
--table mysql_user 
--target-dir /sqoop 
--delete-target-dir 

修改默认的map数量

bin/sqoop import 
--connect 
jdbc:mysql://hadoop6:3306/sqoop 
--username root 
--password 123456 
--table mysql_user 
--target-dir /sqoop 
--delete-target-dir 
--num-mappers 1

修改默认的列分割符

bin/sqoop import 
--connect 
jdbc:mysql://hadoop6:3306/sqoop 
--username root 
--password 123456 
--table mysql_user 
--target-dir /sqoop 
--delete-target-dir 
--num-mappers 1 
--fields-terminated-by '	'

快速导入

bin/sqoop import 
--connect 
jdbc:mysql://hadoop6:3306/sqoop 
--username root 
--password 123456 
--table mysql_user 
--target-dir /sqoop 
--delete-target-dir 
--num-mappers 1 
--fields-terminated-by '	' 
--direct

#快速模式 sqoop必须和mysql安装在同一个机器上
#如果集群环境下 没有sqoop与mysql安装在一起,那么需要在所有节点 赋值mysql/bin目录

增加导入数据

# 增量导入
--check-column <column> Source column to check for incremental id
--last-value <value> Last imported value in the incremental 5
--incremental <import-type> Define an incremental import of type append
'append' or 'lastmodified'

#注意 不能引入 --delete-target-dir
bin/sqoop import
--connect
jdbc:mysql://hadoop6:3306/sqoop
--username root
--password 123456
--table mysql_user
--target-dir /sqoop
--num-mappers 1
--fields-terminated-by ' '
--direct
--check-column id
--last-value 5
--incremental append

export (hdfs导出数据到数据库)

先确定数据库有表
create table to_mysql(
id int primary key,
name varchar(12)
);

bin/sqoop export 
--connect 
jdbc:mysql://sqoop:3306/sqoop 
--username root 
--password 1234456 
--table to_mysql 
--export-dir /sqoop 
--num-mappers 1 
--input-fields-terminated-by ','

Hive Import

--hive-import 
--hive-database lhc129 
--hive-table t_user 

bin/sqoop import 
--connect 
jdbc:mysql://hadoop6:3306/sqoop 
--username root 
--password 123456 
--table mysql_user 
--delete-target-dir 
--hive-import 
--hive-database lhc_140 
--hive-table t_user 
--num-mappers 1 
--fields-terminated-by '	'

Hive Export

bin/sqoop export 
--connect 
jdbc:mysql://hadoop6:3306/sqoop 
--username root 
--password 123456 
--table hive_mysql 
--export-dir /user/hive/warehouse/lhc_140.db/t_user 
--num-mappers 1 
--input-fields-terminated-by '	'

脚本化的Sqoop

  存在一个独立的文件中

  

create table filetomysql(
 id int,
 name varchar(12)
);
1. 创建一个Sqoop文件 普通文件 sqoop.file
export 
--connect
jdbc:mysql://hadoop6:3306/sqoop
--username
root
--password
123456
--table
filetomysql
--export-dir
/sqoop
--num-mappers
1
--input-fields-terminated-by
'	'

2. 执行文件
bin/sqoop --options-file /root/sqoop.file 

定义Sqoop的作业

1. 创建作业
bin/sqoop job 
--create test_job1 
-- 
export 
--connect 
jdbc:mysql://hadoop6:3306/sqoop 
--username root 
--password 123456 
--table filetomysql 
--export-dir /sqoop 
--num-mappers 1 
--input-fields-terminated-by '	'

2. 使用作业
bin/sqoop job --exec test_job1

3. 问题 每一次输入密码 不利于自动化处理 所以定密码的存储文件

echo -n "123456"  >> /root/password

bin/sqoop job 
--create test_job2 
-- 
export 
--connect 
jdbc:mysql://hadoop6:3306/sqoop 
--username root 
--password-file file:///root/password 
--table filetomysql 
--export-dir /sqoop 
--num-mappers 1 
--input-fields-terminated-by '	'

bin/sqoop job --exec test_job2

定时任务

1. 安装crontab 
yum -y install vixie-cron

2. crontab -e 
编辑配置文件
minute   hour   day   month   week   command 

*/5 * * * * command
*/1 * * * * echo 'suns' >> /root/sunshuai
*/1 * * * * /opt/install/sqoop-1.4.5-cdh5.3.6/bin/sqoop job --exec test_job2

* 24 * * * /opt/install/sqoop-1.4.5-cdh5.3.6/bin/sqoop job --exec test_job2

3. 启动关闭服务
/sbin/service crond start
/sbin/service crond stop

ps -ef | grep cron
原文地址:https://www.cnblogs.com/lhc-hhh/p/10380760.html