Hadoop 权威指南学习2 (Sqoop)

Apache sqoop is an open source tool that allow users to extract data from structured data store into Hadoop or back.

self : $SQOOP_HOME/bin/sqoop

standard: sqoop (默认目录是／usr/bin/sqoop)

sqoop help ## the list of available tools

sqoop help import #provided with a tool, will get its usage

sqoop import # run the tool

sqoop-toolname #alternatate way to run the tool

Sqoop has an entension framework that makes it possible to import data from and export to ,any external storage system that has bulk

data transfer capabilities.

Sqoop Connector is a modular component to use it to enable extracting work.

% sqoop import --connect jdbc:mysql://localhost/hadoopDB --table myTest -m 1

Sqoop tool will run a MapReduce job that connects to the MySQL db and reads the table.
By default, it will call four map tasks. But we could specify just one map task as (-m 1).
By default, it will generate comma-delimited text files.

Besides importing to HDFS successfully, Sqoop also provides you with a generated Java source file in current local directory.

%sqoop codegen  --connect jdbc:mysql://localhost/hadoopDB  --table myTest  --class-name myNeed

A better importing process should use a splitting column to divide table querying aross multiple nodes.