Flume practices and sqoop hive 2 oracle

#receive the file

flume-ng agent --conf conf --conf-file conf1.conf --name a1

flume-ng agent --conf conf --conf-file conf2.conf --name hdfs-agent

flume-ng agent --conf conf --conf-file conf3.conf --name file-agent

   

Conf1.conf

a1.sources = tail

a1.channels = c1

a1.sinks = avro-forward-sink

   

a1.channels.c1.type = file

#a1.channels.c1.capacity = 1000

#a1.channels.c1.transactionCapacity = 100

   

a1.sources.tail.type = spooldir

a1.sources.tail.spoolDir = /path/to/folder/

   

a1.sinks.avro-forward-sink.type = avro

a1.sinks.avro-forward-sink.hostname =hostname/ip

a1.sinks.avro-forward-sink.port = 12345

   

# Bind the source and sink to the channel

a1.sources.tail.channels = c1

a1.sinks.avro-forward-sink.channel = c1

   

Conf2.conf

hdfs-agent.sources= avro-collect

hdfs-agent.sinks = hdfs-write

hdfs-agent.channels=ch1

hdfs-agent.channels.ch1.type = file

#hdfs-agent.channels.ch1.capacity = 1000

#hdfs-agent.channels.ch1.transactionCapacity = 100

   

hdfs-agent.sources.avro-collect.type = avro

hdfs-agent.sources.avro-collect.bind = 10.59.123.69

hdfs-agent.sources.avro-collect.port = 12345

   

hdfs-agent.sinks.hdfs-write.type = hdfs

hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://namenode/user/usera/test/

hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text

   

# Bind the source and sink to the channel

hdfs-agent.sources.avro-collect.channels = ch1

hdfs-agent.sinks.hdfs-write.channel = ch1

   

Start the conf2.conf first, then start conf1.conf agent.

Because the avro source should start first then avro sink can connect to it.

#when use memory change, issue is :

org.apache.flume.ChannelException: Unable to put batch on required channel:

org.apache.flume.channel.MemoryChannel{name: ch1}

#change to filechannel

ok...

   

#batched change the filename, remove .completed

for f in *;

do

mv $f ${f%.COMPLETED*};

done;

   

Sqoop load data from hive to oracle:

sqoop export -D oraoop.disabled=true

--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=hostname)(port=port))(connect_data=(service_name=sname)))"

--username user_USER

--password pwd

--table EVAN_TEST

--fields-terminated-by '01'

-m 1

--export-dir /path/to/folder/

   

####table name should in upper case. Or else, report exception not found columns information.

原文地址:https://www.cnblogs.com/huaxiaoyao/p/4550083.html