Flume

Step1.download tar packages from apache website.

Step2.extract the file and set the environment.

Vim /etc/profile

Export FLUME_HOME=/home/hadoop/flume1.4

Export PATH=$PATH:$FLUME_HOME/bin

Source /etc/profile

Step3:

Start a agent to put dir to avro.

flume-ng agent -n agent1 -f confs/avrotest.conf

flume-ng avro-client -H namenode -p 55555 -F /home/hadoop/data/xml/*.*

flume-ng avro-client -H namenode -p 55555 -F /home/hadoop/data/xml/*.zip

传输的过程,先把数据传送到avro形式,然后再使用avro source - hdfs sink.

先开启hdfs conf的flume,然后再开启avro source的conf.

flume-ng agent -n agent2 -f $FLUME_HOME/confs/avrosink.conf

flume-ng agent -n agent3 -f $FLUME_HOME/confs/hdfssink.conf

flume-ng agent -n agent1 -f $FLUME_HOME/confs/avrotest.conf

./flume-ng agent -n agent1 -f /home/yaxiaohu/flumeconf/flume-dest.conf

./flume-ng agent -n agent -f /home/yaxiaohu/flumeconf/flume-source.conf

./flume-ng agent -n agent-1 -f /home/yaxiaohu/flumeconf/evantest.conf

Actually it might be tricky to use the directory spooling source to read a
compressed archive. It's possible, but you would definitely need to write
your own deserializer.

Flume is an event-oriented streaming system, it's not really optimized to
be a plain file transfer mechanism like FTP.

Looking for a job working at Home about MSBI