安装Hadoop:
1,下载hadoop-0.20.203.0.tar.gz
用命令tar -zxf hadoop-0.20.203.0.tar.gz
此时会出现hadoop-0.20.203.0文件夹
2,vi /etc/profile或修改conf/hadoop-env.sh
在里面添加如下代码:
#This is Hadoop
#HADOOP_INSTALL is an environment variable that points to the Hadoop installation directory
export HADOOP_HOME=/usr/hadoop-0.20.203.0
export PATH=$PATH:$HADOOP_HOME/bin
(同时也需配置JAVA_HOME, 必须把JAVA_HOME配置到hadoop-env.sh中)
3,重启或执行source /etc/profile
配置Hadoop:
(1)单机模式:
测试1:
#mkdir input
#cd input
#echo "hello world" > test1.txt
#echo "hello hadoop" > test2.txt
#cd ..
#hadoop jar /usr/hadoop-0.20.203.0/hadoop-examples-0.20.203.0.jar wordcount input output
查看结果:
#cat output/*
测试2:
#mkdir input2
#cp /usr/hadoop-0.20.203.0/conf/*.xml input2
#hadoop jar /usr/hadoop-0.20.203.0/hadoop-examples-0.20.203.0.jar grep input2 output2 ‘dfs[a-z.]+’
查看结果:
#cat output2/*
(2)伪分布式模式:
免密码SSH设置:
额外知识:(1,/etc/ssh 2,当执行过ssh localhost//连本机,就会在自己的主文件夹中生成~/.ssh/known_hosts)
1, 生成密钥对:
#ssh-keygen -t rsa
//执行后,在~/.ssh文件夹中会生成两个文件id_rsa和id_rsa.pub
2, cp id_rsa.pub authorized_keys
此后执行#ssh localhost,可以实现用SSH连接不需要输入密码。
conf中的配置:
core-site.xml和hdfs-site.xml是站在HDFS角度上的配置文件;core-site.xml和mapred-site.xml是站在MapReduce角度上的配置文件。
core-site.xml文档内容:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml文档内容:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml文档内容:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
运行Hadoop:
格式化分布式系统:hadoop namenode –format
启动hadoop守护进程: start-all.sh
可用jps查看守护进程是否启动,Java Print Service(JPS)是一个旨在所有Java平台上实现打印功能的API
运行wordcount实例:
#hadoop jar /usr/hadoop-1.0.1/hadoop-examples-1.0.1.jar wordcount input output
#cat output/*
或 #hadoop fs –copyFromLocal input in
#hadoop jar /usr/hadoop-1.0.1/hadoop-examples-1.0.1.jar wordcount in out
#hadoop fs –cat out/*
停止守护进程:stop-all.sh