Hadoop2.5.2源码编译及导入Eclipse

前言:由于官网提供的64位hadoop是没有编译的,所以当我们用到64位的hadoop时,需要在自己的64位linux系统上编译hadoop源码。另外,要想在eclipse里查看hadoop源码,修改源码等,也要进行编译。现在把两者编译的方法列在下面,其中准备阶段是共同的,都必须做。

环境:Ubuntu12.0464位系统,hadoop-2.5.2-src.tar.gz,JDK1.7(1.8的JDK不行)


准备:

1、安装G++,CMake和zlib1g-dev,其中G++是必安装的,而CMake和zlib1g-dev(实际上是Zlibdevel软件)则是在编译native库时需要的,我们编译源码导入Eclipse不需要编译native库,或者不编译native库来编译源码,也不需要这两个软件,所以CMake和zlib1g-dev可以不安装。但如果要编译native库,你得安装。

       Native库:hadoop由java开发,但有一些需求和操作并不适合使用java,所以hadoop就自己编写了一些库函数以供使用,因此出现了native库。在hadoop的配置文件里可以指定是否使用native库。通过本地库,Hadoop可以更加高效地执行某一些操作。

安装命令:sudo apt-get install g++ cmake zlib1g-dev

安装forrest

Apache forrest.

http://forrest.apache.org/mirrors.cgi

安装并且设置FORREST_HOME 到profile里面。


       2、安装Maven。在Apache官网上下载bin文件,解压到相应目录。然后配置/etc/profile即可,环境变量名为M2_HOME,如下:(配置完后注意source /etc/profile)
#Maven
export M2_HOME=/usr/local/apache-maven-3.3.1
export PATH=$PATH:$M2_HOME/bin
由于编译时要下载很多东西,我们可以编辑mavenconfsettings.xml文件,将镜像站点改为中国开源镜像点,如下:
<mirrors>
    <mirror>

       <id>nexus-osc</id>

       <mirrorOf>*</mirrorOf>

        <name>Nexusosc</name>

       <url>http://maven.oschina.net/content/groups/public/</url>

    </mirror>
</mirrors>
和(注意不要把这个文件改错了)
<profiles>
<profile>
    <id>jdk-1.7</id>
        <activation>

               <jdk>1.7</jdk>

        </activation>
        <repositories>

               <repository>

                       <id>nexus</id>

                       <name>local private nexus</name>

                        <url>http://maven.oschina.net/content/groups/public/</url>

                       <releases>

                               <enabled>true</enabled>

                       </releases>

                       <snapshots>

                               <enabled>false</enabled>

                       </snapshots>

               </repository>

        </repositories>

       <pluginRepositories>

               <pluginRepository>

                       <id>nexus</id>

                       <name>local private nexus</name>

                       <url>http://maven.oschina.net/content/groups/public/</url>

                       <releases>

                               <enabled>true</enabled>

                       </releases>

                        <snapshots>

                               <enabled>false</enabled>

                       </snapshots>

               </pluginRepository>

       </pluginRepositories>

    </profile>
</profiles>
然后将这个mavenconfsettings.xml文件拷贝到~/.m2文件夹下,保证用户每次使用maven都可以用到这个配置(.m2是个隐藏文件夹)
       
       3、安装protobuf2.5(必需是这个版本或以上):
       在网上下载protobuf2.5.0,然后解压(解压后文件夹名为protobuf-2.5.0),执行以下命令:
       cd protobuf-2.5.0
       ./configure --prefix=/usr/local(此命令的意思是把软件安装在/usr/local目录下)
       make
       sudo make install
       如果如上安装在/usr/local下,那么protoc的lib将会安装在/usr/local/lib下,头文件信息安装在/usr/local/include/google/protobuf下。

使用命令protoc--version查看是否安装成功,如果出现libprotoc 2.5.0则表示安装成功

但如果出现:protoc:error while loading shared libraries: libprotoc.so.8: cannot open shared objectfile: No such file or directory则表示安装失败。

       失败原因是Ubuntu没有将/usr/local/lib库包含在环境变量path里面,我们只需修改/etc/profile,加上
       export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib 即可,改完注意source/etc/profile
        以上三步都是必做的,少了一步或其中软件没安好,下面编译都将失败。
       
       4、安装findbugs:
       为了编译本地库native而需要安装findbugs,如果不编译native库,请忽略。编译native库时,如果不安装则报错:

[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (site) on projecthadoop-common: An Ant BuildException has occured: stylesheet/home/qjj/hadoop/hadoop-2.5.2-src/hadoop-common-project/hadoop-common/${env.FINDBUGS_HOME}/src/xsl/default.xsldoesn't exist.)

       安装方法:解压(unzip)从http://sourceforge.jp/projects/sfnet_findbugs/下载到的findbugs(findbugs-3.0.1.zip),将文件夹放到自己的安装目录下,然后配置环境变量即可,配置/etc/profile:
       #FindBugs
       export FINDBUGS_HOME=/usr/local/findbugs-3.0.1
       export PATH=$PATH:$FINDBUGS_HOME/bin

之后source/etc/profile即可,使用findbugs -version查看是否安装成功,出现”3.0.1”表示安装成功。


       5、安装openssl devel:同样,需要编译native库的安装,否则不安装。编译native库时,如果不安装则会报错:

[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on projecthadoop-pipes: An Ant BuildException has occured: exec returned: 1 -> [Help1]

       安装方法:从http://www.openssl.org/source/上下载源码,解压,进入根目录执行以下命令:
       ./config --prefix=/usr/local(配置安装目录为/usr/local)
       make
       sudo make install

       6、配置DNS:

修改: vim/etc/resolv.conf,在后面加入,可以加快解析DNS

       nameserver 8.8.8.8
       nameserver 8.8.4.4

       这里将上面所有需要的软件提供给大家:链接:http://pan.baidu.com/s/1bn8P6wv 密码:p5a3
       开始:(1)编译成eclipse工程文件:

       1、  解压源码2.5.2,进入hadoop-2.5.2-src/hadoop-maven-plugins,执行命令:
       mvn install(这里大概需要8分钟)

       2、  上面成功后,在源码根目录下执行:mvn eclipse:eclipse –DskipTests(意思是生成eclipse工程文件,大概32分钟,参数-DskipTests是指忽略工程里的测试test文件,因为测试文件往往会有错,如果也编译测试文件,那么很可能会报错,我们这里只需要编译源码即可。有时候会报参数-DskipTests不识别,我的解决办法是重启电脑,然后继续尝试)

       3、  待上面成功后,工程文件就编译成功。这时就可以导入Eclipse了。在Eclipse里选择import->General->Existing Projects into Workspace->选择刚刚编译的源码根目录->finish。截图如下:
         

       4、 这时在hadoop的工程目录下面就会出现很多个工程,其实都是hadoop的各个模块,这时源码便已经导入了。不过不要高兴太早…等Eclipsebulid path之后,这时你会发现Eclipse下面报了很多个错误,如下(真的好多….)
 

       不过不要怕,详细观察就会发现,问题也就以下三个:
       1、  org.apache.hadoop.ipc.protobuf下面的几个类没有找到,可以在GrepCode官网(grepcode.com,非常赞的一个网站)上去搜索,找到源码并下下来,然后放到org.apache.hadoop.ipc.protobuf的包里去,项目中没有这个包,我们新建一个这样的包,然后把代码拷进去
    缺少的两个类:链接:http://pan.baidu.com/s/1sjFiBmd 密码:1jrb

       2、  AvroRecord类没有找到,从出错处可以看到,这个类应该位于org.apache.hadoop.io.serializer.avro下,我们再去GrepCode官网搜这个类的源码,然后放到这个包下即可,如图:
      
       AvroRecord类 链接:http://pan.baidu.com/s/1hqnaAhA 密码:4mj7

       3、Project'hadoop-streaming' is missing required source folder...,这个是hadoop-streaming里面的build path有问题,解决办法就是remove掉引用就好,具体右键右边出错项目->properties->左边Java Build Path->Source->选定错误项->右边Remove,如图:
      

        以上的错误解决方案来自网友,吃水不忘挖井人,原链接:错误解决:www.bigdatas.cn/thread-62995-1-1.html

       至此,所有的问题都解决了,附上一张Eclipse里的工程图:
    

       编译成功的hadoop源码工程文件,可以直接导入Eclipse:链接:http://pan.baidu.com/s/1qW8yY52 密码:w密码:wied
       
       ---------------------------------------------分割线-----------------------------------------


      (2)编译Hadoop源码,生成安装文件
       相比上面,步骤就简单多了。解压源码2.5.2,进入hadoop-2.5.2-src,执行:

       mvn package -Pdist -DskipTests –Dtar(package是打包生成jar的命令)


       然后就是漫长的等待,直到它编译成功!成功后,生成的bin文件的路径在:hadoop-2.5.2-src/hadoop-dist/target/下。这个是不编译native库的命令,编译完成后,在根目录下不会生成lib文件夹。

使用命令:mvnpackage -Pdist,native,docs -DskipTests –Dtar 可以编译native库,并生成文档。只不过这个过程相对漫长些。

(hadoop编译成功的文件(未编译native):链接:http://pan.baidu.com/s/1qWPuaAC 密码:57fr)

       问题:关于上面出现的问题,在给出的链接里分析的很详细,还有其他一些问题在里面也有,我这里主要说下我碰到的编译过程中的问题。由于编译不是很熟,所以走了些弯路,一个显著的问题如下:

[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-surefire-plugin:2.16:test (default-test) onproject hadoop-common: There are test failures.

[ERROR] Please refer to /home/qjj/hadoop/hadoop-2.5.2-src/hadoop-common-project/hadoop-common/target/surefire-reportsfor the individual test results.

       这里提到的是test编译出错,主要是因为我没加 –DskipTests 参数,这个参数意思就是忽略test内容…

       补充:hadoop的编译和导入eclipse其实在官方文档里写的很清楚(源码根目录下的build.txt),只不过我们一直走弯路,喜欢到网上去搜。其实最正确的方法就在眼前啊….

官方编译安装文档:
Build instructions for Hadoop

----------------------------------------------------------------------------------
Requirements:

* Unix System
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code)
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes )

* Internet connection for first build (to fetch all Maven and Hadoopdependencies)


----------------------------------------------------------------------------------
Maven main modules:

  hadoop                            (Main Hadoopproject)

         - hadoop-project           (Parent POM for all Hadoop Mavenmodules.             )

                                   (All plugins & dependencies versions are defined here.)

         -hadoop-project-dist      (Parent POM formodules that generate distributions.)

         -hadoop-annotations       (Generates theHadoop doclet used to generated the Javadocs)

         -hadoop-assemblies        (Mavenassemblies used by the different modules)

         -hadoop-common-project    (Hadoop Common)

         -hadoop-hdfs-project      (Hadoop HDFS)

         -hadoop-mapreduce-project (Hadoop MapReduce)

         - hadoop-tools             (Hadoop tools like Streaming,Distcp, etc.)

         - hadoop-dist              (Hadoop distribution assembler)

----------------------------------------------------------------------------------
Where to run Maven from?

  It can be run from anymodule. The only catch is that if not run from utrunk

  all modules that are notpart of the build run must be installed in the local

  Maven cache or available ina Maven repository.


----------------------------------------------------------------------------------
Maven build goals:

* Clean                     : mvn clean
* Compile                   : mvn compile [-Pnative]
* Run tests                 : mvn test [-Pnative]
* Create JAR                : mvn package
* Run findbugs              : mvn compile findbugs:findbugs
* Run checkstyle            : mvn compile checkstyle:checkstyle

* Install JAR in M2cache   : mvn install

* Deploy JAR to Mavenrepo  : mvn deploy

* Run clover                : mvn test -Pclover[-DcloverLicenseLocation=${user.name}/.clover.license]

* Run Rat                   : mvn apache-rat:check
* Build javadocs            : mvn javadoc:javadoc

* Build distribution        : mvn package[-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]

* Change Hadoop version     : mvn versions:set -DnewVersion=NEWVERSION

Build options:

  * Use -Pnative tocompile/bundle native code

  * Use -Pdocs to generate& bundle the documentation in the distribution (using -Pdist)

  * Use -Psrc to create aproject source TAR.GZ

  * Use -Dtar to create a TARwith the distribution (using -Pdist)


Snappy build options:

   Snappy is a compressionlibrary that can be utilized by the native code.

   It is currently an optionalcomponent, meaning that Hadoop can be built with

   or without this dependency.

  * Use -Drequire.snappy tofail the build if libsnappy.so is not found.

    If this option is notspecified and the snappy library is missing,

    we silently build aversion of libhadoop.so that cannot make use of snappy.

    This option is recommendedif you plan on making use of snappy and want

    to get more repeatablebuilds.


  * Use -Dsnappy.prefix tospecify a nonstandard location for the libsnappy

    header files and libraryfiles. You do not need this option if you have

    installed snappy using apackage manager.

  * Use -Dsnappy.lib tospecify a nonstandard location for the libsnappy library

    files.  Similarly to snappy.prefix, you do not needthis option if you have

    installed snappy using apackage manager.

  * Use -Dbundle.snappy tocopy the contents of the snappy.lib directory into

    the final tar file. Thisoption requires that -Dsnappy.lib is also given,

    and it ignores the-Dsnappy.prefix option.


   Tests options:

  * Use -DskipTests to skiptests when running the following Maven goals:

    'package',  'install', 'deploy' or 'verify'

  *-Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,....

  *-Dtest.exclude=<TESTCLASSNAME>

  * -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java

----------------------------------------------------------------------------------
Building components separately

If you are building a submodule directory, all the hadoopdependencies this

submodule has will be resolved as all other 3rd party dependencies.This is,

from the Maven cache or from a Maven repository (if not available inthe cache

or the SNAPSHOT 'timed out').

An alternative is to run 'mvn install -DskipTests' from Hadoop sourcetop

level once; and then work from the submodule. Keep in mind thatSNAPSHOTs

time out after a while, using the Maven '-nsu' will stop Maven fromtrying

to update SNAPSHOTs from external repos.

----------------------------------------------------------------------------------
Protocol Buffer compiler

The version of Protocol Buffer compiler, protoc, must match theversion of the

protobuf JAR.

If you have multiple versions of protoc in your system, you can setin your

build shell the HADOOP_PROTOC_PATH environment variable to point tothe one you

want to use for the Hadoop build. If you don't define thisenvironment variable,

protoc is looked up in the PATH.
----------------------------------------------------------------------------------
Importing projects to eclipse

When you import the project to eclipse, install hadoop-maven-pluginsat first.


  $ cd hadoop-maven-plugins
  $ mvn install

Then, generate eclipse project files.

  $ mvn eclipse:eclipse-DskipTests


At last, import to eclipse by specifying the root directory of theproject via

[File] > [Import] > [Existing Projects into Workspace].

----------------------------------------------------------------------------------
Building distributions:

Create binary distribution without native code and withoutdocumentation:


  $ mvn package -Pdist-DskipTests -Dtar


Create binary distribution with native code and with documentation:

  $ mvn package-Pdist,native,docs -DskipTests -Dtar


Create source distribution:

  $ mvn package -Psrc-DskipTests


Create source and binary distributions with native code anddocumentation:


  $ mvn package-Pdist,native,docs,src -DskipTests -Dtar


Create a local staging version of the website (in /tmp/hadoop-site)

  $ mvn clean site; mvnsite:stage -DstagingDirectory=/tmp/hadoop-site


----------------------------------------------------------------------------------

Handling out of memory errors in builds

----------------------------------------------------------------------------------

If the build process fails with an out of memory error, you shouldbe able to fix

it by increasing the memory used by maven -which can be done via theenvironment

variable MAVEN_OPTS.

Here is an example setting to allocate between 256 and 512 MB ofheap space to

Maven

export MAVEN_OPTS="-Xms256m -Xmx512m"
 
 
原文地址:https://www.cnblogs.com/ilinuxer/p/5040493.html