Spark单机编译(on CentOS 6)

注:1. 编译Spark之前,需要搭建Java和Scala环境,参见http://www.cnblogs.com/kevingu/p/4418779.html

     2. Spark之前使用sbt进行编译,现在建议使用maven并兼容sbt,但会逐步淘汰sbt编译方式。本文使用Maven工具编译Spark 1.2.0。

一、Maven工具搭建

(I)从http://maven.apache.org/download.cgi下载Maven二进制安装包apache-maven-3.2.5-bin.tar.gz,解压后放在/usr/maven目录下。

(II)添加环境变量

export M2_HOME=/usr/maven/apache-maven-3.2.5
export PATH=$PATH:$M2_HOME/bin
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

(III)编辑/usr/maven/apache-maven-3.2.5/conf/settings.xml配置文件(主要为<proxies><mirrors><profiles>标签更新源使用国内http://maven.oschina.net/

<?xml version="1.0" encoding="UTF-8"?>

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor 
    license agreements. See the NOTICE file distributed with this work for additional 
    information regarding copyright ownership. The ASF licenses this file to 
    you under the Apache License, Version 2.0 (the "License"); you may not use 
    this file except in compliance with the License. You may obtain a copy of 
    the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required 
    by applicable law or agreed to in writing, software distributed under the 
    License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 
    OF ANY KIND, either express or implied. See the License for the specific 
    language governing permissions and limitations under the License. -->

<!-- | This is the configuration file for Maven. It can be specified at two 
    levels: | | 1. User Level. This settings.xml file provides configuration 
    for a single user, | and is normally provided in ${user.home}/.m2/settings.xml. 
    | | NOTE: This location can be overridden with the CLI option: | | -s /path/to/user/settings.xml 
    | | 2. Global Level. This settings.xml file provides configuration for all 
    Maven | users on a machine (assuming they're all using the same Maven | installation). 
    It's normally provided in | ${maven.home}/conf/settings.xml. | | NOTE: This 
    location can be overridden with the CLI option: | | -gs /path/to/global/settings.xml 
    | | The sections in this sample file are intended to give you a running start 
    at | getting the most out of your Maven installation. Where appropriate, 
    the default | values (values used when the setting is not specified) are 
    provided. | | -->
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
    <!-- localRepository | The path to the local repository maven will use to 
        store artifacts. | | Default: ${user.home}/.m2/repository 
    -->
        <!--localRepository>F:/Maven/repo/m2/</localRepository-->

    <!-- interactiveMode | This will determine whether maven prompts you when 
        it needs input. If set to false, | maven will use a sensible default value, 
        perhaps based on some other setting, for | the parameter in question. | | 
        Default: true <interactiveMode>true</interactiveMode> -->

    <!-- offline | Determines whether maven should attempt to connect to the 
        network when executing a build. | This will have an effect on artifact downloads, 
        artifact deployment, and others. | | Default: false <offline>false</offline> -->

    <!-- pluginGroups | This is a list of additional group identifiers that 
        will be searched when resolving plugins by their prefix, i.e. | when invoking 
        a command line like "mvn prefix:goal". Maven will automatically add the group 
        identifiers | "org.apache.maven.plugins" and "org.codehaus.mojo" if these 
        are not already contained in the list. | -->
    <pluginGroups>
        <!-- pluginGroup | Specifies a further group identifier to use for plugin 
            lookup. <pluginGroup>com.your.plugins</pluginGroup> -->
    </pluginGroups>

    <!-- proxies | This is a list of proxies which can be used on this machine 
        to connect to the network. | Unless otherwise specified (by system property 
        or command-line switch), the first proxy | specification in this list marked 
        as active will be used. | -->
     <proxies>
            <!--<proxy>
            <id>optional</id>
            <active>true</active>
            <protocol>http</protocol>
            <host>10.22.98.21</host>
            <port>8080</port>
        </proxy>
        -->
    </proxies> 

    <!-- servers | This is a list of authentication profiles, keyed by the server-id 
        used within the system. | Authentication profiles can be used whenever maven 
        must make a connection to a remote server. | -->
    <servers>
        <!-- server | Specifies the authentication information to use when connecting 
            to a particular server, identified by | a unique name within the system (referred 
            to by the 'id' attribute below). | | NOTE: You should either specify username/password 
            OR privateKey/passphrase, since these pairings are | used together. | <server> 
            <id>deploymentRepo</id> <username>repouser</username> <password>repopwd</password> 
            </server> -->

        <!-- Another sample, using keys to authenticate. <server> <id>siteServer</id> 
            <privateKey>/path/to/private/key</privateKey> <passphrase>optional; leave 
            empty if not used.</passphrase> </server> -->
    </servers>

    <!-- mirrors | This is a list of mirrors to be used in downloading artifacts 
        from remote repositories. | | It works like this: a POM may declare a repository 
        to use in resolving certain artifacts. | However, this repository may have 
        problems with heavy traffic at times, so people have mirrored | it to several 
        places. | | That repository definition will have a unique id, so we can create 
        a mirror reference for that | repository, to be used as an alternate download 
        site. The mirror site will be the preferred | server for that repository. 
        | -->
    <mirrors>
        <!-- mirror | Specifies a repository mirror site to use instead of a given 
            repository. The repository that | this mirror serves has an ID that matches 
            the mirrorOf element of this mirror. IDs are used | for inheritance and direct 
            lookup purposes, and must be unique across the set of mirrors. | -->
        <mirror>
            <id>nexus-osc</id>
            <mirrorOf>central</mirrorOf>
            <name>Nexus osc</name>
            <url>http://maven.oschina.net/content/groups/public/</url>
        </mirror>
        <mirror>
            <id>nexus-osc-thirdparty</id>
            <mirrorOf>thirdparty</mirrorOf>
            <name>Nexus osc thirdparty</name>
            <url>http://maven.oschina.net/content/repositories/thirdparty/</url>
        </mirror>

    </mirrors>

    <!-- profiles | This is a list of profiles which can be activated in a variety 
        of ways, and which can modify | the build process. Profiles provided in the 
        settings.xml are intended to provide local machine- | specific paths and 
        repository locations which allow the build to work in the local environment. 
        | | For example, if you have an integration testing plugin - like cactus 
        - that needs to know where | your Tomcat instance is installed, you can provide 
        a variable here such that the variable is | dereferenced during the build 
        process to configure the cactus plugin. | | As noted above, profiles can 
        be activated in a variety of ways. One way - the activeProfiles | section 
        of this document (settings.xml) - will be discussed later. Another way essentially 
        | relies on the detection of a system property, either matching a particular 
        value for the property, | or merely testing its existence. Profiles can also 
        be activated by JDK version prefix, where a | value of '1.4' might activate 
        a profile when the build is executed on a JDK version of '1.4.2_07'. | Finally, 
        the list of active profiles can be specified directly from the command line. 
        | | NOTE: For profiles defined in the settings.xml, you are restricted to 
        specifying only artifact | repositories, plugin repositories, and free-form 
        properties to be used as configuration | variables for plugins in the POM. 
        | | -->
    <profiles>
        <!-- profile | Specifies a set of introductions to the build process, to 
            be activated using one or more of the | mechanisms described above. For inheritance 
            purposes, and to activate profiles via <activatedProfiles/> | or the command 
            line, profiles have to have an ID that is unique. | | An encouraged best 
            practice for profile identification is to use a consistent naming convention 
            | for profiles, such as 'env-dev', 'env-test', 'env-production', 'user-jdcasey', 
            'user-brett', etc. | This will make it more intuitive to understand what 
            the set of introduced profiles is attempting | to accomplish, particularly 
            when you only have a list of profile id's for debug. | | This profile example 
            uses the JDK version to trigger activation, and provides a JDK-specific repo. -->
        <profile>
            <id>jdk-1.8</id>

            <activation>
                <jdk>1.8</jdk>
            </activation>

            <repositories>
                <repository>
                    <id>nexus</id>
                    <name>local private nexus</name>
                    <url>http://maven.oschina.net/content/groups/public/</url>
                    <releases>
                        <enabled>true</enabled>
                    </releases>
                    <snapshots>
                        <enabled>false</enabled>
                    </snapshots>
                </repository>
                <repository>
                                <id>osc_thirdparty</id>
                                <url>http://maven.oschina.net/content/repositories/thirdparty/</url>
                        </repository>
            </repositories>
            <pluginRepositories>
                <pluginRepository>
                    <id>nexus</id>
                    <name>local private nexus</name>
                    <url>http://maven.oschina.net/content/groups/public/</url>
                    <releases>
                        <enabled>true</enabled>
                    </releases>
                    <snapshots>
                        <enabled>false</enabled>
                    </snapshots>
                </pluginRepository>
            </pluginRepositories>
        </profile>


        <!-- | Here is another profile, activated by the system property 'target-env' 
            with a value of 'dev', | which provides a specific path to the Tomcat instance. 
            To use this, your plugin configuration | might hypothetically look like: 
            | | ... | <plugin> | <groupId>org.myco.myplugins</groupId> | <artifactId>myplugin</artifactId> 
            | | <configuration> | <tomcatLocation>${tomcatPath}</tomcatLocation> | </configuration> 
            | </plugin> | ... | | NOTE: If you just wanted to inject this configuration 
            whenever someone set 'target-env' to | anything, you could just leave off 
            the <value/> inside the activation-property. | <profile> <id>env-dev</id> 
            <activation> <property> <name>target-env</name> <value>dev</value> </property> 
            </activation> <properties> <tomcatPath>/path/to/tomcat/instance</tomcatPath> 
            </properties> </profile> -->
    </profiles>

    <!-- activeProfiles | List of profiles that are active for all builds. | 
        <activeProfiles> <activeProfile>alwaysActiveProfile</activeProfile> <activeProfile>anotherAlwaysActiveProfile</activeProfile> 
        </activeProfiles> -->
</settings>

(IV)验证打开Terminal,键入

mvn -v

显示以下信息,Maven工具搭建成功。

Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-15T01:29:23+08:00)
Maven home: /usr/maven/apache-maven-3.2.5
Java version: 1.7.0_72, vendor: Oracle Corporation
Java home: /usr/java/jdk1.7.0_72/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-504.8.1.el6.x86_64", arch: "amd64", family: "unix"

二、从http://spark.apache.org/downloads.html下载Spark 1.2.0源码包,解压放在/usr/spark目录下。

三、打开Terminal,进入/usr/spark/spark-1.2.0目录,键入

mvn -DskipTests clean package

出现以下信息,开始编译。

[INFO] Scanning for projects...
Downloading: http://maven.oschina.net/content/groups/public/org/apache/apache/14/apache-14.pom
Downloaded: http://maven.oschina.net/content/groups/public/org/apache/apache/14/apache-14.pom (15 KB at 5.6 KB/sec)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] Spark Project Parent POM
[INFO] Spark Project Networking
[INFO] Spark Project Shuffle Streaming Service
[INFO] Spark Project Core
[INFO] Spark Project Bagel
[INFO] Spark Project GraphX
[INFO] Spark Project Streaming
[INFO] Spark Project Catalyst
[INFO] Spark Project SQL
[INFO] Spark Project ML Library
[INFO] Spark Project Tools
[INFO] Spark Project Hive
[INFO] Spark Project REPL
[INFO] Spark Project Assembly
[INFO] Spark Project External Twitter
[INFO] Spark Project External Flume Sink
[INFO] Spark Project External Flume
[INFO] Spark Project External MQTT
[INFO] Spark Project External ZeroMQ
[INFO] Spark Project External Kafka
[INFO] Spark Project Examples
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------

编译过程中,Maven根据情况,下载需要的文件包,受限国内网络条件,时间可能较长。过程中若因网络问题出现下载错误,再次键入编译命令,编译过程继续进行,警告可忽略。直到最后出现以下信息,编译完成。

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [35:17 min]
[INFO] Spark Project Networking ........................... SUCCESS [16:53 min]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 26.230 s]
[INFO] Spark Project Core ................................. SUCCESS [32:59 min]
[INFO] Spark Project Bagel ................................ SUCCESS [ 25.566 s]
[INFO] Spark Project GraphX ............................... SUCCESS [01:45 min]
[INFO] Spark Project Streaming ............................ SUCCESS [01:54 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [01:56 min]
[INFO] Spark Project SQL .................................. SUCCESS [05:14 min]
[INFO] Spark Project ML Library ........................... SUCCESS [03:17 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 15.841 s]
[INFO] Spark Project Hive ................................. SUCCESS [11:33 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 54.570 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 46.018 s]
[INFO] Spark Project External Twitter ..................... SUCCESS [ 47.342 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [04:54 min]
[INFO] Spark Project External Flume ....................... SUCCESS [ 37.416 s]
[INFO] Spark Project External MQTT ........................ SUCCESS [ 34.923 s]
[INFO] Spark Project External ZeroMQ ...................... SUCCESS [01:05 min]
[INFO] Spark Project External Kafka ....................... SUCCESS [02:15 min]
[INFO] Spark Project Examples ............................. SUCCESS [11:07 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:15 h
[INFO] Finished at: 2015-01-02T17:21:15+08:00
[INFO] Final Memory: 69M/1122M
[INFO] ------------------------------------------------------------------------

四、启动Spark Shell

/usr/Spark/Spark-1.2.0目录下,键入

./bin/spark-shell

出现以下信息,Spark启动成功。

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/04/13 09:50:52 INFO SecurityManager: Changing view acls to: kevin
15/04/13 09:50:52 INFO SecurityManager: Changing modify acls to: kevin
15/04/13 09:50:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kevin); users with modify permissions: Set(kevin)
15/04/13 09:50:52 INFO HttpServer: Starting HTTP Server
15/04/13 09:50:52 INFO Utils: Successfully started service 'HTTP class server' on port 55842.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_   version 1.2.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_72)
Type in expressions to have them evaluated.
Type :help for more information.
15/04/13 09:50:57 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.131.151 instead (on interface eth0)
15/04/13 09:50:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/04/13 09:50:57 INFO SecurityManager: Changing view acls to: kevin
15/04/13 09:50:57 INFO SecurityManager: Changing modify acls to: kevin
15/04/13 09:50:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kevin); users with modify permissions: Set(kevin)
15/04/13 09:50:58 INFO Slf4jLogger: Slf4jLogger started
15/04/13 09:50:58 INFO Remoting: Starting remoting
15/04/13 09:50:58 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.131.151:41278]
15/04/13 09:50:58 INFO Utils: Successfully started service 'sparkDriver' on port 41278.
15/04/13 09:50:58 INFO SparkEnv: Registering MapOutputTracker
15/04/13 09:50:58 INFO SparkEnv: Registering BlockManagerMaster
15/04/13 09:50:58 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150413095058-f481
15/04/13 09:50:58 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/04/13 09:50:59 INFO HttpFileServer: HTTP File server directory is /tmp/spark-15b2ae1c-3256-43a7-bc05-b79cb924911d
15/04/13 09:50:59 INFO HttpServer: Starting HTTP Server
15/04/13 09:50:59 INFO Utils: Successfully started service 'HTTP file server' on port 41609.
15/04/13 09:50:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/04/13 09:50:59 INFO SparkUI: Started SparkUI at http://192.168.131.151:4040
15/04/13 09:50:59 INFO Executor: Using REPL class URI: http://192.168.131.151:55842
15/04/13 09:50:59 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.131.151:41278/user/HeartbeatReceiver
15/04/13 09:50:59 INFO NettyBlockTransferService: Server created on 50724
15/04/13 09:50:59 INFO BlockManagerMaster: Trying to register BlockManager
15/04/13 09:50:59 INFO BlockManagerMasterActor: Registering block manager localhost:50724 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 50724)
15/04/13 09:50:59 INFO BlockManagerMaster: Registered BlockManager
15/04/13 09:50:59 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala>

最后,单机编译Spark完成!

参考:Maven:http://maven.apache.org/

        Spark:http://spark.apache.org/

原文地址:https://www.cnblogs.com/kevingu/p/4421624.html