spark编译

安装jdk1.8

vi pom.xml

CDH版本编译添加：

pom.xml

<repositories>
  <repository>
    <id>central</id>
    <!-- This should be at top, it makes maven try the central repo first and then others and hence faster dep resolution -->
    <name>Maven Repository</name>
    <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
    <releases>
      <enabled>true</enabled>
    </releases>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </repository>
  <repository>
    <id>cloudera</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
  </repository>
</repositories>
<pluginRepositories>
  <pluginRepository>
    <id>central</id>
    <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
    <releases>
      <enabled>true</enabled>
    </releases>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </pluginRepository>
</pluginRepositories>
<dependencies>

./dev/make-distribution.sh (自定义版本)

VERSION=2.3.1

#VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1)

#SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\

# | grep -v "INFO"\

# | tail -n 1)

SCALA_VERSION=2.11

SPARK_HADOOP_VERSION=2.6.0

#SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\

# | grep -v "INFO"\

# | tail -n 1)

SPARK_HIVE=1

#SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\

# | grep -v "INFO"\

# | fgrep --count "<id>hive</id>";\

# Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\

# because we use "set -o pipefail"

# echo -n)

执行编译命令：

./dev/make-distribution.sh --name 2.6.0-cdh5.10.0 --tgz -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.10.0 -Phive -Phive-thriftserver -DskipTests