Spark-1.5.2安装--Standalone和Yarn

Spark Standalone

1.下载scala-2.10.6包解压到指定目录,添加环境变量

#SCALA VARIABLES START
export SCALA_HOME=/usr/local/scala-2.10.6
export PATH=$PATH:$SCALA_HOME/bin
#SCALA VARIABLES END

2.下载Spark-1.5.2包解压到指定目录,添加环境变量

#SPARK VARIABLES START
export SPARK_HOME=/usr/local/spark-1.5.2
export PATH=$PATH:$SPARK_HOME/bin
#SPARK VARIABLES END

3.修改spark-env.sh文件

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_66
export SCALA_HOME=/usr/local/scala-2.10.6
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=10.9.2.100
SPARK_LOCAL_DIR="/usr/local/spark-1.5.2/tmp"

4.启动集群(机器ssh端口改变时)
启动主节点:sbin/start-master.sh
启动从节点:sbin/start-slave.sh 10.9.2.100:7077
5.验证

#本地模式两线程运行
./bin/run-example SparkPi 10 --master local[2]

#Spark Standalone 集群模式运行
./bin/spark-submit   --class org.apache.spark.examples.SparkPi   --master spark://10.9.2.100:7077   lib/spark-examples-1.5.2-hadoop2.6.0.jar   100

#Spark on YARN 集群上 yarn-cluster 模式运行(此方法无需启动master和slaves,需要yarn环境)
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster lib/spark-examples*.jar 10

直接使用bin/spark-shell是local模式
6.错误解决:

15/11/30 16:20:00 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-6,5,main]

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@4a890723 rejected from java.util.concurrent.ThreadPoolExecutor@64992284[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]

        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)

        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)

        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)

        at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)

        at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:211)

        at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:210)

        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)

        at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters(Worker.scala:210)

        at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:288)

        at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)

        at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234)

        at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)

        at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)

sr/local/spark-1.5.2/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark-1.5.2/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.5.2/lib/datanucleus-core-3.

2.10.jar:/usr/local/hadoop-2.6.0/etc/hadoop/ -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 10.9.2.100:7077

解决:

将SPARK_MASTER_IP=master改成
SPARK_MASTER_IP=10.9.2.100

spark on yarn

spark按需部署,不用部署全集群节点, 同时也没必要启动spark的master和slaves服务,因为Spark应用程序提交到YARN后,YARN会负责集群资源的调度。
按照上面步骤1-3进行配置即可,需要去掉步骤3中的SPARK_MASTER_IP=10.9.2.100配置项。

原文地址:https://www.cnblogs.com/ggzone/p/10121220.html