Spark 1.3.0 单机安装

一、试验环境:

CentOS6.6 最小化安装;主机名spark-test,IP:10.10.10.26

OpenStack虚拟云主机。

注:安装流程:进入linux->安装JDK->安装scala->安装spark。


二、安装JDK

下载JDK:

版本jdk-6u45-linux-x64.bin,下载见Oracle官网

建立data文件夹,用来存放数据

# mkdir /data

[root@spark-test data]# ls 
jdk-6u45-linux-x64.bin  scala-2.11.6.tgz  spark-1.3.0-bin-hadoop2.4.tgz

安装jdk

[root@spark-test data]# chmod u+x jdk-6u45-linux-x64.bin      //增加执行权限 
[root@spark-test data]# ./jdk-6u45-linux-x64.bin

配置环境变量

[root@spark-test data]# vim /etc/profile

#JAVA VARIABLES START 
export JAVA_HOME=/data/jdk1.6.0_45 
export PATH=$PATH:$JAVA_HOME/bin 
#JAVA VARIABLES END

[root@spark-test data]# source /etc/profile 
[root@spark-test data]# java -version 
java version "1.6.0_45" 
Java(TM) SE Runtime Environment (build 1.6.0_45-b06) 
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

三、安装scala

下载Scala,版本2.11.6  网址:http://www.scala-lang.org/download/2.11.6.html

image

安装Scala

[root@spark-test data]# tar -zxvf  scala-2.11.6.tgz

配置环境变量

[root@spark-test data]# vim /etc/profile

#SCALA VARIABLES START 
export SCALA_HOME=/data/scala-2.11.6 
export PATH=$PATH:$SCALA_HOME/bin 
#SCALA VARIABLES END

[root@spark-test data]# source /etc/profile 
[root@spark-test data]# scala -version 
Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL

Scala配置成功

四、安装Spark

从官网下载http://spark.apache.org/downloads.html

image

下载编译后的版本

解压安装

[root@spark-test data]# tar -zxvf spark-1.3.0-bin-hadoop2.4.tgz

配置Spark环境变量:

[root@spark-test data]# vim /etc/profile

#SPARK VARIABLES START 
export SPARK_HOME=/data/spark-1.3.0-bin-hadoop2.4 
export PATH=$PATH:$SPARK_HOME/bin 
#SPARK VARIABLES END

[root@spark-test data]# source /etc/profile

切换到conf目录:

[root@spark-test conf]# ls 
fairscheduler.xml.template   slaves.template 
log4j.properties.template    spark-defaults.conf.template 
metrics.properties.template  spark-env.sh.template 
[root@spark-test conf]# mv spark-env.sh.template spark-env.sh

[root@spark-test conf]# vim spark-env.sh 

export SCALA_HOME=/data/scala-2.11.6 
export JAVA_HOME=/data/jdk1.6.0_45 
export SPARK_MASTER_IP=10.10.10.26 
export SPARK_WORKER_MEMORY=1024m 
export master=spark://10.10.10.26:7070

[root@spark-test conf]# vim slaves 

spark-test

启动spark集群:

[root@spark-test sbin]# pwd 
/data/spark-1.3.0-bin-hadoop2.4/sbin 
[root@spark-test sbin]# ./start-all.sh

验证:

[root@spark-test sbin]# jps 
22974 Worker 
23395 Jps 
22830 Master

测试:

切换目录

[root@spark-test bin]# pwd 
/data/spark-1.3.0-bin-hadoop2.4/bin

运行样例:

[root@spark-test spark-1.3.0-bin-hadoop2.4]# ./bin/run-example org.apache.spark.examples.SparkPi 
Spark assembly has been built with Hive, including Datanucleus jars on classpath 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
15/04/01 11:40:48 INFO SparkContext: Running Spark version 1.3.0 
15/04/01 11:40:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/04/01 11:40:49 INFO SecurityManager: Changing view acls to: root 
15/04/01 11:40:49 INFO SecurityManager: Changing modify acls to: root 
15/04/01 11:40:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
15/04/01 11:40:49 INFO Slf4jLogger: Slf4jLogger started 
15/04/01 11:40:49 INFO Remoting: Starting remoting 
15/04/01 11:40:50 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@spark-test.novalocal:58680] 
15/04/01 11:40:50 INFO Utils: Successfully started service 'sparkDriver' on port 58680. 
15/04/01 11:40:50 INFO SparkEnv: Registering MapOutputTracker 
15/04/01 11:40:50 INFO SparkEnv: Registering BlockManagerMaster 
15/04/01 11:40:50 INFO DiskBlockManager: Created local directory at /tmp/spark-53cdf980-4803-480f-8936-2b3bb7e2bbfc/blockmgr-c15cfa29-3bfb-4ee8-a0d3-b9735bfe9dea 
15/04/01 11:40:50 INFO MemoryStore: MemoryStore started with capacity 265.0 MB 
15/04/01 11:40:50 INFO HttpFileServer: HTTP File server directory is /tmp/spark-22f9b0df-bfdb-435d-b504-ab1c52b73556/httpd-244e5d7f-9c1d-48d8-bd95-2ed985ecb3a0 
15/04/01 11:40:50 INFO HttpServer: Starting HTTP Server 
15/04/01 11:40:50 INFO Server: jetty-8.y.z-SNAPSHOT 
15/04/01 11:40:50 INFO AbstractConnector: Started SocketConnector@0.0.0.0:59040 
15/04/01 11:40:50 INFO Utils: Successfully started service 'HTTP file server' on port 59040. 
15/04/01 11:40:50 INFO SparkEnv: Registering OutputCommitCoordinator 
15/04/01 11:40:50 INFO Server: jetty-8.y.z-SNAPSHOT 
15/04/01 11:40:50 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 
15/04/01 11:40:50 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
15/04/01 11:40:50 INFO SparkUI: Started SparkUI at http://spark-test.novalocal:4040 
15/04/01 11:40:51 INFO SparkContext: Added JAR file:/data/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar at http://10.10.10.26:59040/jars/spark-examples-1.3.0-hadoop2.4.0.jar with timestamp 1427859651127 
15/04/01 11:40:51 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@10.10.10.26:7070/user/Master... 
15/04/01 11:40:51 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@10.10.10.26:7070: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@10.10.10.26:7070 
15/04/01 11:40:51 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@10.10.10.26:7070]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /10.10.10.26:7070 
15/04/01 11:41:11 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@10.10.10.26:7070/user/Master... 
15/04/01 11:41:11 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@10.10.10.26:7070: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@10.10.10.26:7070 
15/04/01 11:41:11 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@10.10.10.26:7070]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /10.10.10.26:7070 
15/04/01 11:41:31 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@10.10.10.26:7070/user/Master... 
15/04/01 11:41:31 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@10.10.10.26:7070: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@10.10.10.26:7070 
15/04/01 11:41:31 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@10.10.10.26:7070]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /10.10.10.26:7070 
15/04/01 11:41:51 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 
15/04/01 11:41:51 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up. 
15/04/01 11:41:51 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet. 
[root@spark-test spark-1.3.0-bin-hadoop2.4]# ./bin/run-example org.apache.spark.examples.SparkPi 
Spark assembly has been built with Hive, including Datanucleus jars on classpath 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
15/04/01 11:53:22 INFO SparkContext: Running Spark version 1.3.0 
15/04/01 11:53:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/04/01 11:53:22 INFO SecurityManager: Changing view acls to: root 
15/04/01 11:53:22 INFO SecurityManager: Changing modify acls to: root 
15/04/01 11:53:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
15/04/01 11:53:23 INFO Slf4jLogger: Slf4jLogger started 
15/04/01 11:53:23 INFO Remoting: Starting remoting 
15/04/01 11:53:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@spark-test.novalocal:55722] 
15/04/01 11:53:23 INFO Utils: Successfully started service 'sparkDriver' on port 55722. 
15/04/01 11:53:23 INFO SparkEnv: Registering MapOutputTracker 
15/04/01 11:53:23 INFO SparkEnv: Registering BlockManagerMaster 
15/04/01 11:53:23 INFO DiskBlockManager: Created local directory at /tmp/spark-d70142c7-effd-40c0-b050-f39d727d6e33/blockmgr-6d5699cc-acf8-4ab9-8b39-dfb5385209e5 
15/04/01 11:53:23 INFO MemoryStore: MemoryStore started with capacity 265.0 MB 
15/04/01 11:53:23 INFO HttpFileServer: HTTP File server directory is /tmp/spark-5821f748-ecf7-4e24-a593-ff2c2b040b43/httpd-90a05ad6-f73b-4a52-9a61-0ff135f449a9 
15/04/01 11:53:23 INFO HttpServer: Starting HTTP Server 
15/04/01 11:53:23 INFO Server: jetty-8.y.z-SNAPSHOT 
15/04/01 11:53:23 INFO AbstractConnector: Started SocketConnector@0.0.0.0:43969 
15/04/01 11:53:23 INFO Utils: Successfully started service 'HTTP file server' on port 43969. 
15/04/01 11:53:23 INFO SparkEnv: Registering OutputCommitCoordinator 
15/04/01 11:53:23 INFO Server: jetty-8.y.z-SNAPSHOT 
15/04/01 11:53:23 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 
15/04/01 11:53:23 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
15/04/01 11:53:23 INFO SparkUI: Started SparkUI at http://spark-test.novalocal:4040 
15/04/01 11:53:23 INFO SparkContext: Added JAR file:/data/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar at http://10.10.10.26:43969/jars/spark-examples-1.3.0-hadoop2.4.0.jar with timestamp 1427860403997 
15/04/01 11:53:24 INFO Executor: Starting executor ID <driver> on host localhost 
15/04/01 11:53:24 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@spark-test.novalocal:55722/user/HeartbeatReceiver 
15/04/01 11:53:24 INFO NettyBlockTransferService: Server created on 39015 
15/04/01 11:53:24 INFO BlockManagerMaster: Trying to register BlockManager 
15/04/01 11:53:24 INFO BlockManagerMasterActor: Registering block manager localhost:39015 with 265.0 MB RAM, BlockManagerId(<driver>, localhost, 39015) 
15/04/01 11:53:24 INFO BlockManagerMaster: Registered BlockManager 
15/04/01 11:53:24 INFO SparkContext: Starting job: reduce at SparkPi.scala:35 
15/04/01 11:53:24 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false) 
15/04/01 11:53:24 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35) 
15/04/01 11:53:24 INFO DAGScheduler: Parents of final stage: List() 
15/04/01 11:53:24 INFO DAGScheduler: Missing parents: List() 
15/04/01 11:53:24 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31), which has no missing parents 
15/04/01 11:53:24 INFO MemoryStore: ensureFreeSpace(1848) called with curMem=0, maxMem=277842493 
15/04/01 11:53:24 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1848.0 B, free 265.0 MB) 
15/04/01 11:53:24 INFO MemoryStore: ensureFreeSpace(1296) called with curMem=1848, maxMem=277842493 
15/04/01 11:53:24 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1296.0 B, free 265.0 MB) 
15/04/01 11:53:24 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:39015 (size: 1296.0 B, free: 265.0 MB) 
15/04/01 11:53:24 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 
15/04/01 11:53:24 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:839 
15/04/01 11:53:24 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31) 
15/04/01 11:53:24 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
15/04/01 11:53:24 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1336 bytes) 
15/04/01 11:53:24 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1336 bytes) 
15/04/01 11:53:24 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 
15/04/01 11:53:24 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 
15/04/01 11:53:24 INFO Executor: Fetching http://10.10.10.26:43969/jars/spark-examples-1.3.0-hadoop2.4.0.jar with timestamp 1427860403997 
15/04/01 11:53:24 INFO Utils: Fetching http://10.10.10.26:43969/jars/spark-examples-1.3.0-hadoop2.4.0.jar to /tmp/spark-7cb47603-adb9-45ea-ad91-e5ddc3c6da41/userFiles-86c97d54-c082-4bb8-bcb3-34b97a432674/fetchFileTemp3928503400699723858.tmp 
15/04/01 11:53:25 INFO Executor: Adding file:/tmp/spark-7cb47603-adb9-45ea-ad91-e5ddc3c6da41/userFiles-86c97d54-c082-4bb8-bcb3-34b97a432674/spark-examples-1.3.0-hadoop2.4.0.jar to class loader 
15/04/01 11:53:25 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 736 bytes result sent to driver 
15/04/01 11:53:25 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 736 bytes result sent to driver 
15/04/01 11:53:25 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1068 ms on localhost (1/2) 
15/04/01 11:53:25 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1028 ms on localhost (2/2) 
15/04/01 11:53:25 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/04/01 11:53:25 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 1.107 s 
15/04/01 11:53:25 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:35, took 1.326417 s 
Pi is roughly 3.13518 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 
15/04/01 11:53:25 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 
15/04/01 11:53:25 INFO SparkUI: Stopped Spark web UI at http://spark-test.novalocal:4040 
15/04/01 11:53:25 INFO DAGScheduler: Stopping DAGScheduler 
15/04/01 11:53:25 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! 
15/04/01 11:53:25 INFO MemoryStore: MemoryStore cleared 
15/04/01 11:53:25 INFO BlockManager: BlockManager stopped 
15/04/01 11:53:25 INFO BlockManagerMaster: BlockManagerMaster stopped 
15/04/01 11:53:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped! 
15/04/01 11:53:25 INFO SparkContext: Successfully stopped SparkContext 
15/04/01 11:53:25 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
15/04/01 11:53:25 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
15/04/01 11:53:25 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
既然优秀不够,那就让自己无可替代
原文地址:https://www.cnblogs.com/icloud/p/4381470.html