spark--spark9.0安装【1】

spark：

Spark是下一代In Memory MR计算框架，性能上有数量级提升，同时支持Interactive Query、流计算、图计算等。支持java、scala

1.高性能机器学习

2.即时计算

这种模式就是一个单一的spark集群或者单spark测试机抑或开发机。

1.在集群各个节点安装编译好的spark版本，也可以自己编译安装，自己编译点击此处。在conf/slaves中需要将需要使用的worker的hostname包含进去，和hadoop的slaves文件配置类型。

2.启动spark

./sbin/start-master.sh

3.启动后master会首先输出spark://HOST:PORT 的url，也可以在mater的 http://localhost:8080上找到这个url的。

4.使用如下的命令启动worker并连接到master

./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT

5.在master上用http://localhost:8080这个地址对集群进行监控

6.又和hadoop类似，spark集群需要无密码访问的ssh

7.使用在 SPARK_HOME/bin的如下脚本对spark集群进行管理：

8.使用conf/spark-env.sh.template创建配置环境变量的conf/spark-env.sh文件。

Environment Variable	Meaning
`SPARK_MASTER_IP`	Bind the master to a specific IP address, for example a public one.
`SPARK_MASTER_PORT`	Start the master on a different port (default: 7077).
`SPARK_MASTER_WEBUI_PORT`	Port for the master web UI (default: 8080).
`SPARK_WORKER_PORT`	Start the Spark worker on a specific port (default: random).
`SPARK_WORKER_DIR`	Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work).
`SPARK_WORKER_CORES`	Total number of cores to allow Spark applications to use on the machine (default: all available cores).
`SPARK_WORKER_MEMORY`	Total amount of memory to allow Spark applications to use on the machine, e.g. `1000m`, `2g` (default: total memory minus 1 GB); note that each application's individual memory is configured using its `spark.executor.memory`property.
`SPARK_WORKER_WEBUI_PORT`	Port for the worker web UI (default: 8081).
`SPARK_WORKER_INSTANCES`	Number of worker instances to run on each machine (default: 1). You can make this more than 1 if you have have very large machines and would like multiple Spark worker processes. If you do set this, make sure to also set `SPARK_WORKER_CORES` explicitly to limit the cores per worker, or else each worker will try to use all the cores.
`SPARK_DAEMON_MEMORY`	Memory to allocate to the Spark master and worker daemons themselves (default: 512m).
`SPARK_DAEMON_JAVA_OPTS`	JVM options for the Spark master and worker daemons themselves (default: none).

注意，管理脚本不支持window操作系统