记录hadoop2.7.3、Spark2.4.5配置

core-site.xml:
  <configuration>  
      <property>  
        <name>fs.default.name</name>  
        <value>hdfs://node1:9000</value>  
      </property>  
      <property>  
        <name>hadoop.tmp.dir</name>  
        <value>/opt/hadoop-2.7.3/tmp</value>  
      </property>  
    </configuration>  
hdfs-site.xml:
 <configuration>  
      <property>  
        <name>dfs.replication</name>  
        <value>2</value>  
      </property>  
      <property>  
        <name>dfs.namenode.name.dir</name>  
        <value>file:/opt/hadoop-2.7.3/dfs/name</value>  
      </property>  
      <property>  
        <name>dfs.datanode.data.dir</name>  
        <value>file:/opt/hadoop-2.7.3/dfs/data</value>  
      </property>  
    </configuration>  
<configuration>  
      <property>  
        <name>mapreduce.framework.name</name>  
        <value>yarn</value>  
      </property>  
      <property>  
        <name>mapreduce.jobhistory.address</name>  
        <value>node1:10020</value>  
      </property>  
      <property>  
        <name>mapreduce.jobhistory.webapp.address</name>  
        <value>node1:19888</value>  
      </property>  
    </configuration>  
yarn-site.xml:
 <configuration>     
    <!-- Site specific YARN configuration properties -->  
      <property>  
        <name>yarn.nodemanager.aux-services</name>  
        <value>mapreduce_shuffle</value>  
      </property>  
      <property>  
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>  
      </property>  
      <property>  
        <name>yarn.resourcemanager.address</name>  
        <value>node1:8032</value>  
      </property>  
      <property>  
    <name>yarn.resourcemanager.scheduler.address</name>  
        <value>node1:8030</value>  
      </property>  
      <property>  
        <name>yarn.resourcemanager.resource-tracker.address</name>  
        <value>node1:8031</value>  
      </property>  
      <property>  
        <name>yarn.resourcemanager.admin.address</name>  
        <value>node1:8033</value>  
      </property>  
      <property>  
        <name>yarn.resourcemanager.webapp.address</name>  
        <value>node1:8888</value>  
      </property>  
    </configuration>  

node1:192.168.31.100

node2:192.168.31.101

node3:192.168.31.102

关闭iptables

关闭selinux

配置IPADDR、ONBOOT=yes、GATEWAY

配置JAVA、hadoop

配置SSH免密

配置hadoop-env.sh、yarn-env.sh,取消注释添加JAVA_HOME

Spark:

进入conf文件夹,把spark-env.sh.template复制一份spark-env.sh

export JAVA_HOME=/opt/jdk1.8.0_221
export SCALA_HOME=/opt/scala-2.12.11
export HADOOP_HOME=/opt/hadoop-2.7.3
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/spark-2.4.5-bin-hadoop2.7
export SPARK_MASTER_IP=192.168.31.100

进入conf文件夹,把slaves.template拷贝一份改名为slaves

node1
node2
node3

BUG:

设置yarn运行时:

WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

## 打包jars

jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .
## 新建hdfs路径
hdfs dfs -mkdir -p /spark/jar
## 上传jars到HDFS
hdfs dfs -put spark-libs.jar /spark/jar
## 增加配置
vim spark-defaults.conf(先cp模板)
spark.yarn.archive=hdfs://node1:9000/spark/jar/spark-libs.jar

红色部分一定要写,不然提示错误如下:

java.lang.IllegalArgumentException: java.net.UnknownHostException: spark

Spark集群运行的几种模式 

    a.local本地模式

    b.Spark内置standalone集群模式

    c.Yarn集群模式

原文地址:https://www.cnblogs.com/cassielcode/p/12639150.html