Spark学习之第一个程序 WordCount

WordCount程序

求下列文件中使用空格分割之后,单词出现的个数

  • input.txt
java scala python hello world
java pyfysf upuptop wintp top
sfok sf sf 
sf java android sf pyfysf upuptop 
pyfysf upuptop java android spark
hello world world hello top scala spark
spark spark sql

创建maven项目

  • pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>SparkStudy</artifactId>
        <groupId>top.wintp.sparkstudy</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>SparkCore</artifactId>
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.1.1</version>
        </dependency>
    </dependencies>
    <build>
        <finalName>WordCount</finalName>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.0.0</version>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>WordCount(修改)</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>


</project>

  • WordCount.scala
package top.wintp.sparkstudy.sparkcore

import org.apache.spark.{SparkConf, SparkContext}

/**
  * description:
  * <p>
  * author:  upuptop
  * <p>
  * qq: 337081267
  * <p>
  * CSDN:   http://blog.csdn.net/pyfysf
  * <p>
  * cnblogs:   http://www.cnblogs.com/upuptop
  * <p>
  * blog:   http://wintp.top
  * <p>
  * email:  pyfysf@163.com
  * <p>
  * time: 2019/07/2019/7/1
  * <p>
  */
object WordCount {
  def main(args: Array[String]): Unit = {
    //    创建SparkConf  
    // setMaster local/local[n]/local[*] 都是本地运行 可以设置远程服务器的Master的地址URL
    val conf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
    //    创建SparkContext
    val sc = new SparkContext(conf)
    //    根据外部文件创建RDD
    val line = sc.textFile("E:/input/input.txt")
    //    flatmap压平操作
    val words = line.flatMap(_.split(" "))
    //    map 组装键值对
    val k2v = words.map((_, 1))
    //    计算结果
    val result = k2v.reduceByKey(_ + _)
    //    保存结果数据到文件中去
    result.saveAsTextFile("E:/output/" + System.currentTimeMillis())

    //    关闭Context
    sc.stop()
  }
}

  • 输出结果
    在这里插入图片描述

按照如上配置,不会出现以下问题,如不幸出现下面描述问题,请将scal-SDK放到所有依赖的最后


Exception in thread "main"
 java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)
 Lscala/collection/mutable/ArrayOps;

在这里插入图片描述

原文地址:https://www.cnblogs.com/shaofeer/p/11154488.html