Spark基础知识汇总

2,wordcount:

val wordcount = sc.textFile("/user/s-44/wordcount.txt").flatMap(_.split(' ')).map((_, 1)).reduceByKey(_ + _).sortByKey().collect

val wordcount = sc.textFile("/user/s-44/wordcount.txt").flatMap(_.split(' ')).map((_, 1)).reduceByKey(_ + _).sortByKey().saveAsTextFile("/user/s-44/result.txt")

下面这个是按value排序

val wordcount = sc.textFile("/user/s-44/wordcount.txt").flatMap(_.split(' ')).map((_, 1)).reduceByKey(_ + _).map(_.swap).sortByKey().collect

val wordcount = sc.textFile("/user/s-44/wordcount.txt").flatMap(_.split(' ')).map((_, 1)).reduceByKey(_ + _).map(_.swap).sortByKey().saveAsTextFile("/user/s-44/result.txt")
View Code

1,集合变成rdd

val rdd = sc.parallelize(List(1, 2, 3, 4, 5))
View Code
原文地址:https://www.cnblogs.com/yueyebigdata/p/5604667.html