Spark-shell实验1简单的shell操作

某大学计算机系的成绩,数据格式如下所示:
Tom,DataBase,80
Tom,Algorithm,50
Tom,DataStructure,60
Jim,DataBase,90
Jim,Algorithm,60
Jim,DataStructure,80
……
请根据给定的实验数据,在 spark-shell 中通过编程来计算以下内容:
(1)该系总共有多少学生;

val lines=sc.textFile("/test/Data1.txt")//打开文件
val par=lines.map(row=>row.split(",")(0))//切分取第一数值
val distinct_par=par.distinct()//去重
distinct_par.count//输出

(2)Tom 同学的总成绩平均分是多少;

val lines=sc.textFile("/test/Data1.txt")//打开文件
val pare=lines.filter(row=>row.split(",")(0)=="Tom")//fileter pare.foreach(println)//输出内容 /*Tom,DataBase,26 Tom,Algorithm,12 Tom,OperatingSystem,16 Tom,Python,40 Tom,Software,60*/ pare.map(row=>(row.split(",")(0),row.split(",")(2).toInt)).mapValues(x=>(x,1)).reduceByKey((x,y)=>(x._1+y._1,x._2+y._2)).mapValues(x=>(x._1/x._2)).collect() //res13: Array[(String, Int)] = Array((Tom,30))

(4)求每名同学的选修的课程门数;

val lines=sc.textFile("/test/Data1.txt")
val pare=lines.map(row=>(row.spilt(",")(0),row.split(",")(1)))
pare.mapValues(x=>(x,1)).reduceByKey((x,y)=>(" ",x._2+y._2)).mapValues(x =>x._2).foreach(println)


(5)该系 DataBase 课程共有多少人选修;

val pare=lines.filter(row=>row.split(",")(1)=="DataBase")

pare.count


(6)各门课程的平均分是多少;

val par=lines.map(row=>(row.split(",")(1),row.split(",")(2).toInt))
par.mapValues(x=>(x,1)).reduceByKey((x,y)=>(x._1+y._1,x._2+y._2)).mapValues(x=>(x._1/x._2)).collect()

(7)使用累加器计算共有多少人选了 DataBase 这门课

val pare=lines.filter(row=>row.split(",")(1)=="DataBase").map(row=>(row.split(",")(1),1))
 val accum=sc.longAccumulator("My Accumulator")
pare.values.foreach(x=>accum.add(x))
accum.value
原文地址:https://www.cnblogs.com/837634902why/p/10520116.html