spark 学习笔记 RDD 向 Dataframe 转换

1、普通方式:

例如rdd.map(para(para(0).trim(),para(1).trim().toInt)).toDF("name","age")   

#需要导入隐式转换

 import spark.implicits._  // 隐式转换
 val df1=data.map(x=>x.split(",")).map(x=>(x(0).trim,x(1).trim,x(2).trim,x(3).trim,x(4).trim,x(5).trim,x(6).trim,x(7).trim,x(8).trim.toLong,x(9).trim,x(10).trim,x(11).trim,x(12).trim,x(13).trim,x(14).trim))
      .toDF("xxid","province_id","xid","test","number","time","a","b","sales","selection","add","game","begin","end","draw")
df1.createOrReplaceTempView(
"bjlot") spark.sql("select sum(sales) as a from bjlot " ).createOrReplaceTempView("tmp")

2、通过反射来设置schema,例如:

//通过反射导入schema
import spark.implicits._

val df2=data.map(x=>x.split(",")).map(x=>bd(x(0).trim.toString,x(1).trim.toInt,x(2).trim.toInt,x(3).trim.toInt,x(4).trim.toString,x(5).trim.toString,x(6).toInt,x(7).toInt,x(8).toLong,x(9).toString,x(10).toString,x(11).toString,x(12).toString, x(13).toString,x(14).trim.toString)).toDF() case class bd(shop:String,province:Int,loc:Int,no:Int,ticket_no:String,sale_time:String,chances:Int,multple:Int,sales:Long,selection:String,add:String,game:String,begn:String,end:String,draw_date:String)

  

原文地址:https://www.cnblogs.com/students/p/13446988.html