Spark SQL 操作Hive 数据

Spark 2.0以前版本:
val sparkConf = new SparkConf().setAppName("soyo")
    val spark = new SparkContext(sparkConf)
Spark 2.0以后版本:(上面的写法兼容)
直接用SparkSession:
val spark = SparkSession
      .builder
      .appName("soyo")
      .getOrCreate()
    var tc = spark.sparkContext.parallelize(数据).cache()


import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.hive._
case class Persons(name:String,age:Int) case class Record(key: Int, value: String) object rdd_to_dataframe_parquet { val warehouseLocation = "file:${system:user.dir}/spark-warehouse" val spark=SparkSession.builder().config("spark.sql.warehouse.dir",warehouseLocation).enableHiveSupport().getOrCreate() import spark.implicits._ def main(args: Array[String]): Unit = { spark.sql("CREATE TABLE IF NOT EXISTS soyo1(key INT,value STRING)") spark.sql("LOAD DATA LOCAL INPATH 'file:///home/soyo/桌面/spark编程测试数据/kv1.txt' INTO TABLE soyo1") spark.sql("select * from soyo").show() //默认只取前20行 spark.sql("select * from soyo").take(100).foreach(println) import spark.sql //导入之后不需要再加Spark sql("SELECT COUNT(*) FROM soyo").show() sql("select count(*) from soyo1").show() sql("show tables").show() sql("select * from people").show() val result2=sql("select * from people") val fin_result=result2.map { case Row(key: String, value: Int) => s"name=$key;value=$value" } fin_result.show() val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i, s"soyo_$i"))) recordsDF.createOrReplaceTempView("records") // Queries can then join DataFrame data with data stored in Hive. sql("SELECT * FROM records ").show() val res= sql("SELECT * FROM records ").map( x=>"key:"+x(0)+",value:"+x(1) ).show() spark.stop() } }

 结果:+---+-------+
|key|  value|
+---+-------+
|238|val_238|
| 86| val_86|
|311|val_311|
| 27| val_27|
|165|val_165|
|409|val_409|
|255|val_255|
|278|val_278|
| 98| val_98|
|484|val_484|
|265|val_265|
|193|val_193|
|401|val_401|
|150|val_150|
|273|val_273|
|224|val_224|
|369|val_369|
| 66| val_66|
|128|val_128|
|213|val_213|
+---+-------+
only showing top 20 rows

[238,val_238]
[86,val_86]
[311,val_311]
[27,val_27]
[165,val_165]
[409,val_409]
[255,val_255]
[278,val_278]
[98,val_98]
[484,val_484]
+--------+
|count(1)|
+--------+
|    6000|
+--------+

+--------+
|count(1)|
+--------+
|    8500|
+--------+

+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default|     soyo|      false|
| default|    soyo1|      false|
|        |   people|       true|
+--------+---------+-----------+

+-----+---+
| name|age|
+-----+---+
|soyo8| 35|
|   小周| 30|
|   小华| 19|
| soyo| 88|
+-----+---+

+-------------------+
|              value|
+-------------------+
|name=soyo8;value=35|
|   name=小周;value=30|
|   name=小华;value=19|
| name=soyo;value=88|
+-------------------+

+---+-------+
|key|  value|
+---+-------+
|  1| soyo_1|
|  2| soyo_2|
|  3| soyo_3|
|  4| soyo_4|
|  5| soyo_5|
|  6| soyo_6|
|  7| soyo_7|
|  8| soyo_8|
|  9| soyo_9|
| 10|soyo_10|
| 11|soyo_11|
| 12|soyo_12|
| 13|soyo_13|
| 14|soyo_14|
| 15|soyo_15|
| 16|soyo_16|
| 17|soyo_17|
| 18|soyo_18|
| 19|soyo_19|
| 20|soyo_20|
+---+-------+
only showing top 20 rows

+--------------------+
|               value|
+--------------------+
|  key:1,value:soyo_1|
|  key:2,value:soyo_2|
|  key:3,value:soyo_3|
|  key:4,value:soyo_4|
|  key:5,value:soyo_5|
|  key:6,value:soyo_6|
|  key:7,value:soyo_7|
|  key:8,value:soyo_8|
|  key:9,value:soyo_9|
|key:10,value:soyo_10|
|key:11,value:soyo_11|
|key:12,value:soyo_12|
|key:13,value:soyo_13|
|key:14,value:soyo_14|
|key:15,value:soyo_15|
|key:16,value:soyo_16|
|key:17,value:soyo_17|
|key:18,value:soyo_18|
|key:19,value:soyo_19|
|key:20,value:soyo_20|
+--------------------+
only showing top 20 rows

原文地址:https://www.cnblogs.com/soyo/p/7656322.html