spark-sql-04-spark连接hive的几种方式

配置spark

目录:/opt/bigdata/spark-2.3.4-bin-hadoop2.6/conf
[root@ke03 conf]# vi spark-env.sh
配置:export HADOOP_CONF_DIR=/opt/bigdata/hadoop-2.6.5/etc/hadoop

cp  /opt/bigdata/hive-2.3.4/conf/hive-site.xml  ./

<configuration>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://ke03:9083</value>
  </property>
</configuration>

目录:/opt/bigdata/spark-2.3.4-bin-hadoop2.6/bin

启动:

[root@ke03 bin]# ./spark-shell --master yarn

观看: http://ke03:8088/cluster
Spark shell一直在运行,只要spark-shell不退出,一直连接

日志:Spark session available as 'spark'.
启动日志图

测试:
scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default|   test01|      false|
| default|   test02|      false|
| default|   test03|      false|
+--------+---------+-----------+

方式二:

[root@ke03 bin]# ./spark-sql master yarn

spark-sql> show tables;
2021-02-16 04:33:02 INFO  CodeGenerator:54 - Code generated in 367.646292 ms
default    test01    false
default    test02    false
default    test03    false
8080没有任务(无论增删查都没有)

----------------------------------------------------------------        spark使用类似hiveserver2方式  -----------------------------------------------------------------

目录:/opt/bigdata/spark-2.3.4-bin-hadoop2.6/sbin

./start-thriftserver.sh --master yarn

jps:多了一个进程 SparkSubmit (后台运行)
8080启动了:Thrift JDBC/ODBC Server  不退出程序,一直连接

目录:/opt/bigdata/spark-2.3.4-bin-hadoop2.6/bin

./beeline  // 只要能和ke03这台机器通讯,就可以直接连接
!connect jdbc:hive2://ke03:10000    // hive2连接  原因是模仿hive的

验证:spark数据与hive数据同步

0: jdbc:hive2://ke03:10000> show tables;
+-----------+------------+--------------+--+
| database  | tableName  | isTemporary  |
+-----------+------------+--------------+--+
| default   | test01     | false        |
| default   | test02     | false        |
| default   | test03     | false        |
+-----------+------------+--------------+--+
原文地址:https://www.cnblogs.com/bigdata-familyMeals/p/14521453.html