【Hive】Beeline CLI介绍

Beeline，它其实是HiveServer2的JDBC客户端，基于SQLLine命令行接口。Beeline Shell可以工作在嵌入式模式和远程模式，在嵌入式模式中，它运行一个嵌入式的Hive（类似于Hive CLI），在远程模式中，通过Thrift连接到一个单独的HiveServer2进程，从Hive 0.14开始，当Beeline和HiveServer2一起使用时，它会从HiveServer2打印执行查询的日志信息到STDERR。建议在生产环境使用远程HiveServer2模式，因为这样更安全，不需要为用户授予直接的HDFS/Metastore访问权限。

一 Hive环境

hive> select version();
OK
2.3.3 r8a511e3f79b43d4be41cd231cf5c99e43b248383
Time taken: 11.166 seconds, Fetched: 1 row(s)

二运行Beeline

Hive的运行依赖于Hadoop，所以首先启动Hadoop，而Beeline的运行必须首先启动HiveServer2，下面将分别介绍：

1 启动HDFS

[hadoop@strong ~]$ start-all.sh
注：该脚本将会被弃用，建议使用start-dfs.sh and start-yarn.sh启动HADOOP。

2 启动HiveServer2

1）方法一

[hadoop@strong ~]$ hiveserver2

2）方法二

[hadoop@strong ~]$ hive --service hiveserver2

3 启动Beeline

1）方法一

[hadoop@strong ~]$ beeline

2）方法二

[hadoop@strong ~]$ hive --service beeline
beeline> !connect jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Enter username for jdbc:hive2://localhost:10000/default: hadoop
Enter password for jdbc:hive2://localhost:10000/default: ******
Connected to: Apache Hive (version 2.3.3)
Driver: Hive JDBC (version 2.3.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/default>

注：进行连接出现如下错误时，需要对/usr/local/hadoop/etc/hadoop/core-site.xml进行配置

apache.hadoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate root (state=08S01,code=0)

增加配置内容为：

<name>hadoop.proxyuser.hadoop.hosts</name>

</property>

<name>hadoop.proxyuser.hadoop.groups</name>

</property>

三 Beeline介绍

1 Beeline示例

[hadoop@strong ~]$ beeline 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.6/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.3 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/hive
Connecting to jdbc:hive2://localhost:10000/hive
Enter username for jdbc:hive2://localhost:10000/hive: hadoop
Enter password for jdbc:hive2://localhost:10000/hive: ******
Connected to: Apache Hive (version 2.3.3)
Driver: Hive JDBC (version 2.3.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/hive> show tables;
+-----------+
| tab_name  |
+-----------+
| city      |
| emp       |
| t_emp     |
| test      |
+-----------+
4 rows selected (0.432 seconds)
0: jdbc:hive2://localhost:10000/hive>

注：也可以通过如下方式进行连接和访问。

[hadoop@strong ~]$ beeline -u jdbc:hive2://localhost:10000/hive -n hadoop -p hadoop

2 Beeline命令

通过！<SQLLine 命令>的方式执行，本篇将演示部分SQLLine命令。

1）查看帮助信息

0: jdbc:hive2://localhost:10000/hive> !help
!addlocaldriverjar  Add driver jar file in the beeline client side.
!addlocaldrivername Add driver name that needs to be supported in the beeline
                    client side.
!all                Execute the specified SQL against all the current connections
!autocommit         Set autocommit mode on or off
!batch              Start or execute a batch of statements
!brief              Set verbose mode off
!call               Execute a callable statement
!close              Close the current connection to the database
!closeall           Close all current open connections
!columns            List all the columns for the specified table
!commit             Commit the current transaction (if autocommit is off)
!connect            Open a new connection to the database.
!dbinfo             Give metadata information about the database
!describe           Describe a table
!dropall            Drop all tables in the current database
!exportedkeys       List all the exported keys for the specified table
!go                 Select the current connection
!help               Print a summary of command usage
!history            Display the command history
!importedkeys       List all the imported keys for the specified table
!indexes            List all the indexes for the specified table
!isolation          Set the transaction isolation for this connection
!list               List the current connections
!manual             Display the BeeLine manual
!metadata           Obtain metadata information
!nativesql          Show the native SQL for the specified statement
!nullemptystring    Set to true to get historic behavior of printing null as
                    empty string. Default is false.
!outputformat       Set the output format for displaying results
                    (table,vertical,csv2,dsv,tsv2,xmlattrs,xmlelements, and
                    deprecated formats(csv, tsv))
!primarykeys        List all the primary keys for the specified table
!procedures         List all the procedures
!properties         Connect to the database specified in the properties file(s)
!quit               Exits the program
!reconnect          Reconnect to the database
!record             Record all output to the specified file
!rehash             Fetch table and column names for command completion
!rollback           Roll back the current transaction (if autocommit is off)
!run                Run a script from the specified file
!save               Save the current variabes and aliases
!scan               Scan for installed JDBC drivers
!script             Start saving a script to a file
!set                Set a beeline variable
!sh                 Execute a shell command
!sql                Execute a SQL command
!tables             List all the tables in the database
!typeinfo           Display the type map for the current connection
!verbose            Set verbose mode on

2）列出当前的连接信息

0: jdbc:hive2://localhost:10000/hive> !list
1 active connection:
 #0  open     jdbc:hive2://localhost:10000/hive

3）执行清屏命令

0: jdbc:hive2://localhost:10000/hive> !sh clear

0: jdbc:hive2://localhost:10000/hive>

4）格式化输出

0: jdbc:hive2://localhost:10000/hive> !outputformat vertical
0: jdbc:hive2://localhost:10000/hive> show tables;
tab_name  city

tab_name  emp

tab_name  t_emp

tab_name  test

4 rows selected (0.251 seconds)
0: jdbc:hive2://localhost:10000/hive> !outputformat table
0: jdbc:hive2://localhost:10000/hive> show tables;
+-----------+
| tab_name  |
+-----------+
| city      |
| emp       |
| t_emp     |
| test      |
+-----------+
4 rows selected (0.205 seconds)

3 Beeline Hive命令

当使用Hive JDBC驱动时，可以在Beeline运行Hive特定的命令（和Hive CLI命令一样）。

具体参考：Hive CLI初探

4 Beeline命令选项

[hadoop@strong ~]$ beeline --help
Usage: java org.apache.hive.cli.beeline.BeeLine 
   -u <database url>               the JDBC URL to connect to
   -r                              reconnect to last saved connect url (in conjunction with !save)
   -n <username>                   the username to connect as
   -p <password>                   the password to connect as
   -d <driver class>               the driver class to use
   -i <init file>                  script file for initialization
   -e <query>                      query that should be executed
   -f <exec file>                  script file that should be executed
   -w (or) --password-file <password file>  the password file to read password from
   --hiveconf property=value       Use value for given property
   --hivevar name=value            hive variable name and value
                                   This is Hive specific settings in which variables
                                   can be set at session level and referenced in Hive
                                   commands or queries.
   --property-file=<property-file> the file to read connection properties (url, driver, user, password) from
   --color=[true/false]            control whether color is used for display
   --showHeader=[true/false]       show column names in query results
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showDbInPrompt=[true/false]   display the current database name in the prompt
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display
                                   Note that csv, and tsv are deprecated - use csv2, tsv2 instead
   --incremental=[true/false]      Defaults to false. When set to false, the entire result set
                                   is fetched and buffered before being displayed, yielding optimal
                                   display column sizing. When set to true, result rows are displayed
                                   immediately as they are fetched, yielding lower latency and
                                   memory usage at the price of extra display column padding.
                                   Setting --incremental=true is recommended if you encounter an OutOfMemory
                                   on the client side (due to the fetched result set size being large).
                                   Only applicable if --outputformat=table.
   --incrementalBufferRows=NUMROWS the number of rows to buffer when printing rows on stdout,
                                   defaults to 1000; only applicable if --incremental=true
                                   and --outputformat=table
   --truncateTable=[true/false]    truncate table column when it exceeds length
   --delimiterForDSV=DELIMITER     specify the delimiter for delimiter-separated values output format (default: |)
   --isolation=LEVEL               set the transaction isolation level
   --nullemptystring=[true/false]  set to true to get historic behavior of printing null as empty string
   --maxHistoryRows=MAXHISTORYROWS The maximum number of rows to store beeline history.
   --help                          display this message
 
   Example:
    1. Connect using simple authentication to HiveServer2 on localhost:10000
    $ beeline -u jdbc:hive2://localhost:10000 username password

    2. Connect using simple authentication to HiveServer2 on hs.local:10000 using -n for username and -p for password
    $ beeline -n username -p password -u jdbc:hive2://hs2.local:10012

    3. Connect using Kerberos authentication with hive/localhost@mydomain.com as HiveServer2 principal
    $ beeline -u "jdbc:hive2://hs2.local:10013/default;principal=hive/localhost@mydomain.com"

    4. Connect using SSL connection to HiveServer2 on localhost at 10000
    $ beeline "jdbc:hive2://localhost:10000/default;ssl=true;sslTrustStore=/usr/local/truststore;trustStorePassword=mytruststorepassword"

    5. Connect using LDAP authentication
    $ beeline -u jdbc:hive2://hs2.local:10013/default <ldap-username> <ldap-password>