Hive 存储类型 StoreType

file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
TEXTFILE 文本文件
SEQUENCEFILE 序列化文件(compressed) 压缩存储可提升查询效率并节省磁盘空间。

经过Gzip 或 Bzip2压缩后的文本文件可直接以TEXTFILE的格式存储至HIVE表中,查询时会自动检测该压缩文件并在线解压缩。
CREATE TABLE raw (line STRING)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' LINES TERMINATED BY '
';
 
LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;

上表存储为TEXTFILE(默认),但以这种方式进行存储时,hadoop无法将文件进行分区以至于不支持mapreduce的并行计算。

推荐做法是将该表的数据导入到另一个SEQUENCEFILE的表中,其压缩后仍支持并行计算

CREATE TABLE raw (line STRING)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' LINES TERMINATED BY '
';
 
CREATE TABLE raw_sequence (line STRING)
   STORED AS SEQUENCEFILE;
 
LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;
 
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK; -- NONE/RECORD/BLOCK (see below)
INSERT OVERWRITE TABLE raw_sequence SELECT * FROM raw;

 io.seqfile.compression.type 定义如何压缩

 
原文地址:https://www.cnblogs.com/Dhouse/p/5892984.html