发票数据分析2

1、将数据导入hive在hive进行数据的处理,对数据进行清洗将括号去掉;

导入表并删除括号;

 

2、创建test1和test2来分别存储只出不进和只进不出的企业;

Test1建表

create table test1(nsr_id String) ROW format delimited fields terminated by ',' STORED AS TEXTFILE ;

在是纳税人表中但是出方的id没有他

insert into test1(nsr_id) select distinct nsr_id from nsrxx where nsr_id not in (select xf_id from zzsfp);

来判断出不出的;

建立test2

create table test2(nsr_id String)  ROW format delimited fields terminated by ',' STORED AS TEXTFILE ;

在是纳税人表中但是入方的id没有他

insert into test2(nsr_id) select distinct nsr_id from nsrxx where nsr_id not in (select gf_id from zzsfp);

判断出不入的:

将两个表整合,统计出只进不出和只出不进

insert into data(nsr_id) select distinct nsr_id from yc3 where nsr_id not in (select nsr_id from yc2);

存放在data将test1和test2进行关联

原文地址:https://www.cnblogs.com/mjhjl/p/14901394.html