device public set

backgroud:  our dvertiser provide on device list of idfa to show ad to  target audience,however none of the  ad shows ,so we want to know how many  public device id  in our traffic request。

to find the public  deviceid,we need to get all device id(idfa/google adid) in one day .

method1: use map reduce on azkaban ,however it failed . 

method2: use  hive tables;  insert  the deviceidlist to one table and  join  deviceids . 

method3: select  all distinct deviceids from request log   and output as a file , about 0.2 billion deviceid list  and file size 6G.

then use shell command just as this : 

grep -F -f a.txt  b.txt    >  public_ids.txt 

then ,we get the public deviceids . 

refer:http://blog.csdn.net/autofei/article/details/6579320

原文地址:https://www.cnblogs.com/lavin/p/6912244.html