Pig sample用法举例

some = sample data 0.1

 
遍历整个数据集,获取指定比例的行数的数据,获取的数据不确定,条数也不准确。
 
内部重写为filter data by random() <= 0.1
 
 
抽取100行数据
data = load 'data';
grpd = group data all;
sums = foreach grpd generate COUNT(data) as c;
some = sample data 100/(double)sums.c;

  

 
原文地址:https://www.cnblogs.com/lishouguang/p/4559607.html