pig trial-group,foreach

A = load '/user/cloudera/lab/mydata' using PigStorage() as (a,b,c);

如果写成 A=load 就会出现  [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "A=load "" at line 1, column 1.

​(1,2,3)

(4,2,1)

(8,3,4)

(4,3,3)

(7,2,5)

(8,4,3)

B = group A by a;​

(1,{(1,2,3)})

(4,{(4,3,3),(4,2,1)})

(7,{(7,2,5)})

(8,{(8,4,3),(8,3,4)})

C = foreach B { D = distinct A.b; generate flatten(group), COUNT(D); };

把"("写成中文"( " 会报错  Unexpected character '.

B的第一个字段有固定的名字,叫group,因为它是由group操作生成的。

上面语句中D = distinct A.b;       A 指 B的第二个字段,保留生成B的时候 relation的名字,这里是以下值 

(1,2,3)

(4,3,3), (4,2,1)

(7,2,5)

(8,4,4), (8,3,4)

所以 D 每次是

2

3,2

2

4,3

>> generate flatten(group), COUNT(D);

(1,1)

(4,2)

(7,1)

(8,2)

=========================

GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples

The first field is named "group" (do not confuse this with the GROUP operator) and is the same type as the group key.  

The second field takes the name of the original relation and is type bag.

# so "group" is the key name, and "A or B the original alias" is the nested set name

原文地址:https://www.cnblogs.com/bob-dong/p/14248211.html