pig相关

1. 重命名pig job name:

在Pig脚本中的一开始处,写上这一句:

set job.name 'This is my job';

2. 设置pig参数:

允许pig时,输入如下:

pig -p JOBNAME="MyJob" test.pig
************test.pig**********
set job.name '$JOBNAME';
......

3. pig分隔符定义:

pig默认分隔符是/t,可以通过如下命令 using PigStorage(',')自定义分隔符:

prices = load 'NYSE_daily' using PigStorage(',') as (exchange, symbol, date, open,high, low, close, volume, adj_close);

4. pig定义reduce个数:

Parallel

设置pig的reduce进程个数

--parallel.pig
daily   = load 'NYSE_daily' as (exchange, symbol, date, open, high, low, close,
            volume, adj_close);
bysymbl = group daily by symbol parallel 10;

parallel只针对一条语句,如果希望脚本中的所有语句都有10个reduce进程,可以使用 set default_parallel 10命令

--defaultparallel.pig
set default_parallel 10;
daily   = load 'NYSE_daily' as (exchange, symbol, date, open, high, low, close,
            volume, adj_close);
bysymbl = group daily by symbol;
average = foreach bysymbl generate group, AVG(daily.close) as avg;
sorted  = order average by avg desc;

其他可以参考:

http://www.cnblogs.com/siwei1988/archive/2012/08/06/2624912.html

原文地址:https://www.cnblogs.com/dorothychai/p/4606406.html