naive cube implementation in python

这篇论文中提到的naive cube算法的实现，python写出来真的就和伪代码差不多=。=

输入大约长这样，依次是

index  userid  country  state  city  topic  category  product  sales

1    400141    3    78    3427    3    59    4967    4670.08
2    783984    1    34    9    1    5    982    5340.9
3    4945    1    47    1658    1    7    363    3065.37
4    468352    2    57    2410    2    37    3688    9561.13
5    553471    1    25    550    1    13    1476    3596.72
6    649149    1    9    234    1    12    1456    2126.29
...

输出的格式是这样，对于各个attr（用位置而不是名字表示）的各种value的搭配，输出对应group的measure的结果

<attr><attr><attr>...|<value><value>...    <measure>

mapper：

#!/usr/bin/env python
import sys
from itertools import product


def seq(start, end):
    return [range(start, i) for i in range(start, end + 2)]


def read_input(file):
    for line in file:
        yield line.split()


def main():
    data = read_input(sys.stdin)
    C = [a + b for a, b in product(seq(2, 4), seq(5, 7))]
    for e in data:
        for R in C:
            k = [e[i] for i in R]
            print "%s|%s	%s" % (' '.join([str(i) for i in R]), ' '.join(k), e[1])

if __name__ == "__main__":
    main()

reducer：

#!/usr/bin/env python

from itertools import groupby
from operator import itemgetter
import sys


def read_input(file):
    for line in file:
        yield line.rstrip().split('	')


def main():
    data = read_input(sys.stdin)
    for key, group in groupby(data, itemgetter(0)):
        ids = set(uid for key, uid in group)
        print "%s	%d" % (key, len(ids))

if __name__ == "__main__":
    main()

课程设计选python就可以玩各种缩短代码的奇技淫巧了好嗨森……