Python笔记 #18# Pandas: Grouping

引

By “group by” we are referring to a process involving one or more of the following steps

Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure
See the Grouping section

代码

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
                    'B': ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
                     'C': np.random.randn(8), 'D': np.random.randn(8)})
print(df)
print(df.groupby('A').sum()) # 计算 foo bar 各自对应 C D 列的和（B列无法求和）

print(df.groupby(['A','B']).sum()) # 同理，不过这里有个一对多的关系

#      A      B         C         D
# 0  foo    one  0.102071 -0.301926
# 1  bar    one  1.161158  0.847451
# 2  foo    two -0.023879  0.936338
# 3  bar  three -0.353075 -0.834349
# 4  foo    two -0.272542 -1.425635
# 5  bar    two -1.016016 -0.031614
# 6  foo    one -0.428517  0.892747
# 7  foo  three -0.843796  0.614443
# /
#             C         D
# A                      
# bar -0.207932 -0.018512
# foo -1.466663  0.715967
#                   C         D
# /
# A   B                        
# bar one    1.161158  0.847451
#     three -0.353075 -0.834349
#     two   -1.016016 -0.031614
# foo one   -0.326445  0.590821
#     three -0.843796  0.614443
#     two   -0.296421 -0.489296