3-3 groupby操作

 Pandas章节应用的数据可以在以下链接下载:  https://files.cnblogs.com/files/AI-robort/Titanic_Data-master.zip

In [1]:
import pandas as pd
df=pd.DataFrame({'key':['A','B','C','A','B','C','A','B','C'],
                 'data':[0,5,10,5,10,15,10,15,20]})
df
Out[1]:
 
 keydata
0 A 0
1 B 5
2 C 10
3 A 5
4 B 10
5 C 15
6 A 10
7 B 15
8 C 20
In [3]:
for key in['A','B','C']:
     print(key,df[df['key']==key].sum())#求每个key值的求和
 
A key     AAA
data     15
dtype: object
B key     BBB
data     30
dtype: object
C key     CCC
data     45
dtype: object
In [4]:
df.groupby('key').sum()#和上面的分组是一样的
Out[4]:
 
 data
key 
A 15
B 30
C 45
In [7]:
import numpy as np
df.groupby('key').aggregate(np.mean)#aggregate是执行操作,如np的sum 、mean等
Out[7]:
 
 data
key 
A 5
B 10
C 15
In [8]:
df1=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
In [13]:
df1.groupby('Sex')['Age'].mean()#统计性别对应的年龄的均值
Out[13]:
Sex
female    27.915709
male      30.726645
Name: Age, dtype: float64
In [14]:
df1.groupby('Sex')['Survived'].mean()#统计性别对应的获救的平均概率
Out[14]:
Sex
female    0.742038
male      0.188908
Name: Survived, dtype: float64
原文地址:https://www.cnblogs.com/AI-robort/p/11636749.html