3-8 pivot操作

 

数据透视表

In [1]:
import pandas as pd
excelample=pd.DataFrame({'Month':["January","January","January","January",
                                "February", "February","February","February",
                                 "March","March","March","March"],
                        'Category':["Transportation","Grocery","Household","Entertainment",
                                   "Transportation","Grocery","Household","Entertainment",
                                   "Transportation","Grocery","Household","Entertainment"],
                          'Amount':[74.,235.,175.,100.,115.,240.,225.,125.,390.,260.,200.,120.]})
In [2]:
excelample
Out[2]:
 
 MonthCategoryAmount
0 January Transportation 74.0
1 January Grocery 235.0
2 January Household 175.0
3 January Entertainment 100.0
4 February Transportation 115.0
5 February Grocery 240.0
6 February Household 225.0
7 February Entertainment 125.0
8 March Transportation 390.0
9 March Grocery 260.0
10 March Household 200.0
11 March Entertainment 120.0
 

1.统计指标:每个月的各个种类的花费:pivot

In [3]:
example_pivot=excelample.pivot(index='Category',columns='Month',values='Amount')
example_pivot
Out[3]:
 
MonthFebruaryJanuaryMarch
Category   
Entertainment 125.0 100.0 120.0
Grocery 240.0 235.0 260.0
Household 225.0 175.0 200.0
Transportation 115.0 74.0 390.0
In [4]:
example_pivot.sum(axis=1)#计算每个种类的总和
Out[4]:
Category
Entertainment     345.0
Grocery           735.0
Household         600.0
Transportation    579.0
dtype: float64
In [5]:
example_pivot.sum(axis=0)#每个月的总和
Out[5]:
Month
February    705.0
January     584.0
March       970.0
dtype: float64
In [6]:
df=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
df.head()#读取前几行数据
Out[6]:
 
 PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
 

2.通过性别索引,船舱的等级分类,统计不同性别在不同船舱的费用:pivot_table(默认求平均值)

In [8]:
df.pivot_table(index='Sex',columns='Pclass',values='Fare')#默认求平均值
Out[8]:
 
Pclass123
Sex   
female 106.125798 21.970121 16.118810
male 67.226127 19.741782 12.661633
In [9]:
df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='max')#求最大
Out[9]:
 
Pclass123
Sex   
female 512.3292 65.0 69.55
male 512.3292 73.5 69.55
In [12]:
df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='count')#求计数
Out[12]:
 
Pclass123
Sex   
female 94 76 144
male 122 108 347
In [13]:
pd.crosstab(index=df['Sex'],columns=df['Pclass'])#pd.crosstab和df.pivot_table的count是一样的效果
Out[13]:
 
Pclass123
Sex   
female 94 76 144
male 122 108 347
 

3.求不同等级的舱位,不同性别的获救概率

In [14]:
df.pivot_table(index='Pclass',columns='Sex',values='Survived',aggfunc='mean')#求平均值的概率
Out[14]:
 
Sexfemalemale
Pclass  
1 0.968085 0.368852
2 0.921053 0.157407
3 0.500000 0.135447
 

4.新加一列,计算未成年的,不同性别的获救情况概率

In [15]:
df['Underaged']=df['Age']<=18#新加一列
df.pivot_table(index='Underaged',columns='Sex',values='Survived',aggfunc='mean')#求平均值的概率
Out[15]:
 
Sexfemalemale
Underaged  
False 0.760163 0.167984
True 0.676471 0.338028
原文地址:https://www.cnblogs.com/AI-robort/p/11636781.html