3-8 pivot操作

数据透视表¶

In [1]:

import pandas as pd
excelample=pd.DataFrame({'Month':["January","January","January","January",
                                "February", "February","February","February",
                                 "March","March","March","March"],
                        'Category':["Transportation","Grocery","Household","Entertainment",
                                   "Transportation","Grocery","Household","Entertainment",
                                   "Transportation","Grocery","Household","Entertainment"],
                          'Amount':[74.,235.,175.,100.,115.,240.,225.,125.,390.,260.,200.,120.]})

In [2]:

excelample

Out[2]:

	Month	Category	Amount
0	January	Transportation	74.0
1	January	Grocery	235.0
2	January	Household	175.0
3	January	Entertainment	100.0
4	February	Transportation	115.0
5	February	Grocery	240.0
6	February	Household	225.0
7	February	Entertainment	125.0
8	March	Transportation	390.0
9	March	Grocery	260.0
10	March	Household	200.0
11	March	Entertainment	120.0

1.统计指标:每个月的各个种类的花费：pivot

In [3]:

example_pivot=excelample.pivot(index='Category',columns='Month',values='Amount')
example_pivot

Out[3]:

Month	February	January	March
Category
Entertainment	125.0	100.0	120.0
Grocery	240.0	235.0	260.0
Household	225.0	175.0	200.0
Transportation	115.0	74.0	390.0

In [4]:

example_pivot.sum(axis=1)#计算每个种类的总和

Out[4]:

Category
Entertainment     345.0
Grocery           735.0
Household         600.0
Transportation    579.0
dtype: float64

In [5]:

example_pivot.sum(axis=0)#每个月的总和

Out[5]:

Month
February    705.0
January     584.0
March       970.0
dtype: float64

In [6]:

df=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
df.head()#读取前几行数据

Out[6]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

2.通过性别索引，船舱的等级分类，统计不同性别在不同船舱的费用：pivot_table（默认求平均值）

In [8]:

df.pivot_table(index='Sex',columns='Pclass',values='Fare')#默认求平均值

Out[8]:

Pclass	1	2	3
Sex
female	106.125798	21.970121	16.118810
male	67.226127	19.741782	12.661633

In [9]:

df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='max')#求最大

Out[9]:

Pclass	1	2	3
Sex
female	512.3292	65.0	69.55
male	512.3292	73.5	69.55

In [12]:

df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='count')#求计数

Out[12]:

Pclass	1	2	3
Sex
female	94	76	144
male	122	108	347

In [13]:

pd.crosstab(index=df['Sex'],columns=df['Pclass'])#pd.crosstab和df.pivot_table的count是一样的效果

Out[13]:

Pclass	1	2	3
Sex
female	94	76	144
male	122	108	347

3.求不同等级的舱位，不同性别的获救概率

In [14]:

df.pivot_table(index='Pclass',columns='Sex',values='Survived',aggfunc='mean')#求平均值的概率

Out[14]:

Sex	female	male
Pclass
1	0.968085	0.368852
2	0.921053	0.157407
3	0.500000	0.135447

4.新加一列，计算未成年的，不同性别的获救情况概率

In [15]:

df['Underaged']=df['Age']<=18#新加一列
df.pivot_table(index='Underaged',columns='Sex',values='Survived',aggfunc='mean')#求平均值的概率

Out[15]:

Sex	female	male
Underaged
False	0.760163	0.167984
True	0.676471	0.338028