Pandas透视表和交叉表

透视表

参数名 说明
values 待聚合的列的名称。默认聚合所有数值列
index 用于分组的列名或其他分组键,出现在结果透视表的行
columns 用于分组的列表或其他分组键,出现在结果透视表的列
aggfunc 聚合函数或函数列表,默认为'mean',可以是任何对groupby有效的函数
fill_value 用于替换结果表中的缺失值
margins 添加行/列小计和总计,默认为False
# pivot_table默认聚合分组平均数
tips = pd.read_csv('C:/Users/1/Desktop/tips.csv')
tips.pivot_table(index=['sex','smoker'])

                    size	  tip	    total_bill
sex	    smoker			
Female	    No	2.592593	2.773519	18.105185
           Yes	2.242424	2.931515	17.977879
Male	    No	2.711340	3.113402	19.791237
           Yes	2.500000	3.051167	22.284500

# 聚合小费的比例和人数的多少,根据天数进行分组
tips['tip_pct'] = tips.tip / tips.total_bill
tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker')

                size	                tip_pct
smoker	          No	   Yes	        No	      Yes
sex	    day				
Female	Fri	  2.500000	2.000000	0.165296	0.209129
        Sat	  2.307692	2.200000	0.147993	0.163817
        Sun	  3.071429	2.500000	0.165710	0.237075
        Thur  2.480000	2.428571	0.155971	0.163073
Male	Fri	  2.000000	2.125000	0.138005	0.144730
        Sat	  2.656250	2.629630	0.162132	0.139067
        Sun	  2.883721	2.600000	0.158291	0.173964
        Thur  2.500000	2.300000	0.165706	0.164417

# margins=True,会开启一列ALL的行和列,其值对应于单个等级中所有数据的分组统计。
tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker',margins=True)

                size	                                  tip_pct
    smoker	  No	       Yes	      All	       No	      Yes	       All
sex	    day						
Female	Fri	2.500000	2.000000	2.111111	0.165296	0.209129	0.199388
        Sat	2.307692	2.200000	2.250000	0.147993	0.163817	0.156470
        Sun	3.071429	2.500000	2.944444	0.165710	0.237075	0.181569
        Thur2.480000	2.428571	2.468750	0.155971	0.163073	0.157525
Male	Fri	2.000000	2.125000	2.100000	0.138005	0.144730	0.143385
        Sat	2.656250	2.629630	2.644068	0.162132	0.139067	0.151577
        Sun	2.883721	2.600000	2.810345	0.158291	0.173964	0.162344
        Thur2.500000	2.300000	2.433333	0.165706	0.164417	0.165276
All		    2.668874	2.408602	2.569672	0.159328	0.163196	0.160803



# aggfunc可以直接接受聚合函数,len可以统计分组大小的交叉表
# 根据性别,是否吸烟对每天的人数进行统计
tips.pivot_table('tip_pct',index=['sex','smoker'],columns='day',aggfunc=len,margins=True)

day	        Fri	     Sat	Sun	    Thur	All
sex	  smoker					
Female	No	 2.0	13.0	14.0	25.0	54.0
        Yes	 7.0	15.0	4.0	    7.0	    33.0
Male	No	 2.0	32.0	43.0	20.0	97.0
        Yes	 8.0	27.0	15.0	10.0	60.0
All		     19.0	87.0	76.0	62.0	244.0

# 如果存在空的组合,可以设置空值fill_value
tips.pivot_table('size',index=['time','sex','smoker'],columns='day',aggfunc='sum',fill_value=0)

         day	     Fri	Sat	 Sun	Thur
time	sex	   smoker				
Dinner	Female	No	  2	     30	 43	     2
               Yes	  8	     33	 10	     0
Male	        No	  4	     85	 124	 0
               Yes	  12	 71	 39	     0
Lunch	Female	No	  3	      0	  0	    60
                Yes	  6	      0	  0	    17
        Male	No	  0	      0	  0	    50
                Yes	  5	      0	  0	    23

交叉表: crosstab

是一种用于计算分组频率的特殊透视表
data = [[1,'Female','Right-handed'],
        [2,'Male','Left-handed'],
        [3,'Female','Right-handed'],
        [4,'Male','Right-handed'],
        [5,'Male','Left-handed'],
        [6,'Male','Right-handed'],
        [7,'Female','Right-handed'],
        [8,'Female','Left-handed'],
        [9,'Male','Right-handed'],
        [10,'Female','Right-handed']]
data = pd.DataFrame(data,columns=['Sample','Gender','Handedness'])

# 前两个参数可以是数组、Series或数组列表
# 根据性别和用手习惯对这段数据进行统计汇总
pd.crosstab(data.Gender,data.Handedness,margins=True)

Handedness	Left-handed	Right-handed	All
 Gender			
 Female	          1	          4	         5
   Male	          2	          3	         5
   All	          3	          7	        10

# 这是传统的透视表,比较复杂点比交叉表
pd.pivot_table(data,index=['Gender'],columns=['Handedness'],aggfunc=len,margins=True)

             Sample
Handedness	Left-handed	Right-handed	All
Gender			
Female	         1	         4	         5
Male	         2	         3 	         5
All	             3	         7	         10

# 根据时间,星期几进行了对每天的是否吸烟人数分组统计
pd.crosstab([tips.time,tips.day],tips.smoker,margins=True)

      smoker	No	  Yes	All
time	day			
Dinner	Fri	    3	   9	12
        Sat	    45	  42	87
        Sun	    57	  19	76
        Thur	1	   0	1
Lunch	Fri	    1	   6	7
        Thur	44	  17	61
        All		151	  93	244
原文地址:https://www.cnblogs.com/lishi-jie/p/10109377.html