pandas.DataFrame.stack抄书笔记

首先学习stack

来源链接:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html#pandas.DataFrame.stack

pandas.DataFrame.stack

DataFrame.stack(level=1dropna=True)[source]

Stack the prescribed level(s) from columns to index.

Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:

  • if the columns have a single level, the output is a Series;

  • if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.

Parameters
levelint, str, list, default -1

Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.

dropnabool, default True

Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.

Returns
DataFrame or Series

Stacked dataframe or series.

简单理解就是从列中拿取一列来当行的索引,如果列是单一的,那返回的就是Series对象,如果是多层的,那返回的还是DataFrame对象。

Notes

The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).

Examples

Single level columns

df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
                                    index=['cat', 'dog'],
                                    columns=['weight', 'height'])

  Stacking a dataframe with a single level column axis returns a Series:

In [27]: df_single_level_cols                                                                                                            
Out[27]: 
     weight  height
cat       0       1
dog       2       3

In [28]: r = df_single_level_cols.stack()                                                                                                

In [29]: r                                                                                                                               
Out[29]: 
cat  weight    0
     height    1
dog  weight    2
     height    3
dtype: int64

In [30]: r.index                                                                                                                         
Out[30]: 
MultiIndex([('cat', 'weight'),
            ('cat', 'height'),
            ('dog', 'weight'),
            ('dog', 'height')],
           )

In [31]:   

  从输出可以看出来返回的是将列索引转移到行索引上面,行索引变成了多层索引。

Multi level columns: simple case

multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
                                       ('weight', 'pounds')])
df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
                                    index=['cat', 'dog'],
                                    columns=multicol1)

  输出

In [38]: df_multi_level_cols1                                                                                                            
Out[38]: 
    weight       
        kg pounds
cat      1      2
dog      2      4

In [39]: df_multi_level_cols1.stack()                                                                                                    
Out[39]: 
            weight
cat kg           1
    pounds       2
dog kg           2
    pounds       4

  从输出看出,stack抽走了最下面的一层column的index去当行标签了。

Missing values

multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
                                       ('height', 'm')])
df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
                                    index=['cat', 'dog'],
                                    columns=multicol2)

  It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:

In [41]: df_multi_level_cols2                                                                                                            
Out[41]: 
    weight height
        kg      m
cat    1.0    2.0
dog    3.0    4.0

In [42]: df_multi_level_cols2.stack()                                                                                                    
Out[42]: 
        height  weight
cat kg     NaN     1.0
    m      2.0     NaN
dog kg     NaN     3.0
    m      4.0     NaN

  从最下面抽了一层给行便签组合成联合索引,很多空的数据默认用了NaN

Prescribing the level(s) to be stacked

The first parameter controls which level or levels are stacked:

In [48]: df_multi_level_cols2.stack(level=0)                                                                                             
Out[48]: 
             kg    m
cat height  NaN  2.0
    weight  1.0  NaN
dog height  NaN  4.0
    weight  3.0  NaN

In [49]: df_multi_level_cols2                                                                                                            
Out[49]: 
    weight height
        kg      m
cat    1.0    2.0
dog    3.0    4.0

In [50]: df_multi_level_cols2.stack(level=[0,1])                                                                                         
Out[50]: 
cat  height  m     2.0
     weight  kg    1.0
dog  height  m     4.0
     weight  kg    3.0
dtype: float64

In [51]: df_multi_level_cols2.stack(level=[1,0])                                                                                         
Out[51]: 
cat  kg  weight    1.0
     m   height    2.0
dog  kg  weight    3.0
     m   height    4.0
dtype: float64

  你也可以指定需要抽的行索引,也可以把所有的行索引抽出来。

Dropping missing values

In [52]: df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]], 
    ...:                                     index=['cat', 'dog'], 
    ...:                                     columns=multicol2)                                                                          

In [53]: df_multi_level_cols3                                                                                                            
Out[53]: 
    weight height
        kg      m
cat    NaN    1.0
dog    2.0    3.0

  Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter:

当一行数据都为NaN的时候,可以通过dropna的选择来控制是否删除

In [54]: df_multi_level_cols3.stack()                                                                                                    
Out[54]: 
        height  weight
cat m      1.0     NaN
dog kg     NaN     2.0
    m      3.0     NaN

In [55]: df_multi_level_cols3.stack(dropna=False)                                                                                        
Out[55]: 
        height  weight
cat kg     NaN     NaN
    m      1.0     NaN
dog kg     NaN     2.0
    m      3.0     NaN

  默认为True,表示行数据为空的时候,不显示。

原文地址:https://www.cnblogs.com/sidianok/p/14475624.html