pandas dataframe, pandas series里的索引操作里的坑

Series类实例的检索s[key]

当pd.Series的索引是数值型类型时, 我们不可以通过s1[-1]来检索其最后一行的值

正确的做法是: s1.iloc[-1] 或者 s1[len(s1) - 1] 或者 s1.values[-1]

python语言里的魔术方法之__getitem__使类能够具有索引键功能. 也就是说instance[key]
可以检索到key对应的元素的值. pandas的Series类就是_getitem__方法的集大成者. 它里面隐藏了
很多规则.
这里深挖一下它的源码, 当Series的实例s1的索引是整型数时, 如果用[-1]索引键来检索时会发生什么情况呢?
我们顺藤摸瓜来跑一下程序的脉络:
getitem()里调用了: ._get_value(-1)方法, 该方法调用了: .index.get_loc(-1)方法.
问题就出在这里了: .index._range.index(-1)
'-1' 这个索引键根本就不在s1的索引里. 因为我们的s1的索引是: range(1)
所以程序才会抛出异常: KeyError: -1

当pd.Series的索引是字符型时(比如s2实例), 我们可以用s2[-1]来检索其最后一行的值

结论: series[key]这种检索方法, 功能很强大, 但是使用时要注意其索引的类型, 避免掉到坑里. 或者用.iloc()的方法更加明确一些.

Signature: s1.__getitem__(key)
Source:   
    def __getitem__(self, key):
        key = com.apply_if_callable(key, self)

        if key is Ellipsis:
            return self

        key_is_scalar = is_scalar(key)
        if isinstance(key, (list, tuple)):
            key = unpack_1tuple(key)

        if is_integer(key) and self.index._should_fallback_to_positional():
            return self._values[key]

        elif key_is_scalar:
            return self._get_value(key)

        if is_hashable(key):
            # Otherwise index.get_value will raise InvalidIndexError
            try:
                # For labels that don't resolve as scalars like tuples and frozensets
                result = self._get_value(key)

                return result

            except KeyError:
                if isinstance(key, tuple) and isinstance(self.index, MultiIndex):
                    # We still have the corner case where a tuple is a key
                    # in the first level of our MultiIndex
                    return self._get_values_tuple(key)

        if is_iterator(key):
            key = list(key)

        if com.is_bool_indexer(key):
            key = check_bool_indexer(self.index, key)
            key = np.asarray(key, dtype=bool)
            return self._get_values(key)

        return self._get_with(key)
File:      d:anaconda3libsite-packagespandascoreseries.py
Type:      method



Signature: s1._get_value(label, takeable:bool=False)
Source:   
    def _get_value(self, label, takeable: bool = False):
        """
        Quickly retrieve single value at passed index label.

        Parameters
        ----------
        label : object
        takeable : interpret the index as indexers, default False

        Returns
        -------
        scalar value
        """
        if takeable:
            return self._values[label]

        # Similar to Index.get_value, but we do not fall back to positional
        loc = self.index.get_loc(label)
        return self.index._get_values_for_loc(self, loc, label)
File:      d:anaconda3libsite-packagespandascoreseries.py
Type:      method



s1.index.get_loc??
Signature: s1.index.get_loc(key, method=None, tolerance=None)
Source:   
    @doc(Int64Index.get_loc)
    def get_loc(self, key, method=None, tolerance=None):
        if method is None and tolerance is None:
            if is_integer(key) or (is_float(key) and key.is_integer()):
                new_key = int(key)
                try:
                    return self._range.index(new_key)
                except ValueError as err:
                    raise KeyError(key) from err
            raise KeyError(key)
        return super().get_loc(key, method=method, tolerance=tolerance)
File:      d:anaconda3libsite-packagespandascoreindexes
ange.py
Type:      method



s1=pd.Series([111,222], range(2))
s2=pd.Series([111,222], list('ab'))


s1
Out[266]: 
0    111
1    222
dtype: int64

s2
Out[267]: 
a    111
b    222
dtype: int64

s2[-1]
Out[268]: 222
s1[-1]

Traceback (most recent call last):

  File "<ipython-input-269-0123e3764900>", line 1, in <module>
    s1[-1]

  File "D:Anaconda3libsite-packagespandascoreseries.py", line 882, in __getitem__
    return self._get_value(key)

  File "D:Anaconda3libsite-packagespandascoreseries.py", line 989, in _get_value
    loc = self.index.get_loc(label)

  File "D:Anaconda3libsite-packagespandascoreindexes
ange.py", line 357, in get_loc
    raise KeyError(key) from err

KeyError: -1

pd.DataFrame类实例的检索df[key]

df是一个2D的数据结构, 它有两个可以检索的键: 或者是列名的组合或者是行名的组合(sliceable对象).
它的检索规则更加隐藏和复杂. 总之: 提供了一种在行轴或者列轴上的切片操作.

原文地址:https://www.cnblogs.com/duan-qs/p/13906059.html