pd.Series()函数解析(最清晰的解释)

欢迎关注WX公众号:【程序员管小亮】

1. Series介绍

Pandas模块的数据结构主要有两:1、Series ;2、DataFrame

series是一个一维数组,是基于NumPy的ndarray结构。Pandas会默然用0到n-1来作为series的index,但也可以自己指定index(可以把index理解为dict里面的key)。

2. Series创建

  1. pd.Series([list],index=[list])

参数为list;index为可选参数,若不填写则默认index从0开始;若填写则index长度应该与value长度相等。

import pandas as pd

s=pd.Series([1,2,3,4,5],index=['a','b','c','f','e'])
print s
  1. pd.Series({dict})

以一字典结构为参数。

import pandas as pd

s=pd.Series({'a':1,'b':2,'c':3,'f':4,'e':5})
print s

3. Series取值

s[index] or s[[index的list]]

取值操作类似数组,当取不连续的多个值时可以以list为参数

import pandas as pd
import numpy as np

v = np.random.random_sample(50)
s = pd.Series(v)
s1 = s[[3, 13, 23, 33]]
s2 = s[3:13]
s3 = s[43]
print("s1", s1)
print("s2", s2)
print("s3", s3)
s1 3     0.064095
13    0.354023
23    0.225739
33    0.959288
dtype: float64

s2 3     0.064095
4     0.405651
5     0.024181
6     0.367606
7     0.844005
8     0.405313
9     0.102824
10    0.806400
11    0.950502
12    0.735310
dtype: float64

s3 0.42803253918

4. Series取头和尾的值

.head(n).tail(n)

取出头n行或尾n行,n为可选参数,若不填默认5

import pandas as pd
import numpy as np

v = np.random.random_sample(50)
s = pd.Series(v)
print("s.head()", s.head())
print("s.head(3)", s.head(3))
print("s.tail()", s.tail())
print("s.head(3)", s.head(3))
s.head() 0    0.714136
1    0.333600
2    0.683784
3    0.044002
4    0.147745
dtype: float64
s.head(3) 0    0.714136
1    0.333600
2    0.683784
dtype: float64
s.tail() 45    0.779509
46    0.778341
47    0.331999
48    0.444811
49    0.028520
dtype: float64
s.head(3) 0    0.714136
1    0.333600
2    0.683784
dtype: float64

5. Series常用操作

import pandas as pd
import numpy as np

v = [10, 3, 2, 2, np.nan]
v = pd.Series(v)
print("len():", len(v))  # Series长度,包括NaN
print("shape():", np.shape(v))  # 矩阵形状,(,)
print("count():", v.count())  # Series长度,不包括NaN
print("unique():", v.unique())  # 出现不重复values值
print("value_counts():
", v.value_counts())  # 统计value值出现次数
len(): 5
shape(): (5,)
count(): 4
unique(): [ 10.   3.   2.  nan]
value_counts():
2.0     2
3.0     1
10.0    1
dtype: int64

6. Series加法

import pandas as pd
import numpy as np

v = [10, 3, 2, 2, np.nan]
v = pd.Series(v)
sum = v[1:3] + v[1:3]
sum1 = v[1:4] + v[1:4]
sum2 = v[1:3] + v[1:4]
sum3 = v[:3] + v[1:]
print("sum", sum)
print("sum1", sum1)
print("sum2", sum2)
print("sum3", sum3)
sum 1    6.0
2    4.0
dtype: float64

sum1 1    6.0
2    4.0
3    4.0
dtype: float64

sum2 1    6.0
2    4.0
3    NaN
dtype: float64

sum3 0    NaN
1    6.0
2    4.0
3    NaN
4    NaN
dtype: float64

7. Series查找

  1. 范围查找
import pandas as pd
import numpy as np
 
s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
sa = pd.Series(s, name="age")
print(sa[sa>19])
jim    22.0
lj     24.0
ton    20.0
Name: age, dtype: float64
  1. 中位数
import pandas as pd
import numpy as np

s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
sa = pd.Series(s, name="age")
print("sa.median()", sa.median())
sa.median() 20.0

8. Series赋值

import pandas as pd
import numpy as np
 
s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": None}
sa = pd.Series(s, name="age")
print(s)
print('----------------')
sa['ton'] = 99
print(sa)
{'ton': 20, 'mary': 18, 'jack': 19, 'jim': 22, 'lj': 24, 'car': None}
----------------
car      NaN
jack    19.0
jim     22.0
lj      24.0
mary    18.0
ton     99.0
Name: age, dtype: float64

python课程推荐。
在这里插入图片描述

参考文章

原文地址:https://www.cnblogs.com/hzcya1995/p/13302751.html