pandas时间序列频率处理

生成日期范围
pd.data_range()

In [15]: rng = pd.date_range('2000-01-01', '2000-06-30', freq='BM')

In [16]: rng
Out[16]:
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
'2000-05-31', '2000-06-30'],
dtype='datetime64[ns]', freq='BM')

In [17]: Series(np.random.randn(6),index=rng)
Out[17]:
2000-01-31 0.586341
2000-02-29 -0.439679
2000-03-31 0.853946
2000-04-28 -0.740858
2000-05-31 -0.114699
2000-06-30 -0.529631
Freq: BM, dtype: float64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
频率和日期偏移量
from pandas.tseries.offsets import Hour, Minute

移动(shifting)数据
ts.shift()

时期及其算术运算
Period类 、 PeriodIndex类

pd.period_range():创建规则的时期范围。

In [20]: rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')
...: rng
...:
Out[20]: PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='int64', freq='M')
1
2
3
4
构造函数:
pd.PeriodIndex()

时期的频率转换
ts.asfred()

Timestamp(时间戳) 和 Period(时期) 的 转换
In [21]: rng = pd.date_range('2000-01-01', '2000-06-30', freq='M')

In [22]: rng
Out[22]:
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30',
'2000-05-31', '2000-06-30'],
dtype='datetime64[ns]', freq='M')

In [23]: rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')

In [24]: rng
Out[24]: PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='int64', freq='M')
1
2
3
4
5
6
7
8
9
10
11
12
to_period() to_timestamp()

In [25]: rng = pd.date_range('2000-01-01', periods=3, freq='M')
...: ts = pd.Series(np.random.randn(3), index=rng)
...: ts
...:
Out[25]:
2000-01-31 0.455968
2000-02-29 1.720553
2000-03-31 1.695834
Freq: M, dtype: float64

In [26]: pts = ts.to_period()
...: pts
...:
Out[26]:
2000-01 0.455968
2000-02 1.720553
2000-03 1.695834
Freq: M, dtype: float64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
重采样及频率转换!!
重采样(resampling)指的是将时间序列从一个频率转换到另一个频率的处理过程。

高频率数据聚合到低频率称为降采样(downsamling),而将低频率数据转换到高频率数据则称为升采样(upsampling,通常伴随着插值)。

resample() : 频率转换工作的主力函数

参数 说明
freq 表示重采样频率的字符串或DataOffset,例如‘M’、‘5min’、Second(15)
how=’mean’ 用于产生聚合值的函数名或数组函数。默认为‘mean’ –> FutureWarning: how in .resample() is deprecated the new syntax is .resample(…).mean()
axis=0 重采样的轴
fill_method=None 升采样时如何插值,如‘ffill’或‘bfill’。默认不插值。
closed=’right’ 降采样时哪一段是闭合的。
label=’right’ 降采样时如何设置聚合值的标签
loffset=None 面元标签的时间校正值,比如‘-1s’或者Second(-1)用于将聚合标签调早1秒
limit = None 在前向或后向填充时,允许填充的最大时期数
kind = None 聚合到时期(Period)或者时间戳(Timestamp),默认聚合到时间序列的索引类型
convention=None 重采样时期时,低频转高频的约定,默认‘end’。
降采样
使用resample
看下面的例子,使用resample对数据进行降采样时,需要考虑两样东西:

各区间哪边是闭合的。
如何标记各个聚合面元,用区间的开头还是末尾。
In [27]: rng = pd.date_range('2000-01-01', periods=100, freq='D')
...: ts = pd.Series(np.random.randn(len(rng)), index=rng)
...: ts
...:
Out[27]:
2000-01-01 -0.189731
...
2000-04-09 0.283110
Freq: D, dtype: float64

In [28]: ts.resample('M').mean()
Out[28]:
2000-01-31 -0.019276
2000-02-29 -0.041192
2000-03-31 -0.214551
2000-04-30 0.411190
Freq: M, dtype: float64

In [29]: ts.resample('M', kind='period').mean()
Out[29]:
2000-01 -0.019276
2000-02 -0.041192
2000-03 -0.214551
2000-04 0.411190
Freq: M, dtype: float64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
In [31]: rng = pd.date_range('2000-01-01', periods=12, freq='T')
...: ts = pd.Series(np.arange(12), index=rng)
...: ts
...:
Out[31]:
2000-01-01 00:00:00 0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
2000-01-01 00:07:00 7
2000-01-01 00:08:00 8
2000-01-01 00:09:00 9
2000-01-01 00:10:00 10
2000-01-01 00:11:00 11
Freq: T, dtype: int32

In [32]: ts.resample('5min', closed='right', label='right').sum()
Out[32]:
2000-01-01 00:00:00 0
2000-01-01 00:05:00 15
2000-01-01 00:10:00 40
2000-01-01 00:15:00 11
Freq: 5T, dtype: int32

In [33]: ts.resample('5min', closed='right',
...: label='right', loffset='-1s').sum()
Out[33]:
1999-12-31 23:59:59 0
2000-01-01 00:04:59 15
2000-01-01 00:09:59 40
2000-01-01 00:14:59 11
Freq: 5T, dtype: int32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
通过groupby进行降采样
打算根据月份或者星期进行分组,传入能够访问时间序列的索引上的这些字段的函数。

In [35]: rng = pd.date_range('2000-01-01', periods=100, freq='D')
...: ts = pd.Series(np.random.randn(len(rng)), index=rng)
...: ts

In [36]: ts.groupby(lambda x : x.month).mean()
Out[36]:
1 -0.126008
2 0.079132
3 0.026093
4 0.321457
dtype: float64

In [37]: ts.groupby(lambda x : x.weekday).mean()
Out[37]:
0 0.280289
1 0.174452
2 0.166102
3 -0.779489
4 -0.036195
5 0.086394
6 0.234831
dtype: float64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
升采样
In [38]: import pandas as pd
...: import numpy as np
...: frame = pd.DataFrame(np.random.randn(2, 4),
...: index=pd.date_range('1/1/2000', periods=2,
...: freq='W-WED'),
...: columns=['Colorado', 'Texas', 'New York', 'Ohio'])
...: frame
...:
Out[38]:
Colorado Texas New York Ohio
2000-01-05 -0.925525 -0.434350 1.037349 -1.532790
2000-01-12 1.075744 0.237922 -0.907699 0.592211

In [39]: df_daily = frame.resample('D').asfreq()
...: df_daily
...:
Out[39]:
Colorado Texas New York Ohio
2000-01-05 -0.925525 -0.434350 1.037349 -1.532790
2000-01-06 NaN NaN NaN NaN
2000-01-07 NaN NaN NaN NaN
2000-01-08 NaN NaN NaN NaN
2000-01-09 NaN NaN NaN NaN
2000-01-10 NaN NaN NaN NaN
2000-01-11 NaN NaN NaN NaN
2000-01-12 1.075744 0.237922 -0.907699 0.592211

In [40]: frame.resample('D').ffill()
Out[40]:
Colorado Texas New York Ohio
2000-01-05 -0.925525 -0.434350 1.037349 -1.532790
2000-01-06 -0.925525 -0.434350 1.037349 -1.532790
2000-01-07 -0.925525 -0.434350 1.037349 -1.532790
2000-01-08 -0.925525 -0.434350 1.037349 -1.532790
2000-01-09 -0.925525 -0.434350 1.037349 -1.532790
2000-01-10 -0.925525 -0.434350 1.037349 -1.532790
2000-01-11 -0.925525 -0.434350 1.037349 -1.532790
2000-01-12 1.075744 0.237922 -0.907699 0.592211

# 之前的frame.resample('D', how='mean')

In [41]: df_daily = frame.resample('D').mean()
...: df_daily
...:
Out[41]:
Colorado Texas New York Ohio
2000-01-05 -0.925525 -0.434350 1.037349 -1.532790
2000-01-06 NaN NaN NaN NaN
2000-01-07 NaN NaN NaN NaN
2000-01-08 NaN NaN NaN NaN
2000-01-09 NaN NaN NaN NaN
2000-01-10 NaN NaN NaN NaN
2000-01-11 NaN NaN NaN NaN
2000-01-12 1.075744 0.237922 -0.907699 0.592211
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
对于时期进行重采样。
In [42]: frame = pd.DataFrame(np.random.randn(24, 4),
...: index=pd.period_range('1-2000', '12-2001',
...: freq='M'),
...: columns=['Colorado', 'Texas', 'New York', 'Ohio'])
...: frame[:5]
...: annual_frame = frame.resample('A-DEC').mean()
...: annual_frame
...:
Out[42]:
Colorado Texas New York Ohio
2000 0.442672 0.104870 -0.067043 -0.128942
2001 -0.263757 -0.399865 -0.423485 0.026256

In [43]: annual_frame.resample('Q-DEC', convention='end').ffill()
Out[43]:
Colorado Texas New York Ohio
2000Q4 0.442672 0.104870 -0.067043 -0.128942
2001Q1 0.442672 0.104870 -0.067043 -0.128942
2001Q2 0.442672 0.104870 -0.067043 -0.128942
2001Q3 0.442672 0.104870 -0.067043 -0.128942
2001Q4 -0.263757 -0.399865 -0.423485 0.026256

原文地址:https://www.cnblogs.com/zuichuyouren/p/11277411.html