时间序列学习笔记3

4. 时区处理

时区处理很麻烦,一般就以UTC来处理。
UTC为协调世界时,是格林尼治时间的替代者,目前已经是国际标准。

In [1]: import pytz

In [4]: pytz.common_timezones[-5:]
Out[4]: ['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']

In [5]: tz = pytz.timezone('Asia/Shanghai')

In [6]: tz
Out[6]: <DstTzInfo 'Asia/Shanghai' LMT+8:06:00 STD>

4.1 本地化和转换

默认情况下,pandas时间序列是单纯(naive)时区的。

In [11]: rng = pd.date_range('2/19/2017 9:30', periods=4, freq='D')

In [12]: ts = Series(np.random.randn(4),index=rng)

In [13]: ts.index.tz  # 结果为空

In [14]: ts
Out[14]:
2017-02-19 09:30:00    0.530722
2017-02-20 09:30:00    1.459262
2017-02-21 09:30:00   -0.038216
2017-02-22 09:30:00   -0.671159
Freq: D, dtype: float64


# 可以在创建的时候直接赋值 tz=?
In [15]: pd.date_range('2/19/2017 9:30', periods=4, freq='D', tz='UTC')
Out[15]:
DatetimeIndex(['2017-02-19 09:30:00+00:00', '2017-02-20 09:30:00+00:00',
               '2017-02-21 09:30:00+00:00', '2017-02-22 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

# 从naive到有时区,使用tz_localize
In [16]: tz_utc = ts.tz_localize('UTC')

In [17]: tz_utc
Out[17]:
2017-02-19 09:30:00+00:00    0.530722
2017-02-20 09:30:00+00:00    1.459262
2017-02-21 09:30:00+00:00   -0.038216
2017-02-22 09:30:00+00:00   -0.671159
Freq: D, dtype: float64

In [18]: tz_utc.index.tz
Out[18]: <UTC>

# 使用 tz_convert进行修改时区
In [20]: tz_utc.tz_convert('Asia/Shanghai')
Out[20]:
2017-02-19 17:30:00+08:00    0.530722
2017-02-20 17:30:00+08:00    1.459262
2017-02-21 17:30:00+08:00   -0.038216
2017-02-22 17:30:00+08:00   -0.671159
Freq: D, dtype: float64



4.2 Timestamp对象

# 创建一个Timestamp对象
In [25]: stamp = pd.Timestamp('2017-2-19 12:10')

# naive to utc
In [26]: stamp_utc = stamp.tz_localize('UTC')

# 转换
In [29]: stamp_cn = stamp_utc.tz_convert('Asia/Shanghai')



#  value 显示从unix纪元(1970.1.1)开始计算的纳秒数
In [30]: stamp_utc.value
Out[30]: 1487506200000000000

In [31]: stamp_cn.value
Out[31]: 1487506200000000000

In [32]: stamp.value  # 三个都是一样的
Out[32]: 1487506200000000000



4.3 不同时区之间的运算

不同时区之间的运算最终都转换成了UTC,因为实际存储中都是以UTC时区来存储的。

In [33]: ts
Out[33]:
2017-02-19 09:30:00    0.530722
2017-02-20 09:30:00    1.459262
2017-02-21 09:30:00   -0.038216
2017-02-22 09:30:00   -0.671159
Freq: D, dtype: float64

In [34]: ts.index
Out[34]:
DatetimeIndex(['2017-02-19 09:30:00', '2017-02-20 09:30:00',
               '2017-02-21 09:30:00', '2017-02-22 09:30:00'],
              dtype='datetime64[ns]', freq='D')

In [35]: ts1 = ts[:2].tz_localize('Europe/London')  

In [36]: ts2 = ts1.tz_convert('Europe/Moscow')

In [37]: result = ts1 + ts2  # ts1和ts2在不同的时区

In [38]: result.index  # 结果都转变为了UTC
Out[38]: DatetimeIndex(['2017-02-19 09:30:00+00:00', '2017-02-20 09:30:00+00:00'], dtype='datetime64[ns, UTC]', freq='D')

In [39]: result
Out[39]:
2017-02-19 09:30:00+00:00    1.061445
2017-02-20 09:30:00+00:00    2.918524
Freq: D, dtype: float64

5. 时期及算术运算

period(时期)表示时间区间,如数日、数月等。

In [4]: p = pd.Period(2017)

In [5]: p
Out[5]: Period('2017', 'A-DEC')

In [6]: p + 1
Out[6]: Period('2018', 'A-DEC')

In [7]: pd.Period(2018) - p
Out[7]: 1

In [8]: rng = pd.period_range('1/1/2001','6/30/2001', freq='M')

In [9]: rng
Out[9]: PeriodIndex(['2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06'], dtype='int64', freq='M')

In [10]: Series(np.random.randn(6), index=rng)
Out[10]:
2001-01    1.146489
2001-02    2.112800
2001-03    0.292746
2001-04   -0.841383
2001-05   -0.845565
2001-06    1.207504
Freq: M, dtype: float64


# 列表
In [11]: values = ['2001Q3','2002Q2','2003Q1']

In [13]: index = pd.PeriodIndex(values, freq='Q-DEC') # 以DEC月份作为年度最后一天,来计算季度

In [14]: index
Out[14]: PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='int64', freq='Q-DEC')

In [26]: index.asfreq('Q-JUN') # 修改一下
Out[26]: PeriodIndex(['2002Q1', '2002Q4', '2003Q3'], dtype='int64', freq='Q-JUN')

5.1 period的频率转换

In [15]: p
Out[15]: Period('2017', 'A-DEC') # 按年取,取一年,年尾是12年31日

In [16]: p.asfreq('M', how='start')  #
Out[16]: Period('2017-01', 'M')

In [17]: p.asfreq('M', how='end')
Out[17]: Period('2017-12', 'M')

In [18]: p = pd.Period('2017',freq='A-JUN') # 取2017年,以7月底为年终

In [19]: p.asfreq('M',how='end')
Out[19]: Period('2017-06', 'M')

In [20]: rng = pd.period_range('2006','2009',freq='A-DEC')  # 取6-9的每年

In [21]: ts = Series(np.random.randn(len(rng)), index=rng)

In [22]: ts
Out[22]:
2006   -0.627032
2007   -1.409714
2008    0.072737
2009    1.240899
Freq: A-DEC, dtype: float64

In [23]: ts.asfreq('M', how='start')  # 按月取,取第一个月
Out[23]:
2006-01   -0.627032
2007-01   -1.409714
2008-01    0.072737
2009-01    1.240899
Freq: M, dtype: float64

In [24]: ts.asfreq('B', how='end')  # 修改频率到天,并取最后一天
Out[24]:
2006-12-29   -0.627032
2007-12-31   -1.409714
2008-12-31    0.072737
2009-12-31    1.240899
Freq: B, dtype: float64

5.2 按季度计算的时期频率

In [28]: rng = pd.period_range('2011Q3','2012Q4',freq='Q-JAN')

In [29]: rs = Series(np.arange(len(rng)), index=rng)

In [30]: new_rng = (rng.asfreq('B','e') - 1).asfreq('T','s') + 16*60

In [35]: rs.index = new_rng.to_timestamp()

In [36]: rs
Out[36]:
2010-10-28 16:00:00    0
2011-01-28 16:00:00    1
2011-04-28 16:00:00    2
2011-07-28 16:00:00    3
2011-10-28 16:00:00    4
2012-01-30 16:00:00    5
dtype: int64

5.3 将timestamp和period进行转换

In [38]: rng = pd.date_range('1/1/2001', periods=3, freq='M')

In [40]: ts = Series(np.random.randn(3), index=rng)

In [41]: pts = ts.to_period()  # 转换成时期

In [42]: ts
Out[42]:
2001-01-31    0.619856
2001-02-28   -2.117066
2001-03-31    1.152329
Freq: M, dtype: float64

In [43]: pts
Out[43]:
2001-01    0.619856
2001-02   -2.117066
2001-03    1.152329
Freq: M, dtype: float64


In [45]: pts.to_timestamp(how='end')  # 转换成时间戳
Out[45]:
2001-01-31    0.619856
2001-02-28   -2.117066
2001-03-31    1.152329
Freq: M, dtype: float64

5.4 通过数据创建PeriodIndex

In [47]: q = Series(range(1,5) * 7)  # 创建季度

In [48]: y = Series(np.arange(1988,2016))  # 创建年份

In [49]: index = pd.PeriodIndex(year=y,quarter=q, freq='Q-DEC')  # 创建index

In [50]: data = Series(np.random.randn(28), index=index)

In [51]: data
Out[51]:
1988Q1   -0.127187
1989Q2   -1.757196
1990Q3    0.826757
...
2013Q2    0.540955
2014Q3    0.531101
2015Q4    0.751739
Freq: Q-DEC, dtype: float64

待续。。。

原文地址:https://www.cnblogs.com/felo/p/6421795.html