pandas 时间序列

一、生成一段时间范围

1、语法

pd.date_range(start=None, end=None, periods=None, freq=None)
# start、end和freq配合能够生成start和end范围内频率freq的一组时间索引
# start、periods和freq配合能够生成从star开始的频率为freq的periods个时间索引

2、例子

import pandas as pd
# 情况一
date_df = pd.date_range(start='20191110', end='20191231', freq='7D')
print(date_df)

# 情况二
date_df = pd.date_range(start='2018-10-11', periods=10, freq='D')
print(date_df)

3、频率

别名              偏移量类型           说明
D                Day                每日历日
B                BusinessDay        每工作日
H                Hour               每小时
T或min           Minute              每分
S                Second             每秒
L或ms            Milli              每毫秒(即每千分之一秒)
U                Micro              每微妙(即百万分之一秒)
M                MonthEnd           每月最后一个日历日
BM               BusinessMonthEnd   每月最后一个工作日
MS               MonthBegin         每月第一个日历日
BMS              BusinessMonthBegin 每月第一个工作日

二、时间序列的应用

1、时间序列，做为行索引

例子

import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.randint(low=10, high=30, size=(10, )), 
                  index=pd.date_range(start='2018/11/10', periods=10))
print(df)

2、时间字符串->时间序列

语法

pd.to_datetime(时间字符串, format='')
# format时间的格式，一般不用填写

例子

import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.arange(30).reshape(5, 6), index=list('abcde'), columns=list('ZXCVBN'))
# print(df)
df['V'] = ['2019-11-10 11:00:00', '2019-11-10 12:00:00', '2019-11-10 13:00:00',
           '2019-11-10 14:00:00', '2019-11-10 15:00:00']
# print(df)
print(df.dtypes)
df['V'] = pd.to_datetime(df['V'])
print(df)
print(df.dtypes)

三、pandas重采样

1、重采样

指的是将时间序列从一个频率转化到另一个频率进行处理的过程，
将高频率转化为低频率数据为降采样，低频率装化为高频率为升采样

2、语法

df.resample(时间频率).聚合函数

3、例子

import numpy as np
import pandas as pd

df = pd.DataFrame(data=np.random.randint(low=10, high=200, size=(100, 1)),
                  index=pd.date_range(start='20181110 11:10:10', periods=100))
print(df)
print(df.resample('10D').count())

4、过程

时间字符串->时间序列->时间列索引->重采样

四、PeriodIndex

前面的DatetimeIndex可以理解为时间戳
PeriodIndex可以理解为时间段

1、作用：时间段->时间戳
2、注意：时间段时间的数据类型必须是int的

3、例子

import numpy as np
import pandas as pd

data = pd.DataFrame(data=np.zeros((5, 6)), columns=['year', 'month', 'day', 'hour', 'a', 'b'])
# print(data)
data['year'] = [2018, 2018, 2018, 2019, 2019]
data['month'] = [1, 4, 7, 10, 11]
data['day'] = [10, 21, 14, 26, 22]
data['hour'] = [10, 13, 8, 12, 14]
data['a'] = [23, 43, 12, 53, 64]
data['b'] = [54, 27, 19, 23, 54]
print(data)
"""
   year  month  day  hour   a   b
0  2018      1   10    10  23  54
1  2018      4   21    13  43  27
2  2018      7   14     8  12  19
3  2019     10   26    12  53  23
4  2019     11   22    14  64  54
"""
period = pd.PeriodIndex(year=data['year'], month=data['month'], day=data['day'], hour=data['hour'], freq='H')
print(period)

"""
PeriodIndex(['2018-01-10 10:00', '2018-04-21 13:00', '2018-07-14 08:00',
             '2019-10-26 12:00', '2019-11-22 14:00'],
            dtype='period[H]', freq='H')
"""

4、PeriodIndex重采样

# 时间戳列索引
data.set_index(period, inplace=True)
# 降采样
data = data.resample('Y').count()['a']
print(data)
"""
2018    3
2019    2
"""