zipline-benchmarks.py文件改写

改写原因:在这个模块中的 get_benchmark_returns() 方法回去谷歌财经下载对应SPY(类似于上证指数)的数据,但是Google上下载的数据在最后写入Io操作的时候会报一个恶心的编码的错误,很烦人,时好时坏的那种,就是图下这种报错。

                     

改写方式:

  1.首先去雅虎财经下载SPY.csv文件,然后把这个文件放到你对应的目录下

  2.具体代码如下

#
# Copyright 2013 Quantopian, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import pandas as pd
import pytz
from datetime import datetime

import pandas_datareader.data as pd_reader


def get_benchmark_returns(symbol, first_date, last_date):
    """
    Get a Series of benchmark returns from Google associated with `symbol`.
    Default is `SPY`.

    Parameters
    ----------
    symbol : str
        Benchmark symbol for which we're getting the returns.
    first_date : pd.Timestamp
        First date for which we want to get data.
    last_date : pd.Timestamp
        Last date for which we want to get data.

    The furthest date that Google goes back to is 1993-02-01. It has missing
    data for 2008-12-15, 2009-08-11, and 2012-02-02, so we add data for the
    dates for which Google is missing data.

    We're also limited to 4000 days worth of data per request. If we make a
    request for data that extends past 4000 trading days, we'll still only
    receive 4000 days of data.

    first_date is **not** included because we need the close from day N - 1 to
    compute the returns for day N.
    """
    # 源码
    # data = pd_reader.DataReader(
    #     symbol,
    #     'google',
    #     first_date,
    #     last_date
    # )
    #
    # data = data['Close']
    #
    # data[pd.Timestamp('2008-12-15')] = np.nan
    # data[pd.Timestamp('2009-08-11')] = np.nan
    # data[pd.Timestamp('2012-02-02')] = np.nan
    #
    # data = data.fillna(method='ffill')
    # return data.sort_index().tz_localize('UTC').pct_change(1).iloc[1:]

    # 自己写的代码
    # parse = lambda x: pytz.utc.localize(datetime.strptime(x, '%Y-%m-%d'))
    # data = pd.read_csv("SPY.csv", parse_dates=['Date'], index_col=0, date_parser=parse)
    # data = data['Close']
    # data = data.fillna(method='ffill')
    # return data.sort_index().pct_change(1).iloc[0:]

总结:

  1.这次报错后,我习惯性的找到最底层也就是最后两行错误,但是只能知道是编码的错误,但是解决不了。所以以后碰到类似的第三方包的错误,不要急着从最底层开始改,应该适当的想想,我在最开始出错的地方可不可以成功的避免掉,这也是一种思路。

  2.Google财经的SPY数据不全,所以他在这个方法中定义了三列是因为这三天的数据他没有。

原文地址:https://www.cnblogs.com/wuyongqiang/p/7943499.html