预处理后数据的保存与读取

在机器学习中，一般都需要先对数据进行数据预处理工作。模型一般需要反复的调参，因此可能需要多次使用预处理之后的数据，但是反复进行数据的预处理工作是多余的，我们可以将其保存下来。

#用pickle模块将处理好的数据存储成pickle格式，方便以后调用，即建立一个checkpoint
# 保存数据方便调用
import os
import pickle

pickle_file = 'notMNIST.pickle'
if not os.path.isfile(pickle_file):    #判断是否存在此文件，若无则存储
    print('Saving data to pickle file...')
    try:
        with open('fan.pickle', 'wb') as pfile:
            pickle.dump(
                {
                    'X_train': X_train,
                    'X_test': X_test,
                    'Ytrain': y_train,
                    'y_test': y_test,
                },
                pfile, pickle.HIGHEST_PROTOCOL)
    except Exception as e:
        print('Unable to save data to', pickle_file, ':', e)
        raise
print('Data cached in pickle file.')

#从pickle文件中读取数据
pickle_file = 'pickle.pickle'
with open(pickle_file, 'rb') as f:
  pickle_data = pickle.load(f)       # 反序列化，与pickle.dump相反
  X_train = pickle_data['X_train']
  X_test = pickle_data['X_test']
  y_train = pickle_data['y_train']
  y_test = pickle_data['y_test']
  del pickle_data  # 释放内存
print('Data and modules loaded.')