终于理解清楚attention，利用attention对黄金价格进行预测

其实说到底，就是四个字，加权求和！！！

是的，看了将近上百篇文章后，终于搞清楚了一丢丢。

#基本思想
inputs = Input(shape=(input_dim,))

#  STARTS HERE

attention_probs = Dense(input_dim, activation='softmax', name='attention_vec')(inputs)
attention_mul = Multiply()([inputs, attention_probs])

# FINISHES HERE

attention_mul = Dense(64)(attention_mul)
output = Dense(1, activation='sigmoid')(attention_mul)
model = Model(input=[inputs], output=output)

https://www.cnblogs.com/LittleHann/p/9722779.html#_label3_1_1_1

上面这篇博客成了最后跑通的关键，不得不吐槽现在基本都是抄来抄去，网上很多改动的都不给注解或者把原来的注解还放在上面，真的让我很无语。

结合https://www.joinquant.com/view/community/detail/301c73f3088d6d768a499d3f519f00e8?type=1&page=1这篇来讲一下最后应该怎么样跑通这一份代码

除去数据准备和博主相同，接下来主要是模型构建部分

# 卷积层
inputs = Input(shape=(TIME_STEPS, INPUT_DIM))
x = Conv1D(filters = 64, kernel_size = 1, activation = 'relu')(inputs)
x = MaxPooling1D(pool_size = 5)(x)
x = Dropout(0.2)(x)

from keras.layers import Input, Dense, merge
from keras import layers
# 循环层
lstm_out = Bidirectional(LSTM(lstm_units,return_sequences=True,activation='relu'), name='bilstm')(x)
print('双向',lstm_out.shape)
lstm_out = LSTM(lstm_units,activation='relu')(x)
print('单向',lstm_out.shape)


# 这里便是attention的地方了，注意还是四个字⚠️加权求和，我这里关注的是最后一层单向的LSTM的数据。
attention_probs = Dense(64, activation='softmax', name='attention_vec')(lstm_out)这里的64是要关注的那一层的维度，也就是这行代码最后一个小括号里面那个输入层的维度
# 上面是求了权重
print(lstm_out.shape)
print('权重',attention_probs.shape)
# 下面将求到的权重和lstm层加权一次
attention_mul = Multiply()([lstm_out, attention_probs])


#这里最后把加和求权的拿出来放到层里面放到一个全联接层里面
attention_mul = Dense(64)(attention_mul)
output = Dense(1, activation='sigmoid')(attention_mul)

# 输出层，组成模型
model = Model(inputs=inputs, outputs=output)
# print(model.summary())

# 模型编译

epochs = 10

model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, shuffle=False)
y_pred = model.predict(X_test)
print('MSE Train loss:', model.evaluate(X_train, y_train, batch_size=batch_size))
print('MSE Test loss:', model.evaluate(X_test, y_test, batch_size=batch_size))
plt.plot(y_test, label='test')
plt.plot(y_pred, label='pred')
plt.legend()
plt.show()

关于博客中GRU的那个部分和红色注解这些好像是不用运行的，那个属于额外自定义一种attention机制了，类似于两个LSTM神经元里面又加了一层GRU

但是，事实上，这一篇博客可能还是有一点问题，比如他对于归一化的处理

如果你在代码或原理上还有疑惑，欢迎留言。

但是放在最后，如果不是seqence to sequence,也可以尝试着对于时间序列单纯的本身直接进行运算。请参照下文。

https://zhuanlan.zhihu.com/p/46148045