Pytorch中的RNN、RNNCell、LSTM、LSTMCell、GRU、GRUCell的用法

首先，当然，官方文档都有

RNN: https://pytorch.org/docs/stable/generated/torch.nn.RNN.html

RNNCell: https://pytorch.org/docs/stable/generated/torch.nn.RNNCell.html

LSTM: https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

LSTMCell: https://pytorch.org/docs/stable/generated/torch.nn.LSTMCell.html

GRU: https://pytorch.org/docs/stable/generated/torch.nn.GRU.html

GRUCell: https://pytorch.org/docs/stable/generated/torch.nn.GRUCell.html

这里，只是自己做下笔记

以LSTM和LSTMCell为例

LSTM的结构

LSTM the dim of definition input output weights

LSTM parameters:

input_size: input x 的 features
hidden_size: hidden state h 的 features
num_layers: 层数，默认为1
batch_first: if True，是(batch, seq, feature)，否则是(seq, batch, feature)，默认是False
bidirectional: 默认为False

input:

input: 当batch_first=False， tensor为(L, N, H_i) ，否则为 (N, L, H_i)
h_0: tensor of shape (D*num_layers, N, H_out)，默认为zeros，如果(h_0, c_0) not provided
c_0: tensor of shape (D*num_layers, n, H_cell)，默认为zeros，如果(h_0, c_0) not provided

where:

N = batch size

L = sequence length

D = 2 if bidirectional=True otherwise 1

H_in = input_size

H_cell = hidden_size

H_out = proj_size if proj_size>0 otherwise hidden_size，通常就是hidden_size咯

Output:

output: (L, N, D*H_out) when batch_first=False，是一个长度为L的序列，[h_1[-1], h_2[-1], ..., h_L[-1]]，就是最后一层的hidden states
h_n: tensor of shape (D*num_layers, N, H_out)
c_n: tensor of shape (D*num_layers, N, H_cell)

Variables:

好像新版的有改动

all_weights

Examples:

>>> rnn = nn.LSTM(10, 20, 2)   # (input_size, hidden_size, num_layers)
>>> input = torch.randn(5, 3, 10) # (time_steps, batch, input_size) 
>>> h0 = torch.randn(2, 3, 20)  # (num_layers, batch_size, hidden_size)
>>> c0 = torch.randn(2, 3, 20)
>>> output, (hn, cn) = rnn(input, (h0, c0))  # (time_steps, batch, hidden_size)
# output[-1] = h0[-1]

LSTM Cell

就是LSTM的一个单元，许多个LSTM Cell组成一个LSTM

结构

相比LSTM，少了参数t

Parameters:

只有input_size 和 hidden_size，没有了 num_layers

Inputs:

input: (batch, input_size)
h_0: (batch, hidden_size)
c_0: (batch, hidden_size)

Outputs:

h_1: (batch, hidden_size)
c_1: (batch, hidden_size)

Variables:

weight_ih: input-hidden weights, of shape (4*hidden_size, input_size)，因为是左乘W*input，且有4个W，所以是4*hidden_size
weight_hh: hidden-hidden weights, of shape (4*hidden_size, hidden_size)
bias_ih: input-hidden bias, of shape (4*hidden_size)
bias_hh: hidden-hidden bias, of shape (4*hidden_size)

Example:

>>> rnn = nn.LSTMCell(10, 20) # (input_size, hidden_size)
>>> input = torch.randn(2, 3, 10) # (time_steps, batch, input_size)
>>> hx = torch.randn(3, 20) # (batch, hidden_size)
>>> cx = torch.randn(3, 20)
>>> output = []
>>> for i in range(2):
        hx, cx = rnn(input[i], (hx, cx))
        output.append(hx)
>>> output = torch.stack(output, dim=0)

个性签名：时间会解决一切