【python】不同的dropout们

神经网络之所以能处理非线性问题，这归功于激活函数的非线性表达能力，神经网络的数学基础是处处可微的。
dropout是一种激活函数（activation function），python中有若干种dropout函数，不尽相同。
dropout是为了防止或减轻过拟合而使用的函数，它一般用在全连接层。也有研究证明可以用在卷积层（小卷积核不适用）。

PyTorch中的dropout：概率参数p表示置零的概率
Tensorflow中的dropout：概率参数keep_prob表示保留的概率

torch.nn.Dropout

参考源码：torch.nn.Dropout。

class torch.nn.Dropout(p: float = 0.5, inplace: bool = False) 
# Input: (*). Input can be of any shape
# Output: (*). Output is of the same shape as input

torch.nn.functional.dropout

参考源码：torch.nn.functional.dropout。

torch.nn.functional.dropout(input, p=0.5, training=True, inplace=False)
# During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.

torch.nn.modules.dropout

参数中的p表示要设置为0的概率。
参考源码：SOURCE CODE FOR TORCH.NN.MODULES.DROPOUT.

class Dropout(_DropoutNd):
r"""During training, randomly zeroes some of the elements of the input tensor with probability :attr:`p` using samples from a Bernoulli distribution. Each channel will be zeroed out independently on every forward call.
This has proven to be an effective technique for regularization and preventing the co-adaptation of neurons as described in the paper `Improving neural networks by preventing co-adaptation of feature detectors`_ .
Furthermore, the outputs are scaled by a factor of :math:`frac{1}{1-p}` during training. This means that during evaluation the module simply computes an identity function.
Args:
p: probability of an element to be zeroed. Default: 0.5
inplace: If set to ``True``, will do this operation in-place. Default: ``False``
Shape:
- Input: :math:`(*)`. Input can be of any shape
- Output: :math:`(*)`. Output is of the same shape as input
Examples::
>>> m = nn.Dropout(p=0.2)
>>> input = torch.randn(20, 16)
>>> output = m(input)
.. _Improving neural networks by preventing co-adaptation of feature detectors: https://arxiv.org/abs/1207.0580 """
def forward(self, input: Tensor) -> Tensor:
return F.dropout(input, self.p, self.training, self.inplace)

tf.nn.dropout

dropout函数会以一个概率为keep_prob来决定神经元是否被抑制。如果被抑制，该神经元输出为0，如果不被抑制则该神经元的输出为输入的1/keep_prob倍？
可参考Tensorflow中的dropout的使用方法。

def dropout(x, keep_prob, noise_shape=None, seed=None, name=None)

这里的keep_prob是保留概率，即要保留的结果所占比例，它作为一个place holder，在run时传入，当keep_prob=1的时候，相当于100%保留，也就是dropout没有起作用。
上述函数中，x为浮点类型的tensor，keep_prob为浮点类型的scalar，范围在(0,1]之间，表示x中的元素被保留下来的概率，noise_shape为一维的tensor（int32类型），表示标记张量的形状（representing the shape for randomly generated keep/drop flags），并且noise_shape指定的形状必须对x的形状是可广播的。如果x的形状是[k, l, m, n]，并且noise_shape为[k, l, m, n]，那么x中的每一个元素是否保留都是独立，但如果x的形状是[k, l, m, n]，并且noise_shape为[k, 1, 1, n]，则x中的元素沿着第0个维度第3个维度以相互独立的概率保留或者丢弃，而元素沿着第1个维度和第2个维度要么同时保留，要么同时丢弃。