深度学习标准化

Swin Transformer

作者：elfin

1、Batch Normalization
2、Layer Normalization

Top --- Bottom

1、Batch Normalization

使用BN时，我们只需要使用torch.nn.BatchNorm2d()指定通道数即可。它会在每个通道上分别求均值和方差在进行标准化。

1.1 数据准备

import torch
BatchNorm2d = torch.nn.BatchNorm2d
test = torch.rand((1,3,2,2))

1.2 数据展示

test[0,:,:,:]

tensor([[[6.7027e-01, 5.3149e-01],
         [4.6797e-01, 3.1028e-02]],

        [[4.1371e-01, 1.2022e-04],
         [2.3150e-01, 2.5120e-01]],

        [[5.2258e-01, 9.6350e-02],
         [4.6467e-01, 3.6091e-01]]])

1.3 BN转化

BatchNorm2d(3)(test)
Out:
    tensor([[[[ 1.0252e+00,  4.4465e-01],
              [ 1.7894e-01, -1.6488e+00]],

             [[ 1.2858e+00, -1.5194e+00],
              [ 4.9993e-02,  1.8359e-01]],

             [[ 9.8744e-01, -1.6194e+00],
              [ 6.3328e-01, -1.3302e-03]]]], grad_fn=<NativeBatchNormBackward>)

(test[:,0,:,:] - test[:,0,:,:].numpy().mean()) / test[0,0,:,:].numpy().std()
Out:
    tensor([[ 1.0253,  0.4447],
            [ 0.1790, -1.6489]])

这里我们清楚看到两个结构是一致的！下面我们直接测试batch_size不为1的情况：

test = torch.rand((10,3,2,2))
BatchNorm2d(3)(test)[0,0,:,:]
Out:
	tensor([[ 1.6257,  0.5479],
            [-1.3761,  0.8000]], grad_fn=<SliceBackward>)

res = (test[:,0,:,:] - test[:,0,:,:].numpy().mean()) / test[:,0,:,:].numpy().std()
res[0,:,:]
Out:
    tensor([[ 1.6258,  0.5479],
            [-1.3762,  0.8000]])

Top --- Bottom

2、Layer Normalization

LN(Layer Normalization)也是做标准化，但是它不是在样本间，标准化的数据采集只会在单个样本内。

关于torch.nn.LayerNorm()的参数我们有很多种的配置：

>>> input = torch.randn(20, 5, 10, 10)
>>> # With Learnable Parameters
>>> m = nn.LayerNorm(input.size()[1:])
>>> # Without Learnable Parameters
>>> m = nn.LayerNorm(input.size()[1:], elementwise_affine=False)
>>> # Normalize over last two dimensions
>>> m = nn.LayerNorm([10, 10])
>>> # Normalize over last dimension of size 10
>>> m = nn.LayerNorm(10)
>>> # Activating the module
>>> output = m(input)

2.1 数据展示

squence = torch.rand((2,3,10))
squence
Out:
    tensor([[[0.1151, 0.9571, 0.5986, 0.4692, 0.7029, 0.5159, 0.4494, 0.9428,
              0.9714, 0.9938],
             [0.6456, 0.5997, 0.7542, 0.7266, 0.7021, 0.2900, 0.7044, 0.1627,
              0.3725, 0.9454],
             [0.9398, 0.3861, 0.5276, 0.8783, 0.8319, 0.1181, 0.6185, 0.9689,
              0.6393, 0.7770]],

            [[0.2786, 0.8901, 0.7228, 0.3740, 0.4186, 0.6857, 0.8438, 0.4762,
              0.4106, 0.4823],
             [0.5199, 0.7644, 0.2987, 0.3745, 0.6000, 0.7266, 0.0854, 0.1954,
              0.5413, 0.1656],
             [0.5487, 0.2655, 0.9256, 0.7352, 0.4081, 0.8017, 0.7130, 0.5364,
              0.5441, 0.8483]]])

2.2 指定一个维度

LN = torch.nn.LayerNorm
LN(10)(squence)
Out:
    tensor([[[-1.9932,  1.0227, -0.2616, -0.7251,  0.1120, -0.5578, -0.7961,
              0.9712,  1.0739,  1.1540],
             [ 0.2423,  0.0411,  0.7180,  0.5971,  0.4899, -1.3160,  0.5000,
              -1.8739, -0.9546,  1.5561],
             [ 1.0619, -1.1060, -0.5519,  0.8214,  0.6396, -2.1551, -0.1960,
              1.1760, -0.1146,  0.4246]],

            [[-1.3968,  1.6568,  0.8218, -0.9200, -0.6974,  0.6363,  1.4258,
              -0.4100, -0.7372, -0.3793],
             [ 0.4093,  1.4885, -0.5673, -0.2324,  0.7629,  1.3218, -1.5084,
              -1.0233,  0.5037, -1.1548],
             [-0.4265, -1.8654,  1.4884,  0.5210, -1.1411,  0.8590,  0.4084,
              -0.4893, -0.4500,  1.0955]]], grad_fn=<NativeLayerNormBackward>)

(squence[0,0,:] - squence[0,0,:].numpy().mean()) / squence[0,0,:].numpy().std()
Out:
    tensor([-1.9934,  1.0227, -0.2617, -0.7252,  0.1120, -0.5578, -0.7961,
            0.9713,  1.0740,  1.1540])

对比发现，这里只在最后一个维度进行操作！

2.3 指定两个维度

squence2 = torch.rand((2,2,7))
LN([2,7])(squence2)
Out:
    tensor([[[-0.1525, -0.3791,  1.9005,  0.9187, -1.2562, -0.9069,  0.4788],
             [-0.9507, -0.5147, -1.1867,  1.9212,  0.4739, -0.4837,  0.1374]],

            [[-0.4490, -1.2532,  1.2571, -0.7904, -0.7550, -1.0003,  0.2586],
             [ 1.2673, -0.8106, -0.2374,  1.4318,  0.0237,  1.8428, -0.7854]]],
       grad_fn=<NativeLayerNormBackward>)

(squence2[0,:,:] - squence2[0,:,:].numpy().mean()) / squence2[0,:,:].numpy().std()
Out:
    tensor([[-0.1525, -0.3791,  1.9006,  0.9188, -1.2563, -0.9070,  0.4788],
            [-0.9508, -0.5148, -1.1867,  1.9214,  0.4739, -0.4838,  0.1374]])

这里两种结构也是一致的，说明指定两个维度时，是在最后两个维度上进行标准化！

Top --- Bottom

未完！

清澈的爱，只为中国