视频学习--《语义分割中的自注意力机制和低秩重建》

语义分割（Semantic Segmentation）：对图像中的每个像素同时输出一个label

全卷积网络：理论上感知域增大，实际有效感知域很小。
Nonlocal Network （对应自注意力机制）

Non-local neural networks：为了推测某一位置上的物品信息，需要建立此位置和图像中所有点的关系，计算方法：

f()为xi,xj的关系建模，C(x)是对f()的归一化，g(xj)是对参考像素的变换，相似度的其他选择：

具体实现：

复杂度为N*N*C
A^2-Nets

A^2-Nets: Double Attention Networks

与Nonlocal network对比（右图为nonlocal net,左图为A^2-Nets）：

计算复杂度减小为 N*C*C
EM Attention Networks

Expectation Maximization Attention Networks for Semantic Segmentation

N为图像数量，C为输入feature map的维度，Z为映射矩阵，结构上实现：
Tricks for semantic segmentation

Tricks that must work:
- Not use Pytorch's official ResNet.
- Avoid weight decay on BN and Conv's bias.
- Use OHEM for test.
- Interpolating with align_corners=True.
- Set crop size as 8x+1.
- Inference with sliding window on Cityscapes.
- Inference with the whole image on PASCAL VOC.
Tricks may work:
- Use a 10 times larger lr at the segmentation head.
- Training with warmup strategy.