Scale-Invariant Error

概
主要内容
代码

Eigen D., Puhrsch C. and Fergus R. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. NIPS 2014.

概

看这篇文章单纯是为了看一看这个scale-invariant error.

主要内容

我们时常通过平方误差来衡量两个图片的差异, 但是这个损失是很依赖与scale的.
比如, 有两个图片(m{x}, m{x}'), 则其误差为

[|m{x} - m{x}'|_2^2 = sum_{i=1}^n (m{x}_i - m{x}_i')^2, ]

倘若此时(x)的每一个元素都增加了(c), 则变成了

[|m{x} + c - m{x}'|_2^2, ]

这个实际不是非常友好的, 我们是希望这个损失最好是Scale-Invariant的, 所以我们在损失的部分加入一个值

[| m{x} - m{x}' + alpha |_2^2, ]

注意, 这里的(m{x})可以理解为(m{x} + c), 那么选择一个怎样的(alpha)能够使得上述的误差最小呢(关于特定的(m{x}, m{x}')).

[2(m{x} - m{x}' + alpha)^T m{1} = 0 Rightarrow alpha = frac{1}{n} (m{x}'- m{x})^T m{1} = frac{1}{n}sum_{i=1}^n (x_i' - x_i). ]

故, 最后的损失函数是

[| m{x} - m{x}' + frac{1}{n}(m{x} - m{x}')^T m{1}|_2^2 = |m{x} - m{x}'|_2^2 - frac{1}{n} ((m{x} - m{x}')^T m{1})^2. ]

注: 如果我们将像素置于对数空间, 即考虑(log m{x}), 则上述实际上考虑的(c cdot m{x}) 的scale.

代码

import torch
import torch.nn as nn
import torch.nn.functional as F

def scale_invariant_loss(outs: torch.Tensor, targets: torch.Tensor, reduction="mean"):
    """
    outs: N ( x C) x H x W
    targets: N ( x C) x H x W
    reduction: ...
    """
    outs = outs.flatten(start_dim=1)
    targets = targets.flatten(start_dim=1)
    alpha = (targets - outs).mean(dim=1, keepdim=True)
    return F.mse_loss(outs + alpha, targets, reduction=reduction)