mmdetection:各种各样的anchor生成方式及其标签分配assigner(1)

在anchor_generator.py中,集成了很多中anchor的生成方式,趁阅读源码mmdetection之际,对其进行一下总结

(先总结下faster rcnn, yolov3, SSD,后续继续补充)

一、anchor生成

         生成anchor的总体思路是,首先生成base_anchor,然后网格化(meshgrid)生成其他anchor。

        1、faster rcnn

               faster rcnn的anchor生成是最经典的,其他anchor生成方式与之相比大同小异。在anchor_generator中,可以

看到它是在没有使用for循环的情况下,如何生成的anchor的。

              首先是base_anchor。以(0,0)为左上角,以(stride, stride)为基本(w, h),分别与scale,ratio计算得到的多个

anchor。比如scale = [8, 16, 32](w,h的大小)  , ratios=[0.5, 1.0, 2.0](w和h的比例), 那就是生成9个anchor。

 1 def gen_single_level_base_anchors(self,
 2                                       base_size,
 3                                       scales,
 4                                       ratios,
 5                                       center=None):
 6         """Generate base anchors of a single level.
 7 
 8         Args:
 9             base_size (int | float): Basic size of an anchor.
10             scales (torch.Tensor): Scales of the anchor.
11             ratios (torch.Tensor): The ratio between between the height
12                 and width of anchors in a single level.
13             center (tuple[float], optional): The center of the base anchor
14                 related to a single feature grid. Defaults to None.
15 
16         Returns:
17             torch.Tensor: Anchors in a single-level feature maps.
18         """
19         w = base_size
20         h = base_size
21         if center is None:
22             x_center = self.center_offset * w
23             y_center = self.center_offset * h
24         else:
25             x_center, y_center = center
26 
27         h_ratios = torch.sqrt(ratios)
28         w_ratios = 1 / h_ratios
29         if self.scale_major:
30             ws = (w * w_ratios[:, None] * scales[None, :]).view(-1)
31             hs = (h * h_ratios[:, None] * scales[None, :]).view(-1)
32         else:
33             ws = (w * scales[:, None] * w_ratios[None, :]).view(-1)
34             hs = (h * scales[:, None] * h_ratios[None, :]).view(-1)
35 
36         # use float anchor and the anchor's center is aligned with the
37         # pixel center
38         base_anchors = [
39             x_center - 0.5 * ws, y_center - 0.5 * hs, x_center + 0.5 * ws,
40             y_center + 0.5 * hs
41         ]
42         base_anchors = torch.stack(base_anchors, dim=-1)
43 
44         return base_anchors

          有了base_anchor,那就只需在其他位置上,对base_anchor进行相应的偏移即可。因此,先通过meshgrid,生成各个位置,然后加上base_anchor。

 1 def _meshgrid(self, x, y, row_major=True):
 2         """Generate mesh grid of x and y.
 3 
 4         Args:
 5             x (torch.Tensor): Grids of x dimension.
 6             y (torch.Tensor): Grids of y dimension.
 7             row_major (bool, optional): Whether to return y grids first.
 8                 Defaults to True.
 9 
10         Returns:
11             tuple[torch.Tensor]: The mesh grids of x and y.
12         """
13         xx = x.repeat(len(y))
14         yy = y.view(-1, 1).repeat(1, len(x)).view(-1)
15         if row_major:
16             return xx, yy
17         else:
18             return yy, xx
19 
20 def single_level_grid_anchors(self,
21                                   base_anchors,
22                                   featmap_size,
23                                   stride=(16, 16),
24                                   device='cuda'):
25         """Generate grid anchors of a single level.
26 
27         Note:
28             This function is usually called by method ``self.grid_anchors``.
29 
30         Args:
31             base_anchors (torch.Tensor): The base anchors of a feature grid.
32             featmap_size (tuple[int]): Size of the feature maps.
33             stride (tuple[int], optional): Stride of the feature map in order
34                 (w, h). Defaults to (16, 16).
35             device (str, optional): Device the tensor will be put on.
36                 Defaults to 'cuda'.
37 
38         Returns:
39             torch.Tensor: Anchors in the overall feature maps.
40         """
41         feat_h, feat_w = featmap_size
42         # convert Tensor to int, so that we can covert to ONNX correctlly
43         feat_h = int(feat_h)
44         feat_w = int(feat_w)
45         shift_x = torch.arange(0, feat_w, device=device) * stride[0]
46         shift_y = torch.arange(0, feat_h, device=device) * stride[1]
47 
48         shift_xx, shift_yy = self._meshgrid(shift_x, shift_y)
49         shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1)
50         shifts = shifts.type_as(base_anchors)
51         # first feat_w elements correspond to the first row of shifts
52         # add A anchors (1, A, 4) to K shifts (K, 1, 4) to get
53         # shifted anchors (K, A, 4), reshape to (K*A, 4)
54 
55         all_anchors = base_anchors[None, :, :] + shifts[:, None, :]
56         all_anchors = all_anchors.view(-1, 4)
57         # first A rows correspond to A anchors of (0, 0) in feature map,
58         # then (0, 1), (0, 2), ...
59         return all_anchors

             2、yolov2&yolov3

                  和faster rcnn的不同就是base_anchor。yolo的base_anchor是通过对数据集聚类得到的。如下,可以看到这里不需要在去

计算scale和ratio,剩下的就是网格化生成其余anchor。

 1 def gen_single_level_base_anchors(self, base_sizes_per_level, center=None):
 2         """Generate base anchors of a single level.
 3 
 4         Args:
 5             base_sizes_per_level (list[tuple[int, int]]): Basic sizes of
 6                 anchors.
 7             center (tuple[float], optional): The center of the base anchor
 8                 related to a single feature grid. Defaults to None.
 9 
10         Returns:
11             torch.Tensor: Anchors in a single-level feature maps.
12         """
13         x_center, y_center = center
14         base_anchors = []
15         for base_size in base_sizes_per_level:
16             w, h = base_size
17 
18             # use float anchor and the anchor's center is aligned with the
19             # pixel center
20             base_anchor = torch.Tensor([
21                 x_center - 0.5 * w, y_center - 0.5 * h, x_center + 0.5 * w,
22                 y_center + 0.5 * h
23             ])
24             base_anchors.append(base_anchor)
25         base_anchors = torch.stack(base_anchors, dim=0)
26 
27         return base_anchors 

            3、SSD

                         SSD也类似,不同的地方就是anchor的尺度不再是固定的,而是变化的(参考论文给的尺度公式):随着特征图减小,尺度逐渐增大(感受野大,anchor也要大)。

           剩下的就和faster rcnn一样了。参考https://zhuanlan.zhihu.com/p/33544892

 1 # 计算出在原图上,anchor大小(单边为60, 111, 162, 213, 264)
 2         min_sizes = []
 3         max_sizes = []
 4         for ratio in range(int(min_ratio), int(max_ratio) + 1, step):
 5             min_sizes.append(int(self.input_size * ratio / 100))
 6             max_sizes.append(int(self.input_size * (ratio + step) / 100))
 7 
 8         # anchor再增加一个尺度30   
 9         if self.input_size == 300:
10             if basesize_ratio_range[0] == 0.15:  # SSD300 COCO
11                 min_sizes.insert(0, int(self.input_size * 7 / 100))
12                 max_sizes.insert(0, int(self.input_size * 15 / 100))
13 
14 # 计算其 scale ratio
15 anchor_ratios = []
16         anchor_scales = []
17         for k in range(len(self.strides)):
18             scales = [1., np.sqrt(max_sizes[k] / min_sizes[k])]
19             anchor_ratio = [1.]
20             for r in ratios[k]:
21                 anchor_ratio += [1 / r, r]  # 4 or 6 ratio
22             anchor_ratios.append(torch.Tensor(anchor_ratio))
23             anchor_scales.append(torch.Tensor(scales))

二、anchor assigner

          生成了anchor之后,要对其打标签,看看哪些是正样本,哪些是负样本。

          1、MaxIouAssigner

                 SSD和faseter rcnn采用的方式,计算anchor与GT的IOU。对每个anchor,如果其IOU>pos_thread,则为正样本,如果IOU<neg_thread,则为背景类;此外,

        在代码中还有一段细节。在上述策略中,有些GT可能没有匹配到任何的anchor,因此添加了一个补救措施来扩充正样本:遍历每一个GT,

       查看iou最大的anchor,如果最大iou>pos_min_iou,那就标记为正样本。注意:这个其实并不能保证每个GT一定有anchor(跟GT遍历顺序有关),

        并且会引入一些不太好的正样本,因此效果并不一定好。这边博客举了一个不错的例子

         目标检测(MMdetection)——Retina(Anchor、Focal Loss) - 知乎 (zhihu.com)

 1 if self.match_low_quality:
 2             # Low-quality matching will overwirte the assigned_gt_inds assigned
 3             # in Step 3. Thus, the assigned gt might not be the best one for
 4             # prediction.
 5             # For example, if bbox A has 0.9 and 0.8 iou with GT bbox 1 & 2,
 6             # bbox 1 will be assigned as the best target for bbox A in step 3.
 7             # However, if GT bbox 2's gt_argmax_overlaps = A, bbox A's
 8             # assigned_gt_inds will be overwritten to be bbox B.
 9             # This might be the reason that it is not used in ROI Heads.
10             for i in range(num_gts):
11                 if gt_max_overlaps[i] >= self.min_pos_iou:
12                     if self.gt_max_assign_all:
13                         max_iou_inds = overlaps[i, :] == gt_max_overlaps[i]
14                         assigned_gt_inds[max_iou_inds] = i + 1
15                     else:
16                         assigned_gt_inds[gt_argmax_overlaps[i]] = i + 1

       2、GridAssigner

              YOLO采用的方式,同样先计算anchor与GT的IOU。负样本标记方式相同,不同的是正样本。对于每个anchor,其最近网格的IOU>pos_thread并且其中心落入

     该网格,则该anchor为正样本;对于每个GT,将其最近的anchor,赋值给该GT最近的格子。这就意味着,每个GT,其实只有一个正样本。

三、box编码

         在训练过程中,并非直接使用GT和anchor的坐标直接训练,为了加速收敛,会对其进行编码,编码的方式略有不同。

            这部分参考史上最详细的Yolov3边框预测分析_逍遥王的博客-CSDN博客

             

                                       

原文地址:https://www.cnblogs.com/573177885qq/p/14362449.html