论文笔记：Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships

Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships

2018-09-07 20:38:10

pdf: http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Structure_Inference_Net_CVPR_2018_paper.pdf

Introduction:

本文在物体检测尝试结合场景信息，以及 物体之间的关系 来进一步的提升检测结果。

文章的流程如下所示：

大致过程可以表达为：

1. 首先利用 RPN 进行 proposal 的提取；

2. 将整幅图像的 feature 传入到 fc layer 中，得到对应的 scene 的 feature；

3. 利用 roi pooling, 得到 proposal 对应的 feature map，然后传入到 fc 中，得到向量化的 feature；

4. 利用不同 proposals 之间的空间关系，来学习 edges 的信息；

5. 将上述信息分别传入到 scene GRU 以及 edge GRU 中，得到增强之后的 feature，然后进行 BBox 的分类及回归；

其中关于 GRU 的介绍如下所示：

该网络中的 structure inference 部分为：

对于每一个 proposal，我们这里看到上图中的 vi, 那么该 proposal 的 feature 为：f_i^v，给定 scene 的 feature，那么将这两个信息输入到 scene GRU中，得到基于场景的 feature；

将不同 proposal 之间的关系，建模到模型中，那么：

根据空间位置信息，得到 R；

然后根据 R，我们可以得到 e，然后就可以进行 max-pooling，然后得到 m；

将该信息传到 edge GRU，得到 hidden state；

然后将 scene GRU 以及 edge GRU，得到的状态，在进行结合：