Faster R-CNN论文阅读摘要

　　论文链接: https://arxiv.org/pdf/1506.01497.pdf

　　代码下载: https://github.com/ShaoqingRen/faster_rcnn (MATLAB)
　　　　　　 https://github.com/rbgirshick/py-faster-rcnn (Python)

Abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
#State-of-the-art目标检测网络依赖于region proposal算法来预测目标的定位.SPPnet[1]and Fast R-CNN[2]在降低检测网络运行时间的进展，揭示了region proposal运算是一个瓶颈.

In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position.
#在这篇作品中，我们引入了一个Region Proposal(RPN)网络，这个网络共用检测网络的全连接卷积特征，因此region proposal基本不占用运算资源.RPN是可以同时对边框和分类进行预测的全连接网络.

The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with “attention” mechanisms, the RPN component tells the unified network where to look. 
#RPN通过端到端的训练产生高质量的可用于Fast R-CNN检测任务的region proposal.我们进一步引入最新流形的神经网络术语＂注意力＂机制，使RPN和Fast R-CNN融合成一个网络达到共享卷积特征的效果,其中RPN告诉网络应该关注哪里.

For the very deep VGG-16 model [3],our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.
#对于非常深的网络例如VGG-16，我们的检测算法可以在GPU实现5fps的效果，其中每张图使用300个region proposals，同时在PASCAL VOC 2007,2012及MS COCO数据集上实现了state-of-the-art的目标检测准确率.在ILSVRC及COCO 2015挑战赛中,Faster R-CNN和RPN是一些最佳跟踪算法的基础.算法的源码可以被公开下载.

Introduction

Recent advances in object detection are driven by the success of region proposal methods (e.g., [4]) and region-based convolutional neural networks (R-CNNs) [5].Now, proposals are the test-time computational bottleneck in state-of-the-art detection systems.
#区域建议方法以及基于区域卷积神经网络的成功极大地促进了目标检测的进步.目前Proposals算法已经成为影响state-of-art检测系统测试时间的瓶颈.

Region proposal methods typically rely on inexpensive features and economical inference schemes.Selective Search [4], one of the most popular methods, greedily merges superpixels based on engineered low-level features. Yet when compared to efficient detection networks [2], Selective Search is an order of magnitude slower, at 2 seconds per image in a CPU implementation. EdgeBoxes [6] currently provides the best tradeoff between proposal quality and speed,at 0.2 seconds per image. Nevertheless, the region proposal step still consumes as much running time as the detection network.
#区域建议方法通常依赖于经济的特征提取及推理方案.Selective Search[4],一种最流行的方法,使用贪婪算法合并基于低维度特征的超像素.当然与文献[2]所述的高效检测网络相比,Selective Proposal运行速度依然低了一个数量级,其使用CPU处理每张照片所需的时间为2秒.而Edge Boxes[6]最近实现了区域质量和速度的折衷,达到了每张图片0.2秒.尽管如此,区域建议这个步骤仍然是检测网络中最消耗时间的.

One may note that fast region-based CNNs taken advantage of GPUs, while the region proposal methods used in research are implemented on the CPU,making such runtime comparisons inequitable. An obvious way to accelerate proposal computation is to reimplement it for the GPU.
#你可能会注意到基于区域的快速CNN有效利用了GPU,而研究中所使用的区域建议方法仍然是基于CPU实现的,这样进行运行时间上的对比有点不太公平.一个明显的方法是通过在GPU上运行达到加速区域建议运算的效果.

In this paper, we show that an algorithmic change-computing proposals with a deep convolutional neural network—leads to an elegant and effective solution where proposal computation is nearly cost-free given the detection network’s computation.
#在这篇论文中,我们展示了一种算法上的改进,使用深度神经网络改进区域建议,从而实现一个优雅而高效的解决方案,而其中区域建议几乎不占用运算资源.

Our observation is that the convolutional feature maps used by region-based detectors, like Fast R-CNN, can also be used for generating region proposals. On top of these convolutional features, we construct an RPN by adding a few additional convolutional layers that simultaneously regress region bounds and objectness scores at each location on a regular grid.
#我们发现类似Fast R-CNN这种基于区域的检测网络卷积特征图也可以用于产生区域建议.在这些卷积特征的顶部,我们通过添加少数额外的卷积层来构建RPN实现常规尺度下对区域边界和分类得分的同时回归.

RPNs are designed to efficiently predict region proposals with a wide range of scales and aspect ratios. In contrast to prevalent methods [8], [9], [1], [2] that use pyramids of images (Figure 1, a) or pyramids of filters(Figure 1, b), we introduce novel “anchor” boxes that serve as references at multiple scales and aspect ratios. Our scheme can be thought of as a pyramid of regression references (Figure 1, c), which avoids enumerating images or filters of multiple scales or aspect ratios.
#RPN被设计用于产生不同尺度和宽高比的区域建议.相比于流形的使用图像金字塔(图1,a)或滤波金字塔(图1,b),我们引入了"anchor" box概念用于充当多尺度和宽高比的参考.我们的方法被认为可以充当回归参考金字塔(图1,c),这份方法避免了对不同尺度及宽高比图像及滤波器的枚举.

To unify RPNs with Fast R-CNN [2] object detection networks, we propose a training scheme that alternates between fine-tuning for the region proposal task and then fine-tuning for object detection, while keeping the proposals fixed.
#为了将RPNs与Fast R-CNN目标检测网络联合使用,我们提出一个选择区域建议任务fine-tuning及固定区域目标检测fine-tuning的训练机制.

We comprehensively evaluate our method on the PASCAL VOC detection benchmarks [11] where RPNs with Fast R-CNNs produce detection accuracy better than the strong baseline of Selective Search with Fast R-CNNs.
#我们在PASCAL VOC检测基准上综合评估了我们的方法,其中使用RPNs的Fast R-CNN检测准确率由于strong baseline之使用Selective Search的Fast R-CNN.

A preliminary version of this manuscript was published previously [10].
#这篇手稿的初步版本之前在文献[10]发表过.

Related Work

　　　a) Object Proposals

There is a large literature on object proposal methods. Comprehensive surveys and comparisons of object proposal methods can be found in [19], [20], [21].
#关于object proposal方法有很多的文献描述,其中文献[19],[20],[21]给出了不同的proposal methods的综合调查和对比.

　　　b) Deep Networks for Obeject Detection

The R-CNN method [5] trains CNNs end-to-end to classify the proposal regions into object categories or background.R-CNN mainly plays as a classifier, and it does not predict object bounds (except for refining by bounding box regression). Its accuracy depends on the performance of the region proposal module (see comparisons in [20]). Several papers have proposed ways of using deep networks for predicting object bounding boxes [25], [9], [26], [27].
#文献5所述CNN方法使用端到端训练方式将proposal regions分类成物体目录或背景.其中R-CNN主要充当一个分类器的作用,不对目标边界进行预测(除非对bounding box回归重新进行定义).它的准确率依赖于region proposal模块的表现(见文献20中的对比).有一些论文也提到使用深度神经网络来预测目标bounding boxes[25],[9],[26],[27].

Shared computation of convolutions [9], [1], [29],[7], [2] has been attracting increasing attention for efficient, yet accurate, visual recognition.
#应用于高效准确视觉识别任务的共享神经网络[9],[1],[29],[7],[2]开始越来越多引起人们的关注.

Faster R-CNN

Our object detection system, called Faster R-CNN, is composed of two modules. The first module is a deep fully convolutional network that proposes regions,and the second module is the Fast R-CNN detector [2]that uses the proposed regions.In Section 3.1 we introduce the designs and properties of the network for region proposal. In Section 3.2 we develop algorithms for training both modules with features shared.
#我们的目标检测系统,又称Faster R-CNN,包含两个模块.第一个模块是用于区域建议的深度全连接网络,第二个模块是使用建议区域的Fast R-CNN检测器.在3.1中我们将介绍区域建议网络的设计及性质.在3.2中我们引入共享特征网络的训练算法.

　　　3.1 Region Proposal Netwoks

A Region Proposal Network (RPN) takes an image (of any size) as input and outputs a set of rectangular object proposals, each with an objectness score. We model this process with a fully convolutional network[7], which we describe in this section. Because our ultimate goal is to share computation with a Fast R-CNN object detection network [2], we assume that both nets share a common set of convolutional layers. In our experiments, we investigate the Zeiler and Fergus model[32] (ZF), which has 5 shareable convolutional layers and the Simonyan and Zisserman model [3] (VGG-16), which has 13 shareable convolutional layers.
#区域建议网络(RPN)使用任意尺寸的图像作为输出,输出一系列长方形的区域建议,每个区域建议都附带一个objectness得分.我们用一个全连接网络[7]对这个过程进行建模,这个过程将在本节进行讨论.因为我们的终极目标是共享Fast R-CNN目标检测网络[2]的运算,我们假定所有的网络都有一系列相同的卷积层.在我们的实验中,研究了具有5层共享卷积层的Zeiler和Fergus模型[32](ZF)以及具有13层共享卷积层的Simonyan和Zisserman模型[3](VGG-16).

To generate region proposals, we slide a small network over the convolutional feature map output by the last shared convolutional layer.
#为了产生区域建议,我们在最后一个共享卷积层输出的卷积特征图使用一个小的网络进行滑窗运算.

　　　a). Anchors

At each sliding-window location, we simultaneously predict multiple region proposals, where the number of maximum possible proposals for each location is denoted as k. So the reg layer has 4k outputs encoding the coordinates of k boxes, and the cls layer outputs 2k scores that estimate probability of object or not object for each proposal 4 . The k proposals are parameterized relative to k reference boxes, which we call anchors. An anchor is centered at the sliding window in question, and is associated with a scale and aspect ratio (Figure 3, left).
#在每个滑动窗口位置,我们同时预测多个region proposals,其中每个位置可能的最大proposal数量通过k来定义,所以reg层有4k个输出用于编码k个boxes的坐标,cls层有2k个输出每个proposal时候为目标的概率.这些k个proposals使用k个参数化的boxes,这个reference boxes我们称之为anchors.Anchors是滑动窗口中心,与尺寸与宽高比无关(图3左边)

　　　　Translation-Invariant Anchors

An important property of our approach is that it is translation invariant, both in terms of the anchors and the functions that compute proposals relative to the anchors. If one translates an object in an image,the proposal should translate and the same function should be able to predict the proposal in either location. This translation-invariant property is guaranteed by our method.
#我们所使用的方法一个很重要的性质是无论是ancors的形式或是与计算anchors的proposals的函数,都具有平移不变性.假如平移一张图像上的物体,对应的proposal也应该平移,同一个函数需要在这两个位置都能够预测proposal.这个平移不变性在我们提出的方法中得到了保证.

The translation-invariant property also reduces the model size.
#平移不变的特性也起到降低模型尺寸的作用.

　　　　Multi-Scale Anchors as Regression References

Our design of anchors presents a novel scheme for addressing multiple scales (and aspect ratios). As shown in Figure 1, there have been two popular ways for multi-scale predictions. The first way is based on image/feature pyramids, e.g., in DPM [8] and CNN-based methods [9], [1], [2].The second way is to use sliding windows of multiple scales (and/or aspect ratios) on the feature maps.
#我们的anchor设计展示了一个强调多尺度(以及宽高比)的概念机制.如图1所示,有两种流形的多尺度预测方式.第一种是基于图像/特征金字塔(例如文献8所述DPM)和基于CNN的方法(例如文献9,1,2).第二种方法是在特征图上使用不同尺度(和/或宽高比)的滑动窗口

As a comparison, our anchor-based method is built on a pyramid of anchors, which is more cost-efficient.
#作为对比,我们基于anchor的方法是建立在anchors金字塔上的,这种方法具有更高的计算效率.

The design of multi-scale anchors is a key component for sharing features without extra cost for addressing scales.
#多尺度anchors的设计是权值共享的关键,而无需强调额外的尺度.

　　　b). Loss Function

For training RPNs, we assign a binary class label (of being an object or not) to each anchor. We assign a positive label to two kinds of anchors: (i) the anchor/anchors with the highest Intersection-over-Union (IoU) overlap with a ground-truth box, or (ii) an anchor that has an IoU overlap higher than 0.7 with any ground-truth box.Note that a single ground-truth box may assign positive labels to multiple anchors.Usually the second condition is sufficient to determine the positive samples; but we still adopt the first condition for the reason that in some rare cases the second condition may find no positive sample. We assign a negative label to a non-positive anchor if its IoU ratio is lower than 0.3 for all ground-truth boxes. Anchors that are neither positive nor negative do not contribute to the training objective.
#为了训练RPNs,我们为每个anchor分配了一个二进制分类标签(是否为某个物体).我们为两种anchors分配正向标签:(1)与ground-truth拥有最高IoU的anchor/anchors,或者(ii)与任意ground-truth相交IoU超过0.7.注意一个单独的groung-truth box可能也会在不同的尺度下被赋予正向标签.通常第二种情况足以确定正向样本.但考虑到特定场景下考虑找不到任何的正向样本,我们仍然引入第一种情况.当一个非正向anchor与所有的ground-truth boxes的IoU都低于0.3时,我们赋予一个正向标签.即不是正向也不是反向的Anchors对训练目标没有任何贡献.

Our loss function for an image is defined as:
#我们的损失函数定义如下:

The two terms are normalized by N_cls and N_reg and weighted by a balancing parameter λ.
#这两个损失通过N_cls以及N_reg实现标准化,并通过λ控制平衡.

For bounding box regression, we adopt the parameterizations of the 4 coordinates following [5]:
#对于bounding box回归,我们引入了4个相关的公式:

where x, y, w, and h denote the box’s center coordinates and its width and height.
#其中x,y,w及h分别代表box的中间坐标系和宽,高.

　　　c).Training RPNs

The RPN can be trained end-to-end by backpropagation and stochastic gradient descent (SGD)[35]. We follow the “image-centric” sampling strategy from [2] to train this network. Each mini-batch arises from a single image that contains many positive and negative example anchors.
#RPN可以使用反向传播和随机梯度下降实现端到端训练,我们遵循文献[2]所述的"图像居中"采样原则来训练这个网络.每个mini-batch来自于一个包含很多正向和反向样本anchors.

We randomly initialize all new layers by drawing weights from a zero-mean Gaussian distribution with standard deviation 0.01. All other layers (i.e., the shared convolutional layers) are initialized by pre-training a model for ImageNet classification [36], as is standard practice [5]. We tune all layers of the ZF net, and conv3 1 and up for the VGG net to conserve memory [2]. We use a learning rate of 0.001 for 60k mini-batches, and 0.0001 for the next 20k mini-batches on the PASCAL VOC dataset. We use a momentum of 0.9 and a weight decay of 0.0005 [37].Our implementation uses Caffe [38].
#我们使用方差为的零均值高斯分布对所有新的网络层进行随机初始化.常规的做法是其他所有网络层(包含共享卷积层)使用ImageNet分类的一个预训练模型进行初始化.出于减少内存使用的考虑,我们调整ZF网络的所有层和卷积层3_1.训练数据集的前60k mini-batches我们使用0.001的学习率,接下来的20k mini-bathes使用0.0001的学习率.我们使用momentum为0.9及权值decay为0.0005,Caffe实现.

　　　3.2 Sharing Features for RPN and Fast R-CNN

Thus far we have described how to train a network for region proposal generation, without considering the region-based object detection CNN that will utilize these proposals. For the detection network, we adopt Fast R-CNN [2]. Next we describe algorithms that learn a unified network composed of RPN and Fast R-CNN with shared convolutional layers (Figure 2).
#至此我们已经描述了用于region proposal的网络训练,而没有考虑到利用这些proposals的基于region的目标检测CNN.对于检测网络,我们应用了Fast R-CNN[2].之后我们会讨论共用卷积层的RPN和Fast R-CNN联合训练算法.

Both RPN and Fast R-CNN, trained independently,will modify their convolutional layers in different ways. We therefore need to develop a technique that allows for sharing convolutional layers between the two networks, rather than learning two separate networks. We discuss three ways for training networks with features shared:
#RPN和Fast R-CNN都会被独立训练,以各自不同的方式修改它们的卷积层.我们因此需要开发一个允许两个网络共享卷积层的技术,而不是分别训练两个网络.我们将以三个角度探讨共享特征的网络训练方法:

(i)Alternating training. In this solution, we first train RPN, and use the proposals to train Fast R-CNN.The network tuned by Fast R-CNN is then used to initialize RPN, and this process is iterated. This is the solution that is used in all experiments in this paper.
#(i)选择性训练.在这一节中,我们首先对RPN进行训练,然后用得到的proposals训练Fast R-CNN.针对Fast R-CNN微调后的网络被用于RPN的初始化,这个过程是重复进行的.这是本篇论文中所有实验的方案.

(ii)Approximate joint training. In this solution, the RPN and Fast R-CNN networks are merged into one network during training as in Figure 2.
#(ii)近似joint训练.在这一步骤中,RPN和Fast R-CNN在训练时将被融合层一个网络(如图2所示)

(iii)Non-approximate joint training. As discussed above, the bounding boxes predicted by RPN are also functions of the input. 
#(iii)非近似jpint训练.如上文讨论的,RPN预测的bounding boxes也会作为函数的输入.

　　　a). 4-Step Alternating Training

In this paper, we adopt a pragmatic 4-step training algorithm to learn shared features via alternating optimization. In the first step,we train the RPN as described in Section 3.1.3. This network is initialized with an ImageNet-pre-trained model and fine-tuned end-to-end for the region proposal task. 
#在这篇文章中,我们引入了一个实用的4步训练算法通过选择性优化来学习共享特征.第一步,按照节3.1.3来训练RPN.网络使用一个ImageNet预训练模式初始化,然后针对region proposal进行端到端微调.

In the second step, we train a separate detection network by Fast R-CNN using the proposal sgenerated by the step-1 RPN. This detection network is also initialized by the ImageNet-pre-trained model. At this point the two networks do not share convolutional layers.
#第二步,我们使用步骤1中分割的proposal训练一个单独的Fast R-CNN检测网络.这个检测网络也是使用ImageNet预训练模型进行初始化的.从这个角度出发,两个网络尚未实现卷积层共享.

In the third step, we use the detector network to initialize RPN training, but we fix the shared convolutional layers and only fine-tune the layers unique to RPN. Now the two networks share convolutional layers.
#第三步,我们使用检测网络对初始化RPN训练,但是我们固定共享卷积层,只对RPN特有的层进行fine-tune.现在两个网络实现了卷积层共享.

Finally, keeping the shared convolutional layers fixed, we fine-tune the unique layers of Fast R-CNN. As such, both networks share the same convolutional layers and form a unified network.
#最后,保持共享卷积层固定,我们对Fast R-CNN特有的层进行fine-tune.至此,两个网络共用相同的卷积层,并且形成独特的网络.

　　　3.3 Implementation Details

We train and test both region proposal and object detection networks on images of a single scale [1], [2].We re-scale the images such that their shorter side is s = 600 pixels [2]. Multi-scale feature extraction(using an image pyramid) may improve accuracy but does not exhibit a good speed-accuracy trade-off [2].
#我们在单一尺度图像集上[1],[2]训练和测试region proposal网络和目标检测网络.我们将图像进行缩放以实现所有短边都等于600像素[2].多尺度特征提取(使用图像金字塔)可能会提高准确率,但是并没有表现出与速度上的很好权衡.

On the re-scaled images, the total stride for both ZF and VGG nets on the last convolutional layer is 16 pixels, and thus is ∼10 pixels on a typical PASCAL image before resizing (∼500×375). Even such a large stride provides good results, though accuracy may be further improved with a smaller stride.
#在经过缩放的图像上,ZF和VGG网络最后卷积层的步长都是16像素,因此相当于缩放前(约500*375像素)典型PASCAL图像的10像素.这么大的步长仍然保持比较好的效果,尽管使用更小的步长可能可以进一步提高准确率.

For anchors, we use 3 scales with box areas of 128²,256², and 512² pixels, and 3 aspect ratios of 1:1, 1:2,and 2:1. These hyper-parameters are not carefully chosen for a particular dataset, and we provide ablation experiments on their effects in the next section.
#我们使用3种不同的尺度,分别对应128*128,256*256,512*512,三种不同的宽高比,分别对应1:1,1:2和2:1.这些超参数并没有经过特定数据集上的精心筛选,然后我们将在下一节中通过experiments讲述每个部分的作用.

As discussed, our solution does not need an image pyramid or filter pyramid to predict regions of multiple scales,saving considerable running time. Figure 3 (right) shows the capability of our method for a wide range of scales and aspect ratios. Table 1 shows the learned average proposal size for each anchor using the ZFnet.
#一如讨论的一样,我们的解决方案并不需要图像金字塔或者滤波器金字塔来预测不同尺度的regions,因此节省了可观的时间.图3(右边)展示了我们的方法应对不同尺度及宽高比的能力.表1展示了ZFnet中每个anchor的平均proposal尺寸.

We note that our algorithm allows predictions that are larger than the underlying receptive field.Such predictions are not impossible—one may still roughly infer the extent of an object if only the middle of the object is visible.
#我们注意到我们的算法允许预测(区域)可以比潜在的感受野(receptive field)大.这些预测并非不可能的,只要图像中间物体是可见的,仍然可以粗略推测出物体的内容.

The anchor boxes that cross image boundaries need to be handled with care. During training, we ignore all cross-boundary anchors so they do not contribute to the loss.
#横跨图像边界的anchor boxes需要被仔细对待.在训练过程中,我们忽略了所有跨边界的anchors,所以他们不会对loss产生影响.

For a typical 1000 × 600 image, there will be roughly 20000 (≈ 60 × 40 × 9) anchors in total. With the cross-boundary anchors ignored, there are about 6000 anchors per image for training.
#对于一个典型的1000*600的图片,总共大概有近20000(约60*40*9)个anchors.在忽略掉跨边界的anchors后,每张图像约有6000个anchors需要训练.

If the boundary-crossing outliers are not ignored in training,they introduce large, difficult to correct error terms in the objective, and training does not converge. During testing, however, we still apply the fully convolutional RPN to the entire image. This may generate cross-boundary proposal boxes, which we clip to the image boundary.
#如果训练中不忽略跨边界的极端值,它们对消除错误检测造成较大的困难,同时训练也不会收敛.但是,在测试中,我们仍然对整张图片应用全卷积RPN.当我们剪切到边界时,这可能不会产生跨边界的proposal boxes.

Some RPN proposals highly overlap with each other. To reduce redundancy, we adopt non-maximum suppression (NMS) on the proposal regions based on their cls scores.We fix the IoU threshold for NMS at 0.7, which leaves us about 2000 proposal regions per image.
#有一些RPN proposals可能会与其他proposal严重重叠.为了降低重复运算,我们在proposal regions上基于它们的cls得分引入了非极大值抑制(NMS).我们将IoU的阈值修改到0.7,使我们在训练中每张图剩下2000个proposal regions.

As we will show, NMS does not harm the ultimate detection accuracy, but substantially reduces the number of proposals. After NMS, we use the top-N ranked proposal regions for detection. In the following, we train Fast R-CNN using 2000 RPN proposals, but evaluate different numbers of proposals at test-time.
#正如我们即将呈现的,NMS并不会损害最终的检测准确率,但是实际上相当多的降低了proposals的数目.在极大值抑制之后,我们选择top-N排列的proposal regions用于检测.随后,网络使用Fast R-CNN对2000个RPN proposals进行训练,但是在测试阶段使用不同数量的proposals进行评估.

Experiments

　　　4.1 Experiments on PASCAL VOC

　　　4.2 Experiments on MS COCO

　　　4.3 From MS COCO to PASCAL VOC

Conclusion

We have presented RPNs for efficient and accurate region proposal generation.By sharing convolutional features with the down-stream detection network, the region proposal step is nearly cost-free. 
#我们展示了可以高效准确地产生region proposal的RPNs.通过与下游检测网络共享卷积层特征的方式,region proposal阶段几乎不占用运算资源.

Our method enables a unified, deep-learning-based object detection system to run at near real-time frame rates. The learned RPN also improves region proposal quality and thus the overall object detection accuracy.
#我们的方法使一个联合的基于深度学习的目标检测系统几乎以实时的帧率运行.训练的RPN也提高了region proposal的质量,因此提高了总体的目标检测准确率.