目标检测数据集The Object Detection Dataset

在目标检测领域，没有像MNIST或Fashion MNIST这样的小数据集。为了快速测试模型，我们将组装一个小数据集。首先，我们使用一个开源的3D Pikachu模型生成1000张不同角度和大小的Pikachu图像。然后，我们收集一系列背景图像，并在每个图像上随机放置一个Pikachu图像。我们使用MXNet提供的im2rec工具将图像转换为二进制RecordIO格式[1]。这种格式可以减少数据集在磁盘上的存储开销，提高读取效率。如果您想了解有关如何读取图像的更多信息，请参阅GluonCV工具包的文档。

1. Downloading the Dataset

可以直接从互联网上下载RecordIO格式的Pikachu数据集。

%matplotlib inline

from d2l import mxnet as d2l

from mxnet import gluon, image, np, npx

import os

npx.set_np()

#@save

d2l.DATA_HUB['pikachu'] = (d2l.DATA_URL + 'pikachu.zip',

'68ab1bd42143c5966785eb0d7b2839df8d570190')

2. Reading the Dataset

我们将通过创建实例imageDediter来读取对象检测数据集。名称中的“Det”是指检测。我们将随机读取训练数据集。因为数据集的格式是RecordIO，所以我们需要图像索引文件'train.idx'读取随机的小批量。此外，对于训练集的每个图像，我们将使用随机裁剪，并要求裁剪后的图像至少覆盖每个对象的95%。由于裁剪是随机的，这一要求并不总是满足的。我们将随机裁剪尝试的最大次数设置为200次。如果它们都不符合要求，图像将不会被裁剪。为了确保输出的确定性，我们不会随机裁剪测试数据集中的图像。我们也不需要随机读取测试数据集。

#@save

def load_data_pikachu(batch_size, edge_size=256):

"""Load the pikachu dataset."""

data_dir = d2l.download_extract('pikachu')

train_iter = image.ImageDetIter(

path_imgrec=os.path.join(data_dir, 'train.rec'),

path_imgidx=os.path.join(data_dir, 'train.idx'),

batch_size=batch_size,

data_shape=(3, edge_size, edge_size), # The shape of the output image

shuffle=True, # Read the dataset in random order

rand_crop=1, # The probability of random cropping is 1

min_object_covered=0.95, max_attempts=200)

val_iter = image.ImageDetIter(

path_imgrec=os.path.join(data_dir, 'val.rec'), batch_size=batch_size,

data_shape=(3, edge_size, edge_size), shuffle=False)

return train_iter, val_iter

下面，我们阅读一个小批量，并打印图像和标签的形状。图像的形状与前一个实验中相同（批量大小、通道数、高度、宽度）(batch size, number of channels, height, width)。标签的形状是（批量大小，m，5）(batch size, $m$ m, 5)，其中m等于数据集中单个图像中包含的最大边界框数。虽然小批量的计算非常高效，但它要求每个图像包含相同数量的边界框，以便将它们放置在同一批中。因为每个图像可能有不同数量的边界框，我们可以添加非法的边界框到少于m边界框，直到每个图像包含m边界框。因此，我们每次都可以读取一小批图像。图像中每个边界框的标签由长度为5的数组表示。数组中的第一个元素是边界框中包含的对象的类别。当值为-1时，边界框是非法的填充边界框。数组的其余四个元素表示x、y、边界框左上角和边界框右下角的轴坐标（值范围在0和1之间）。这里的Pikachu数据集每个图像只有一个边界框，因此m=1。

batch_size, edge_size = 32, 256

train_iter, _ = load_data_pikachu(batch_size, edge_size)

batch = train_iter.next()

batch.data[0].shape, batch.label[0].shape

Downloading ../data/pikachu.zip from http://d2l-data.s3-accelerate.amazonaws.com/pikachu.zip...

((32, 3, 256, 256), (32, 1, 5))

3. Demonstration

我们有十张图片，上面有边框。我们可以看到Pikachu的角度、大小和位置在每个图像中都是不同的。当然，这是一个简单的人工数据集。在实际操作中，数据通常要复杂得多。

imgs = (batch.data[0][0:10].transpose(0, 2, 3, 1)) / 255

axes = d2l.show_images(imgs, 2, 5, scale=2)

for ax, label in zip(axes, batch.label[0][0:10]):

d2l.show_bboxes(ax, [label[0][1:5] * edge_size], colors=['w'])

4. Summary

The Pikachu dataset we synthesized can be used to test object detection models.
The data reading for object detection is similar to that for image classification. However, after we introduce bounding boxes, the label shape and image augmentation (e.g., random cropping) are changed.