数据集标签制作

学习目标检测已经有段时间了，以前都是拿着别人写好的相关代码（api）来用，没有自己好好总结琢磨，想到自己以后工作后估计还是要去用到，这不，再次从最基本的数据和标签准备环节进行。

目标检测领域基本数据类型用的多无非就是VOC、COCO两种，下面就记录一下这两种数据类型的获取。

1. VOC 数据格式

VOC 数据（VOC2007）的目录

├── Annotations 进行 detection 任务时的标签文件，xml 形式，文件名与图片名一一对应
├── ImageSets 包含三个子文件夹 Layout、Main、Segmentation，其中 Main 存放的是分类和检测的数据集分割文件
├── JPEGImages 存放 .jpg 格式的图片文件
├── SegmentationClass 存放按照 class 分割的图片
└── SegmentationObject 存放按照 object 分割的图片

├── Main
│   ├── train.txt 写着用于训练的图片名称， 共 2501 个
│   ├── val.txt 写着用于验证的图片名称，共 2510 个
│   ├── trainval.txt train与val的合集。共 5011 个
│   ├── test.txt 写着用于测试的图片名称，共 4952 个

Annotations 下的XML

<annotation>  
    <folder>VOC2012</folder>                             
    <filename>2007_000392.jpg</filename>                             //文件名  
    <source>                                                         //图像来源（不重要）  
        <database>The VOC2007 Database</database>  
        <annotation>PASCAL VOC2007</annotation>  
        <image>flickr</image>  
    </source>  
    <size>                                            //图像尺寸（长宽以及通道数）                        
        <width>500</width>  
        <height>332</height>  
        <depth>3</depth>  
    </size>  
    <segmented>1</segmented>            //是否用于分割（在图像物体识别中01无所谓）  
    <object>                              //检测到的物体  
        <name>horse</name>                                         //物体类别  
        <pose>Right</pose>                                         //拍摄角度  
        <truncated>0</truncated>                                   //是否被截断（0表示完整）  
        <difficult>0</difficult>                                   //目标是否难以识别（0表示容易识别）  
        <bndbox>                                                   //bounding-box（包含左下角和右上角xy坐标）  
            <xmin>100</xmin>  
            <ymin>96</ymin>  
            <xmax>355</xmax>  
            <ymax>324</ymax>  
        </bndbox>  
    </object>  
    <object>              //检测到多个物体  
        <name>person</name>  
        <pose>Unspecified</pose>  
        <truncated>0</truncated>  
        <difficult>0</difficult>  
        <bndbox>  
            <xmin>198</xmin>  
            <ymin>58</ymin>  
            <xmax>286</xmax>  
            <ymax>197</ymax>  
        </bndbox>  
    </object>  
</annotation>

2. COCO数据类型格式

Object Instance 类型的标注格式

1，整体JSON文件格式

比如instances_train2017.json、instances_val2017.json这两个文件就是这种格式。

Object Instance这种格式的文件从头至尾按照顺序分为这几个模块

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}

下面介绍一下这个字典所包含的信息

images数组元素的数量等同于划入训练集（或者测试集）的图片的数量；

annotations数组元素的数量等同于训练集（或者测试集）中bounding box的数量；

categories数组元素的数量为80（2017年）；

>>> ann_train_file='annotations/instances_train2017.json'
>>> coco_train = COCO(ann_train_file)
loading annotations into memory...
Done (t=19.30s)
creating index...
index created!
 
>>> len(coco_train.dataset['categories'])
80
>>> len(coco_train.dataset['images'])
118287
>>> len(coco_train.dataset['annotations'])
860001

info 字段：数据集介绍说明

info: {
    "year": int,
    "version": str,
    "description": str,
    "contributor": str,
    "url": str,
    "date_created": datetime,
}

images：图片的信息，包括图像名称、图像大小，来源等。

{
	"license":3,
	"file_name":"COCO_val2014_000000391895.jpg",
	"coco_url":"http://mscoco.org/images/391895",
	"height":360,"width":640,"date_captured":"2013-11-14 11:18:45",
	"flickr_url":"http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg",
	"id":391895
},

annotations：目标的一些标签信息。

annotation{
    "id": int,    
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1,
}


categories：一个包含多个category实例的数组，而category结构体描述如下：

{
    "id": int,
    "name": str,
    "supercategory": str,
}

3. LabelImg convert to COCO

import os
import json
import numpy as np
import glob
import shutil
from sklearn.model_selection import train_test_split
np.random.seed(41)

#0为背景
classname_to_id = {"person": 1}

class Lableme2CoCo:

    def __init__(self):
        self.images = []
        self.annotations = []
        self.categories = []
        self.img_id = 0
        self.ann_id = 0

    def save_coco_json(self, instance, save_path):
        json.dump(instance, open(save_path, 'w', encoding='utf-8'), ensure_ascii=False, indent=1)  # indent=2 更加美观显示

    # 由json文件构建COCO
    def to_coco(self, json_path_list):
        self._init_categories()
        for json_path in json_path_list:
            obj = self.read_jsonfile(json_path)
            self.images.append(self._image(obj, json_path))
            shapes = obj['shapes']
            for shape in shapes:
                annotation = self._annotation(shape)
                self.annotations.append(annotation)
                self.ann_id += 1
            self.img_id += 1
        instance = {}
        instance['info'] = 'spytensor created'
        instance['license'] = ['license']
        instance['images'] = self.images
        instance['annotations'] = self.annotations
        instance['categories'] = self.categories
        return instance

    # 构建类别
    def _init_categories(self):
        for k, v in classname_to_id.items():
            category = {}
            category['id'] = v
            category['name'] = k
            self.categories.append(category)

    # 构建COCO的image字段
    def _image(self, obj, path):
        image = {}
        from labelme import utils
        img_x = utils.img_b64_to_arr(obj['imageData'])
        h, w = img_x.shape[:-1]
        image['height'] = h
        image['width'] = w
        image['id'] = self.img_id
        image['file_name'] = os.path.basename(path).replace(".json", ".jpg")
        return image

    # 构建COCO的annotation字段
    def _annotation(self, shape):
        label = shape['label']
        points = shape['points']
        annotation = {}
        annotation['id'] = self.ann_id
        annotation['image_id'] = self.img_id
        annotation['category_id'] = int(classname_to_id[label])
        annotation['segmentation'] = [np.asarray(points).flatten().tolist()]
        annotation['bbox'] = self._get_box(points)
        annotation['iscrowd'] = 0
        annotation['area'] = 1.0
        return annotation

    # 读取json文件，返回一个json对象
    def read_jsonfile(self, path):
        with open(path, "r", encoding='utf-8') as f:
            return json.load(f)

    # COCO的格式： [x1,y1,w,h] 对应COCO的bbox格式
    def _get_box(self, points):
        min_x = min_y = np.inf
        max_x = max_y = 0
        for x, y in points:
            min_x = min(min_x, x)
            min_y = min(min_y, y)
            max_x = max(max_x, x)
            max_y = max(max_y, y)
        return [min_x, min_y, max_x - min_x, max_y - min_y]


if __name__ == '__main__':
    labelme_path = "labelme/"
    saved_coco_path = "./"
    # 创建文件
    if not os.path.exists("%scoco/annotations/"%saved_coco_path):
        os.makedirs("%scoco/annotations/"%saved_coco_path)
    if not os.path.exists("%scoco/images/train2017/"%saved_coco_path):
        os.makedirs("%scoco/images/train2017"%saved_coco_path)
    if not os.path.exists("%scoco/images/val2017/"%saved_coco_path):
        os.makedirs("%scoco/images/val2017"%saved_coco_path)
    # 获取images目录下所有的joson文件列表
    json_list_path = glob.glob(labelme_path + "/*.json")
    # 数据划分,这里没有区分val2017和tran2017目录，所有图片都放在images目录下
    train_path, val_path = train_test_split(json_list_path, test_size=0.12)
    print("train_n:", len(train_path), 'val_n:', len(val_path))

    # 把训练集转化为COCO的json格式
    l2c_train = Lableme2CoCo()
    train_instance = l2c_train.to_coco(train_path)
    l2c_train.save_coco_json(train_instance, '%scoco/annotations/instances_train2017.json'%saved_coco_path)
    for file in train_path:
        shutil.copy(file.replace("json","jpg"),"%scoco/images/train2017/"%saved_coco_path)
    for file in val_path:
        shutil.copy(file.replace("json","jpg"),"%scoco/images/val2017/"%saved_coco_path)

    # 把验证集转化为COCO的json格式
    l2c_val = Lableme2CoCo()
    val_instance = l2c_val.to_coco(val_path)
    l2c_val.save_coco_json(val_instance, '%scoco/annotations/instances_val2017.json'%saved_coco_path)

4. LabelImg convert to VOC

import os
import numpy as np
import codecs
import json
from glob import glob
import cv2
import shutil
from sklearn.model_selection import train_test_split
#1.标签路径
labelme_path = "./labelme/"              #原始labelme标注数据路径
saved_path = "./VOCdevkit/VOC2007/"                #保存路径

#2.创建要求文件夹
if not os.path.exists(saved_path + "Annotations"):
    os.makedirs(saved_path + "Annotations")
if not os.path.exists(saved_path + "JPEGImages/"):
    os.makedirs(saved_path + "JPEGImages/")
if not os.path.exists(saved_path + "ImageSets/Main/"):
    os.makedirs(saved_path + "ImageSets/Main/")
    
#3.获取待处理文件
files = glob(labelme_path + "*.json")
files = [i.split("/")[-1].split(".json")[0] for i in files]

#4.读取标注信息并写入 xml
for json_file_ in files:
    json_filename = labelme_path + json_file_ + ".json"
    json_file = json.load(open(json_filename,"r",encoding="utf-8"))
    height, width, channels = cv2.imread(labelme_path + json_file_ +".jpg").shape
    with codecs.open(saved_path + "Annotations/"+json_file_ + ".xml","w","utf-8") as xml:
        xml.write('<annotation>
')
        xml.write('	<folder>' + 'UAV_data' + '</folder>
')
        xml.write('	<filename>' + json_file_ + ".jpg" + '</filename>
')
        xml.write('	<source>
')
        xml.write('		<database>The UAV autolanding</database>
')
        xml.write('		<annotation>UAV AutoLanding</annotation>
')
        xml.write('		<image>flickr</image>
')
        xml.write('		<flickrid>NULL</flickrid>
')
        xml.write('	</source>
')
        xml.write('	<owner>
')
        xml.write('		<flickrid>NULL</flickrid>
')
        xml.write('		<name>ChaojieZhu</name>
')
        xml.write('	</owner>
')
        xml.write('	<size>
')
        xml.write('		<width>'+ str(width) + '</width>
')
        xml.write('		<height>'+ str(height) + '</height>
')
        xml.write('		<depth>' + str(channels) + '</depth>
')
        xml.write('	</size>
')
        xml.write('		<segmented>0</segmented>
')
        for multi in json_file["shapes"]:
            points = np.array(multi["points"])
            xmin = min(points[:,0])
            xmax = max(points[:,0])
            ymin = min(points[:,1])
            ymax = max(points[:,1])
            label = multi["label"]
            if xmax <= xmin:
                pass
            elif ymax <= ymin:
                pass
            else:
                xml.write('	<object>
')
                xml.write('		<name>'+label+'</name>
')
                xml.write('		<pose>Unspecified</pose>
')
                xml.write('		<truncated>1</truncated>
')
                xml.write('		<difficult>0</difficult>
')
                xml.write('		<bndbox>
')
                xml.write('			<xmin>' + str(xmin) + '</xmin>
')
                xml.write('			<ymin>' + str(ymin) + '</ymin>
')
                xml.write('			<xmax>' + str(xmax) + '</xmax>
')
                xml.write('			<ymax>' + str(ymax) + '</ymax>
')
                xml.write('		</bndbox>
')
                xml.write('	</object>
')
                print(json_filename,xmin,ymin,xmax,ymax,label)
        xml.write('</annotation>')
        
#5.复制图片到 VOC2007/JPEGImages/下
image_files = glob(labelme_path + "*.jpg")
print("copy image files to VOC007/JPEGImages/")
for image in image_files:
    shutil.copy(image,saved_path +"JPEGImages/")
    
#6.split files for txt
txtsavepath = saved_path + "ImageSets/Main/"
ftrainval = open(txtsavepath+'/trainval.txt', 'w')
ftest = open(txtsavepath+'/test.txt', 'w')
ftrain = open(txtsavepath+'/train.txt', 'w')
fval = open(txtsavepath+'/val.txt', 'w')
total_files = glob("./VOC2007/Annotations/*.xml")
total_files = [i.split("/")[-1].split(".xml")[0] for i in total_files]
#test_filepath = ""
for file in total_files:
    ftrainval.write(file + "
")
#test
#for file in os.listdir(test_filepath):
#    ftest.write(file.split(".jpg")[0] + "
")
#split
train_files,val_files = train_test_split(total_files,test_size=0.15,random_state=42)
#train
for file in train_files:
    ftrain.write(file + "
")
#val
for file in val_files:
    fval.write(file + "
")

ftrainval.close()
ftrain.close()
fval.close()
#ftest.close()