软工划水日报-AIchallager数据集下载及处理 4/16

此处是AI challenger2018年的植物病害数据集

https://challenger.ai/dataset/pdd2018

解压后可以看到是一堆图片文件夹和json文件

 

由于paddle卷积神经网络是读取txt文件的(路径/标签)键值对,以下是数据集预处理过程

import json


# 读取本地的数据集,2018年这个数据集给出了路径+类别的json,使用时修改下数据集的路径即可
def create_data_list(data_root_path):
    f = open(data_root_path+'_trainingset/AgriculturalDisease_train_annotations.json', 'r')
    content = f.read()
    a = json.loads(content)
    print(a[31717])
    for x in range(0, 31717):
        with open("C:/Users/14997/Desktop/database/train.list", 'a') as f:
            f.write(data_root_path+"_trainingset/images/"+a[x]["image_id"] + "	%d" % a[x]["disease_class"] + "
")
    f.close()
    h = open(data_root_path + '_validationset/AgriculturalDisease_validation_annotations.json', 'r')
    contents = h.read()
    b = json.loads(contents)
    print(b[4538])
    for x in range(0, 31717):
        with open("C:/Users/14997/Desktop/database/test.list", 'a') as h:
            h.write(data_root_path+"_validationset/images/"+b[x]["image_id"] + "	%d" % b[x]["disease_class"] + "
")
    h.close()


create_data_list('C:/Users/14997/Desktop/database/AgriculturalDisease')

效果如下:

 这样就生成了路径与标签的键值对

好,那么今天就先这样

原文地址:https://www.cnblogs.com/Sakuraba/p/14909810.html