Tensorflow : Sumary on TFrecord 如何制作，使用，测试以及显示TFrecord

Sometimes we will need to generate a TFrecord file for its many advantages in terms of less space and higher reading speed. but how on earth can we make a TFrecord?

To make a TFrecord file, follow the following instrucions:

def make_tfrecord(dest_path, image_folder, label_csv):

    print ("There are {} images in the folder").format(len(os.listdir(image_folder)))
    l = 2
    writer = tf.python_io.TFRecordWriter(dest_path)
    
    for img in os.listdir(image_folder):
        img_path = image_folder + img
        # print (img_path)
        img = Image.open(img_path)
        #convert image to bytes
        img_binary = img.tobytes()

        data = tf.train.Example(features=tf.train.Features(feature={
            'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_binary])),
            "label": tf.train.Feature(float_list=tf.train.FloatList(value=[float(linecache.getline(label_csv, l).split(',')[1])]))
        }))

        writer.write(data.SerializeToString())
        l += 1

        if (l-1) %10 == 0:
            if float(l-1)/len(os.listdir(image_folder)) == 1:
                print ("dataset generation finished !")
            else:
                print ("{} percent finished ...").format(float(l-1)/len(os.listdir(image_folder)))

    writer.close()#

After generating a tfrecord , you will need a batch of examples with which you want to feed into your network.

First for simplicity purpose lets define a parser function that can parse an example from a tfrecord file.

def read_and_decode(file_name):
    #file_name can be a array of names in the format of [file1, file2]
    #as sometimes you have multiple tfrecord file 
    filename_queue = tf.train.string_input_producer(file_name)
    reader = tf.TFRecordReader()
    
    _, serialized_example = reader.read(filename_queue)
    features = tf.parse_single_example(serialized_example,
                                       features={
                                           'label':tf.FixedLenFeature([], tf.int64),
                                           'image':tf.FixedLenFeature([], tf.string),
                                       })
    image = tf.decode_raw(features['image'], tf.uint8)

　　#because it is a gray-scale image, if the image was in RGB format
　　#we should use [480, 752, 3] instead
    image = tf.reshape(image, [480, 752]) 
    image = tf.cast(image, tf.float32)
    label = tf.cast(features['label'], tf.int32)
　　　
　　print (image)
　　print (label)

    # print (np.shape(image))
    return image, label

the results of these two output functions are:

Tensor("Cast:0", shape=(224, 224, 3), dtype=float32)
Tensor("Cast_1:0", shape=(), dtype=int32)

but what if we want to actually see the image? we can test if our tfrecord file was successfully generated if the image matches.

in the last step we got a parsed image and label, to visualize the image, we need to use the following method:

with tf.Sess() as sess:
    img, L = sess.run([image, label])
    img = Image.fromarray(np.asarray(img)) 
    #if the image is in RGB format
    #img = Image.fromarray(np.asarray(img), mode='RGB') 
    
    #save the image to where you want
     img.save('/home/'+str(i)+'_''Label_'+str(l)+'.jpg')

the images would be saved if you follow the instructions:

to generate a batch of examples we will need to use the following command:

example_batch, label_batch = tf.train.shuffle_batch(
    [image, label], batch_size=32, capacity=1000+64,
    min_after_dequeue=1000)

print (example_batch)
print (label_batch)

the results of these two output functions are:

Tensor("shuffle_batch:0", shape=(32, 224, 224, 3), dtype=float32)
Tensor("shuffle_batch:1", shape=(32,), dtype=int32)

sometimes we dont want to shuffle the example we can use:

example_batch, label_batch = tf.train.batch(
    [image, label], batch_size=32, capacity=1000+64)

now we can feed batches to a predefined model.