由于本人深度学习环境安装在windows上,因此下面是在windows系统上实现的。仅供自己学习记录。
使用caffe训练模型,首先需要准备数据。
正样本:对于人脸检测项目,正样本就是人脸的图片。制作正样本需要将人脸从图片中裁剪出来(数据源已经标注出人脸在图片中的坐标)。裁剪完成之后,需要check一下数据是否制作的没问题。
负样本:随机进行裁剪,使用IOU确定是正样本还是负样本。比如:IOU<0.3为负样本,最好是拿没有人脸的图片。
1、caffe数据源准备:
caffe支持LMDB数据,在训练模型时首先需要将训练集、验证集转换成LMDB数据。
首先需要准备两个txt文件:train.txt和test.txt。格式如下:
/path/to/folder/image_x.jpg 0 (即图片样本所在的路径和标签。文本后面的标签,对于二分类时,为0和1。本例中,0表示人脸数据,1表示非人脸数据。)
可以使用脚本来获取txt文档。简单写个脚本(获取train.txt)如下:
(txt文档中应该只需要相对路径,如train.txt的格式如下:xxxx.jpg label ,下面的代码有点问题)——2019年7月28日更新
import os full_train_path = r"C:UsersAdministratorDesktopFaceDetection rain.txt" full_val_path = r"C:UsersAdministratorDesktopFaceDetectionval.txt" train_txt = open(full_train_path, 'w') val_txt = open(full_val_path, 'w') # get train.txt for file in os.listdir(r"C:UsersAdministratorDesktopFaceDetection rain rain"): for figure in os.listdir(r"C:UsersAdministratorDesktopFaceDetection rain rain\" + file): train_txt.writelines(file + r"/" + figure + " " + file + " ") train_txt.close() #get val.txt for val_file in os.listdir(r"C:UsersAdministratorDesktopFaceDetection rainval"): if val_file.find("faceimage") != -1: val_txt.writelines(val_file + " " + "0" + " ") else: val_txt.writelines(val_file + " " + "1" + " ") val_txt.close()
2、制作LMDB数据源:
分类问题使用LMDB数据,回归问题使用HDF5数据。
使用caffe自带的脚本文件,制作LMDB数据源。
convert_imageset使用格式:
convert_imageset --参数(如:resize、shuffle等) 数据源路径 数据源的txt 需要输出的lmdb路径。
cd C:Program Filescaffe-windowsscriptsuild oolsRelease
convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:UsersAdministratorDesktopFaceDetection rain rain C:UsersAdministratorDesktopFaceDetection rain rain rain.txt C:UsersAdministratorDesktopFaceDetection rain_lmdb convert_imageset.exe --resize_height=227 --resize_width=227 --shuffle C:UsersAdministratorDesktopFaceDetection rainval C:UsersAdministratorDesktopFaceDetection rainvalval.txt C:UsersAdministratorDesktopFaceDetectionval_lmdb
3、训练ALEXNET网络:
3.1配置caffe文件:
1、train.prototxt
配置caffe格式的ALEXNET网络结构。
2、solver.prototxt
①net:指定网络配置文件路径:
②test_iter:设置一次测试需要测试的batch数。最好是test_iter * batch_size = 样本总个数。
③base_lr:基础学习率。最终总的学习率为base_lr * lr_mult(train.prototxt中每一层指定的)。学习率不能太大,太大会
注:windows版本中,配置文件的路径使用“/”,如:source: "C:/Users/Administrator/Desktop/FaceDetection/train_lmdb"
3.2编写脚本,训练模型,得到模型(_iter_36000.caffemodel):
cd C:Program Filescaffe-windowsscriptsuild oolsRelease
caffe.exe train --solver=C:UsersAdministratorDesktopFaceDetectionsolver.prototxt
训练过程如下:
4、人脸识别检测算法框架:
4.1滑动窗口:
对输入的图片,画出不同的227*227的窗口(到目前为止,只支持固定大小的图片---卷积神经网络,最后的全连接层,参数固定。后面讲到全卷积网络,可以输入任意大小图片)。
为了检测不同尺寸图片中的人脸。需要进行多尺度scale变换。
FCN全卷积网络。得到heatmap,heatmap每一个点,代表了原图的每一个区域,其值为该区域是人脸的概率值。通过前向传播forward_all() ,得到heatmap。
设置阈值α,比如当α>0.9时,保存框。这样的结果可能得到多个框。可以使用NMS(非极大值抑制)得到最终的一个框。
4.2将训练时的全连接的Alexnet网络进行转换成全卷积网络fcn的模型:
可以根据caffe官网示例操作:https://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb
首先需要将原先全连接网络的deploy.prototxt文件中的全连接层(InnerProduct)转换层卷积网络层(Convolution),并计算设定kernel size。
然后使用以下代码转换成全卷积网络fcn的模型(full_conv.caffemodel)
net = caffe.Net(r"C:UsersAdministratorDesktopFaceDetectiondeploy.prototxt", r"C:UsersAdministratorDesktopFaceDetectionmodel2\_iter_36000.caffemodel", caffe.TEST) params = ['fc6', 'fc7', 'fc8_flickr'] fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params} for fc in params: print("{} weights are {} dimensional and biases are {} dimensional".format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)) net_fully_conv = caffe.Net(r"C:UsersAdministratorDesktopFaceDetectiondeploy_full_conv.prototxt", r"C:UsersAdministratorDesktopFaceDetectionmodel2\_iter_36000.caffemodel", caffe.TEST) params_fully_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv'] conv_params = {pr:(net_fully_conv.params[pr][0].data, net_fully_conv.params[pr][1].data) for pr in params_fully_conv} for conv in params_fully_conv: print("{} weights are {} dimensional and biases are {} dimensional".format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)) for pr, pr_conv in zip(params, params_fully_conv): conv_params[pr_conv][0].flat = fc_params[pr][0].flat conv_params[pr_conv][1][...] = fc_params[pr][1] net_fully_conv.save(r"C:UsersAdministratorDesktopFaceDetectionfull_conv.caffemodel")
4.3使用训练好的模型,编码实现人脸检测:
import os import sys import numpy as np import math import cv2 import random caffe_root = r"C:Program Filescaffe-windowspython" sys.path.insert(0, caffe_root + 'python') os.environ['GLOG_minloglevel'] = '2' import caffe class Point(object): def __init__(self, x, y): self.x = x self.y = y class Rect(object): def __init__(self, p1, p2): """Store the top, bottom, left, right values for points p1, p2 are the left-top and right-bottom points of the rectangle""" self.left = min(p1.x, p2.x) self.right = max(p1.x, p2.x) self.bottom = min(p1.y, p2.y) self.top = max(p1.y, p2.y) def __str__(self): return "Rect[%d, %d, %d, %d]" %(self.left, self.top, self.right, self.bottom) def calcDistance(x1, y1, x2, y2): dist = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2) return dist def range_overlap(a_min, a_max, b_min, b_max): """Judge whether there is intersection on one dimension""" return (a_min <= b_max) and (a_max >= b_min) def rect_overlaps(r1, r2): """Judge whether the two rectangles have intersection""" return range_overlap(r1.left, r1.right, r2.left, r2.right) and range_overlap(r1.bottom, r1.top, r2.bottom, r2.top) def rect_merge(r1, r2, mergeThresh): """Calculate the merge area of two rectangles""" if rect_overlaps(r1, r2): SI = abs(min(r1.right, r2.right) - max(r1.left, r2.left)) * abs(min(r1.top, r2.top) - max(r1.bottom, r2.bottom)) SA = abs(r1.right - r1.left) * abs(r1.top - r1.bottom) SB = abs(r2.right - r2.left) * abs(r2.top - r2.bottom) S = SA + SB - SI ratio = float(SI) / float(S) if ratio > mergeThresh: return 1 return 0 def generateBoundingBox(featureMap, scale): boundingBox = [] """We can calculate the stride from the architecture of the alexnet""" stride = 32 """We need to get the boundingbox whose size is 227 * 227. When we trained the alexnet, we also resize the size of the input image to 227 * 227 in caffe""" cellSize = 227 for (x, y), prob in np.ndenumerate(featureMap): if(prob >= 0.50): """Get the bounding box: we record the left-bottom and right-top coordinates""" boundingBox.append([float(stride * y) / scale, float(stride * x) / scale, float(stride * y + cellSize - 1) / scale, float(stride * x + cellSize - 1) / scale, prob]) return boundingBox def nms_average(boxes, groupThresh = 2, overlapThresh=0.2): rects = [] for i in range(len(boxes)): if boxes[i][4] > 0.2: """The box in here, we record the left-bottom coordinates(y, x) and the height and width""" rects.append([boxes[i, 0], boxes[i, 1], boxes[i, 2] - boxes[i, 0], boxes[i, 3] - boxes[i, 1]]) rects, weights = cv2.groupRectangles(rects, groupThresh, overlapThresh) rectangles = [] for i in range(len(rects)): testRect = Rect(Point(rects[i, 0], rects[i, 1]), Point(rects[i, 0] + rects[i, 2], rects[i, 1] + rects[i, 3])) rectangles.append(testRect) clusters = [] for rect in rectangles: matched = 0 for cluster in clusters: if (rect_merge(rect, cluster, 0.2)): matched = 1 cluster.left = (cluster.left + rect.left) / 2 cluster.right = (cluster.right + rect.right) / 2 cluster.bottom = (cluster.bottom + rect.bottom) / 2 cluster.top = (cluster.top + rect.top) / 2 if (not matched): clusters.append(rect) result_boxes = [] for i in range(len(clusters)): result_boxes.append([clusters[i].left, clusters[i].bottom, clusters[i].right, clusters[i].top, 1]) return result_boxes def face_detection(imgFlie): net_fully_conv = caffe.Net(r"C:UsersAdministratorDesktopFaceDetectiondeploy_full_conv.prototxt", r"C:UsersAdministratorDesktopFaceDetectionfull_conv.caffemodel", caffe.TEST) scales = [] factor = 0.793700526 img = cv2.imread(imgFlie) print(img.shape) largest = min(2, 4000 / max(img.shape[0:2])) scale = largest minD = largest * min(img.shape[0:2]) while minD >= 227: scales.append(scale) scale *= factor minD *= factor total_boxes = [] for scale in scales: scale_img = cv2.resize(img, (int(img.shape[0] * scale), int(img.shape[1] * scale))) cv2.imwrite(r"C:UsersAdministratorDesktopFaceDetectionscale_img.jpg", scale_img) im = caffe.io.load_image(r"C:UsersAdministratorDesktopFaceDetectionscale_img.jpg") """Change the test input data size of the scaled image size """ net_fully_conv.blobs['data'].reshape(1, 3, scale_img.shape[1], scale_img.shape[0]) transformer = caffe.io.Transformer({'data': net_fully_conv.blobs['data'].data.shape}) transformer.set_transpose('data', (2, 0, 1)) transformer.set_channel_swap('data', (2, 1, 0)) transformer.set_raw_scale('data', 255.0) out = net_fully_conv.forward_all(data=np.asarray([transformer.preprocess('data', im)])) print(out['prob'][0, 1].shape) boxes = generateBoundingBox(out['prob'][0, 1], scale) if (boxes): total_boxes.extend(boxes) print(total_boxes) boxes_nms = np.array(total_boxes) true_boxes = nms_average(boxes_nms, 1, 0.2) if (not true_boxes == []): (x1, y1, x2, y2) = true_boxes[0][:-1] cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0)) win = cv2.namedWindow('face detection', flags=0) cv2.imshow('face detection', img) cv2.waitKey(0) if __name__ == "__main__": img = r"C:UsersAdministratorDesktopFaceDetection mp9055.jpg" face_detection(img)
因为电脑配置实在是太低了,所以训练了好久,电脑开着跑了好几天,也没有训练很多次。所以模型训练的不是很好。本例中,经过调参,发现生成boundingbox时,prob设置为大于等于0.5得到的结果较好。结果如下:
另外,也是用过tensorflow写过训练代码,但是由于电脑太差,训练速度太慢、精度太差。待以后慢慢再进一步学习。
2、tensorflow实现
2.1将数据转换成TFRecord文件,便于后续训练,代码如下:
import tensorflow as tf import numpy as np import os import cv2 from PIL import Image class0_path = "/home/sxj/DL/face_detect/train/train/0/" class1_path = "/home/sxj/DL/face_detect/train/train/1/" tf_output_dir = "/home/sxj/DL/face_detect/data/" tf_filename = "/home/sxj/DL/face_detect/data/train.tfrecord" SAMPLER_PER_FILES = 5000 # 将数据集转换成TFRecords格式 def int64_feature(value): if not isinstance(value, list): value = [value] return tf.train.Feature(int64_list=tf.train.Int64List(value=value)) def bytes_feature(value): if not isinstance(value, list): value = [value] return tf.train.Feature(bytes_list=tf.train.BytesList(value=value)) def get_output_filename(tf_output_dir, dataset_name, fdx): return "%s/%s_%03d.tfrecord" # 将标签为0的数据集进行转换 i = 0 total_files = len(os.listdir(class0_path)) train_writer = tf.python_io.TFRecordWriter(tf_filename) for img in os.listdir(class0_path): print("转换图片进度%d/%d" % (i + 1, total_files)) # 获取图片 file_name = class0_path + img img_raw = Image.open(file_name) img_raw = img_raw.resize((227, 227)) img_data = img_raw.tobytes() # img_data = tf.gfile.FastGFile(file_name, 'rb').read() # 将图片数据封装出example example = tf.train.Example(features=tf.train.Features(feature={ 'label': int64_feature(value=0), 'image/encoded': bytes_feature(value=img_data) })) # 序列化 train_writer.write(example.SerializeToString()) i += 1 print("数据集 转换成功") # 将标签为1的数据集进行转换 i = 0 total_files = len(os.listdir(class1_path)) for img in os.listdir(class1_path): print("转换图片进度%d/%d" % (i + 1, total_files)) # 获取图片 file_name = class1_path + img # img_data = tf.gfile.FastGFile(file_name, 'rb').read() img_raw = Image.open(file_name) img_raw = img_raw.resize((227, 227)) img_data = img_raw.tobytes() # 将图片数据封装出example example = tf.train.Example(features=tf.train.Features(feature={ 'label': int64_feature(value=1), 'image/encoded': bytes_feature(value=img_data) })) # 序列化 train_writer.write(example.SerializeToString()) i += 1 train_writer.close()
2.2 TFRecord文件读取
import os import sys import numpy as np import math import cv2 import random import tensorflow as tf import matplotlib.pyplot as plt slim = tf.contrib.slim batch_size = 32 img_size = 227 num_bathes = 100 train_step = 1000001 model_path = "/home/sxj/DL/insightface/alex_model" model_name = 'Alex' addr = '/home/sxj/DL/face_detect/data/train.tfrecords' lr_steps = [40000, 60000, 80000] lr_values = [0.004, 0.002, 0.0012, 0.0004] class Point(object): def __init__(self, x, y): self.x = x self.y = y class Rect(object): def __init__(self, p1, p2): """Store the top, bottom, left, right values for points p1, p2 are the left-top and right-bottom points of the rectangle""" self.left = min(p1.x, p2.x) self.right = max(p1.x, p2.x) self.bottom = min(p1.y, p2.y) self.top = max(p1.y, p2.y) def __str__(self): return "Rect[%d, %d, %d, %d]" %(self.left, self.top, self.right, self.bottom) def calcDistance(x1, y1, x2, y2): dist = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2) return dist def range_overlap(a_min, a_max, b_min, b_max): """Judge whether there is intersection on one dimension""" return (a_min <= b_max) and (a_max >= b_min) def rect_overlaps(r1, r2): """Judge whether the two rectangles have intersection""" return range_overlap(r1.left, r1.right, r2.left, r2.right) and range_overlap(r1.bottom, r1.top, r2.bottom, r2.top) def rect_merge(r1, r2, mergeThresh): """Calculate the merge area of two rectangles""" if rect_overlaps(r1, r2): SI = abs(min(r1.right, r2.right) - max(r1.left, r2.left)) * abs(min(r1.top, r2.top) - max(r1.bottom, r2.bottom)) SA = abs(r1.right - r1.left) * abs(r1.top - r1.bottom) SB = abs(r2.right - r2.left) * abs(r2.top - r2.bottom) S = SA + SB - SI ratio = float(SI) / float(S) if ratio > mergeThresh: return 1 return 0 def softmax(a, b): a0 = math.exp(a) a1 = math.exp(b) return a0/(a0 + a1) def generateBoundingBox(featureMap, scale): boundingBox = [] """We can calculate the stride from the architecture of the alexnet""" stride = 32 """We need to get the boundingbox whose size is 227 * 227. When we trained the alexnet, we also resize the size of the input image to 227 * 227 in caffe""" cellSize = 227 # print(featureMap.shape[0]) # featureMap = tf.nn.softmax(featureMap) # print(featureMap[0][0][0]) # print(featureMap[0][0][1]) for x in range(featureMap.shape[0]): for y in range(featureMap.shape[1]): prob = softmax(featureMap[x][y][0], featureMap[x][y][1]) if prob > 0.4: print("success") boundingBox.append( [float(stride * y) / scale, float(stride * x) / scale, float(stride * y + cellSize - 1) / scale, float(stride * x + cellSize - 1) / scale, 1]) return boundingBox # for (x, prob) in np.ndenumerate(featureMap): # if x[2] == 0 # print(x) # print(prob) # print("aaaaaaaaaaa") # for x, y in zip(range(featureMap.get_shape().as_list()[0]), range(featureMap.get_shape().as_list()[1])): # for x in range(featureMap.get_shape().as_list()[0]): # for y in range(featureMap.get_shape().as_list()[1]): # # print(x) # # print(y) # prob = tf.nn.softmax([featureMap[x, y][0], featureMap[x, y][1]]) # print(sess.run(prob[0])) # print(prob[1]) # print("----") # if (prob[0] >= 0.50): # """Get the bounding box: we record the left-bottom and right-top coordinates""" # boundingBox.append( # [float(stride * y) / scale, float(stride * x) / scale, float(stride * y + cellSize - 1) / scale, # float(stride * x + cellSize - 1) / scale, prob]) # # prob = tf.nn.softmax([featureMap[x, y][0], featureMap[x, y][1]]) # # if(prob[0] >= 0.50): # # """Get the bounding box: we record the left-bottom and right-top coordinates""" # # boundingBox.append([float(stride * y) / scale, float(stride * x) / scale, float(stride * y + cellSize - 1) / scale, # # float(stride * x + cellSize - 1) / scale, prob]) # return boundingBox def nms_average(boxes, groupThresh = 2, overlapThresh=0.2): rects = [] for i in range(len(boxes)): if boxes[i][4] > 0.2: """The box in here, we record the left-bottom coordinates(y, x) and the height and width""" rects.append([boxes[i, 0], boxes[i, 1], boxes[i, 2] - boxes[i, 0], boxes[i, 3] - boxes[i, 1]]) rects, weights = cv2.groupRectangles(rects, groupThresh, overlapThresh) rectangles = [] for i in range(len(rects)): testRect = Rect(Point(rects[i, 0], rects[i, 1]), Point(rects[i, 0] + rects[i, 2], rects[i, 1] + rects[i, 3])) rectangles.append(testRect) clusters = [] for rect in rectangles: matched = 0 for cluster in clusters: if (rect_merge(rect, cluster, 0.2)): matched = 1 cluster.left = (cluster.left + rect.left) / 2 cluster.right = (cluster.right + rect.right) / 2 cluster.bottom = (cluster.bottom + rect.bottom) / 2 cluster.top = (cluster.top + rect.top) / 2 if (not matched): clusters.append(rect) result_boxes = [] for i in range(len(clusters)): result_boxes.append([clusters[i].left, clusters[i].bottom, clusters[i].right, clusters[i].top, 1]) return result_boxes def print_tensor_info(tensor): print("tensor name:", tensor.op.name, "-tensor shape:", tensor.get_shape().as_list()) def read_single_tfrecord(addr, _batch_size, shape): filename_queue = tf.train.string_input_producer([addr], shuffle=True) reader = tf.TFRecordReader() _, serialized_example = reader.read(filename_queue) features = tf.parse_single_example(serialized_example, features={ 'image': tf.FixedLenFeature([], tf.string), 'label': tf.FixedLenFeature([], tf.int64)}) img = tf.decode_raw(features['image'], tf.uint8) label = tf.cast(features['label'], tf.int32) img = tf.reshape(img, [shape, shape, 3]) # img = augmentation(img) img=(tf.cast(img, tf.float32)-127.5)/128 min_after_dequeue = 10000 batch_size = _batch_size capacity = min_after_dequeue + 10 * batch_size image_batch, label_batch = tf.train.shuffle_batch([img, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue, num_threads=4) label_batch = tf.reshape(label_batch, [batch_size]) return image_batch, label_batch def Network(images, is_train): # input 227*227 with tf.variable_scope('vgg_16'): with slim.arg_scope([slim.conv2d, slim.max_pool2d], padding='SAME'): conv1 = slim.conv2d(images, 96, [11, 11], stride=[4, 4], scope='conv_1') # 55*55*96 print_tensor_info(conv1) pool1 = slim.max_pool2d(conv1, [3, 3], stride=[2, 2], scope='pool_1') # 27*27*96 print_tensor_info(pool1) conv2 = slim.conv2d(pool1, 256, [5, 5], stride=[1, 1], scope='conv_2') print_tensor_info(conv2) pool2 = slim.max_pool2d(conv2, [3, 3], stride=[2, 2], scope='pool_2') print_tensor_info(pool2) conv3 = slim.conv2d(pool2, 384, [3, 3], stride=[1, 1], scope='conv_3') print_tensor_info(conv3) conv4 = slim.conv2d(conv3, 384, [3, 3], stride=[1, 1], scope='conv_4') print_tensor_info(conv4) conv5 = slim.conv2d(conv4, 256, [3, 3], stride=[1, 1], scope='conv_5') pool5 = slim.max_pool2d(conv5, [3, 3], stride=[2, 2], scope='pool_5') print_tensor_info(pool5) conv6 = slim.conv2d(pool5, 256, [3, 3], stride=[1, 1], scope='conv_6') print_tensor_info(conv6) output = slim.conv2d(conv6, 2, [8, 8], stride=[8, 8], scope='output') print_tensor_info(output) if is_train: output = tf.reshape(output, [-1, 2]) print_tensor_info(output) else: output = tf.squeeze(output, axis=0) output = tf.nn.softmax(output) print_tensor_info(output) return output # def test(): # reader = tf.TFRecordReader() # filename_queue = tf.train.string_input_producer([addr]) # # _, serialized_example = reader.read(filename_queue) # # features = tf.parse_single_example(serialized_example, # features={ # 'image': tf.FixedLenFeature([], tf.string), # 'label': tf.FixedLenFeature([], tf.int64), # }) # # image = tf.decode_raw(features['image'], tf.uint8) # # image = image.set_shape([227, 227, 3]) # image = tf.reshape(image, [227, 227, 3]) # label = tf.cast(features['label'], tf.int32) # [img_batch, label_batch] = tf.train.shuffle_batch([image, label], # batch_size=32, # capacity=64, # min_after_dequeue=32) # # sess = tf.Session() # coord = tf.train.Coordinator() # threads = tf.train.start_queue_runners(sess=sess, coord=coord) # # for i in range(10): # print(sess.run([img_batch, label_batch])) # print(tf.shape(image)) def train(): image = tf.placeholder(tf.float32, [batch_size, img_size, img_size, 3], name='image') label = tf.placeholder(tf.int32, [batch_size], name='label') logit = Network(image, True) # [batch, 2] train_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logit, labels=label)) train_images, train_labels = read_single_tfrecord(addr, batch_size, img_size) with tf.name_scope('loss'): tf.summary.scalar('train_loss', train_loss) global_step = tf.Variable(name='global_step', initial_value=0, trainable=False) inc_op = tf.assign_add(global_step, 1, name='increment_global_step') scale = int(128.0/batch_size) _lr_steps = [scale*s for s in lr_steps] _lr_values = [v/scale for v in lr_values] lr = tf.train.piecewise_constant(global_step, boundaries=_lr_steps, values=_lr_values, name='lr_schedule') update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op = tf.train.MomentumOptimizer(learning_rate=lr, momentum=0.9).minimize(train_loss) with tf.name_scope('accuracy'): # label = tf.one_hot(label, 2) train_accuracy = tf.reduce_mean( tf.cast(tf.equal(tf.to_int32(tf.argmax(tf.nn.softmax(logit), axis=1)), label), tf.float32)) tf.summary.scalar('train_accuracy', train_accuracy) saver = tf.train.Saver(max_to_keep=5) merged = tf.summary.merge_all() with tf.Session() as sess: sess.run((tf.global_variables_initializer(), tf.local_variables_initializer())) coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(sess=sess, coord=coord) writer_train = tf.summary.FileWriter("/home/sxj/DL/insightface/alex_model/%s" % (model_name), sess.graph) try: for i in range(1, train_step): image_batch, label_batch = sess.run([train_images, train_labels]) sess.run([train_op, inc_op], feed_dict={image: image_batch, label: label_batch}) if (i % 100 == 0): summary = sess.run(merged, feed_dict={image: image_batch, label: label_batch}) writer_train.add_summary(summary, i) if (i % 1000 == 0): print('次数', i) print('train_accuracy', sess.run(train_accuracy, feed_dict={image: image_batch, label: label_batch})) print('train_loss', sess.run(train_loss, {image: image_batch, label: label_batch})) if (i % 10000 == 0): saver.save(sess, os.path.join(model_path, model_name), global_step=i) except tf.errors.OutOfRangeError: print("finished") finally: coord.request_stop() writer_train.close() # for i in range(1, train_step): # image_batch, label_batch = sess.run([train_images, train_labels]) # # sess.run([train_op, inc_op], feed_dict={image: image_batch, label: label_batch}) # # if (i % 100 == 0): # # summary = sess.run(merged, feed_dict={image: image_batch, label: label_batch}) # # writer_train.add_summary(summary, i) # if (i % 10 == 0): # print('次数', i) # print('train_accuracy', # sess.run(train_accuracy, feed_dict={image: image_batch, label: label_batch})) # print('train_loss', sess.run(train_loss, # {image: image_batch, label: label_batch})) # if (i % 10000 == 0): # saver.save(sess, os.path.join(model_path, model_name), global_step=i) # # coord.join(threads) # def face_detection(imgFlie): # # scales = [] # factor = 0.793700526 # # img = cv2.imread(imgFlie) # # image_size = [227, 227] # # image = tf.placeholder(tf.float32, [1, 227, 227, 3], name='image') # # a = Network(image) # largest = min(2, 4000 / max(img.shape[0:2])) # scale = largest # minD = largest * min(img.shape[0:2]) # while minD >= 227: # scales.append(scale) # scale *= factor # minD *= factor # total_boxes = [] # # # with tf.Session() as sess: # # saver = tf.train.Saver() # # saver.restore(sess, "/home/sxj/DL/insightface/alex_model/Alex-500000") # sess.run((tf.global_variables_initializer(), # tf.local_variables_initializer())) # # for scale in scales: # tf.reset_default_graph() # scale_img = cv2.resize(img, (int(img.shape[0] * scale), int(img.shape[1] * scale))) # # print(scale_img.shape[0], scale_img.shape[1], scale_img.shape[2]) # if scale_img.shape[0] < 227: # continue # cv2.imwrite(r"/home/sxj/DL/face_detect/scale_img.jpg", scale_img) # # im = cv2.imread(r"/home/sxj/DL/face_detect/scale_img.jpg") # im = np.array(im) # print(im.shape) # # image = tf.placeholder(tf.float32, [1, im.shape[0], im.shape[1], im.shape[2]]) # # im = im.reshape(1, im.shape[0], im.shape[1], im.shape[2]) # logit = Network(image, False) # saver = tf.train.Saver() # saver.restore(sess, "/home/sxj/DL/insightface/alex_model/Alex-10000") # a = sess.run(logit, feed_dict={image: im}) # # boxes = generateBoundingBox(sess, a, scale) # # if (boxes): # total_boxes.extend(boxes) # # print(total_boxes) # boxes_nms = np.array(total_boxes) # true_boxes = nms_average(boxes_nms, 1, 0.2) # # if (not true_boxes == []): # (x1, y1, x2, y2) = true_boxes[0][:-1] # cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0)) # win = cv2.namedWindow('face detection', flags=0) # cv2.imshow('face detection', img) # cv2.waitKey(0) def face_detection(imgFlie): scales = [] factor = 0.793700526 img = cv2.imread(imgFlie) # image = tf.placeholder(tf.float32, name='image') largest = min(2, 4000 / max(img.shape[0:2])) scale = largest minD = largest * min(img.shape[0:2]) while minD >= 227: scales.append(scale) scale *= factor minD *= factor total_boxes = [] for scale in scales: image = tf.placeholder(tf.float32, name='image') scale_img = cv2.resize(img, (int(img.shape[0] * scale), int(img.shape[1] * scale))) cv2.imwrite(r"/home/sxj/DL/face_detect/scale_img.jpg", scale_img) im = cv2.imread(r"/home/sxj/DL/face_detect/scale_img.jpg") image_reshape = tf.reshape(image, [1, im.shape[0], im.shape[1], 3]) logit = Network(image_reshape, False) with tf.Session() as sess: sess.run((tf.global_variables_initializer(), tf.local_variables_initializer())) saver = tf.train.Saver() saver.restore(sess, "/home/sxj/DL/insightface/alex_model/Alex-80000") input = sess.run(logit, feed_dict={image: im}) boxes = generateBoundingBox(input, scale) if (boxes): total_boxes.extend(boxes) tf.reset_default_graph() print(total_boxes) boxes_nms = np.array(total_boxes) true_boxes = nms_average(boxes_nms, 1, 0.2) if (not true_boxes == []): (x1, y1, x2, y2) = true_boxes[0][:-1] cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0)) # win = cv2.namedWindow('face detection', flags=0) plt.imshow(img) plt.show() # cv2.imshow('face detection', img) # cv2.waitKey(0) # with tf.Session() as sess: # sess.run((tf.global_variables_initializer(), # tf.local_variables_initializer())) # # for scale in scales: # print("aaaaaaaaaaa") # scale_img = cv2.resize(img, (int(img.shape[0] * scale), int(img.shape[1] * scale))) # # print(scale_img.shape[0], scale_img.shape[1], scale_img.shape[2]) # # cv2.imwrite(r"/home/sxj/DL/face_detect/scale_img.jpg", scale_img) # # im = cv2.imread(r"/home/sxj/DL/face_detect/scale_img.jpg") # # image_reshape = tf.reshape(image, [1, im.shape[0], im.shape[1], 3]) # logit = Network(image_reshape, False) # # saver = tf.train.Saver() # saver.restore(sess, "/home/sxj/DL/insightface/alex_model/Alex-1000") # a = sess.run(logit, feed_dict={image: im}) # # boxes = generateBoundingBox(sess, a, scale) # # if (boxes): # total_boxes.extend(boxes) # # # # print(total_boxes) # boxes_nms = np.array(total_boxes) # true_boxes = nms_average(boxes_nms, 1, 0.2) # # if (not true_boxes == []): # (x1, y1, x2, y2) = true_boxes[0][:-1] # cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0)) # win = cv2.namedWindow('face detection', flags=0) # cv2.imshow('face detection', img) # cv2.waitKey(0) def run_bechmark(): with tf.Graph().as_default(): image_size = 227 # 以高斯分布产生一些图片 images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=0.1)) Network(images) init = tf.global_variables_initializer() sess = tf.Session() sess.run(init) if __name__ == "__main__": train() # img = "/home/sxj/DL/face_detect/c.jpg" # face_detection(img)
训练和测试代码:
import os import sys import numpy as np import math import cv2 import random import tensorflow as tf import matplotlib.pyplot as plt # 因为自己电脑安装的cv2显示图片有问题,这里使用matplotlib来显示图片 slim = tf.contrib.slim batch_size = 32 img_size = 227 num_bathes = 100 train_step = 1000001 model_path = "/home/sxj/DL/insightface/alex_model" model_name = 'Alex' addr = '/home/sxj/DL/face_detect/data/train.tfrecords' lr_steps = [40000, 60000, 80000] lr_values = [0.004, 0.002, 0.0012, 0.0004] class Point(object): def __init__(self, x, y): self.x = x self.y = y class Rect(object): def __init__(self, p1, p2): """Store the top, bottom, left, right values for points p1, p2 are the left-top and right-bottom points of the rectangle""" self.left = min(p1.x, p2.x) self.right = max(p1.x, p2.x) self.bottom = min(p1.y, p2.y) self.top = max(p1.y, p2.y) def __str__(self): return "Rect[%d, %d, %d, %d]" %(self.left, self.top, self.right, self.bottom) def calcDistance(x1, y1, x2, y2): dist = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2) return dist def range_overlap(a_min, a_max, b_min, b_max): """Judge whether there is intersection on one dimension""" return (a_min <= b_max) and (a_max >= b_min) def rect_overlaps(r1, r2): """Judge whether the two rectangles have intersection""" return range_overlap(r1.left, r1.right, r2.left, r2.right) and range_overlap(r1.bottom, r1.top, r2.bottom, r2.top) def rect_merge(r1, r2, mergeThresh): """Calculate the merge area of two rectangles""" if rect_overlaps(r1, r2): SI = abs(min(r1.right, r2.right) - max(r1.left, r2.left)) * abs(min(r1.top, r2.top) - max(r1.bottom, r2.bottom)) SA = abs(r1.right - r1.left) * abs(r1.top - r1.bottom) SB = abs(r2.right - r2.left) * abs(r2.top - r2.bottom) S = SA + SB - SI ratio = float(SI) / float(S) if ratio > mergeThresh: return 1 return 0 def softmax(a, b): a0 = math.exp(a) a1 = math.exp(b) return a0/(a0 + a1) def generateBoundingBox(featureMap, scale): boundingBox = [] """We can calculate the stride from the architecture of the alexnet""" stride = 32 """We need to get the boundingbox whose size is 227 * 227. When we trained the alexnet, we also resize the size of the input image to 227 * 227 in caffe""" cellSize = 227 for x in range(featureMap.shape[0]): for y in range(featureMap.shape[1]): if featureMap[x][y] > 0.8: boundingBox.append( [float(stride * y) / scale, float(stride * x) / scale, float(stride * y + cellSize - 1) / scale, float(stride * x + cellSize - 1) / scale, featureMap[x][y]]) return boundingBox def nms_average(boxes, groupThresh = 2, overlapThresh=0.2): rects = [] for i in range(len(boxes)): if boxes[i][4] > 0.2: """The box in here, we record the left-bottom coordinates(y, x) and the height and width""" rects.append([boxes[i, 0], boxes[i, 1], boxes[i, 2] - boxes[i, 0], boxes[i, 3] - boxes[i, 1]]) rects, weights = cv2.groupRectangles(rects, groupThresh, overlapThresh) rectangles = [] for i in range(len(rects)): testRect = Rect(Point(rects[i, 0], rects[i, 1]), Point(rects[i, 0] + rects[i, 2], rects[i, 1] + rects[i, 3])) rectangles.append(testRect) clusters = [] for rect in rectangles: matched = 0 for cluster in clusters: if (rect_merge(rect, cluster, 0.2)): matched = 1 cluster.left = (cluster.left + rect.left) / 2 cluster.right = (cluster.right + rect.right) / 2 cluster.bottom = (cluster.bottom + rect.bottom) / 2 cluster.top = (cluster.top + rect.top) / 2 if (not matched): clusters.append(rect) result_boxes = [] for i in range(len(clusters)): result_boxes.append([clusters[i].left, clusters[i].bottom, clusters[i].right, clusters[i].top, 1]) return result_boxes def print_tensor_info(tensor): print("tensor name:", tensor.op.name, "-tensor shape:", tensor.get_shape().as_list()) def read_single_tfrecord(addr, _batch_size, shape): filename_queue = tf.train.string_input_producer([addr], shuffle=True) reader = tf.TFRecordReader() _, serialized_example = reader.read(filename_queue) features = tf.parse_single_example(serialized_example, features={ 'image': tf.FixedLenFeature([], tf.string), 'label': tf.FixedLenFeature([], tf.int64)}) img = tf.decode_raw(features['image'], tf.uint8) label = tf.cast(features['label'], tf.int32) img = tf.reshape(img, [shape, shape, 3]) # img = augmentation(img) img=(tf.cast(img, tf.float32)-127.5)/128 min_after_dequeue = 10000 batch_size = _batch_size capacity = min_after_dequeue + 10 * batch_size image_batch, label_batch = tf.train.shuffle_batch([img, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue, num_threads=4) label_batch = tf.reshape(label_batch, [batch_size]) return image_batch, label_batch def Network(images, is_train): # input 227*227 with tf.variable_scope('vgg_16'): with slim.arg_scope([slim.conv2d, slim.max_pool2d], padding='VALID'): conv1 = slim.conv2d(images, 96, [11, 11], stride=[4, 4], scope='conv_1') # 55*55*96 print_tensor_info(conv1) pool1 = slim.max_pool2d(conv1, [3, 3], stride=[2, 2], scope='pool_1') # 27*27*96 print_tensor_info(pool1) conv2 = slim.conv2d(pool1, 256, [5, 5], stride=[1, 1], scope='conv_2') print_tensor_info(conv2) pool2 = slim.max_pool2d(conv2, [3, 3], stride=[2, 2], scope='pool_2') print_tensor_info(pool2) conv3 = slim.conv2d(pool2, 384, [3, 3], stride=[1, 1], scope='conv_3') print_tensor_info(conv3) conv4 = slim.conv2d(conv3, 384, [3, 3], stride=[1, 1], scope='conv_4') print_tensor_info(conv4) conv5 = slim.conv2d(conv4, 256, [3, 3], stride=[1, 1], scope='conv_5') pool5 = slim.max_pool2d(conv5, [3, 3], stride=[2, 2], scope='pool_5') print_tensor_info(pool5) conv6 = slim.conv2d(pool5, 256, [2, 2], stride=[1, 1], scope='conv_6') print_tensor_info(conv6) output = slim.conv2d(conv6, 2, [1, 1], stride=[1, 1], scope='output') print_tensor_info(output) if is_train: output = tf.reshape(output, [-1, 2]) print_tensor_info(output) else: output = tf.squeeze(output, axis=0) output = tf.nn.softmax(output) print_tensor_info(output) return output def train(): image = tf.placeholder(tf.float32, [batch_size, img_size, img_size, 3], name='image') label = tf.placeholder(tf.int32, [batch_size], name='label') logit = Network(image, True) # [batch, 2] train_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logit, labels=label)) train_images, train_labels = read_single_tfrecord(addr, batch_size, img_size) with tf.name_scope('loss'): tf.summary.scalar('train_loss', train_loss) global_step = tf.Variable(name='global_step', initial_value=0, trainable=False) inc_op = tf.assign_add(global_step, 1, name='increment_global_step') scale = int(128.0/batch_size) _lr_steps = [scale*s for s in lr_steps] _lr_values = [v/scale for v in lr_values] lr = tf.train.piecewise_constant(global_step, boundaries=_lr_steps, values=_lr_values, name='lr_schedule') update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op = tf.train.MomentumOptimizer(learning_rate=lr, momentum=0.9).minimize(train_loss) with tf.name_scope('accuracy'): # label = tf.one_hot(label, 2) train_accuracy = tf.reduce_mean( tf.cast(tf.equal(tf.to_int32(tf.argmax(tf.nn.softmax(logit), axis=1)), label), tf.float32)) tf.summary.scalar('train_accuracy', train_accuracy) saver = tf.train.Saver(max_to_keep=5) merged = tf.summary.merge_all() with tf.Session() as sess: sess.run((tf.global_variables_initializer(), tf.local_variables_initializer())) coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(sess=sess, coord=coord) saver.restore(sess, '/home/sxj/DL/insightface/alex_model/Alex-40000') writer_train = tf.summary.FileWriter("/home/sxj/DL/insightface/alex_model/%s" % (model_name), sess.graph) try: for i in range(1, train_step): image_batch, label_batch = sess.run([train_images, train_labels]) sess.run([train_op, inc_op], feed_dict={image: image_batch, label: label_batch}) if (i % 100 == 0): summary = sess.run(merged, feed_dict={image: image_batch, label: label_batch}) writer_train.add_summary(summary, i) if (i % 1000 == 0): print('次数', i) print('train_accuracy', sess.run(train_accuracy, feed_dict={image: image_batch, label: label_batch})) print('train_loss', sess.run(train_loss, {image: image_batch, label: label_batch})) if (i % 10000 == 0): saver.save(sess, os.path.join(model_path, model_name), global_step=i) except tf.errors.OutOfRangeError: print("finished") finally: coord.request_stop() writer_train.close() def face_detection(imgFlie): scales = [] factor = 0.793700526 img = cv2.imread(imgFlie) # image = tf.placeholder(tf.float32, name='image') largest = min(2, 4000 / max(img.shape[0:2])) scale = largest minD = largest * min(img.shape[0:2]) while minD >= 227: scales.append(scale) scale *= factor minD *= factor total_boxes = [] for scale in scales: print("scales") print(scale) image = tf.placeholder(tf.float32, name='image') scale_img = cv2.resize(img, (int(img.shape[0] * scale), int(img.shape[1] * scale))) image_reshape = tf.reshape(image, [1, scale_img.shape[0], scale_img.shape[1], 3]) logit = Network(image_reshape, False) with tf.Session() as sess: sess.run((tf.global_variables_initializer(), tf.local_variables_initializer())) saver = tf.train.Saver() saver.restore(sess, "/home/sxj/DL/insightface/alex_model/Alex-50000") input = sess.run(logit, feed_dict={image: scale_img}) boxes = generateBoundingBox(input[:, :, 0], scale) if boxes: total_boxes.extend(boxes) tf.reset_default_graph() print(total_boxes) boxes_nms = np.array(total_boxes) true_boxes = nms_average(boxes_nms, 1, 0.2) if not true_boxes == []: (x1, y1, x2, y2) = true_boxes[0][:-1] cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0)) plt.imshow(img) plt.show() if __name__ == "__main__": train() # img = "/home/sxj/DL/face_detect/tmp9055.jpg" # face_detection(img)
训练过程如下:
测试结果:
tensorflow的实现代码的可能还有缺陷,实际跑出来效果不太好,还在优化中,欢迎各位大佬提出意见。
注:本人正在学习AI相关知识,本例只是通过视频学习加上自己动手操作实现人脸检测功能,仅供自己学习记录。