图像识别

图像识别
原文：

    我们的大脑很容易实现视觉。我们可以很容易地分别狮子和豹子，读取一个信号，或者识别一个人脸。但是这些对于一个计算机来说是相当困难的问题。他们只是看起来简单，是因为我们大脑在理解图片方面是相当厉害的。
    在过去几年，机器学习领域取得了巨大的进步。特别是，我们已经发现，一种模型称为一种深层卷积神经网络以实现合理的性能，硬视觉识别任务--在某些领域匹配或超过人类的性能。
    研究人员在计算机视觉领域已经表现出了稳定的进步，通过验证他们对ImageNe：计算机视觉的学术基准工作。连续模式继续显示改善，每次达到一个新的国家的最先进的结果：QuocNet：, AlexNet：, Inception (GoogLeNet)：, BN-Inception-v2：。谷歌内部和外部的研究人员对研究已经发表论文描述所有这些模型，但结果仍然很难复制。我们现在正在通过释放代码来运行图像识别在我们的最新模型，Inception-v3：。
    inception-v3训练IMAGEnet：大视觉识别的挑战赛从2012年的数据。这是一个标准的任务在计算机视觉，模式尝试把整个图像库分类为1000类。例如，这里有一些图片alexnet：分类结果：

    比较模型，我们研究模型失败的频率来预测作为一个顶级5猜测的正确答案--被称为“五大错误率”。alexnet：实现了通过设置五大错误率15.3% 2012年验证数据集；bn-inception-v2：达到6.66%；inception-v3：达到3.46 %。
    人类能做的多好在imagenet人类知识的挑战？有博客：的人被attempted Andrej karpathy测量自己的表演。他达到了5.1 %，前5位错误率。
    这个例子会教你如何使用Inception-v3。你会了解如何用python或者C++语言，把照片分为1000类。我们也会讨论如何从这个模型中提取出更高层次的特征，可以用于其他视觉任务。
    我们也很期待，社区会用这个模型实现什么。

使用Python API：
    当我们第一次运行classify_image.py程序的时候，会从tensorflow.org下载训练好的模型。你将需要200M的空间。
    下面的命令是假设你从PIP包安装Tensorflow，然后你的terminal定位到Tensorflow的根目录。

1 cd tensorflow/models/image/imagenet
2 python classify_image.py

View Code

以上的命令会上交一张panda bear的照片。

如果模型被正确运行，那么脚本会输出一下内容：

1 giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493)
2 indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878)
3 lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317)
4 custard apple (score = 0.00149)
5 earthstar (score = 0.00127)

View Code

    如果你希望提交其它的JPEG照片，你可以通过编辑--image_file的参数。
    如果你下载的模型数据在不同的目录，你将需要指定--image_dir的参数到你目录。

使用C++ API：
    你可以运行相同的inception-v3：模型在C++在生产环境中使用。你可以下载档案包含graphdef定义模型像这样（从tensorflow库的根目录运行）：

1 wget https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip -O tensorflow/examples/label_image/data/inception_dec_2015.zip
2 unzip tensorflow/examples/label_image/data/inception_dec_2015.zip -d tensorflow/examples/label_image/data/

View Code

接下来，我们需要编译C++程序，该程序包括加载和执行图表的代码。如果你是按照the instructions to download the source installation of TensorFlow:配置你的平台，你应该能编译该例子通过这命令，从你的shell terminal：

1 bazel build tensorflow/examples/label_image/...

View Code

上面的命令会生成一个可执行程序，然后你就可以这样运行：

1 bazel-bin/tensorflow/examples/label_image/label_image

View Code

这使用默认的例子图像，那图片与框架已经绑定好的，并应该输出类似的东西：

1 I tensorflow/examples/label_image/main.cc:200] military uniform (866): 0.647296
2 I tensorflow/examples/label_image/main.cc:200] suit (794): 0.0477196
3 I tensorflow/examples/label_image/main.cc:200] academic gown (896): 0.0232411
4 I tensorflow/examples/label_image/main.cc:200] bow tie (817): 0.0157356
5 I tensorflow/examples/label_image/main.cc:200] bolo tie (940): 0.0145024

View Code

    在这个例子中，我们使用的默认的形象，海军上将Admiral Grace Hopper：，你可以看到网络正确识别她穿着军装，有一个高的得分为0.6。

    下面，尝试你自己的照片，通过--image参数指定，例如：
bazel-bin/tensorflow/examples/label_image/label_image --image=my_image.png
    如果你查看tensorflow/examples/label_image/main.cc：文件，你会看到它是如何工作的。我们希望此代码将帮助你整合tensorflow到您自己的应用程序，所以我们将通过一步一步的讲解主要功能：
    这命令行标志控制加载文件的地方，以及输入图像的属性。该模型期望得到299*299RGB图片，他们是通过input_width和input_height标志来控制。我们也需要缩放像素值，从0和255之间的浮点值，图表操作整数。我们控制缩放input_mean和input_std标志：我们先减去input_mean从每个像素值，然后除以input_std。
    这些值可能看起来有点不可思议，但它们只是由原始模型作者定义的，是基于他/她想用的输入的训练照片。如果你有一个自己训练的图表，你只需要调整值和你使用的相匹配，在训练的进程中。
    在ReadTensorFromImageFile()：函数，可以看到他们是如何运用到一张照片的。

1 // Given an image file name, read in the data, try to decode it as an image,
2 // resize it to the requested size, and then scale the values as desired.
3 Status ReadTensorFromImageFile(string file_name, const int input_height,
4                                const int input_width, const float input_mean,
5                                const float input_std,
6                                std::vector<Tensor>* out_tensors) {
7   tensorflow::GraphDefBuilder b;

View Code

我们通过创建一个GraphDefBuilder，这是一个对象，我们可以使用指定一个模型来运行或加载。

1   string input_name = "file_reader";
2   string output_name = "normalized";
3   tensorflow::Node* file_reader =
4       tensorflow::ops::ReadFile(tensorflow::ops::Const(file_name, b.opts()),
5                                 b.opts().WithName(input_name));

View Code

然后，我们开始为我们想要加载和执行的小模型创建节点，调整大小，缩放像素值，来得到主模型期待的结果来作为输入。我们创建的第一个节点只是一个常量op，包含一个带有我们想要加载照片文件名的张量。那也是作为第一个输入传递给ReadFile op。你可能也注意到了，我们传递b.opts()作为最后参数给所有op创建函数。运行GraphDefBuilder的时候，这参数确保了节点被加载到了定义模型。我们也命名了ReadFile操作，通过让WithName()命令b.opts()。这给节点一个名字，那也不是严格必要的，如果你不这样做，系统会自动分配名字，但如果你做了，它确实使调试更容易。

 1   // Now try to figure out what kind of file it is and decode it.
 2   const int wanted_channels = 3;
 3   tensorflow::Node* image_reader;
 4   if (tensorflow::StringPiece(file_name).ends_with(".png")) {
 5     image_reader = tensorflow::ops::DecodePng(
 6         file_reader,
 7         b.opts().WithAttr("channels", wanted_channels).WithName("png_reader"));
 8   } else {
 9     // Assume if it's not a PNG then it must be a JPEG.
10     image_reader = tensorflow::ops::DecodeJpeg(
11         file_reader,
12         b.opts().WithAttr("channels", wanted_channels).WithName("jpeg_reader"));
13   }
14   // Now cast the image data to float so we can do normal math on it.
15   tensorflow::Node* float_caster = tensorflow::ops::Cast(
16       image_reader, tensorflow::DT_FLOAT, b.opts().WithName("float_caster"));
17   // The convention for image ops in TensorFlow is that all images are expected
18   // to be in batches, so that they're four-dimensional arrays with indices of
19   // [batch, height, width, channel]. Because we only have a single image, we
20   // have to add a batch dimension of 1 to the start with ExpandDims().
21   tensorflow::Node* dims_expander = tensorflow::ops::ExpandDims(
22       float_caster, tensorflow::ops::Const(0, b.opts()), b.opts());
23   // Bilinearly resize the image to fit the required dimensions.
24   tensorflow::Node* resized = tensorflow::ops::ResizeBilinear(
25       dims_expander, tensorflow::ops::Const({input_height, input_width},
26                                             b.opts().WithName("size")),
27       b.opts());
28   // Subtract the mean and divide by the scale.
29   tensorflow::ops::Div(
30       tensorflow::ops::Sub(
31           resized, tensorflow::ops::Const({input_mean}, b.opts()), b.opts()),
32       tensorflow::ops::Const({input_std}, b.opts()),
33       b.opts().WithName(output_name));

View Code

然后，我们继续添加更多的节点，解码的文件数据作为一个图像，将整数转换为浮点值，调整它，然后最后运行的像素值的减法和除法运算。

1   // This runs the GraphDef network definition that we've just constructed, and
2   // returns the results in the output tensor.
3   tensorflow::GraphDef graph;
4   TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));

View Code

最后，我们定义了一个模型存储到了b值，我们用tographdef()函数把它输入到一个完整的图形。

1   std::unique_ptr<tensorflow::Session> session(
2       tensorflow::NewSession(tensorflow::SessionOptions()));
3   TF_RETURN_IF_ERROR(session->Create(graph));
4   TF_RETURN_IF_ERROR(session->Run({}, {output_name}, {}, out_tensors));
5   return Status::OK();

View Code

    然后我们创建了一个Session类，它是一个确切的运行图表接口，然后运行它，指定哪个节点我们要从中获得输出，并在哪里输出数据。
    这给我们一个向量类，在这种情况下，我们只知道一个单一的对象。在此文中，你可以把一个tensor想象为一个多为数组，并且它拥有一个299像素高，299像素宽，3通道图片的浮点数值。如果你在你的工程里已经有了你自己的图像处理框架，你应该使用它，只要你在主图中应用相同的变换。
    这是一个用C++创建的简单的小的Tensorflow动态图，但是为了先前训练的Inception模型，我们想要加载一个更大的定义文件。你可以在LoadGraph()函数看到是如何实现的。

 1 // Reads a model graph definition from disk, and creates a session object you
 2 // can use to run it.
 3 Status LoadGraph(string graph_file_name,
 4                  std::unique_ptr<tensorflow::Session>* session) {
 5   tensorflow::GraphDef graph_def;
 6   Status load_graph_status =
 7       ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def);
 8   if (!load_graph_status.ok()) {
 9     return tensorflow::errors::NotFound("Failed to load compute graph at '",
10                                         graph_file_name, "'");
11   }

View Code

如果你已经浏览了图片加载代码，许多阶段应该看起来很相似。我们是直接加载一个包含GraphDef的protobuf文件，而不是使用一个

1 GraphDefBuilder来生成一个GraphDef类。
2   session->reset(tensorflow::NewSession(tensorflow::SessionOptions()));
3   Status session_create_status = (*session)->Create(graph_def);
4   if (!session_create_status.ok()) {
5     return session_create_status;
6   }
7   return Status::OK();
8 }

View Code

然后，我们从那个GraphDef创建了一个Session类，并传回给调用者，以至于可以下次使用。
GetTopLabels()函数很像加载图片，除了这情况，我们想要结果运行主图表，把它变成一个排序的最高评分标签列表。就像图片加载，它创建了GraphDefBuilder，加了一对节点给它，然后运行一个短图表来得到一双tensor输出。在这种情况，他们代表的排序分数和指数的最高结果的位置。

 1 // Analyzes the output of the Inception graph to retrieve the highest scores and
 2 // their positions in the tensor, which correspond to categories.
 3 Status GetTopLabels(const std::vector<Tensor>& outputs, int how_many_labels,
 4                     Tensor* indices, Tensor* scores) {
 5   tensorflow::GraphDefBuilder b;
 6   string output_name = "top_k";
 7   tensorflow::ops::TopK(tensorflow::ops::Const(outputs[0], b.opts()),
 8                         how_many_labels, b.opts().WithName(output_name));
 9   // This runs the GraphDef network definition that we've just constructed, and
10   // returns the results in the output tensors.
11   tensorflow::GraphDef graph;
12   TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));
13   std::unique_ptr<tensorflow::Session> session(
14       tensorflow::NewSession(tensorflow::SessionOptions()));
15   TF_RETURN_IF_ERROR(session->Create(graph));
16   // The TopK node returns two outputs, the scores and their original indices,
17   // so we have to append :0 and :1 to specify them both.
18   std::vector<Tensor> out_tensors;
19   TF_RETURN_IF_ERROR(session->Run({}, {output_name + ":0", output_name + ":1"},
20                                   {}, &out_tensors));
21   *scores = out_tensors[0];
22   *indices = out_tensors[1];
23   return Status::OK();

View Code

PrintTopLabels()函数带有这些分类的结果，然后把他们友善地输出来。CheckTopLabel()函数非常类似，但是只是确保顶部标签是我们期望的，用于调试的目的。

最后，main():把所有的调用组织在一起：

 1 int main(int argc, char* argv[]) {
 2   // We need to call this to set up global state for TensorFlow.
 3   tensorflow::port::InitMain(argv[0], &argc, &argv);
 4   Status s = tensorflow::ParseCommandLineFlags(&argc, argv);
 5   if (!s.ok()) {
 6     LOG(ERROR) << "Error parsing command line flags: " << s.ToString();
 7     return -1;
 8   }
 9 
10   // First we load and initialize the model.
11   std::unique_ptr<tensorflow::Session> session;
12   string graph_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_graph);
13   Status load_graph_status = LoadGraph(graph_path, &session);
14   if (!load_graph_status.ok()) {
15     LOG(ERROR) << load_graph_status;
16     return -1;
17   }

View Code

我们加载主图表：

 1   // Get the image from disk as a float array of numbers, resized and normalized
 2   // to the specifications the main graph expects.
 3   std::vector<Tensor> resized_tensors;
 4   string image_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_image);
 5   Status read_tensor_status = ReadTensorFromImageFile(
 6       image_path, FLAGS_input_height, FLAGS_input_width, FLAGS_input_mean,
 7       FLAGS_input_std, &resized_tensors);
 8   if (!read_tensor_status.ok()) {
 9     LOG(ERROR) << read_tensor_status;
10     return -1;
11   }
12   const Tensor& resized_tensor = resized_tensors[0];

View Code

负载，调整大小，并处理输入图像。

1   // Actually run the image through the model.
2   std::vector<Tensor> outputs;
3   Status run_status = session->Run({{FLAGS_input_layer, resized_tensor}},
4                                    {FLAGS_output_layer}, {}, &outputs);
5   if (!run_status.ok()) {
6     LOG(ERROR) << "Running model failed: " << run_status;
7     return -1;
8   }

View Code

这里，我们运行加载图表，用图表作为输入。

 1   // This is for automated testing to make sure we get the expected result with
 2   // the default settings. We know that label 866 (military uniform) should be
 3   // the top label for the Admiral Hopper image.
 4   if (FLAGS_self_test) {
 5     bool expected_matches;
 6     Status check_status = CheckTopLabel(outputs, 866, &expected_matches);
 7     if (!check_status.ok()) {
 8       LOG(ERROR) << "Running check failed: " << check_status;
 9       return -1;
10     }
11     if (!expected_matches) {
12       LOG(ERROR) << "Self-test failed!";
13       return -1;
14     }
15   }

View Code

为了测试的目的，我们可以检查，以确保我们得到我们期待的输出。

1   // Do something interesting with the results we've generated.
2   Status print_status = PrintTopLabels(outputs, FLAGS_labels);
3 
4     最后，我们打印我们发现的标签。
5   if (!print_status.ok()) {
6     LOG(ERROR) << "Running print failed: " << print_status;
7     return -1;
8   }

View Code

    这里的异常处理是用Tensorflow的Status类，它是很方便的，因为它让你知道是否有任何错误产生，使用ok()检查器，然后能作为可读的错误信息打印出来。
    在这个例子中，我们演示了物体识别，但是你应该能应用到非常相似的代码在别的模型，那些你已经发现和训练你自己的模型，在各种领域。我们希望这个小例子给你一些关于如何使用Tensorflow到你自己产品的想法。

    练习：迁移学习是一种思想，如果你知道如何解决一个任务，你应该能够将一些理解转移到解决相关问题。迁移学习的一个方法是移除网络的最后一层分类层，然后解压next-to-last layer of the CNN：http://arxiv.org/abs/1310.1531，在这种情况下，一个2048维向量。有引导如何操作的在in the how-to section:。

了解更多的资源：
    大概了解神经网络，Michael Nielsen's free online book:是一个很优异的资源。对于特别是convolutional neural networks，Chris Olah有一些nice blog posts：，和Michael Nielsen's书有一个great chaper：涵盖它们。

    要找出更多关于实现卷积神经网络，你可以跳转到Tensorflow deep convolutional networks tutorial：，或者慢慢地开始用我们ML beginner：或者ML expert MNIST： MNIST的开始例子。最后，如果你想在这方面的研究工作，你可以阅读本教程中引用的所有文件的最新工作。