Google Protocol Buffer 的使用和原理

摘自：https://blog.csdn.net/kinzxv/article/details/82699090

简介

Google Protocol Buffer( 简称 Protobuf) 是 Google 公司内部的混合语言数据标准。Protobuf是一种轻便高效的结构化数据存储格式，可以用于结构化数据串行化，或者说序列化。它很适合做数据存储或 RPC 数据交换格式。可用于通讯协议、数据存储等领域的语言无关、平台无关、可扩展的序列化结构数据格式。目前提供了 C++、Java、Python、JS、Ruby等多种语言的 API。

安装

安装步骤如下所示:

tar -xzf protobuf-2.1.0.tar.gz
cd protobuf-2.1.0
./configure --prefix=$INSTALL_DIR
make
make check
make install

清单1.Proto文件

package lm;
message helloworld
{
required int32 id = 1; // ID
required string str = 2; // str
optional int32 opt = 3; //optional field
}

编译proto文件

protoc -I=./ --cpp_out=$./ lm.hello.proto
(用法：protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/*.proto)

命令将生成两个文件：

lm.helloworld.pb.h ，定义了 C++ 类的头文件

lm.helloworld.pb.cc ， C++ 类的实现文件

在生成的头文件中，定义了一个 C++ 类 helloworld，后面的 Writer 和 Reader 将使用这个类来对消息进行操作。诸如对消息的成员进行赋值，将消息序列化等等都有相应的方法。

清单 2. Writer 的主要代码

#include "lm.helloworld.pb.h"
int main(void)
{
lm::helloworld msg1;
msg1.set_id(10080);
msg1.set_str(“hellow”);
// Write the new address book back to disk.
fstream output("./log", ios::out | ios::trunc | ios::binary);
//附加写二进制文件，存在先删除
if (!msg1.SerializeToOstream(&output)) {
cerr << "Failed to write msg." << endl;
return -1;
}
return 0;
}

清单 3. Reader

#include "lm.helloworld.pb.h"
void ListMsg(const lm::helloworld & msg) {
cout << msg.id() << endl;
cout << msg.str() << endl;
}
int main(int argc, char* argv[]) {
lm::helloworld msg1;
fstream input("./log", ios::in | ios::binary);
if (!msg1.ParseFromIstream(&input)) {
cerr << "Failed to parse address book." << endl;
return -1;
}
ListMsg(msg1);
return 0;
}

编译：

g++ test.cc lm.helloworld.pb.cc -I./ -L/usr/local/lib -o test.out –lprotobuf

运行：

0000000: e008 124e 6806 6c65 6f6c 0077 / /hexdump 注意大小端

0000000: 08e0 4e12 0668 656c 6c6f 77 ..N..hellow //Vim打开已经自动调整大小端了

我们生成如下的一个消息 Test1:

Test1.id = 10086；

Test1.str = “hellow”；

Key 的定义如下：

field_number << 3 | wire_type

08	(1 << 3) \| 0 (id字段，字段号为1，类型varint)
e0 4e	(原始数据：11100000 01001110 去掉标志位并按照小端序交换： 1001110 1100000(二进制) = 10080(十进制) )
12	(2 << 3) \| 2 (str字段，字段号为2，类型str)
06	字符串长度为6个字节
68 65 6c 6c 6f 77	(“hellow”字符串ASCII编码)

FAQ：

protobuf与json，xml比优点在哪里？

二进制消息，性能好/效率高（空间和时间效率都很不错，占用空间json 1/10，xml 1/20）
proto文件生成目标代码，简单易用
序列化反序列化直接对应程序中的数据类，不需要解析后在进行映射(XML,JSON都是这种方式)
支持向前兼容（新加字段采用默认值）和向后兼容（忽略新加字段），简化升级

使用protobuf出错：protoc: error while loading shared libraries: libprotoc.so.9: cannot open shared object file:No such...

解决方法：linux 敲击命令：export LD_LIBRARY_PATH=/usr/local/lib

附录：

Wire Type 可能的类型如下表所示：

Type	Meaning	Used For
0	Varint	int32, int64, uint32, uint64, sint32, sint64, bool, enum
1	64-bit	fixed64, sfixed64, double
2	Length-delimi	string, bytes, embedded messages, packed repeated fields
3	Start group	Groups (deprecated)
4	End group	Groups (deprecated)
5	32-bit	fixed32, sfixed32, float

Varint编码

Varint 是一种紧凑的表示数字的方法。它用一个或多个字节来表示一个数字，值越小的数字使用越少的字节数。这能减少用来表示数字的字节数。

比如对于 int32 类型的数字，一般需要 4 个 byte 来表示。但是采用 Varint，对于很小的 int32 类型的数字，则可以用 1 个 byte 来表示。当然凡事都有好的也有不好的一面，采用 Varint 表示法，大的数字则需要 5 个 byte 来表示。从统计的角度来说，一般不会所有的消息中的数字都是大数，因此大多数情况下，采用 Varint 后，可以用更少的字节数来表示数字信息。下面就详细介绍一下 Varint。

Varint 中的每个 byte 的最高位 bit 有特殊的含义，如果该位为 1，表示后续的 byte 也是该数字的一部分，如果该位为 0，则结束。其他的 7 个 bit 都用来表示数字。因此小于 128 的数字都可以用一个 byte 表示。大于 128 的数字，比如 300，会用两个字节来表示：1010 1100 0000 0010

下图演示了 Google Protocol Buffer 如何解析两个 bytes。注意到最终计算前将两个 byte 的位置相互交换过一次，这是因为 Google Protocol Buffer 字节序采用 little-endian 的方式。

图 6. Varint 编码