从反射看protobuf的部分实现

一、一个message的meta中包含的内容

所谓反射(reflection),最直观的理解就是可以在运行中通过一个字符串的名称获得一个内存地址。在protobuf中,这一点通过Reflection对象完成,尽管这个类的接口
virtual int32 GetInt32 (const Message& message,
const FieldDescriptor* field) const = 0;
并不是通过字符串名称来获得,但是由于可以通过field的字符串名称找到一个field的描述符,所以这个这里说可以通过一个字符串名称获得一个内存地址也没有毛病。
对于我们使用的消息,这个具体的接口实现在protobuf-mastersrcgoogleprotobufgenerated_message_reflection.cc文件中实现,从这个文件的具体实现来看,它是通过protobuf-mastersrcgoogleprotobufgenerated_message_reflection.h一个便宜表来实现
// Offset of any field.
uint32 GetFieldOffset(const FieldDescriptor* field) const {
if (field->containing_oneof()) {
size_t offset =
static_cast<size_t>(field->containing_type()->field_count() +
field->containing_oneof()->index());
return OffsetValue(offsets_[offset], field->type());
} else {
return GetFieldOffsetNonOneof(field);
}
}
这个偏移表的来源也就是在使用protobuf生成的pb文件中包含的offsets数组
const ::PROTOBUF_NAMESPACE_ID::uint32 TableStruct_demo_2eproto::offsets[] PROTOBUF_SECTION_VARIABLE(protodesc_cold) = {
PROTOBUF_FIELD_OFFSET(::tutorial::Person_PhoneNumber, _has_bits_),
PROTOBUF_FIELD_OFFSET(::tutorial::Person_PhoneNumber, _internal_metadata_),
……

而其中的宏protobuf-mastersrcgoogleprotobufport_def.inc
// Note that we calculate relative to the pointer value 16 here since if we
// just use zero, GCC complains about dereferencing a NULL pointer. We
// choose 16 rather than some other number just in case the compiler would
// be confused by an unaligned pointer.
#define PROTOBUF_FIELD_OFFSET(TYPE, FIELD)
static_cast< ::google::protobuf::uint32>(reinterpret_cast<const char*>(
&reinterpret_cast<const TYPE*>(16)->FIELD) -
reinterpret_cast<const char*>(16))

二、从描述字符串到Proto结构的转换

下面的proto文件描述了一个proto文件中Fielde信息,protoc编译生成的字符串类型的描述符使用的就是这种形式的描述文件
protobuf-mastersrcgoogleprotobufdescriptor.proto
// Describes a field within a message.
message FieldDescriptorProto {
enum Type {
// 0 is reserved for errors.
// Order is weird for historical reasons.
TYPE_DOUBLE = 1;
TYPE_FLOAT = 2;
// Not ZigZag encoded. Negative numbers take 10 bytes. Use TYPE_SINT64 if
// negative values are likely.
TYPE_INT64 = 3;
TYPE_UINT64 = 4;
// Not ZigZag encoded. Negative numbers take 10 bytes. Use TYPE_SINT32 if
// negative values are likely.
TYPE_INT32 = 5;
TYPE_FIXED64 = 6;
TYPE_FIXED32 = 7;
TYPE_BOOL = 8;
TYPE_STRING = 9;
// Tag-delimited aggregate.
// Group type is deprecated and not supported in proto3. However, Proto3
// implementations should still be able to parse the group wire format and
// treat group fields as unknown fields.
TYPE_GROUP = 10;
TYPE_MESSAGE = 11; // Length-delimited aggregate.

// New in version 2.
TYPE_BYTES = 12;
TYPE_UINT32 = 13;
TYPE_ENUM = 14;
TYPE_SFIXED32 = 15;
TYPE_SFIXED64 = 16;
TYPE_SINT32 = 17; // Uses ZigZag encoding.
TYPE_SINT64 = 18; // Uses ZigZag encoding.
};

enum Label {
// 0 is reserved for errors
LABEL_OPTIONAL = 1;
LABEL_REQUIRED = 2;
LABEL_REPEATED = 3;
};

optional string name = 1;
optional int32 number = 3;
optional Label label = 4;

// If type_name is set, this need not be set. If both this and type_name
// are set, this must be one of TYPE_ENUM, TYPE_MESSAGE or TYPE_GROUP.
optional Type type = 5;

// For message and enum types, this is the name of the type. If the name
// starts with a '.', it is fully-qualified. Otherwise, C++-like scoping
// rules are used to find the type (i.e. first the nested types within this
// message are searched, then within the parent, on up to the root
// namespace).
optional string type_name = 6;

// For extensions, this is the name of the type being extended. It is
// resolved in the same manner as type_name.
optional string extendee = 2;

// For numeric types, contains the original text representation of the value.
// For booleans, "true" or "false".
// For strings, contains the default text contents (not escaped in any way).
// For bytes, contains the C escaped value. All bytes >= 128 are escaped.
// TODO(kenton): Base-64 encode?
optional string default_value = 7;

// If set, gives the index of a oneof in the containing type's oneof_decl
// list. This field is a member of that oneof.
optional int32 oneof_index = 9;

// JSON name of this field. The value is set by protocol compiler. If the
// user has set a "json_name" option on this field, that option's value
// will be used. Otherwise, it's deduced from the field's name by converting
// it to camelCase.
optional string json_name = 10;

optional FieldOptions options = 8;
}

三、从proto文件到内存数据结构的转换

protobuf-mastersrcgoogleprotobufdescriptor.cc
void BuildField(const FieldDescriptorProto& proto,
const Descriptor* parent,
FieldDescriptor* result) {
BuildFieldOrExtension(proto, parent, result, false);
}

四、当我们使用index()接口时在使用什么

其实是一个内存结构相对于数组基地址的偏移量,也就是一个数组index
protobuf-mastersrcgoogleprotobufdescriptor.h
// To save space, index() is computed by looking at the descriptor's position
// in the parent's array of children.
inline int FieldDescriptor::index() const {
if (!is_extension_) {
return static_cast<int>(this - containing_type()->fields_);
} else if (extension_scope_ != NULL) {
return static_cast<int>(this - extension_scope_->extensions_);
} else {
return static_cast<int>(this - file_->extensions_);
}
}

五、一个message的Meta什么时候初始化

由于GetDescriptor和GetReflection都是首先调用GetDescriptor接口,所以通常在派生类接口中按需注册即可:
// Get a non-owning pointer to a Descriptor for this message's type. This
// describes what fields the message contains, the types of those fields, etc.
// This object remains property of the Message.
const Descriptor* GetDescriptor() const { return GetMetadata().descriptor; }

// Get a non-owning pointer to the Reflection interface for this Message,
// which can be used to read and modify the fields of the Message dynamically
// (in other words, without knowing the message type at compile time). This
// object remains property of the Message.
//
// This method remains virtual in case a subclass does not implement
// reflection and wants to override the default behavior.
virtual const Reflection* GetReflection() const final {
return GetMetadata().reflection;
}

例如下面是google项目自带的电话簿派生类实现的接口
::PROTOBUF_NAMESPACE_ID::Metadata Person::GetMetadata() const {
::PROTOBUF_NAMESPACE_ID::internal::AssignDescriptors(&::assign_descriptors_table_demo_2eproto);
return ::file_level_metadata_demo_2eproto[kIndexInFileMessages];

六、一个简单的demo

从这个例子看,这种反射并没有太大实际用处,以为类型是需要在编译时确定的,也就是不同的数据类型需要使用不同的接口来操作
tsecer@protobuf: cat addressbook.proto
syntax = "proto2";

package tutorial;

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}

message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}

repeated PhoneNumber phones = 4;
}

message AddressBook {
repeated Person people = 1;
}

tsecer@protobuf: cat relection.cc
#include <iostream>
#include <fstream>
#include <string>
#include "addressbook.pb.h"
using namespace std;

// file.
int main(int argc, char* argv[]) {

tutorial::Person person;
person.set_id(1111);
const google::protobuf::Reflection *pstRefl = person.GetReflection();
const google::protobuf::Descriptor *pstDesc = person.GetDescriptor();
const google::protobuf::FieldDescriptor *pstField = pstDesc->FindFieldByName("id");
printf("id before is %d ", person.id());
pstRefl->SetInt32(&person, pstField, 2222);
printf("id after is %d ", person.id());
return 0;
}
tsecer@protobuf: protoc --cpp_out=. addressbook.proto
tsecer@protobuf: g++ relection.cc addressbook.pb.cc -lprotobuf --std=c++11
./tsecer@protobuf: ./a.out
id before is 1111
id after is 2222

原文地址:https://www.cnblogs.com/tsecer/p/10690972.html