字段解析(1)

在ClassfileParser::parseClassFile()函数中,解析完常量池、父类和接口后,接着会调用parser_fields()函数解析字段信息。调用语句如下:

u2 java_fields_count = 0;
// Fields (offsets are filled in later)
FieldAllocationCount fac;
Array<u2>* fields = parse_fields(class_name,
                                     access_flags.is_interface(),
                                     &fac, &java_fields_count,
                                     CHECK_(nullHandle));

在调用parse_fields()方法之前定义了一个变量fac,类型为FieldAllocationCount,定义如下:

来源:classFileParser.cpp文件

class FieldAllocationCount: public ResourceObj {
 public:
  u2 count[MAX_FIELD_ALLOCATION_TYPE];
 
  FieldAllocationCount() {
    for (int i = 0; i < MAX_FIELD_ALLOCATION_TYPE; i++) { // MAX_FIELD_ALLOCATION_TYPE的值为10
      count[i] = 0;
    }
  }
 
  FieldAllocationType update(bool is_static, BasicType type) {
    FieldAllocationType atype = basic_type_to_atype(is_static, type);
    // Make sure there is no overflow with injected fields.
    assert(count[atype] < 0xFFFF, "More than 65535 fields");
    count[atype]++;
    return atype;
  }
};

count数组用来统计各个类型变量的数量,这些类型通过FieldAllocationType枚举值定义。FieldAllocationType枚举类的定义如下:

enum FieldAllocationType {
  STATIC_OOP,                // 0 Oops
  STATIC_BYTE,               // 1 Boolean, Byte, char
  STATIC_SHORT,              // 2 shorts
  STATIC_WORD,               // 3 ints
  STATIC_DOUBLE,             // 4 aligned long or double

  NONSTATIC_OOP,             // 5
  NONSTATIC_BYTE,            // 6
  NONSTATIC_SHORT,           // 7
  NONSTATIC_WORD,            // 8
  NONSTATIC_DOUBLE,          // 9

  MAX_FIELD_ALLOCATION_TYPE, // 10
  BAD_ALLOCATION_TYPE = -1
};

主要统计静态与非静态的这5种变量的数量,这样在分配内存空间时,会根据变量的数量计算所需要的内存大小。统计的类型如下:

  • Oop,引用类型
  • Byte,字节类型
  • Short,短整型
  • Word,双字类型
  • Double,浮点类型

update()方法用来更新对应类型变量的总数量。其中的BasicType枚举类的定义如下:

源代码位置:utilities/globalDefinitions.hpp 
enum BasicType {
  T_BOOLEAN     =  4,
  T_CHAR        =  5,
  T_FLOAT       =  6,
  T_DOUBLE      =  7,
  T_BYTE        =  8,
  T_SHORT       =  9,
  T_INT         = 10,
  T_LONG        = 11,
  T_OBJECT      = 12,
  T_ARRAY       = 13,
  T_VOID        = 14,
  T_ADDRESS     = 15, // 表示ret指令用到的表示返回地址的returnAddress类型
  T_NARROWOOP   = 16,
  T_METADATA    = 17,
  T_NARROWKLASS = 18,
  T_CONFLICT    = 19, // for stack value type with conflicting contents
  T_ILLEGAL     = 99
};

调用basic_type_to_atype()方法将BasicType对象转换为对应的FieldAllocationType对象,如下:

static FieldAllocationType _basic_type_to_atype[2 * (T_CONFLICT + 1)] = {
  BAD_ALLOCATION_TYPE, //                  0
  BAD_ALLOCATION_TYPE, //                  1
  BAD_ALLOCATION_TYPE, //                  2
  BAD_ALLOCATION_TYPE, //                  3
  ///////////////////////////////////////////////////////////
  NONSTATIC_BYTE ,     // T_BOOLEAN     =  4,
  NONSTATIC_SHORT,     // T_CHAR        =  5,
  NONSTATIC_WORD,      // T_FLOAT       =  6,
  NONSTATIC_DOUBLE,    // T_DOUBLE      =  7,
  NONSTATIC_BYTE,      // T_BYTE        =  8,
  NONSTATIC_SHORT,     // T_SHORT       =  9,
  NONSTATIC_WORD,      // T_INT         = 10,
  NONSTATIC_DOUBLE,    // T_LONG        = 11,
  NONSTATIC_OOP,       // T_OBJECT      = 12,
  NONSTATIC_OOP,       // T_ARRAY       = 13,
  ///////////////////////////////////////////////////////////
  BAD_ALLOCATION_TYPE, // T_VOID        = 14,
  BAD_ALLOCATION_TYPE, // T_ADDRESS     = 15,
  BAD_ALLOCATION_TYPE, // T_NARROWOOP   = 16,
  BAD_ALLOCATION_TYPE, // T_METADATA    = 17,
  BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18,
  BAD_ALLOCATION_TYPE, // T_CONFLICT    = 19,

  BAD_ALLOCATION_TYPE, //                  0
  BAD_ALLOCATION_TYPE, //                  1
  BAD_ALLOCATION_TYPE, //                  2
  BAD_ALLOCATION_TYPE, //                  3
  ///////////////////////////////////////////////////////////
  STATIC_BYTE ,        // T_BOOLEAN     =  4,
  STATIC_SHORT,        // T_CHAR        =  5,
  STATIC_WORD,         // T_FLOAT       =  6,
  STATIC_DOUBLE,       // T_DOUBLE      =  7,
  STATIC_BYTE,         // T_BYTE        =  8,
  STATIC_SHORT,        // T_SHORT       =  9,
  STATIC_WORD,         // T_INT         = 10,
  STATIC_DOUBLE,       // T_LONG        = 11,
  STATIC_OOP,          // T_OBJECT      = 12,
  STATIC_OOP,          // T_ARRAY       = 13,
  ///////////////////////////////////////////////////////////
  BAD_ALLOCATION_TYPE, // T_VOID        = 14,
  BAD_ALLOCATION_TYPE, // T_ADDRESS     = 15,
  BAD_ALLOCATION_TYPE, // T_NARROWOOP   = 16,
  BAD_ALLOCATION_TYPE, // T_METADATA    = 17,
  BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18,
  BAD_ALLOCATION_TYPE, // T_CONFLICT    = 19,
};

static FieldAllocationType basic_type_to_atype(bool is_static, BasicType type) {
  assert(type >= T_BOOLEAN && type < T_VOID, "only allowable values");
  FieldAllocationType result = _basic_type_to_atype[  type + (is_static ? (T_CONFLICT + 1) : 0)  ];
  assert(result != BAD_ALLOCATION_TYPE, "bad type");
  return result;
}

方法baseic_type_to_atype()的实现很简单,这里不在介绍。  

1、为变量分配内存空间

为变量分配内存,在ClassFileParser::parse_fields()函数中有如下调用:

 u2* fa = NEW_RESOURCE_ARRAY_IN_THREAD(
             THREAD, u2, total_fields * (FieldInfo::field_slots + 1));

其中NEW_RESOURCE_ARRAY_IN_THREAD宏定义如下:

#define NEW_RESOURCE_ARRAY_IN_THREAD(thread, type, size)
    (type*) resource_allocate_bytes(thread, (size) * sizeof(type))

宏替换后相当于如下调用代码:

u2* fa = (u2*) resource_allocate_bytes(THREAD, (total_fields * (FieldInfo::field_slots + 1)) * sizeof(u2))

其中FieldInfo是个枚举类型,枚举常量field_slots的值为6,在内存中开辟total_fields * (FieldInfo::field_slots + 1)个sizeof(u2)大小的内存空间,因为存储时要按如下的规则存储:

f1: [access, name index, sig index, initial value index, low_offset, high_offset]
f2: [access, name index, sig index, initial value index, low_offset, high_offset]
       ...
fn: [access, name index, sig index, initial value index, low_offset, high_offset]
     [generic signature index]
     [generic signature index]
     ...

也就是如果有n个变量,那么每个变量要占用6个u2类型的存储空间,不过每个变量还可能会有generic signature index,所以只能暂时开辟足够大小的空间来临时存储一下,在后面会按照实际情况来分配空间,然后copy一下即可,这样就避免了由于某些变量没有generic signature index而多分配出的空间。 

变量在Class文件中的存储格式如下:

field_info {
    u2             access_flags;
    u2             name_index;
    u2             descriptor_index;
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

其中的access_flags、name_index与descriptor_index对应的就是每个fn中的access、name index与sig index。另外的initial value index用来存储常量值(如果这个变量是一个常量),low_offset与high_offset在后面会详细介绍,这里暂时不介绍。

调用的resource_allocate_bytes()函数如下:

extern char* resource_allocate_bytes(Thread* thread, size_t size, AllocFailType alloc_failmode) {
  return thread->resource_area()->allocate_bytes(size, alloc_failmode);
}
char* allocate_bytes(size_t size, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) {
   return (char*)Amalloc(size, alloc_failmode);
}
void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) {
	// 校验ARENA_AMALLOC_ALIGNMENT必须是2的整数倍
    assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT) , "should be a power of 2");
    // 宏扩展后为:
    // ((((size_t)(x)) + (((size_t)((2*BytesPerWord))) - 1)) & (~((size_t)(((size_t)((2*BytesPerWord))) - 1))))
    x = ARENA_ALIGN(x);

    if (!check_for_overflow(x, "Arena::Amalloc", alloc_failmode))
      return NULL;

    if (_hwm + x > _max) {
      return grow(x, alloc_failmode);
    } else {
      char *old = _hwm;
      _hwm += x;
      return old;
    }
}

最终是在ResourceArea中分配空间,每个线程有一个_resource_area属性,调用的Amalloc()函数与之前在释放Handle句柄时介绍到的Amalloc_4()函数非常相似,这里不过多介绍。

_resource_area属性的定义如下:

// Thread local resource area for temporary allocation within the VM
ResourceArea* _resource_area;

在创建线程对象Thead时就会初始化这个属性,在构造函数中有如下调用:

set_resource_area(new (mtThread)ResourceArea()); // 初始化_resource_area属性

ResourceArea继承自Arena类,通过ResourceArea分配内存空间后就可以通过ResourceMark释放,类似于HandleArea和HandleMark。  

2、读取变量

下面看ClassFileParser::parse_fields()方法中对变量的读取,如下:

// The generic signature slots start after all other fields' data.
  int generic_signature_slot = total_fields * FieldInfo::field_slots;
  int num_generic_signature = 0;
  for (int n = 0; n < length; n++) {
    cfs->guarantee_more(8, CHECK_NULL);  // access_flags, name_index, descriptor_index, attributes_count
    // 读取变量的访问标识
    AccessFlags access_flags;
    jint flags = cfs->get_u2_fast() & JVM_RECOGNIZED_FIELD_MODIFIERS;
    access_flags.set_flags(flags);
    // 读取变量名称索引
    u2 name_index = cfs->get_u2_fast();
    int cp_size = _cp->length(); // 读取常量池中的数量

    Symbol*  name = _cp->symbol_at(name_index);
    // 读取描述符索引
    u2 signature_index = cfs->get_u2_fast();
    Symbol*  sig = _cp->symbol_at(signature_index);

    u2     constantvalue_index = 0;
    bool   is_synthetic = false;
    u2     generic_signature_index = 0;
    bool   is_static = access_flags.is_static();
    FieldAnnotationCollector parsed_annotations(_loader_data);
    // 读取变量属性
    u2 attributes_count = cfs->get_u2_fast();
    if (attributes_count > 0) {
      parse_field_attributes(attributes_count, is_static, signature_index,
                             &constantvalue_index, &is_synthetic,
                             &generic_signature_index, &parsed_annotations,
                             CHECK_NULL);
      if (parsed_annotations.field_annotations() != NULL) {
        if (_fields_annotations == NULL) {
          _fields_annotations = MetadataFactory::new_array<AnnotationArray*>(
                                             _loader_data, length, NULL,
                                             CHECK_NULL);
        }
        _fields_annotations->at_put(n, parsed_annotations.field_annotations());
        parsed_annotations.set_field_annotations(NULL);
      }
      if (parsed_annotations.field_type_annotations() != NULL) {
        if (_fields_type_annotations == NULL) {
          _fields_type_annotations = MetadataFactory::new_array<AnnotationArray*>(
                                                  _loader_data, length, NULL,
                                                  CHECK_NULL);
        }
        _fields_type_annotations->at_put(n, parsed_annotations.field_type_annotations());
        parsed_annotations.set_field_type_annotations(NULL);
      }

      if (is_synthetic) {
        access_flags.set_is_synthetic();
      }
      if (generic_signature_index != 0) {
        access_flags.set_field_has_generic_signature();
        fa[generic_signature_slot] = generic_signature_index;
        generic_signature_slot ++;
        num_generic_signature ++;
      }
    } // 变量属性读取完毕

    FieldInfo* field = FieldInfo::from_field_array(fa, n);
    field->initialize(access_flags.as_short(),
                      name_index,
                      signature_index,
                      constantvalue_index);
    BasicType type = _cp->basic_type_for_signature_at(signature_index);

    // Remember how many oops we encountered and compute allocation type
    FieldAllocationType atype = fac->update(is_static, type);
    field->set_allocation_type(atype);

    // After field is initialized with type, we can augment it with aux info
    if (parsed_annotations.has_any_annotations())
       parsed_annotations.apply_to(field);
  } // 结束了for语句

按格式读取出变量的各个值后存储到fa中,其中FieldInfo::from_field_array()方法的实现如下:

static FieldInfo* from_field_array(u2* fields, int index) {
    return ((FieldInfo*)(fields + index * field_slots));
}

取出第index个变量对应的6个u2类型的内存位置,然后强制转换为FieldInfo*,这样就通过FieldInfo类非常方便的存取6个属性了,FieldInfo类的定义如下:

// This class represents the field information contained in the fields
// array of an InstanceKlass.  Currently it's laid on top an array of
// Java shorts but in the future it could simply be used as a real
// array type.  FieldInfo generally shouldn't be used directly.
// Fields should be queried either through InstanceKlass or through
// the various FieldStreams.
class FieldInfo VALUE_OBJ_CLASS_SPEC {
	u2  _shorts[field_slots];
         ...
}

这个类没有虚函数,并且_shorts数组中的元素也是u2类型,也就是占用16位,在内存布局与之前介绍存储变量的布局完全一样,直接通过类中定义的方法操作_shorts数组即可。

调用field->initialize()方法存储读取出来的变量各个属性值,方法的实现如下:

void initialize(u2 access_flags,
                  u2 name_index,
                  u2 signature_index,
                  u2 initval_index  ){
    _shorts[access_flags_offset] = access_flags;
    _shorts[name_index_offset] = name_index;
    _shorts[signature_index_offset] = signature_index;
    _shorts[initval_index_offset] = initval_index;

    _shorts[low_packed_offset] = 0;
    _shorts[high_packed_offset] = 0;
}

调用_cp->basic_type_for_signature_at()从变量的签名中读取类型,方法的实现如下:

BasicType ConstantPool::basic_type_for_signature_at(int which) {
  return FieldType::basic_type(symbol_at(which));
}

Symbol* symbol_at(int which) {
    assert(tag_at(which).is_utf8(), "Corrupted constant pool");
    return *symbol_at_addr(which);
}

BasicType FieldType::basic_type(Symbol* signature) {
  return char2type(signature->byte_at(0));
}

BasicType FieldType::basic_type(Symbol* signature) {
  return char2type(signature->byte_at(0));
}

// Convert a char from a classfile signature to a BasicType
inline BasicType char2type(char c) {
  switch( c ) {
  case 'B': return T_BYTE;
  case 'C': return T_CHAR;
  case 'D': return T_DOUBLE;
  case 'F': return T_FLOAT;
  case 'I': return T_INT;
  case 'J': return T_LONG;
  case 'S': return T_SHORT;
  case 'Z': return T_BOOLEAN;
  case 'V': return T_VOID;
  case 'L': return T_OBJECT;
  case '[': return T_ARRAY;
  }
  return T_ILLEGAL;
}

调用ConstantPool类中定义的symbol_at()函数从常量池which索引处获取表示签名字符串的Symbol对象,然后根据签名第1个字符就可判断出来变量的类型。得到变量的类型后,调用fac->update()函数更新对应类型的变量数量,这在本篇文章之前已经介绍过,这里不再介绍。

下面就是将临时存储变量信息的fa中的信息copy到新的数组中,代码如下:

// Now copy the fields' data from the temporary resource array.
  // Sometimes injected fields already exist in the Java source so
  // the fields array could be too long.  In that case the
  // fields array is trimed. Also unused slots that were reserved
  // for generic signature indexes are discarded.
  Array<u2>* fields = MetadataFactory::new_array<u2>(
          _loader_data, index * FieldInfo::field_slots + num_generic_signature,
          CHECK_NULL);
  _fields = fields; // save in case of error
  {
    int i = 0;
    for (; i < index * FieldInfo::field_slots; i++) {
      fields->at_put(i, fa[i]);
    }
    for (int j = total_fields * FieldInfo::field_slots;j < generic_signature_slot; j++) {
      fields->at_put(i++, fa[j]);
    }
    assert(i == fields->length(), "");
  }

在创建fields数组时,可以看到元素类型为u2的数组的大小变为了index * FieldInfo::field_slots + num_generic_signature,其中的index表示实际共有的变量数量(因为可能还有注入的变量),另外根据实际情况分配了num_generic_signature的存储位置,下面就是从fa中获取信息copy到fields中了,逻辑比较简单,这里不再详细介绍。 

相关文章的链接如下:

1、 在Ubuntu 16.04上编译OpenJDK8的源代码 

2、 调试HotSpot源代码

3、 HotSpot项目结构 

4、 HotSpot的启动过程 

5、 HotSpot二分模型(1)

6、 HotSpot的类模型(2)  

7、 HotSpot的类模型(3) 

8、 HotSpot的类模型(4)

9、 HotSpot的对象模型(5)  

10、HotSpot的对象模型(6) 

11、操作句柄Handle(7)

12、句柄Handle的释放(8)

13、类加载器 

14、类的双亲委派机制 

15、核心类的预装载

16、Java主类的装载  

17、触发类的装载  

18、类文件介绍 

19、文件流 

20、解析Class文件 

21、常量池解析(1) 

22、常量池解析(2)

作者持续维护的个人博客classloading.com

关注公众号,有HotSpot源码剖析系列文章!

      

  

原文地址:https://www.cnblogs.com/mazhimazhi/p/13409707.html