链接器之库文件处理

一、链接输入
其实链接真正的输入只有两类,一类是目标文件,另一类是库文件(包括通常以.a结束的静态库和以so结束的动态库),当然还有链接脚本输入以及响应文件输入等信息,这些咱就不加进来搀和了。但是事实上一个库文件也是一个所有目标文件以某种形式组成的打包文件。就好像windows下的winrar或者linux下的tar文件,它都是为了把一个操作系统相关的文件夹表示的文件树结构压缩入一个文件,从而便于传输,因为很多工具都是不支持文件夹传递了(因为文件目录结构是操作系统相关组织形式),例如我们常见的ftp协议以及大部分的IM软件。所以对于链接器来说,它就执行了对目标文件相反的操作,也就是把文件中的符号还原为一些目标文件的组合。但是为了便于链接器更快的知道这个库文件中包含了哪些文件,库文件使用了自己特殊的格式来组织符号,从而可以让链接器尽可能快的确定这个库中的某些文件是否需要。所以库文件的开始列出了自己包含的所有符号的信息(准确的说是已定义符号),在链接器读入库文件的时候,它会决定读取这些信息,决定这个库中符号是否需要。如果当前未定义符号在库文件中有定义,那么就把这个符号所在的目标文件整个作为链接器的目标文件输入。这相当于一个CPU的cache算法,如果有一个cache字节命中,那么整个cache line都被从内存读入CPU cache。
二、当遍历到库文件时处理方法
1、我们看到的调用链
(gdb) bt
#0  add_archive_element (info=0x8144d40, abfd=0x81b0098, 
    name=0x81af118 "__libc_csu_fini", subsbfd=0xbfffee6c)
    at ../../binutils-2.21.1/ld/ldmain.c:797
#1  0x080b4f57 in elf_link_add_archive_symbols (abfd=0x818ec10, info=0x8144d40)
    at ../../binutils-2.21.1/bfd/elflink.c:5094
#2  0x080b50b2 in bfd_elf_link_add_symbols (abfd=0x818ec10, info=0x8144d40)
    at ../../binutils-2.21.1/bfd/elflink.c:5153
#3  0x08055660 in load_symbols (entry=0x818d1e8, place=0xbfffef40)
    at ../../binutils-2.21.1/ld/ldlang.c:2754
#4  0x08056080 in open_input_bfds (s=0x818d1e8, mode=OPEN_BFD_FORCE)
    at ../../binutils-2.21.1/ld/ldlang.c:3201
#5  0x08055fbf in open_input_bfds (s=0x814f4c8, mode=OPEN_BFD_NORMAL)
    at ../../binutils-2.21.1/ld/ldlang.c:3166
#6  0x0805af79 in lang_process () at ../../binutils-2.21.1/ld/ldlang.c:6475
#7  0x0805eadd in main (argc=27, argv=0xbffff104)
    at ../../binutils-2.21.1/ld/ldmain.c:462
2、函数大致分析
函数elf_link_add_archive_symbols (bfd *abfd, struct bfd_link_info *info)是实现整个库符号处理的主体
c = bfd_ardata (abfd)->symdef_count;
  if (c == 0)
    return TRUE;
  amt = c;
  amt *= sizeof (bfd_boolean);
  defined = (bfd_boolean *) bfd_zmalloc (amt);
  included = (bfd_boolean *) bfd_zmalloc (amt);
  if (defined == NULL || included == NULL)
    goto error_return;

  symdefs = bfd_ardata (abfd)->symdefs;
  bed = get_elf_backend_data (abfd);
  archive_symbol_lookup = bed->elf_backend_archive_symbol_lookup;

  do
    {
      file_ptr last;
      symindex i;
      carsym *symdef;
      carsym *symdefend;

      loop = FALSE;
      last = -1;

      symdef = symdefs;
      symdefend = symdef + c;
      for (i = 0; symdef < symdefend; symdef++, i++)
    {
      struct elf_link_hash_entry *h;
      bfd *element;
      struct bfd_link_hash_entry *undefs_tail;
      symindex mark;

      if (defined[i] || included[i])
        continue;
      if (symdef->file_offset == last)
        {
          included[i] = TRUE;
          continue;
        }

      h = archive_symbol_lookup (abfd, info, symdef->name);,这里的archive_symbol_lookup指的就是elf_link_add_archive_symbols函数的实现,其实看一下这个函数的实现,它还是比较简单的,只是在最为原始的链接输入中搜索库中当前遍历的符号信息,然后返回搜索结果,额外的操作就是检验了符号的版本信息
      if (h == (struct elf_link_hash_entry *) 0 - 1)
        goto error_return;

      if (h == NULL)
        continue;

      if (h->root.type == bfd_link_hash_common)
        {
          /* We currently have a common symbol.  The archive map contains
         a reference to this symbol, so we may want to include it.  We
         only want to include it however, if this archive element
         contains a definition of the symbol, not just another common
         declaration of it.

         Unfortunately some archivers (including GNU ar) will put
         declarations of common symbols into their archive maps, as
         well as real definitions, so we cannot just go by the archive
         map alone.  Instead we must read in the element's symbol
         table and check that to see what kind of symbol definition
         this is.  */
          if (! elf_link_is_defined_archive_symbol (abfd, symdef))
        continue;
        }
      else if (h->root.type != bfd_link_hash_undefined)
        {
          if (h->root.type != bfd_link_hash_undefweak)
        defined[i] = TRUE;
          continue;
        }
//流程走到这里,相对于上面的 if (h->root.type != bfd_link_hash_undefined),表示库中符号是当前链接输入的一个未定义符号
      /* We need to include this archive member.  */
      element = _bfd_get_elt_at_filepos (abfd, symdef->file_offset);
      if (element == NULL)
        goto error_return;

      if (! bfd_check_format (element, bfd_object))
        goto error_return;

      /* Doublecheck that we have not included this object
         already--it should be impossible, but there may be
         something wrong with the archive.  */
      if (element->archive_pass != 0)
        {
          bfd_set_error (bfd_error_bad_value);
          goto error_return;
        }
      element->archive_pass = 1;

      undefs_tail = info->hash->undefs_tail;

      if (!(*info->callbacks
        ->add_archive_element) (info, element, symdef->name, &element)) 在这个回调中打印Map文件信息,这个函数就是add_archive_element函数
        goto error_return;
      if (!bfd_link_add_symbols (element, info))库中目标文件命中,将符号所在目标文件整个添加到链接输入中
        goto error_return;

      /* If there are any new undefined symbols, we need to make
         another pass through the archive in order to see whether
         they can be defined
.  FIXME: This isn't perfect, because
         common symbols wind up on undefs_tail and because an
         undefined symbol which is defined later on in this pass
         does not require another pass.  This isn't a bug, but it
         does make the code less efficient than it could be.  */
      if (undefs_tail != info->hash->undefs_tail)
        loop = TRUE;一个新目标文件的引入导致未定义符号增加,则再次遍历该库文件,这说明了库文件中未定义符号会在所在库中迭代搜索,直到未定义符号稳定

      /* Look backward to mark all symbols from this object file
         which we have already seen in this pass.  */
      mark = i;
      do
        {
          included[mark] = TRUE;
          if (mark == 0)
        break;
          --mark;
        }
      while (symdefs[mark].file_offset == symdef->file_offset);

      /* We mark subsequent symbols from this object file as we go
         on through the loop.  */
      last = symdef->file_offset;
    }
    }
  while (loop);
三、链接group处理
open_input_bfds
    case lang_group_statement_enum:
      {
        struct bfd_link_hash_entry *undefs;

        /* We must continually search the entries in the group
           until no new symbols are added to the list of undefined
           symbols.  */

        do
          {
        undefs = link_info.hash->undefs_tail;
        open_input_bfds (s->group_statement.children.head,
                 mode | OPEN_BFD_FORCE);
          }
        while (undefs != link_info.hash->undefs_tail); 这里有一个循环,它循环的条件也是确认不断遍历group内部文件,然后确保未定义符号不再增加。它“组团”链接的时候,这个搜索范围更大,并且可以多次迭代,但是也只是在出现位置之前的未定义符号才可以使用;如果group放在命令行最开始, 同样没有意义
      }
      break;

原文地址:https://www.cnblogs.com/tsecer/p/10487370.html