【转】STL空间配置器

STL空间配置器(allocator)在所有容器内部默默工作，负责空间的配置和回收。STL标准为空间配置器定义了标准接口(可见《STL源码剖析》P43)。而具体实现细节则由各编译器实现版本而不同。下面介绍SGI STL中的allocator(实际叫alloc)配置器。SGI STL源码下载地址：http://download.csdn.net/detail/wudaijun/6404589

一个简单的allocator配置器

首先我们来看一个SGI STL中符合标准，名为allocator的空间配置器：

//位于cygwin-b20includeg++defalloc.h
#ifndef DEFALLOC_H
#define DEFALLOC_H

#include <new.h>
#include <stddef.h>
#include <stdlib.h>
#include <limits.h>
#include <iostream    .h>
#include <algobase.h>


template <class T>
inline T* allocate(ptrdiff_t size, T*) {
    set_new_handler(0);
    T* tmp = (T*)(::operator new((size_t)(size * sizeof(T))));
    if (tmp == 0) {
    cerr << "out of memory" << endl; 
    exit(1);
    }
    return tmp;
}


template <class T>
inline void deallocate(T* buffer) {
    ::operator delete(buffer);
}

template <class T>
class allocator {
public:
    typedef T value_type;
    typedef T* pointer;
    typedef const T* const_pointer;
    typedef T& reference;
    typedef const T& const_reference;
    typedef size_t size_type;
    typedef ptrdiff_t difference_type;
    pointer allocate(size_type n) { 
    return ::allocate((difference_type)n, (pointer)0);
    }
    void deallocate(pointer p) { ::deallocate(p); }
    pointer address(reference x) { return (pointer)&x; }
    const_pointer const_address(const_reference x) { 
    return (const_pointer)&x; 
    }
    size_type init_page_size() { 
    return max(size_type(1), size_type(4096/sizeof(T))); 
    }
    size_type max_size() const { 
    return max(size_type(1), size_type(UINT_MAX/sizeof(T))); 
    }
};

class allocator<void> {
public:
    typedef void* pointer;
};

这份allocator很简单，只是对::operator new和::operator delete的简单封装而已。上面的allocator完成对内存的分配与回收，但是并没有对象的构造，对象的构造是在defalloc同目录下的stl_construct.h中的函数完成的，之所以将对象的分配和构造(析构和回收)分开，是为了提高效率，避免不必要的构造和对对象构造析构的优化，在使用上面的allocator配置器时，内存的分配和回收分别由allocator::allocate()和allocator::deallocate()负责，而对象的构造和析构则由::construct()和destroy()负责(位于stl_construct.h)，这种分工使整个空间配置更灵活高效。

但是SGI STL并未使用上面的allocator，因为它效率不高，保留它是为了与HP STL风格兼容。SGI STL真正使用的是一个叫alloc的配置器，它在很多方面都与STL规范不同，但是性能卓越。将alloc和stl_construct中的construct(),destroy()结合，成为了SGI STL的独门利器。

构造和析构基本工具：construct()和destroy()

首先来看看相对简单的对象构造和析构工具: construct()和destroy()

//cygwin-b20includeg++stl_construct.h
#ifndef __SGI_STL_INTERNAL_CONSTRUCT_H
#define __SGI_STL_INTERNAL_CONSTRUCT_H

#include <new.h>

__STL_BEGIN_NAMESPACE

template <class T>//析构单个元素
inline void destroy(T* pointer) {
    pointer->~T();
}

template <class T1, class T2>
inline void construct(T1* p, const T2& value) {
  new (p) T1(value);    //布局new(placement new) 在p地址处调用T1构造函数构造对象
}

template <class ForwardIterator>
inline void
__destroy_aux(ForwardIterator first, ForwardIterator last, __false_type) {
  for ( ; first < last; ++first)//如果元素的析构函数是必要的 那么逐个调用析构函数
    destroy(&*first);
}

template <class ForwardIterator> //如果元素的析构函数是无关紧要的  就什么也不做
inline void __destroy_aux(ForwardIterator, ForwardIterator, __true_type) {}

template <class ForwardIterator, class T>
inline void __destroy(ForwardIterator first, ForwardIterator last, T*) {
  //通过元素型别来判断析构函数是否无关紧要(trivial) 并调用对应的函数进行析构
  typedef typename __type_traits<T>::has_trivial_destructor trivial_destructor;
  __destroy_aux(first, last, trivial_destructor());
}

template <class ForwardIterator>
inline void destroy(ForwardIterator first, ForwardIterator last) {
  __destroy(first, last, value_type(first));//通过泛型的类型识别技术来得到元素类型
}

inline void destroy(char*, char*) {}
inline void destroy(wchar_t*, wchar_t*) {}

__STL_END_NAMESPACE

#endif /* __SGI_STL_INTERNAL_CONSTRUCT_H */

这里面的构造函数construct()只是调用placement new，关于placement new和::operator new以及new运算符的关系和区别我前面专门有博客阐述。

而析构函数destroy就比较巧妙了。它有两个重载，一个销毁指定元素，这个只需要调用其析构函数即可，另一个接受first和last两个迭代器，用于销毁迭代器内的所有元素。这时候destroy并不是盲目地对这个范围内所有元素依次调用析构函数，为了效率起见，它先通过泛型的类型解析从而在_destory()中得到元素类型，再通过元素类型的_type_traits<T>::has_trivial_destructor trivial_destructor来判断元素类型的析构函数是否是无关紧要(trivial)的，如果是，那么trivial_destructor()值为true_type，否则为false_type，注意，这里的true_type，false_type是一种类型而不是值，因此再通过一个_destroy_aux()的重载即可对两种情况分别处理。如果是false_type，也就是元素类型的析构函数是必要的，那么就老老实实依次调用每个元素的析构函数，否则，什么也不做。这对于销毁大范围的元素来说，如果析构函数无关痛痒，那么效率上将会有很大提高。

空间的分配与释放：std::alloc

接下来，看看std::alloc又是如何以高效率淘汰前面的allocator的。

简单来说，alloc主要在如下方面超越了allocator

1.通过内存池技术提升了分配效率:

2.对小内存频繁分配所可能造成的内存碎片问题的处理

3.对内存不足时的处理

两级配置器：

alloc采用了两级配置器，其中一级配置器直接从堆中获取和释放内存(通过malloc)，效率和前面的allocator相当。二级适配器采用内存池技术，对用户的小区块申请进行了优化，当用户申请大区块时，它将其交予一级配置器。当用户申请小区块时，将于内存池打交道，内存池通过自由链表来管理小区块，当内存池不足时，会一次性向堆中申请足够大的空间。用户可以通过宏来控制使用哪一级配置器(默认为二级配置器)。

在前面的allocator中，内存的分配和释放都使用的是::operator new()和::operator delete()函数，它们的内部其实也是调用C语言的malloc和free来实现的，在alloc中，内存的分配和释放直接使用malloc()和free()两个函数。当用户请求的内存大于128bytes时，则通过malloc和free来配置内存(一级配置器)，当请求内存小于128bytes时，则通过内存池来管理内存(二级配置器)。用户可以通过一个_USE_MALLOC宏来控制是否使用二级配置器(即不管请求内存大小，均从堆中分配)。当_USE_MALLOC被定义时，将不使用二级配置器。

用《STL源码剖析》中的两幅图来概览一下两级配置器：

一级配置器：

cygwin-b20includeg++stl_alloc.h

template <int inst>
class __malloc_alloc_template {

private:

static void *oom_malloc(size_t);// malloc内存不足处理例程

static void *oom_realloc(void *, size_t);// realloc内存不足处理例程

#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
    static void (* __malloc_alloc_oom_handler)();//函数指针，保存用户定义的内存不足处理函数
#endif

public:

static void * allocate(size_t n)
{
    void *result = malloc(n);
    if (0 == result) result = oom_malloc(n);//内存不足 调用处理例程
    return result;
}

static void deallocate(void *p, size_t /* n */)
{
    free(p);
}

static void * reallocate(void *p, size_t /* old_sz */, size_t new_sz)
{
    void * result = realloc(p, new_sz);
    if (0 == result) result = oom_realloc(p, new_sz);
    return result;
}

//设置malloc内存不足处理例程
static void (* set_malloc_handler(void (*f)()))()
{
    void (* old)() = __malloc_alloc_oom_handler;
    __malloc_alloc_oom_handler = f;
    return(old);
}

};

// malloc_alloc out-of-memory handling

#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
template <int inst>
void (* __malloc_alloc_template<inst>::__malloc_alloc_oom_handler)() = 0;
#endif

template <int inst>
void * __malloc_alloc_template<inst>::oom_malloc(size_t n)
{
    void (* my_malloc_handler)();
    void *result;

    for (;;) {//反复调用用户定义(通过set_malloc_hander函数)的内存不足函数
        my_malloc_handler = __malloc_alloc_oom_handler;
        if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
        (*my_malloc_handler)();
        result = malloc(n); // 不断尝试分配内存
        if (result) return(result);
    }
}

template <int inst>//和oom_malloc类似
void * __malloc_alloc_template<inst>::oom_realloc(void *p, size_t n)
{
    void (* my_malloc_handler)();
    void *result;

    for (;;) {
        my_malloc_handler = __malloc_alloc_oom_handler;
        if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
        (*my_malloc_handler)();
        result = realloc(p, n);
        if (result) return(result);
    }
}

第一层配置器直接通过malloc来分配内存，并在此之上建立内存不足处理例程。当allocate()通过malloc分配内存失败时，它会调用内存不足处理例程oom_malloc()，oom_malloc会不断调用用户定义的处理函数(由_malloc_alloc_oom_handler保存，通过set_malloc_handler()设置)，并且尝试分配内存。如果用户未定义处理函数，则抛出异常。在这个过程中，设计和设置内存不足处理函数都是用户的责任。

二级配置器：

第二层配置器通过内存池和自由链表(free_list)使得用户申请<128bytes的内存更快捷并造成更少的内存碎片。它每次配置一块大的内存交给自由链表维护，用户每次申请的内存都从链表中获取，并且在释放时交还与自由链表。SGI STL将用户申请的128bytes以内的内存自动上调到8的倍数，并维护16个free_list，各free_list负责的大小分别为8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128bytes。比如用户申请10bytes内存，将被上调到16bytes，并且从负责管理16bytes内存的free_list中取出一个节点(也就是一块内存)，如果free_list中当前没有节点，则从内存池中分配足够内存，并且填充到free_list。

Free_list的节点结构如下：

union obj
{
    union obj* free_list_link; //用于维护空闲内存，指向下一个空闲节点
    char client_data[1];    //用于用户使用
}

注意节点是union类型的，当节点空闲(未被分配时)，节点使用第一字段指向下一个空闲节点，当节点被分配后，用户可以直接使用第二字段，这样自由链表就不会因为free_list_link指针而造成内存的浪费(当节点被分配出去后，free_list_link指针就没有意义了)。如下图

下面是主要代码：

template <bool threads, int inst>
class __default_alloc_template {

private:
  // Really we should use static const int x = N
  // instead of enum { x = N }, but few compilers accept the former.

    enum {__ALIGN = 8};
    enum {__MAX_BYTES = 128};
    enum {__NFREELISTS = __MAX_BYTES/__ALIGN};//16

  //上调到8的倍数
  static size_t ROUND_UP(size_t bytes) {
        return (((bytes) + __ALIGN-1) & ~(__ALIGN - 1));
  }
__PRIVATE:
  union obj {
        union obj * free_list_link;
        char client_data[1];    /* The client sees this.        */
  };
private:

  //自由链表数组 各自管理不同大小的内存节点
  static obj * __VOLATILE free_list[__NFREELISTS]; 

  //找到bytes所属的free_list
  static  size_t FREELIST_INDEX(size_t bytes) {
        return (((bytes) + __ALIGN-1)/__ALIGN - 1);
  }

  // Returns an object of size n, and optionally adds to size n free list.
  static void *refill(size_t n);
  // Allocates a chunk for nobjs of size "size".  nobjs may be reduced
  // if it is inconvenient to allocate the requested number.
  static char *chunk_alloc(size_t size, int &nobjs);

  // Chunk allocation state.
  static char *start_free; //内存池的起始地址 只在chunk_alloc()中变化
  static char *end_free; //内存池的结束地址 只在chunk_alloc()中变化
  static size_t heap_size;


public:

  /* n must be > 0      */
  static void * allocate(size_t n)
  {
    obj * __VOLATILE * my_free_list;
    obj * __RESTRICT result;

    //如果>128bytes 就使用一级配置器
    if (n > (size_t) __MAX_BYTES) {
        return(malloc_alloc::allocate(n));
    }
    my_free_list = free_list + FREELIST_INDEX(n);//找到对应free_list
   
    result = *my_free_list;//直接使用最前面那个节点
    if (result == 0) {//如果没有节点了
        void *r = refill(ROUND_UP(n));//重新填充free_list
        return r;
    }
    //调整free_list
    *my_free_list = result -> free_list_link;
    return (result);
  };

  /* p may not be 0 */
  static void deallocate(void *p, size_t n)
  {
    obj *q = (obj *)p;
    obj * __VOLATILE * my_free_list;

    //如果>128bytes 使用以及配置器回收
    if (n > (size_t) __MAX_BYTES) {
        malloc_alloc::deallocate(p, n);
        return;
    }
    //找到节点所属的free_list
    my_free_list = free_list + FREELIST_INDEX(n);

    //放回free_list 并调整
    q -> free_list_link = *my_free_list;
    *my_free_list = q;
   
  }
  static void * reallocate(void *p, size_t old_sz, size_t new_sz);

} ;

上面是部分源码，删掉了多线程的那部分代码。主要函数有allocate()和deallocate()(STL标准规定空间配置器必须有allocate()和deallocate()接口)。在allocate()中，先判断用户申请的内存大小，如果大于128bytes，则交由一级配置器。否则自由链表中找到对应大小的free_list，然后取出第一个节点并返回。如果对应free_list没有节点，那么通过refill()函数从内存池分配足够节点填充free_list并返回给客户端。refill()代码如下：

template <bool threads, int inst>
void* __default_alloc_template<threads, inst>::refill(size_t n)
{
    int nobjs = 20;
    //尝试从内存池分配nobjs个大小为n的区块，如果内存池不够，
    //实际分配个数由nobjs指出 注意：nobjs参数类型为引用
    char * chunk = chunk_alloc(n, nobjs);
    obj * __VOLATILE * my_free_list;
    obj * result;
    obj * current_obj, * next_obj;
    int i;

    //如果内存池只返回了一个区块 则直接返回 无需调整free_list
    if (1 == nobjs) return(chunk);
    my_free_list = free_list + FREELIST_INDEX(n);

    //将返回区块的第一块返回给客户端 其余的填充到free_list
    /* Build free list in chunk */
      result = (obj *)chunk;
      *my_free_list = next_obj = (obj *)(chunk + n);
      for (i = 1; ; i++) {
        current_obj = next_obj;
        next_obj = (obj *)((char *)next_obj + n);
        if (nobjs - 1 == i) {
            current_obj -> free_list_link = 0;
            break;
        } else {
            current_obj -> free_list_link = next_obj;
        }
      }
    return(result);
}

可以看到，与内存池打交道的任务实际是chunk_alloc()完成的：

template <bool threads, int inst>
char*
__default_alloc_template<threads, inst>::chunk_alloc(size_t size, int& nobjs)
{
    char * result;
    size_t total_bytes = size * nobjs;
    size_t bytes_left = end_free - start_free;//内存池剩余空间

    //如果剩余空间足够 直接返回起始地址 并调整
    if (bytes_left >= total_bytes) {
        result = start_free;
        start_free += total_bytes;
        return(result);
    }//如果内存池空间不够，但是大于一个区块大小 那么尽可能多的返回区块 
    else if (bytes_left >= size) {
        nobjs = bytes_left/size;
        total_bytes = size * nobjs;
        result = start_free;
        start_free += total_bytes;
        return(result);
    } else {//如果连一个区块也不够
        size_t bytes_to_get = 2 * total_bytes + ROUND_UP(heap_size >> 4);
        // Try to make use of the left-over piece.
        if (bytes_left > 0) {//先把剩余的一点内存放入合适的free_list
            obj * __VOLATILE * my_free_list =
                        free_list + FREELIST_INDEX(bytes_left);

            ((obj *)start_free) -> free_list_link = *my_free_list;
            *my_free_list = (obj *)start_free;
        }
        
        //从堆上为内存池注入活水
        start_free = (char *)malloc(bytes_to_get);
        if (0 == start_free) {//如果堆上内存也不足了
            int i;
            obj * __VOLATILE * my_free_list, *p;
            // Try to make do with what we have.  That can't
            // hurt.  We do not try smaller requests, since that tends
            // to result in disaster on multi-process machines.
            //从比请求的区块更大的free_list中找到空闲节点
            //并将该节点放入内存池中 递归调用自身
            for (i = size; i <= __MAX_BYTES; i += __ALIGN) {
                my_free_list = free_list + FREELIST_INDEX(i);
                p = *my_free_list;
                if (0 != p) {
                    *my_free_list = p -> free_list_link;
                    start_free = (char *)p;
                    end_free = start_free + i;
                    return(chunk_alloc(size, nobjs));
                    // Any leftover piece will eventually make it to the
                    // right free list.
                }
            }
        end_free = 0;    // In case of exception.
        //尝试通过一级配置器获得内存(借助内存不足处理例程)
            start_free = (char *)malloc_alloc::allocate(bytes_to_get);
            // This should either throw an
            // exception or remedy the situation.  Thus we assume it
            // succeeded.
        }
        //修正内存池水位
        heap_size += bytes_to_get;
        end_free = start_free + bytes_to_get;
        //修正nobjs
        return(chunk_alloc(size, nobjs));
    }
}

最后随便看看realloc()的代码：

__default_alloc_template<threads, inst>::reallocate(void *p,
                                                    size_t old_sz,
                                                    size_t new_sz)
{
    void * result;
    size_t copy_sz;
    
    //如果分配区块大小>128bytes 交予一级配置器
    if (old_sz > (size_t) __MAX_BYTES && new_sz > (size_t) __MAX_BYTES) {
        return(realloc(p, new_sz));
    }
   //否则用二级配置器的allocate分配
    if (ROUND_UP(old_sz) == ROUND_UP(new_sz)) return(p);
    result = allocate(new_sz);
    copy_sz = new_sz > old_sz? old_sz : new_sz;
    memcpy(result, p, copy_sz);
    deallocate(p, old_sz);
    return(result);
}

到此，整个二级配置器就完工了。现在从客户端申请内存开始来理一下流程：

当客户端第一次申请30bytes内存时，首先被上调到32bytes，在allocate()中，找到管理32bytes块的free_list[3]，发现它为空，因此调用chunk_alloc(32,20)从内存池中取出内存，但是此时内存池也是空的，因此在chunk_alloc()中，调用malloc()分配了20*2=40个32bytes区块，其中1个交给客户端，19个填充到free_list，剩余20*32bytes内存留给内存池。当用户再申请60bytes时，上调到64bytes，然后找到free_list[7]发现也是空的，然后调用chunk_alloc(64,20)，但是此时内存池只有32*20bytes空间，只够10个64bytes区块，因此内存池返回这10个区块，在refill()中，1个返回给客户端，剩余9个交给free_list[7]维护。如果用户再申请90bytes内存，此时仍然会调用chunk_alloc(96，20)，此时内存池又空了，会通过malloc()申请(20*2+n)*96bytes的空间(其中n是一个附加量，内存池malloc()次数越多，该值越大)，之后1个交给客户端，19个交给free_list[11]，剩余(20+n)*96bytes内存留给内存池。。。。

如果内存池中malloc()失败，即系统堆也不够了，那么先尝试从free_list中找到一个比申请区块更大的空闲区块，如果有，则放入内存池，然后递归调用自身。如果没有找到，最后尝试一级配置器，或许内存不足处理例程能做些什么，再否则就抛出bad_alloc异常。

转自：http://blog.csdn.net/wudaijun/article/details/12748471