Golang 中如何优雅的使用map？

Golang中，通过哈希查找实现hash，通过链表解决hash冲突

map的内存模型

type hmap struct {
    count     int // map 中的元素个数，必须放在 struct 的第一个位置，因为 内置的 len 函数会从这里读取
    flags     uint8
    B         uint8  // log_2 of # of buckets (最多可以放 loadFactor * 2^B 个元素，再多就要 hashGrow 了)
    noverflow uint16 // overflow 的 bucket 的近似数
    hash0     uint32 // hash 种子，为hash函数结果引入随机性

    buckets    unsafe.Pointer // 2^B 大小的数组，如果 count == 0 的话，可能是 nil
    oldbuckets unsafe.Pointer // 一半大小的之前的 bucket 数组，只有在扩容过程中是非 nil
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra           // 当 key 和 value 都可以 inline 的时候，就会用这个字段
}

map中更小的单元桶，每一个桶会装8个key，通过hash结果的高8位决定在桶里具体的位置，由hash结果的低B位决定落在哪个桶

bmap内存结构

// bucket 本体
type bmap struct {
    // tophash 是 hash 值的高 8 位
    tophash [bucketCnt]uint8
    // keys
    // values
    // overflow pointer
}

bmap是存具体key-value的地方，进一步观察bmap底层，将key，value分开存储可以避免在key、value不是同一种结构时出现的内存碎片，比如map[int64]int8,在每一个key-value后面都要padding7个字节，分开存储就只需要在最后加padding

当一个bmap满了以后，会通过overflow连接一个溢出桶，实际会将每一个bmap中的overflow统一迁移到hmap的extra中，主要是为了避免gc时扫描整个hmap，所以不在单独的桶里设置指针，而是直接让其指向hmap.extra.overflow数组

mapextra底层结构

type mapextra struct {
    // 如果 key 和 value 都不包含指针，并且可以被 inline(<=128 字节)
    // 使用 extra 来存储 overflow bucket，这样可以避免 GC 扫描整个 map
    // 然而 bmap.overflow 也是个指针。这时候我们只能把这些 overflow 的指针
    // 都放在 hmap.extra.overflow 和 hmap.extra.oldoverflow 中了
    // overflow 包含的是 hmap.buckets 的 overflow 的 bucket
    // oldoverflow 包含扩容时的 hmap.oldbuckets 的 overflow 的 bucket
    overflow    *[]*bmap
    oldoverflow *[]*bmap
    // 指向空闲的 overflow bucket 的指针
    nextOverflow *bmap
}

map的初始化

hashTable := make(map[k]v, hint),当hint >= 4时后面会追加2^(hint-4)个桶，之后再内存页帧对齐又追加了若干个桶

// make(map[k]v, hint), hint即预分配大小
// 不传hint时，如用new创建个预设容量为0的map时，makemap只初始化hmap结构，不分配hash数组
func makemap(t *maptype, hint int, h *hmap) *hmap {
    // 省略部分代码
    // 随机hash种子
    h.hash0 = fastrand()

    // 2^h.B 为大于hint*6.5(扩容因子)的最小的2的幂
    B := uint8(0)
    // overLoadFactor(hint, B)只有一行代码：return hint > bucketCnt && uintptr(hint) > loadFactorNum*(bucketShift(B)/loadFactorDen)
    // 即B的大小应满足 hint <= (2^B) * 6.5
    // 一个桶能存8对key-value，所以这就表示B的初始值是保证这个map不需要扩容即可存下hint个元素对的最小的B值
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B

    // 这里分配hash数组
    if h.B != 0 {
        var nextOverflow *bmap
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        // makeBucketArray()会在hash数组后面预分配一些溢出桶，
        // h.extra.nextOverflow用来保存上述溢出桶的首地址
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
}

分配hash数组

// 分配hash数组
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
    base := bucketShift(b) // base代表用户预期的桶的数量，即hash数组的真实大小
    nbuckets := base // nbuckets表示实际分配的桶的数量，>= base，这就可能会追加一些溢出桶作为溢出的预留
    if b >= 4 {
        // 这里追加一定数量的桶，并做内存对齐
        nbuckets += bucketShift(b - 4)
        sz := t.bucket.size * nbuckets
        up := roundupsize(sz)
        if up != sz {
            nbuckets = up / t.bucket.size
        }
    }

    // 后面的代码就是申请内存空间了，此处省略
    // 这里大家可以思考下这个数组空间要怎么分配，其实就是n*sizeof(桶)，所以：
        // 每个桶前面是8字节的tophash数组，然后是8个key，再是8个value，最后放一个溢出指针
        // sizeof(桶) = 8 + 8*sizeof(key) + 8*sizeof(value) + 8
    
    return buckets, nextOverflow
}

map的读写

value, ok := map[key] #可以通过ok判断map中是否存在这个key

valuePtr := map[key] #返回value对应类型的空值，对于key和value在map中存储的是否为指针，需要根据key，value的大小来判断，当大于128字节时，key和value存储的都为指针

将map作为函数参数时，会传递一个指针副本，对该副本操作，也同样会影响原始map对象中的值

对于访问key，通过key低B位确定哪个桶，通过桶高8位去与桶里每个tophash比较，只查找时，比较未找到就继续去overflow溢出桶里找，如果是插入没有找到，就插入第一个hashtop为empty的位置

map的key可以是除了slice，map，func 的任意类型，value可以是任意类型。

map并不是一个线程安全的数据结构，在边遍历边删除是一个同时读写的行为，被检测到会直接报panic

map的删除

通过delete(map, key)函数删除key，相应的hashtop被置为empty，并没有立即释放内存，在指针没有引用时会被系统gc

map扩容，扩容的两个临界点

1.由装载因子确定，默认6.5 、即元素个数 > = 桶个数 * 6.5，表示大部分桶已满（B+1，将hmap的bucket数组扩容一倍）

2.由overflow的桶个数决定，当overflow溢出桶太多，代表可能是一边插入一边删除，导致大量桶出现空洞，此时值存储的非常稀疏，当bucket总数<2^15 时，overflow的bucket总数>=bucket总数;

bucket总数>=2^15,overflow的bucket >= 2^15，即认为溢出桶太多（移动bucket内容，使其更加紧密进而提高bucket利用率）缩容

当map中拥有大量hash冲突，也可能会导致overflow溢出桶数量过多，但这只是一种理论可能，golang中map中的hash随机种子几乎可以避免这种情况。

扩容过程：

由hmap内存结构知道了，buckets指向新的内存地址，oldbuckets仍然指向老的内存地址，golang中map的扩容是一种渐进式扩容，原有key/value不会一次性全部迁移完毕，每次只会迁移2个buckets，每一次搬迁以buckets作为单元，包括hash桶和这个桶的溢出链表。

对于2的缩容并不是真正的缩容，hash的容量没有发生变化，只是让数据更加紧凑，如果要做到真正的缩容就需要重新创建一个map，再复制。

参考：

1.https://draveness.me/golang/docs/part2-foundation/ch03-datastructure/golang-hashmap/

2.https://segmentfault.com/a/1190000023879178

3.https://www.cnblogs.com/JoZSM/p/11784037.html

4.https://github.com/cch123/golang-notes/blob/master/map.md