d-ary heap实现一个快速的优先级队列（C#）

d-ary heap简介：

d-ary heap 是泛化版本的binary heap(d=2)，d-ary heap每个非叶子节点最多有d个孩子结点。

d-ary heap拥有如下属性：

类似complete binary tree，除了树的最后一层，其它层全部填满结点，且增加结点方式由左至右。
类似binary heap，它也分两类最大堆和最小堆。

下面给出一个3-ary heap示例：

3-ary max heap - root node is maximum of all nodes
             10
       /      |     
      7       9      8
  /  |      /
 4   6   5  7


3-ary min heap -root node is minimum of all nodes
             10
         /    |    
       12     11    13
     / | 
    14 15 18

具有n个节点的完全d叉树的高度由log_dn给出。

d-ary heap的应用：

d-ary heap常用于进一步实现优先级队列，d-ary heap实现的优先级队列比用binary heap实现的优先队列在添加新元素的方面效率更高。binary heap：O(log₂n) vs d-ary heap： O(log_kn) ，当d > 2 时，log_kn < log₂n 。但是d-ary heap实现的优先级队列缺点是提取优先级队列首个元素比binary heap实现的优先队列需要消耗更多性能。binary heap:O(log₂n) vs d-ary heap：O((d-1)log_dn),当 d > 2 时，(d-1)log_dn > log₂n ，通过对数换底公式可证。结果看起来喜忧参半，那么什么情况下特别适合使用d-ary heap呢？答案就是游戏中常见的寻路算法。就以A*和Dijkstra algorithm举例。两者一般都需要一个优先级队列（有某些A*算法不适用优先级队列，比如迭代加深A*），而这些算法在取出队列首个元素时，往往要向队列中添加更多的临近结点。也就是添加结点次数远远大于提取次数。那么正好，d-ary heap可以取长补短。另外，d-ary heap比binary heap 对缓存更加友好，更多的子结点相邻在一起。故在实际运行效率往往会更好一些。

d-ary heap及优先级队列的实现：

我们用数组实现d-ary heap，数组以0为起始，可以得到如下规律：

若该结点为非根结点，那么使用该结点的索引i可以取得其的父结点索引，父结点为(i-1)/d；
若该结点的索引为i，那么它的孩子结点索引分别为(d*i)+1 , (d*i)+2 …. (d*i)+d；
若heap大小为n，最后一个非叶子结点的索引为(n-1)/d；（注：本文给出的实现并没有使用该规则）

构建d-ary heap堆：本文给出的实现侧重于进一步实现优先级队列，并采用最小堆（方便适配寻路算法）。所以把一个输入数组堆化，并不是核心操作，为了方便撰写代码以及加强可读性，构建堆算法采用从根结点至下方式，而不是从最后一个非叶子结点向上的方式。优点显而易见，代码清晰，不需要使用递归且不需要大量if else语句来寻找最小的孩子结点。只要孩子结点的值小于其父节点将其交换即可。缺点显而易见，交换次数增加从而降低效率。

public void BuildHeap() 
{
         for (int i = 1; i < numberOfItems; i++) 
　　　　　 {
            int bubbleIndex = i;
            ar node = heap[i];
                
            while (bubbleIndex != 0) 
　　　　　　　{
                int parentIndex = (bubbleIndex-1) / D;

                if (node.CompareTo(heap[parentIndex]) < 0) 
　　　　　　　　   {
                    heap[bubbleIndex] = heap[parentIndex];

                    heap[parentIndex] = node;

                    bubbleIndex = parentIndex;
                     
                } else 
　　　　　　　　　 {
                    break;
                }
            }
        }
}

Push：向优先级队列中添加新的元素，若添加node为空，抛出异常，若空间不足，则扩展空间。最后调用内部函数DecreaseKey加入新的结点到d-ary heap。

public void Push(T node) 
{
     if (node == null) throw new System.ArgumentNullException("node");

     if (numberOfItems == heap.Length) 
　　  {
         Expand();
     }

    DecreaseKey(node, (ushort)numberOfItems);
    numberOfItems++;
}

DecreaseKey:传入的index为当前队列中现有元素的数量。这个函数是私有的，因为对于优先级队列来说并不需要提供改接口。这里我们使用了一个优化技巧，暂不保存待加入的结点到数组，直到我们找到了它在数组中的合适位置，这样可以节省不必要的交换。

private void DecreaseKey (T node, ushort index)
{
            
            if(index < numberOfItems)
            {
                if(node.CompareTo(heap[index]) > 0 )
                {
                    throw new System.Exception("New node key greater than orginal key");
                }
            }
            int bubbleIndex = index;
            

            while (bubbleIndex != 0) 
　　　　　　  {
                // Parent node of the bubble node
                int parentIndex = (bubbleIndex-1) / D;

                if (node.CompareTo(heap[parentIndex]) < 0 ) {
                    // Swap the bubble node and parent node
                    // (we don't really need to store the bubble node until we know the final index though
                    // so we do that after the loop instead)
                    heap[bubbleIndex] = heap[parentIndex];
                    bubbleIndex = parentIndex;
                } else {
                    break;
                }
            }

            heap[bubbleIndex] = node;
}

Pop：弹出优先级队列top元素，调用内部函数ExtractMin。

public T Pop () 
{
     return ExtractMin();
}

ExtractMin：返回当前root node，更新numberOfItems，重新堆化。把最后一个叶子结点移动到root node，结点依照规则上浮。这里使用了同样的优化技巧。不必把最后一个叶子结点保存到数组0的位置，等到确定其最终位置再把它存入数组。这样做的好处节省交换次数。

private T ExtractMin()
{
            T returnItem = heap[0];

            numberOfItems--;
            if (numberOfItems == 0) return returnItem;

            // Last item in the heap array
            var swapItem = heap[numberOfItems];
        
            int swapIndex = 0, parent;

            
            while (true) {
                parent = swapIndex;
                var curSwapItem = swapItem;
                int pd = parent * D + 1;

                // If this holds, then the indices used
                // below are guaranteed to not throw an index out of bounds
                // exception since we choose the size of the array in that way
                if (pd <= numberOfItems) 
　　　　　　　　   {
                    
                    for(int i = 0;i<D-1;i++)
                    {
                        if (pd+i < numberOfItems && (heap[pd+i].CompareTo(curSwapItem) < 0))
                        {
                            curSwapItem = heap[pd+i];
                            swapIndex = pd+i;
                        }

                    }
                
                    if (pd+D-1 < numberOfItems && (heap[pd+D-1].CompareTo(curSwapItem) < 0)) 
                    {
                        swapIndex = pd+D-1;
                    }
                }

                // One if the parent's children are smaller or equal, swap them
                // (actually we are just pretenting we swapped them, we hold the swapData
                // in local variable and only assign it once we know the final index)
                if (parent != swapIndex) {
                    heap[parent] = heap[swapIndex];
                } else {
                    break;
                }
            }

            // Assign element to the final position
            heap[swapIndex] = swapItem;

            // For debugging
            Validate ();

            return returnItem;
}

时间复杂度分析：

对于用d ary heap实现的优先级队列，若队列拥有n个元素，其对应堆的高度最大为log_dn ，添加新元素时间复杂度为O(log_dn)
对于用d ary heap实现的优先级队列，若队列拥有n个元素，其对应堆的高度最大为log_dn，要在d个孩子结点当中选取最小或最大结点，层层不断上浮。故删除队首元素时间复杂度为(d-1)log_dn
对于把数组转化为d ary heap，采用从最后一个非叶子结点向上的方式，其时间复杂度为O(n)，分析思路和binary heap一样。举例说明，对于拥有n个结点的4 ary heap，高度为1子树的有（3/4)n，高度为2的子树有（3/16)n... 处理高度为1的子树需要O(1),处理高度为2的子树需要O(2)... 累加公式为 $sum_{k=1}^{log_{4}^{n}}{frac{3}{4^{k}}}nk$ ，根据比值收敛法可知这个无穷级数是收敛的，故复杂度仍为O(n)。那么对于本文给出的自顶向下的方式，其复杂度又如何呢？答案为O($dlog_{d}^{n}n$),具体的运算过程（详见下一条），理论上时间复杂度要高于采用从最后一个非叶子结点向上的方式。但两者实际效率相差多少需进行实际测试。
本文的buildheap算法，第i层的结点至多需要比较和交换i次，且第i层结点数d^i，由此可得时间统计范式为$sum_{i=1}^{log_{d}^{n}}{d^{i}}i$，以d=4为例 $sum_{i=1}^{log_{4}^{n}}{4^{i}}i$。需要求前i项和Si关于i的表达式，Si= 1*4 +2*4²+3*4³+.....+ i*4ⁱ,那么4Si=1*4²+2*4³+......+i*4ⁱ⁺¹，用4Si-Si进行错位相减，得知3Si=i*4ⁱ⁺¹- (4+4²+......+4ⁱ) 。痛快，后者是一个等比数列。这样整个式子最后表达为$Si=frac{4}{9}+frac{1}{3}(i-frac{1}{3})4^{i+1}$,我们知道i值为log_dn，代入可得O($dlog_{d}^{n}n$)。

总结：

通过使用System.Diagnostics.Stopwatch 进行多次测试，发现d=4 时，push和pop的性能都不错，d=4很多情况下Push都比d=2的情况要好一些。push可以确定性能确实有所提高，pop不能确定到底是好了还是坏了，实验结果互有胜负。说到底System.Diagnostics.Stopwatch并不是精确测试，里面还有.net的噪音。

附录：

优先级队列完整程序

Q&A：

我的寻路算法想要使用C++或Java标准库自带的PriorityQueue，两者都没有提供DecreaseKey函数，带来的问题是我无法更新队列里元素key，没有办法进行边放松，如何处理？

笔者文章DecreaseKey也是私有的，没有提供给PriorityQueue的使用者。为什么不提供呢？因为即便提供了寻路算法如何给出DecreaseKey所需的index呢？我们知道需要更新的元素在优先级队列中，但是index并不知道，要获取index就需要进行搜索（或者使用额外数据结构辅助）。使用额外的数据结构辅助确定index必然占用更多内存空间，使用搜索确定index必然消耗更多时间尤其是当队列中元素很多时。诀窍根本不改变它。而是将该节点的 "新建副本 " (具有新的更好的成本) 添加到优先级队列中。由于成本较低, 该节点的新副本将在队列中的原始副本之前提取, 因此将在前面进行处理。后面遇到的重复结点直接忽略即可，并且很多情况还没等到处理重复结点时我们已经找到路径了。我们所额外负担的就是优先级队列中存在一些多余对象。这种负担非常小，而且实现起来简便。

参考文献：

https://www.geeksforgeeks.org/k-ary-heap/

http://en.wikipedia.org/wiki/Binary_heap

https://en.wikipedia.org/wiki/D-ary_heap

欢迎评论区交流，批评，指正~

原创文章，转载请标明出处，谢谢~