《编程珠玑》笔记：数组循环左移

问题描述：数组元素循环左移，将包含 num_elem 个元素的一维数组 arr[num_elem] 循环左移 rot_dist 位。能否仅使用数十个额外字节的存储空间，在正比于num_elem的时间内完成数组的旋转？

一：Bentley's Juggling Alogrithm

移动变量 arr[0] 到临时变量 tmp，移动 arr[rot_dist] 到 arr[0]，arr[2rot_dist] 到 arr[rot_dist]，依此类推，直到返回到取 arr[0] 中的元素，此时改为从 tmp 取值，程序结束。

这个方法需要保证：1. 能够遍历所有的数组元素；2. arr[0] （即 tmp 的值）在最后一步赋给某个合适的数组元素。

当 num_elem 和 rot_dist 互素的时候上述条件自然满足，否则不满足。从代数的观点来看，当 num_elem 和 rot_dist 互素的时候上述遍历规则将 0, ... , num_elem 个元素进行了轮换，而当 num_elem 和 rot_dist 不互素时（记最大公约数为 common_divisor），上述遍历规则将构成 common_divisor 个不相交的轮换，对 num_elem/common_divisor 个元素进行轮换。

 1 unsigned gcd(unsigned m, unsigned n)
 2 {
 3     unsigned remainder;
 4     while(n > 0) {
 5         remainder = m % n;
 6         m = n;
 7         n = remainder;
 8     }
 9 
10     return m;
11 }
12 
13 // left rotate @arr containing @num_elem elements by @rot_dist positions
14 // Bentley's Juggling Algorithm from Programming Pearls
15 template<typename _Type>
16 void array_left_rotation_juggling(_Type *arr, int num_elem, int rot_dist)
17 {
18     if(rot_dist == 0 || rot_dist == num_elem)
19         return;
20 
21     _Type tmp;
22     int i, j, k;
23 
24     int common_divisor = gcd(num_elem, rot_dist);
25     for(i = 0; i < common_divisor; ++i) {
26         tmp = arr[i];
27         j = i, k = (j + rot_dist) % num_elem;
28         while(k != i) {
29             arr[j] = arr[k];
30             j = k;
31             k = (k + rot_dist) % num_elem;
32         }
33         arr[j] = tmp;
34     }
35 }

二：Gries and Mills Block Swapping

将包含 num_elem 个元素的数组循环左移 rot_dist 位，等价于交换两个数组块 arr[0, rot_dist-1] 和 arr[rot_dist, num_elem-1] 的位置（即 Block Swapping问题），用 X，Y 来表示这两个数组块。

当 X 和 Y 的长度相等时，直接交换两个数组块 XY -> YX 即可。
当 X 包含的元素个数较多时，将 X 分拆为两部分 X1 和 X2，其中 X1 的长度等于 Y 的长度，交换 X1 和 Y：X1X2Y -> YX2X1，此时 Y 位于循环左移之后（块交换之后）所应处于的位置。
当 Y 包含的元素个数较多时，将 Y 分拆为两部分 Y1 和 Y2，其中 Y2 的长度等于 X 的长度，交换 X 和 Y2： XY1Y2 -> Y2Y1X，此时 X 位于循环左移之后（块交换之后）所应处于的位置（后面成为最终位置）。

在第 2 种情况（第 3 种情况）操作完成之后，问题已经被约减，继续交换对X2X1 （第3种情况下是 Y1Y2），即可解决问题。这就是递归的解决思路了。这是《编程珠玑》上对这个方法的简单描述。看到这里就一头雾水，查阅《The Science of Programming》的18.1节还有这里算是搞明白了，在此整理下。

先看一个简单示例，用上面的思路将包含 7 个元素的数组循环左移 2 位，数组元素值依次为 0, 1, 2, 3, 4, 5, 6，图片来自这里。

图中红色的表示较短的块。观察执行过程可以发现这样几个点：

将数组分成左右两块之后，短块与长块的一个子块进行交换之后，这个短块中的元素便位于最终位置，后续操作不再修改该部分。
与这个短块进行交换的子块位于长块（执行一次交换之后，长块相应缩短）中远离短块的一端。
当两个块长度相等，交换这两个等长的块后程序执行完毕。

首先需要能够交换两个等长的数组块的功能，函数实现如下。

1 // swap two blocks of equal length
2 // there must be no overlap between two blocks
3 template<typename _Type>
4 void swap_equal_blocks(_Type *arr, int beg1, int beg2, int num)
5 {
6     while(num-- > 0)
7         std::swap(arr[beg1++], arr[beg2++]);
8 }

因为需要考虑两个块的长度，分别记左右两个块的长度位 i 和 j，即左侧有 i 个元素仍待处理，右侧有 j 个元素待处理，i + j 表示此时还没有位于最终位置的元素个数。下图画出了初始状态以及 i > j 和 i < j 两种情况下的执行一次交换之后的状态。图中用 r 指代 rot_dist，用 n 指代 num_elem，灰色部分表示已经位于最终位置。

根据上图可以归纳出如下几个关系：

i > j 时，交换的两部分是从下标 r-i 和下标 r 开始的 j 个元素。
i < j 时，交换的两部分时从下标 r-i 和下标 r+j-i 开始的 i 个元素。
arr[0 : r-i-1] 以及 arr[r+j : n-1] 已经位于最终位置。
左侧待处理的 i 个元素总是 arr[r-i : r-1]，右侧待处理的 j 个元素总是 arr[r : r+j-1]。

据此，则有如下实现。

 1 template<typename _Type>
 2 void array_left_rotation_blockswapping(_Type *arr, int num_elem, int rot_dist)
 3 {
 4     if(rot_dist == 0 || rot_dist == num_elem)
 5         return;
 6 
 7     int i = rot_dist, j = num_elem - rot_dist;
 8     while(i != j) { // could be dead loop when rot_dist equals to 0 or num_elem
 9         // Invariant:
10         // arr[0 : rot_dist-i-1] is in final position
11         // arr[rot_dist-i : rot_dist-1] is the left part, length i
12         // arr[rot_dist : rot_dist+j-1] is the right part, length j
13         // arr[rot_dist+j : num_elem-1] is in final position
14         if(i > j) {
15             swap_equal_blocks(arr, rot_dist-i, rot_dist, j);
16             i -= j;
17         }
18         else {
19             swap_equal_blocks(arr, rot_dist-i, rot_dist+j-i, i);
20             j -= i;
21         }
22     }
23     swap_equal_blocks(arr, rot_dist-i, rot_dist, i);
24 }

三：Reversal Algorithm

这个方法的最好说明就是 Doug Mcllroy 用双手给出的示例了，见下图。图中将数组循环左移了 5 位。

对应到这里的表示就是用三次反转实现循环左移。实现代码如下：

// reverse the elements @arr[@low : @high]
void reverse(char *arr, int low, int high)
{
    while(low < high)
        std::swap(arr[low++], arr[high--]);
}

template<typename _Type>
void array_left_rotation_reversal(_Type *arr, int num_elem, int rot_dist)
{
    if(rot_dist == 0 || rot_dist == num_elem)
        return;

    reverse(arr, 0, rot_dist-1);
    reverse(arr, rot_dist, num_elem-1);
    reverse(arr, 0, num_elem-1);
}

关于性能：这三种方法的时间复杂度均为 O(n)。在这里，Victor J. Duvanenko 用Intel C++ Composer XE 2011在 Intel i7 860 (2.8GHz, with TurboBoost up to 3.46GHz) 对三种算法进行的实际的性能测试，结果显示 Gries-Mills 算法运行时间最短，Reversal 算法居第二位，但与 Gries-Mills 算法在运行时间上相差很少。但是 Reversal 有一个优势是在多次的测试种，Reversal 算法的运行时间非常稳定（即多次所测时间的标准差很小），而 Juggling 算法在运行时间和性能稳定性方面均教差。Duvanenko 具体分析了导致这一结果的原因：Gries-Mills 算法和 Reversal 算法的良好表现是由于它们的缓存友好的内存读取模式，而 Juggling 算法的内存读取模式不是缓存友好的（"The Gries-Mills and Reversal algorithms preformed well due to their cache-friendly memeory access patterns. The Juggling algorithm preformed the fewest memory accesses, but came in 5x slower dut to its cache-unfriendly memory access pattern.）。总之，Juggling 算法较其他两个算法的执行效率很差，而 Reversal 算法和 Gries-Mills 算法相比，具有基本相同的运行时间，但是 Reversal 算法运行效率更为稳定，算法原理更容易理解，代码实现也更为简洁。