算法导论 Exercises 9.38

Problem Description:

Let X[1...n] and Y[1...n] be two arrays, each containing n numbers already in sorted order.

Give an O(lgn)-time algorithm to find the median of all 2n elements in array X and Y.

问题描述：

X和Y是两个长度均为n的已排序数组，现要求以O(lgn)的时间复杂度找到两个数组中所有2n个数的中值。

（注：这里中值被定义为：对于奇数个数，排序后中间那个，或者对于偶数个数，排序后中间两个数下标较小的那个）

问题升级：

题目中假设了两个数组长度相等，并且找中值。

这里修改一下题目，假设两个数组长度不一定相等，分别为m和n，要求以O(lgm + lgn)的时间复杂度找到m+n个数中的第i个数。

先解决一般情况，再将原题作为一种特殊情况来处理。

解决方案：

如上图所示的两个已排序数组（假设为单调非减），其中aMid = (aBeg + aEnd) / 2是数组中值的下标。

{a1}是数组a中a[aBeg]至a[aMid]段的元素的集合，{a2}是数组a中a[aMid + 1]至a[aEnd]段的元素的集合。同理{b1}、{b2}如图所示。

假设a[aMid] <= b[bMid]（a[aMid] > b[bMid]的情况与之类似），则有：

{a1}<={a2} {a1}<={b2}

即对于{a1}中的任意元素，在a、b中不小于它的数至少有(aEnd - aMid) + (bEnd - bMid) + 1个

所以对于{a1}中的任意元素，在a、b中小于它的数至多有

[(aEnd - aBeg + 1) + (bEnd - bBeg + 1)] - [(aEnd - aMid) + (bEnd - bMid) + 1]

=(aMid - aBeg) + (bMid - bBeg) + 1个......①

同理，{b2}>={b1} {b2}>={a1}

即对于{b2}中的任意元素，在a、b中不大于它的数至少有(bMid - bBeg) + (aMid - aBeg) + 1

=(aMid - aBeg) + (bMid - bBeg) + 1个......②

由①、②可知：

如果i <= (aEnd - aMid) + (bEnd - bMid) + 1,那第i个数一定不在{b2}中。

此时只需在{a1}、{a2}、{b1}中继续找第i个数就可以了。

（当i == (aEnd - aMid) + (bEnd - bMid) + 1时，只有在a[aMid] == b[bMid]时，第i个数等于a[aMid]或者b[bMid]，

此时虽然b[bMid]在{b2}中，但由于a[aMid]不在{b2}中，所以不影响我们“第i个数一定不在{b2}中”的判断）

如果i > (aEnd - aMid) + (bEnd - bMid) + 1,那第i个数一定不在{a1}中。

此时只需在{a2}、{b1}、{b2}中继续找第i - (aMid - aBeg + 1)个数就可以了。

同理在a[aMid] > b[bMid]时，可以得出类似的结论，只是a、b两个数组的角色互换。

由上面的分析，每次递归可以丢弃掉其中一个数组一半的元素直至递归结束。

递归结束的条件是其中一个数组已经为空，此时在另外一个数组里面直接找第i个数就可以了。

因此本算法的时间复杂度是O(lgm + lgn)的。

实现代码：

（代码里做了一点点优化，因为第a、b的第i个数一定在各自数组的前i个中）

View Code

 1 int ithSmallestNumberLog(int a[], int aBeg, int aEnd, int b[], int bBeg, int bEnd, int i)
 2 {    
 3     //the ith smallest number of all elements must be 
 4     //in the first i elements of either array.
 5     aEnd = aBeg + i - 1 < aEnd ? aBeg + i - 1 : aEnd;
 6     bEnd = bBeg + i - 1 < bEnd ? bBeg + i - 1 : bEnd;
 7     //index of mida and midb
 8     int aMid = (aBeg + aEnd) / 2;
 9     int bMid = (bBeg + bEnd) / 2;
10 
11     if (aBeg > aEnd)
12     {
13         return b[bBeg + i - 1];
14     }
15     if (bBeg > bEnd)
16     {
17         return a[aBeg + i - 1];
18     }
19     if (a[aMid] <= b[bMid])
20     {
21         if (i <= (aMid - aBeg) + (bMid - bBeg) + 1)
22         {
23             return ithSmallestNumberLog(a, aBeg, aEnd, b, bBeg, bMid - 1, i);
24         }
25         else
26         {
27             return ithSmallestNumberLog(a, aMid + 1, aEnd, b, bBeg, bEnd, i - (aMid - aBeg + 1));
28         }
29     }
30     else
31     {
32         if (i <= (aMid - aBeg) + (bMid - bBeg) + 1)
33         {
34             return ithSmallestNumberLog(a, aBeg, aMid - 1, b, bBeg, bEnd, i);
35         }
36         else
37         {
38             return ithSmallestNumberLog(a, aBeg, aEnd, b, bMid + 1, bEnd, i - (bMid - bBeg + 1));
39         }
40     }
41 }

测试：

取数组a长为500，数组b长为1000，数组c为a、b的合集。a、b元素为1到9999之间的随机数。

分别找a、b中所有元素中的第1至1500个元素。

测试代码如下：

View Code

 1 #define ARRAY_SIZE 500
 2  #define COUNT 1000
 3  
 4  int a[ARRAY_SIZE];
 5  int b[ARRAY_SIZE * 2];
 6  int c[ARRAY_SIZE * 3];
 7  
 8  int main(void)
 9  {
10      for (int z = 0; z != COUNT; ++z)
11      {
12          randarray(a, ARRAY_SIZE, 1, 9999);
13          randarray(b, ARRAY_SIZE * 2, 1, 9999);        
14          copyarray(a, 0, c, 0, ARRAY_SIZE);
15          copyarray(b, 0, c, ARRAY_SIZE, ARRAY_SIZE * 2);
16          quick_sort(a, 0, ARRAY_SIZE - 1);
17          quick_sort(b, 0, ARRAY_SIZE * 2 - 1);
18          quick_sort(c, 0, ARRAY_SIZE * 3 - 1);
19  
20          for (int i = 1; i <= ARRAY_SIZE * 3; ++i)
21          {
22              int resultTest = ithSmallestNumberLog(a, 0, ARRAY_SIZE - 1, 
23                                                    b, 0, ARRAY_SIZE * 2 - 1, i);
24              int resultStd = c[i - 1];
25  //             std::cout << "i = " << i << " resultTest = " << resultTest 
26  //                       << " resultStd = " << resultStd << std::endl;
27              if (resultTest != resultStd)
28              {
29                  std::cout << "Error" << std::endl;
30                  return - 1;
31              }
32          }
33          std::cout << "test " << z << " done." << std::endl;
34      }
35  
36      return 0;
37  }

文中一些自定义函数的实现见文章“#include”