001--二分法(Binary search)

一、什么是二分法？

Python 的列表（list）内部实现是一个数组，也就是一个线性表。在列表中查找元素可以使用 list.index() 方法，其时间复杂度为O(n)。对于大数据量，则可以用二分查找进行优化。二分查找要求对象必须有序，其基本原理如下：

1.从数组的中间元素开始，如果中间元素正好是要查找的元素，则搜素过程结束；
2.如果某一特定元素大于或者小于中间元素，则在数组大于或小于中间元素的那一半中查找，而且跟开始一样从中间元素开始比较。
3.如果在某一步骤数组为空，则代表找不到。

二分查找也称为折半查找，算法每一次比较都使搜索范围缩小一半，其时间复杂度为 O(logn)。

二、二分法的实现方法

我们分别用递归和循环来实现二分查找：

1、递归的方法实现二分法

def binary_search_recursion(lst, value, low, high):  
    if high < low:  
        return None
    mid = (low + high) // 2  
    if lst[mid] > value:  
        return binary_search_recursion(lst, value, low, mid-1)  
    elif lst[mid] < value:  
        return binary_search_recursion(lst, value, mid+1, high)  
    else:  
        return mid

2、循环的方法实现二分法

def binary_search_loop(lst,value):  
    low, high = 0, len(lst)-1  
    while low <= high:  
        mid = (low + high) // 2  
        if lst[mid] < value:  
            low = mid + 1  
        elif lst[mid] > value:  
            high = mid - 1
        else:
            return mid  
    return None

3、性能测试：递归 V.S 循环

if __name__ == "__main__":

    lst = [i for i in range(0, 10000)]
    lst.sort()

    def test_recursion():
        binary_search_recursion(lst,999,0,len(lst) -1)

    def test_loop():
        binary_search_loop(lst,999)

    import timeit
    t1 = timeit.Timer("test_recursion()",setup="from __main__ import test_recursion")
    t2 = timeit.Timer("test_loop()",setup="from __main__ import test_loop")

    print("Recursion:",t1.timeit())
    print("Loop:",t2.timeit())

#输出的结果：
Recursion: 4.85854022501735
Loop: 3.470960856997408

结论：循环效率 > 递归效率

三、Bisect模块：

Python 有一个 bisect 模块，用于维护有序列表。bisect 模块实现了一个算法用于插入元素到有序列表。在一些情况下，这比反复排序列表或构造一个大的列表再排序的效率更高。

Bisect 是二分法的意思，这里使用二分法来排序，它会将一个元素插入到一个有序列表的合适位置，这使得不需要每次调用 sort 的方式维护有序列表。

下面是一个简单的使用示例：

import bisect
import random

random.seed(1)#使得随机数据可预测，即只要seed的值一样，后续生成的随机数都一样。

print('New  Pos Contents')
print('---  --- --------')

l = []
for i in range(1, 15):
    r = random.randint(1, 100)
    position = bisect.bisect(l, r)
    bisect.insort(l, r)
    print('%3d  %3d' % (r, position), l)

输出结果如下：

New  Pos Contents
---  --- --------
 18    0 [18]
 73    1 [18, 73]
 98    2 [18, 73, 98]
  9    0 [9, 18, 73, 98]
 33    2 [9, 18, 33, 73, 98]
 16    1 [9, 16, 18, 33, 73, 98]
 64    4 [9, 16, 18, 33, 64, 73, 98]
 98    7 [9, 16, 18, 33, 64, 73, 98, 98]
 58    4 [9, 16, 18, 33, 58, 64, 73, 98, 98]
 61    5 [9, 16, 18, 33, 58, 61, 64, 73, 98, 98]
 84    8 [9, 16, 18, 33, 58, 61, 64, 73, 84, 98, 98]
 49    4 [9, 16, 18, 33, 49, 58, 61, 64, 73, 84, 98, 98]
 27    3 [9, 16, 18, 27, 33, 49, 58, 61, 64, 73, 84, 98, 98]
 13    1 [9, 13, 16, 18, 27, 33, 49, 58, 61, 64, 73, 84, 98, 98]

Bisect模块提供的函数有：

3.1、bisect.bisect_left(a,x, lo=0, hi=len(a)) :

查找在有序列表 a 中插入 x 的index；如果a中已经有x,则返回在列表中已有的x的索引值
lo 和 hi 用于指定列表的区间，默认是使用整个列表。如果 x 已经存在，在其左边插入。返回值为准备插入的 index。

import bisect
data=[1,2,4,5,6]
print(bisect.bisect_left(data,4))

#输出结果
2

3.2、bisect.bisect_right(a,x, lo=0, hi=len(a))

查找在有序列表 a 中插入 x 的index；如果a中已经有x,则返回在列表中已有的x的右边的索引值
lo 和 hi 用于指定列表的区间，默认是使用整个列表。

import bisect
data=[1,2,4,5,6]
print(bisect.bisect_right(data,4))

#输出结果
3

3.3、bisect.bisect(a, x,lo=0, hi=len(a)) ：

这个函数和 bisect_left 类似，但如果 x 已经存在，返回x右边的索引值。

import bisect
data=[1,2,4,5,6]
print(bisect.bisect(data,4))

#输出结果
3

3.4、bisect.insort_left(a,x, lo=0, hi=len(a))

在有序列表 a 中插入 x，如果x存在则在原有x的左边插入。自动会排序插入，不会改变原有值得顺序

import bisect
data=[1,2,4,5,6]
bisect.insort_left(data,3)

print(data)
#输出结果
[1, 2, 3, 4, 5, 6]

3.5、bisect.insort_right(a,x, lo=0, hi=len(a))

在有序列表 a 中插入 x，如果x存在则在原有x的右边插入。自动会排序插入，不会改变原有值得顺序

import bisect
data=[1,2,4,5,6]
bisect.insort_right(data,3)

print(data)
#输出结果
[1, 2, 3, 4, 5, 6]

3.6、bisect.insort(a, x,lo=0, hi=len(a))

a中插入x,自动会排序插入，如果x存在，则会在x的右边插入。自动会排序插入，不会改变原有值得顺序

import bisect
data=[1,2,4,5,6]
bisect.insort(data,3)

print(data)
#输出结果
[1, 2, 3, 4, 5, 6]

Bisect 模块提供的函数可以分两类：

bisect* 只用于查找 index，不进行实际的插入
insort* 则用于实际插入。该模块比较典型的应用是计算分数等级：

Bisect计算分数等级

def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
    i = bisect.bisect(breakpoints, score)
    return grades[i]

print([grade(score)）
for score in [33, 99, 77, 70, 89, 90, 100]])

#输出结果
['F', 'A', 'C', 'C', 'B', 'A', 'A']

Bisect 实现二分查找

def binary_search_bisect(lst, x):
    from bisect import bisect_left
    i = bisect_left(lst, x)
    if i != len(lst) and lst[i] == x:
#解释条件：
#1. i != len(lst)：说明i在lst值的大小范围内
#2. lst[i]==x：说明x在lst中
        return i
    return None

四、性能测试

我们再来测试一下它与递归和循环实现的二分查找的性能：

if __name__ == "__main__":

    lst = [i for i in range(0, 10000)]
    lst.sort()

    def test_recursion():
        binary_search_recursion(lst,999,0,len(lst) -1)

    def test_loop():
        binary_search_loop(lst,999)

    def bisect_loop():
        binary_search_bisect(lst,99)

    import timeit
    t1 = timeit.Timer("test_recursion()",setup="from __main__ import test_recursion")
    t2 = timeit.Timer("test_loop()",setup="from __main__ import test_loop")
    t3 = timeit.Timer("bisect_loop()",setup="from __main__ import bisect_loop")

执行结果为：

Recursion: 4.687202031025663
Loop: 3.2781201470061205
Bisect: 2.4849945579771884

可以看到其比循环实现略快，比递归实现差不多要快一半。

结论：Bisect > 循环 > 递归

结论分析：

1、今天掌握了二分法循环、递归实现，但是bisect还是不太理解，不可否认的是bisect比之前两个方法效率更高。

2、bisect模块的函数用法于2018.04.13学会。

新学知识点

1、random.seed()

import random
random.seed ( [x] )

注意：seed(()是不能直接访问的，需要导入 random 模块，然后通过 random 静态对象调用该方法。

random.seed(n)的用法

import random

random.seed( 10 )
print "Random number with seed 10 : ", random.random()

# 生成同一个随机数
random.seed( 10 )
print "Random number with seed 10 : ", random.random()

# 生成同一个随机数
random.seed( 10 )
print "Random number with seed 10 : ", random.random()

#输出结果
Random number with seed 10 :  0.57140259469
Random number with seed 10 :  0.57140259469
Random number with seed 10 :  0.57140259469

Mark on 2018.04.10

Get Bisect on 2018.04.13