重谈异步编程-多线程

Python中的多线程需要导入threading包。

线程在操作系统中有过介绍，就不再过多阐述了。直接通过代码演示。

如何创建线程？

import threading
def fun():
    pass
t1 = threading.Thread(target=fun)
print(t1)

输出：

<Thread(Thread-1, initial)>

可以看到，线程的创建不需要像协程一样写协程函数。fun就是一个普通的函数。

输出的t1是Thread对象，也就是线程对象。后边的Thread-1是协程的名字。initial是协程的状态，因为协程还没开始运行，所以状态是初态。

创建线程是用的threading.Thread()方法创建的，Thread方法中包含的参数有很多。做一下简单介绍。

class Thread(group=None, target=None, name=None, args=(), kwargs={}，daemon=None)

group用的比较少，不说；target传入的是将哪个函数加入到该线程里边，所以这个位置参数是一个函数名称；name是给线程起名字，一般是用字符串；args是刚才传入函数的位置参数的实参，注意要用元组的形式；kwargs是刚才传入函数的关键字参数的实参；daemon是把该线程设定为是否是守护线程，写True或False，是一个布尔值，后边我们会详细讲到。

下边我们将上边的代码丰富一下。

import threading
def fun(age,**kwargs):
    print('我叫',kwargs['name'],',今年',age,'岁了',sep='')
t1 = threading.Thread(target=fun,args=(18,),name = 't1',kwargs={'name':'张三'})
t1.start()
print(t1)

输出：

我叫张三,今年18岁了
<Thread(t1, stopped 14028)>

可以看到，线程的名字已经成功改为了t1，线程的状态已经是停止了。

那么我们学线程的目的是什么？为了实现线程的并发操作。

例：

import threading
import time
def fun():
    print('hello')
    time.sleep(2)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(2)
    print('fine')
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()

输出：

hello
how are you?
world
fine

好的，看起来结果确实是我们需要的，交替进行，通过观察输出的话，能够看到整个程序执行是需要2s的时间。

我们用程序来计算一下时间：

例：

import threading
import time
def fun():
    print('hello')
    time.sleep(2)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(2)
    print('fine')
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
end_time = time.time()
print(end_time-start_time)

输出：

hello
how are you?
0.001026153564453125
world
fine

哎呀，我们发现输出并不是我们想要的，首先是输出时间的位置不对，输出的时间也不对，很明显，输出时间的代码在开始就被执行了，那这是为什么呢？

我们加一行代码，打印一下：threading.enumerate()，这个enumerate方法的功能是返回线程的信息，即有多少个线程，和线程的名字，状态。

import threading
import time
def fun():
    print('hello')
    time.sleep(2)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(2)
    print('fine')
#start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
# end_time = time.time()
# print(end_time-start_time)
print(threading.enumerate())

输出：

hello
how are you?
[<_MainThread(MainThread, started 2404)>, <Thread(t1, started 4660)>, <Thread(t2, started 2952)>]
fine
world

我们发现，输出的内容中间，线程居然是3个，除了t1和t2之外，还有一个主线程。在上述代码中，主线程、t1、t2是同时执行的，也就是说并不是我们想的那样，t1和t2交叉执行完之后，再执行时间的计算输出，那我们如何解决这问题呢？也就是说等t1和t2执行完毕之后，再执行主线程？加两行代码就可以,jion()。

例：

import threading
import time
def fun():
    print('hello')
    time.sleep(2)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(2)
    print('fine')
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
t1.join()
t2.join()
end_time = time.time()
print(end_time-start_time)

输出：

hello
how are you?
fine
world
2.0018341541290283

发现，没有问题了。那么这里Thread.jion()的作用是什么呢？简单来说，就是等子线程结束之后，再执行主线程。准确一点是说，主线程碰到了Thread.jion()，会阻塞，一直等到子进程执行完，再唤醒主线程，主线程继续执行。另外Thread.jion()中还有一个参数，就是timeout，简单来说，就是主线程最多阻塞这么多时间，如果子线程还没有结束，就不等了，继续执行主线程。

输出：

hello
how are you?
0.716256856918335
worldfine

====================================================================================================================================================================================================

那么下边一个话题，多线程并发真的能够节省时间吗？

我们看两个例子，一个是CPU密集型的两个线程，一个是IO为主的两个线程。

CPU密集型：

import threading
import time
def fun():
    for i in range(10000000):
        sum = i*i*i*i
def fun1():
    for i in range(10000000):
        sum = i*i*i
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
t1.join()
t2.join()
end_time = time.time()
print('time',end_time-start_time)

输出：

time 2.053481340408325

import time
def fun():
    for i in range(10000000):
        sum = i*i*i*i
def fun1():
    for i in range(10000000):
        sum = i*i*i
start_time = time.time()
fun()
fun1()
end_time = time.time()
print('time',end_time-start_time)

输出：

time 2.016674518585205

我们可以看到，使用多线程和不适用多线程，耗时差不多，并且多线程甚至还略微多一丢丢时间，是因为线程的切换也是有时间开销的。

IO型为主：

import threading
import time
def fun():
    time.sleep(1)
def fun1():
    time.sleep(1)
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
t1.join()
t2.join()
end_time = time.time()
print('time',end_time-start_time)

输出：

time 1.0026021003723145

import time
def fun():
    time.sleep(1)
def fun1():
    time.sleep(1)
start_time = time.time()
fun()
fun1()
end_time = time.time()
print('time',end_time-start_time)

输出：

time 2.0009641647338867

我们可以看到，IO型任务的情况下，多线程确实是要快大约一倍。在例子中，我们用time.sleep()来模拟IO。在真正的IO中，CPU是不用处理IO工作的，在time.sleep()中，CPU也是不需要工作的，所以是一样的。

那么我有一个结论：如果线程任务是以CPU密集型为主，计算时间是不会缩减，反而会略微增加；如果线程任务是以IO为主的，计算时间会较少很多，甚至一半。这是为什么呢？这是因为Python中有GIL锁，简单来说，即使计算机有多个CPU，也只能有一个线程工作。那么为什么多线程在爬虫中也有应用，可以一定程度的提高效率呢？是因为爬虫本身就是下载数据，也就是IO工作。

====================================================================================================================================================================================================

我们来讲下一个话题，在最开始介绍Theard方法时，提到了守护线程，那么什么是守护线程呢？我们仍然以上边例子为例，进行修改。

import threading
import time
def fun():
    print('hello')
    time.sleep(3)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(3)
    print('fine')
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1',daemon=True)
t2 = threading.Thread(target=fun1,name = 't2',daemon=True)
t1.start()
t2.start()
t1.join(timeout=0.3)
t2.join(timeout=0.4)
end_time = time.time()
print(end_time-start_time)

输出：

hello
how are you?
0.7178466320037842

可以看到，t1和t2的最后一个输出并没有执行，程序就结束了。在t1和t2的实例化中，只是加入了参数的设置：daemon=True。这就是守护线程。

总结说，如果daemon=False（默认值），该进程为非守护进程，反之如果daemon=True，该进程就是守护进程。守护，顾名思义，守护并且共存亡的意思。主线程结束，子线程也结束。

====================================================================================================================================================================================================

再谈下一个话题，如何保证两个子进程交替执行。

例：

import threading
import time
# 定义线程运行函数
def ou():
    for i in range(0,10,2):
        print(i)
        time.sleep(0.5)
def ji():for i in range(1,10,2):
        print(i)
        time.sleep(0.5)
if __name__ == '__main__':
    th = threading.Thread(target=ji)
    th2 = threading.Thread(target=ou)
    th.start()
    th2.start()

输出：

我们发现输出并没有规律，但是我就是想让两个进程交替执行，输出的奇偶交叉，怎么办？

可以参考我前边的案例。