线程-threading

Python3 实现多线程编程需要借助于 threading 模块。

threading.currentThread()    # 返回当前的线程变量
threading.enumerate()        # 返回一个包含正在运行的线程的list。正在运行指线程启动后、结束前，不包括启动前和终止后的线程
threading.activeCount()      # 返回正在运行的线程数量，与len(threading.enumerate())有相同的结果
threadingObj.run()           # 线程在运行时要执行的方法，调用它只是表示函数执行，而不是新开一个线程
threadingObj.start()         # 启动线程活动，新开一个线程执行 run 方法
threadingObj.join([time])    # 等待至线程中止(可选超时时间)
threadingObj.isAlive()       # 返回线程是否活动的
threadingObj.getName()       # 返回线程名
threadingObj.setName()       # 设置线程名

我们要创建 Thread 对象,然后让它们运行，每个 Thread 对象代表一个线程，在每个线程中我们可以让程序处理不同的任务。

1. 创建 Thread 对象有 2 种手段：

1）直接创建 Thread ，将一个 callable 对象从类的构造器传递进去，这个 callable 就是回调函数，用来处理任务。

class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)

Thread 的构造方法中，最重要的参数是 target，所以我们需要将一个 callable 对象赋值给它，线程才能正常运行。

下面举一个例子，主线程和子线程各自打印五次：

import threading
import time

def test():
    for i in range(5):
        print('%s %d' % (threading.current_thread().name, i))
        time.sleep(1)

thread = threading.Thread(target=test, name='testThread')
thread.start()

for i in range(5):
    print('mainThread ', i)
    time.sleep(1)

2）编写一个自定义类继承 Thread，然后复写 run() 方法，在 run() 方法中编写任务处理代码，然后创建这个 Thread 的子类。

import threading
import time

class TestThread(threading.Thread):
    def __init__(self,name=None):
        threading.Thread.__init__(self,name=name)

    def run(self):
        for i in range(5):
            print('%s %d' % (threading.current_thread().name, i))
            time.sleep(1)

thread = TestThread(name='TestThread')
thread.start()

for i in range(5):
    print('mainThread ', i)
    time.sleep(1)

2. 互斥锁

如果多个线程共同对某个数据修改，则可能出现不可预料的结果，为了保证数据的正确性，需要对多个线程进行同步。

import threading

money = 10000
lock = threading.Lock()

def change_money(n):
    global money
    for i in range(50000):
        lock.acquire()
        try:
            money = money + n
            money = money - n
        finally:
            lock.release()

def test_lock():
    t1 = threading.Thread(target=change_money, args=(5,))
    t2 = threading.Thread(target=change_money, args=(8,))
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    # 不加锁的话，经过一系列存取操作money值就不是10000了
    print(money)

test_lock()

Lock 对象和 with 语句块一起使用可以保证互斥执行，就是每次只有一个线程可以执行 with 语句包含的代码块。with 语句会在

这个代码块执行前自动获取锁，在执行结束后自动释放锁。

class SharedCounter:
    def __init__(self, initial_value = 0):
        self._value = initial_value
        self._value_lock = threading.Lock()

    def incr(self,delta=1):
        with self._value_lock:
            self._value += delta

    def decr(self,delta=1):
        with self._value_lock:
            self._value -= delta

3. 线程间通信

你的程序中有多个线程，你需要在这些线程之间安全地交换信息或数据。

从一个线程向另一个线程发送数据最安全的方式可能就是使用 queue 库中的队列了。创建一个被多个线程共享的 Queue 对象，这些线程通过使

用 put() 和 get() 操作来向队列中添加或者删除元素。例如：

from queue import Queue
from threading import Thread

# A thread that produces data
def producer(out_q):
    while True:
        data = [1,2,3]
        out_q.put(data)

# A thread that consumes data
def consumer(in_q):
    while True:
        data = in_q.get()
        print(data)
        ...

# Create the shared queue and launch both threads
q = Queue()
t1 = Thread(target=consumer, args=(q,))
t2 = Thread(target=producer, args=(q,))
t1.start()
t2.start()

Queue 对象已经包含了必要的锁，所以你可以通过它在多个线程间多安全地共享数据。

使用线程队列有一个要注意的问题是，向队列中添加数据项时并不会复制此数据项，线程间通信实际上是在线程间传递对象引用。

如果你担心对象的共享状态，那你最好只传递不可修改的数据结构（如：整型、字符串或者元组）或者一个对象的深拷贝。如：

out_q.put(copy.deepcopy(data))

4. 条件变量

Python提供了threading.Condition 对象用于条件变量线程的支持，Condition 的底层实现了__enter__和 __exit__协议.所以可以使用with上下文管理器。

常用的方法如下：

"""
线程挂起，直到收到一个notify通知或者超时(该参数是可选的，浮点数，单位为秒s)
才会被唤醒继续运行。wait()必须在已获得Lock前提下才能调用，否则会触发RuntimeError。
调用wait()会主动释放Lock，直至该线程被Notify()、NotifyAll()或者超时线程又重新获得Lock.
"""
wait([timeout])

"""
通知其他线程，那些挂起的线程接到这个通知之后会开始运行，默认是通知一个正等待该condition的线程,
最多则唤醒n个等待的线程。notify()必须在已获得Lock前提下才能调用，
否则会触发RuntimeError。notify()不会主动释放Lock。
"""
notify(n=1)

"""
如果wait状态线程比较多，notifyAll的作用就是通知所有线程（这个一般用得少）
"""
notifyAll()

下面来看一个例子：

import threading,time
from random import randint

class Producer(threading.Thread):
    def run(self):
        global L
        while True:
            val = randint(0,100)
            print('生产者 produce ' + str(val), L)
            if lock_con.acquire():
                L.append(val)
                lock_con.notify()
                lock_con.release()
            time.sleep(3)

class Consumer(threading.Thread):
    def run(self):
        global L
        while True:
            lock_con.acquire()
            if len(L)==0:
                lock_con.wait()
            print('消费者 consume ' + str(L[0]), L)
            del L[0]
            lock_con.release()
            time.sleep(0.5)

if __name__ == '__main__':
    L=[]
    lock_con = threading.Condition()
    threads = []
    for i in range(2):
        threads.append(Producer())
    c = Consumer()
    for t in threads:
        t.start()
    c.start()

5. 全局锁(GIL)问题

尽管 Python 完全支持多线程编程，但是解释器的 C 语言实现部分在完全并行执行时并不是线程安全的。实际上，解释器被一个全局解释器锁保护着，

它确保任何时候都只有一个 Python 线程执行。GIL 最大的问题就是 Python 的多线程程序并不能利用多核 CPU 的优势（比如一个使用了多个线程的

计算密集型程序只会在一个单 CPU 上面运行）。

有一点要强调的是 GIL 只会影响到那些严重依赖 CPU 的程序（比如计算型的）。如果你的程序大部分只会涉及到 I/O，比如网络交互，那么使用多线

程就很合适，因为它们大部分时间都在等待。

而对于依赖 CPU 的程序，你需要弄清楚执行的计算的特点。例如，优化底层算法要比使用多线程运行快得多。类似的，由于 Python 是解释执行的，

如果你将那些性能瓶颈代码移到一个 C 语言扩展模块中，速度也会提升的很快。如果你要操作数组，那么使用 NumPy 这样的扩展会非常的高效。

有两种策略来解决 GIL 的缺点：

1）使用 multiprocessing 模块来创建进程。能这么处理是由于：Python 的每个进程中都有一个 Python 解释器且包含一个独立的 GIL 锁。

2）另外一个解决 GIL 的策略是使用 C 扩展编程技术。主要思想是将计算密集型任务转移给 C，跟 Python 独立，在工作的时候在 C 代码中释放 GIL。