Python 线程、进程、协程

什么是线程？

py文件在执行程序中，他会根据程序的编写来区分，假如没有创建子进程，整个程序就是主进程。

那程序中，有主线程而且还有子线程，那他就是一个多线程。

使用多线程可以提升I/O密集型的效率。

什么是进程？

py文件就是一个进程，比如：QQ，360，浏览器。

使用多进程，会消耗很大的资源问题。

GIL锁

GIL锁又称，全局解释器锁。

GIL锁的作用：在同一时刻，只能有一个线程进入解释器。

站在开发Python语言的那一端时，他就是一个神器，而站在使用这门语言这一端时，他就是一个BUG般的存在，而这BUG仅存在于CPython中。

为什么说BUG呢，因为有了GIL锁，我们使用多线程在进行计算密集型中，计算机的核数再多，他也只能使用一核。

I/O密集型，计算密集型

什么是I/O密集型？

说白了，他就是一个堵塞，当我们创建多线程（A、B），A线程在执行，遇到了堵塞，在CPU空闲时，切换到了B线程。

import threading
import time

start  = time.time()
def music():
    print('I listening Music')
    time.sleep(2)

def movie():
    print('I watching TV')
    time.sleep(3)

t1 = threading.Thread(target=music)
t2 = threading.Thread(target=movie)
t1.start()
t2.start()
end = time.time()
result = end - start
print(result)
print('主线程结束')

View Code

什么时计算密集型？

线程在计算过程中，没有遇到堵塞，而是一直在执行计算。

def add():
    num = 0
    for i in range(1000000):
        num += i
    print(num)

View Code

如何创建多线程？

Threading模块

函数创建

import threading
import time

start  = time.time()
def music():
    for i in range(3):
        print('I listening Music')
        time.sleep(1)

def movie():
    for i in range(2):
        print('I watching TV')
        time.sleep(5)

t1 = threading.Thread(target=music) #创建子线程
t2 = threading.Thread(target=movie) #创建子线程
threads = [t1,t2]

for t in threads:
    t.start() #启动子线程

View Code

类创建

import threading

class MyThread(threading.Thread): # 首先要继承这个方法
    def __init__(self,count):
        super().__init__()
        self.count = count
    def current_thread_count(self):
        print(self.count)
    def run(self): #定义每个线程要运行的内容
        self.current_thread_count()

t1 = MyThread(threading.active_count())
t2 = MyThread(threading.active_count())
t1.start() #开启线程
t2.start()

View Code

join ()方法

主线程A中，创建了子线程B，并且在主线程A中调用了B.join()，那么，主线程A会在调用的地方等待，直到子线程B完成操作后，

才可以接着往下执行，那么在调用这个线程时可以使用被调用线程的join方法。join([timeout]) 里面的参数时可选的，代表线程运行的最大时

间，即如果超过这个时间，不管这个此线程有没有执行完毕都会被回收，然后主线程或函数都会接着执行的，如果线程执行时间小于参数表示的

时间，则接着执行，不用一定要等待到参数表示的时间。

import threading
import time

start  = time.time()
def music():
    for i in range(3):
        print('I listening Music')
        time.sleep(1)

def movie():
    for i in range(2):
        print('I watching TV')
        time.sleep(5)

t1 = threading.Thread(target=music) #创建子线程
t2 = threading.Thread(target=movie) #创建子线程
threads = [t1,t2]

for t in threads:
    t.start() #启动子线程
t.join() # 代表赋值前的一个，也就是t2
print('主线程结束')

View Code

setDaemon()方法

主线程A中，创建了子线程B，并且在主线程A中调用了B.setDaemon(),这个的意思是，把主线程A设置为守护线程，这

时候，要是主线程A执行结束了，就不管子线程B是否完成,一并和主线程A退出.这就是setDaemon方法的含义，这基本和join是相反的。此外，还有

个要特别注意的：必须在start() 方法调用之前设置，如果不设置为守护线程，程序会被无限挂起，只有等待了所有线程结束它才结束。

import threading
import time

start  = time.time()
def music():
    for i in range(3):
        print('I listening Music')
        time.sleep(1)

def movie():
    for i in range(2):
        print('I watching TV')
        time.sleep(5)

t1 = threading.Thread(target=music) #创建子线程
t2 = threading.Thread(target=movie) #创建子线程
threads = [t1,t2]
t2.setDaemon(1) #守护线程

for t in threads:
    t.start() #启动子线程
print('主线程结束')

View Code

同步锁

为什么会有同步锁？

当我们创建多线程时，并且有一个全局变量，而多线程操作这个全局变量。

import threading
import time

def sub():
    global number
    num = number
    time.sleep(0.1)
    number = num - 1

number = 10
threads = []

for i in range(10):
    t = threading.Thread(target=sub)
    t.start()
    threads.append(t)
for i in threads:
    i.join()
print(number) # 9

View Code

结果并不是我们想要的。

为什么出现这种问题？

程序在sleep的一瞬间，cpu来回切换，还没等着修改全局变量，所有的线程已经被创建，而且也已经被赋值。

如何解决？

那就是加锁了。

import threading
import time

def sub():
    global number

    r.acquire() # 获得锁
    num = number
    time.sleep(0.1)
    number = num - 1
    r.release() # 释放锁

number = 10
threads = []
r = threading.Lock()

for i in range(10):
    t = threading.Thread(target=sub)
    t.start()
    threads.append(t)
for i in threads:
    i.join()
print(number) # 0

View Code

加锁，其实就是不让cup进行线程切换，直到锁被释放。

如果锁没被释放，不会让其他线程进入，也不会影响不进入线程的执行。

import threading
import time

number = 10
threads = []
r = threading.Lock()

def sub():
    global number
    r.acquire()
    num = number
    time.sleep(0.1)
    number = num - 1
    r.release()

def music():
    time.sleep(0.5)
    print('Music')

t = threading.Thread(target=music)
t.start()
for i in range(10):
    t = threading.Thread(target=sub)
    t.start()
    threads.append(t)
for i in threads:
    i.join()
print(number)

View Code

递归锁(Rlock)

import threading

r = threading.Lock()

class MyThread(threading.Thread):
    def Thread_1(self):
        r.acquire()
        print('第一层',self.name)
        r.acquire()
        print('第二层',self.name)
        r.release()
        r.release()
    def run(self):
        self.Thread_1()

for i in range(5):
    t = MyThread()
    t.start()

死锁

递归锁，与Lock很相似，但是他有一个计数的功能，能解决死锁

import threading

r = threading.RLock()

class MyThread(threading.Thread):
    def Thread_1(self):
        r.acquire()
        print('第一层',self.name)
        r.acquire()
        print('第二层',self.name)
        r.release()
        r.release()
    def Thread_2(self):
        r.acquire()
        print('第一层',self.name)
        r.acquire()
        print('第二层',self.name)
        r.release()
        r.release()
    def run(self):
        self.Thread_1()
        self.Thread_2()

for i in range(5):
    t = MyThread()
    t.start()

View Code

信号量(Semaphore)

信号量相当于，可以限制最大进入的线程数量。

import threading
import time
r = threading.Semaphore(2) # 创建信号量，最大进入的线程数量

class MyThread(threading.Thread):
    def Thread_1(self):

        r.acquire() # 每次进入线程+1，但不能超过信号量设定的值
        print(self.name)
        time.sleep(2)
        r.release() # -1
    def run(self):
        self.Thread_1()

for i in range(5):
    t = MyThread()
    t.start()

View Code

条件变量(Condition)

wait()：条件不满足时调用，线程会释放锁并进入等待阻塞。

notify()：条件创造后调用，通知等待池激活一个线程。

notifyAll()：条件创造后调用，通知等待池激活所有线程。

import threading
import time
import random

def producer():
    time.sleep(0.2)
    global F
    while True:
        if con_Lock.acquire():
            r = random.randrange(0,100)
            F.append(r)
            print(str(threading.current_thread())+ '--->' + str(r))
            con_Lock.notify()
            con_Lock.release()
        time.sleep(3)

def consumer():
    global F
    while True:
        con_Lock.acquire()
        if not F:
            print("老板，快点，没有包子了")
            con_Lock.wait()
        a = F.pop()
        print('包子%s已经被吃'%a)
        time.sleep(0.5)

con_Lock = threading.Condition()
threads = []
F = []

for i in range(5):
    threads.append(producer)
threads.append(consumer)
for i in threads:
    t = threading.Thread(target=i)
    t.start()

View Code

import threading
import time

con = threading.Condition()

num = 0

# 生产者
class Producer(threading.Thread):

    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        # 锁定线程
        global num
        con.acquire()
        while True:
            print "开始添加！！！"
            num += 1
            print "火锅里面鱼丸个数：%s" % str(num)
            time.sleep(1)
            if num >= 5:
                print "火锅里面里面鱼丸数量已经到达5个，无法添加了！"
                # 唤醒等待的线程
                con.notify()  # 唤醒小伙伴开吃啦
                # 等待通知
                con.wait()
        # 释放锁
        con.release()

# 消费者
class Consumers(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        con.acquire()
        global num
        while True:
            print "开始吃啦！！！"
            num -= 1
            print "火锅里面剩余鱼丸数量：%s" %str(num)
            time.sleep(2)
            if num <= 0:
                print "锅底没货了，赶紧加鱼丸吧！"
                con.notify()  # 唤醒其它线程
                # 等待通知
                con.wait()
        con.release()

p = Producer()
c = Consumers()
p.start()
c.start()

View Code

Event

from threading import Event
Event.isSet() #返回event的状态值
Event.wait() #如果 event.isSet()==False将阻塞线程；
Event.set() #设置event的状态值为True，所有阻塞池的线程激活进入就绪状态， 等待操作系统调度；
Event.clear() #恢复

View Code

实例

#首先定义两个函数，一个是连接数据库
# 一个是检测数据库
from threading import Thread,Event,currentThread
import time
e = Event()
def conn_mysql():
    '''链接数据库'''
    count = 1
    while not e.is_set():  #当没有检测到时候
        if count >3: #如果尝试次数大于3，就主动抛异常
            raise ConnectionError('尝试链接的次数过多')
        print('33[45m%s 第%s次尝试'%(currentThread(),count))
        e.wait(timeout=1) #等待检测（里面的参数是超时1秒）
        count+=1
    print('33[44m%s 开始链接...'%(currentThread().getName()))
def check_mysql():
    '''检测数据库'''
    print('33[42m%s 检测mysql...' % (currentThread().getName()))
    time.sleep(5)
    e.set()
if __name__ == '__main__':
    for i  in range(3):  #三个去链接
        t = Thread(target=conn_mysql)
        t.start()
    t = Thread(target=check_mysql)
    t.start()

View Code

from  threading import Thread,Event,currentThread
import time
e = Event()
def traffic_lights():
    '''红绿灯'''
    time.sleep(5)
    e.set()
def car():
    '''车'''
    print('33[42m %s 等绿灯33[0m'%currentThread().getName())
    e.wait()
    print('33[44m %s 车开始通行' % currentThread().getName())
if __name__ == '__main__':
    for i in range(10):
        t = Thread(target=car)  #10辆车
        t.start()
    traffic_thread = Thread(target=traffic_lights)  #一个红绿灯
    traffic_thread.start()

红绿灯

View Code

队列(Queue)

Python Queue模块有三种队列及构造函数:
1、Python Queue模块的FIFO队列先进先出。 class queue.Queue(maxsize)
2、LIFO类似于堆，即先进后出。 class queue.LifoQueue(maxsize)
3、还有一种是优先级队列级别越低越先出来。 class queue.PriorityQueue(maxsize)

此包中的常用方法(q = Queue.Queue()):
q.qsize() 返回队列的大小
q.empty() 如果队列为空，返回True,反之False
q.full() 如果队列满了，返回True,反之False
q.full 与 maxsize 大小对应
q.get([block[, timeout]]) 获取队列，timeout等待时间
q.get_nowait() 相当q.get(False)
非阻塞 q.put(item) 写入队列，timeout等待时间
q.put_nowait(item) 相当q.put(item, False)
q.task_done() 在完成一项工作之后，q.task_done() 函数向任务已经完成的队列发送一个信号
q.join() 实际上意味着等到队列为空，再执行别的操作

import threading
import queue
import random
q = queue.Queue()

def func(*args):
    q.put(args)
    print("--->%s"%threading.current_thread())

threads = []
for i in range(10):
    t = threading.Thread(target=func,args=(random.choice([1,453,65]),))
    threads.append(t)
    t.start()

for i in threads:
    i.join()

for i in range(q.qsize()):
    print(q.get())

View Code

q.join 与 q.task_done的使用

import queue
import threading
import time
q=queue.Queue(5) #加数字限制长度
def put():
    for i in range(100):
        q.put(i)
    q.join()    #阻塞进程，直到所有任务完成，取多少次数据task_done多少次才行，否则最后的ok无法打印
    print('ok')

def get():
    for i in range(100):
        print(q.get())
        q.task_done()   #必须每取走一个数据，发一个信号给join
    # q.task_done()   #放在这没用，因为join实际上是一个计数器，put了多少个数据，
                      #计数器就是多少，每task_done一次，计数器减1，直到为0才继续执行

t1=threading.Thread(target=put,args=())
t1.start()
t2=threading.Thread(target=get,args=())
t2.start()

View Code

如何创建多进程？

multiprocessing 模块

Process

构造方法：

Process([group [, target [, name [, args [, kwargs]]]]])

　　group: 线程组，目前还没有实现，库引用中提示必须是None；
　　target: 要执行的方法；
　　name: 进程名；
　　args/kwargs: 要传入方法的参数。

实例方法：

　　is_alive()：返回进程是否在运行。

　　join([timeout])：阻塞当前上下文环境的进程程，直到调用此方法的进程终止或到达指定的timeout（可选参数）。

　　start()：进程准备就绪，等待CPU调度

　　run()：strat()调用run方法，如果实例进程时未制定传入target，这star执行t默认run()方法。

　　terminate()：不管任务是否完成，立即停止工作进程

属性：

　　authkey

　　daemon：和线程的setDeamon功能一样

　　exitcode(进程在运行时为None、如果为–N，表示被信号N结束）

　　name：进程名字。

　　pid：进程号。

多进程与多线程的创建完全一致，也有两种方式。

函数创建

from multiprocessing import Process
import os

def My_process():
    print(os.getpid()) # 获取进程和PID

process_num = []
if __name__ == "__main__":
    for i in range(3):
        p = Process(target=My_process,args=())
        p.start()
        process_num.append(p)
    for p in process_num:
        p.join()
    print('主进程结束')

View Code

类创建

from multiprocessing import Process
import time

class My_process(Process):# 继承
    def __init__(self,name):
        super().__init__() # 调用父类的__init__方法
        self.name = name
    def run(self):
        print(self.name)

if __name__ == '__main__':
    for i in range(3):
        p = My_process(str(i))
        p.daemon = True # 启动守护进程
        p.start()
    time.sleep(0.5)
    print('主进程结束')

View Code

注意：多进程开启时

windows必须在，if __name__ == "__main__": 下开启

linux下不用

进程之间通讯

Queue（队列）

这里的Queue跟threading的用法类似:

from multiprocessing import Process,Queue

q = Queue() # 创建队列

def multi_process(i,q):
    q.put(i)

process_obj = []

def run():
    for i in range(3):
        p = Process(target=multi_process,args=(i,q))
        p.start()
        process_obj.append(p)
    for p in process_obj:
        p.join()
if __name__ == "__main__":
    run()
    for i in range(3):
        print(q.get())

View Code

Pipe（管道）

The Pipe() function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example:

import os
from multiprocessing import Process,Pipe

parent_port,child_port = Pipe()

def multi_process(port):
    Pid = os.getpid()
    print(port.recv())
    port.send('我的Pid是：%s'%Pid)
    port.close()

process_obj = []

def run():
    for i in range(3):
        p = Process(target=multi_process,args=(child_port,))
        p.start()
        process_obj.append(p)
    for p in process_obj:
        parent_port.send('你的Pid是多少?')
        print(parent_port.recv())
        p.join()

if __name__ == "__main__":
    run()

View Code

The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.

Manager

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array. For example,

import os
from multiprocessing import Process,Manager

def multi_process(*args):
   for arg in args:
       try:
           arg[os.getpid()] = os.getpid()
       except:
           arg.append(os.getpid())

def start(*args):
    for i in range(3):
        p = Process(target=multi_process,args=(args))
        p.start()
    return p
if __name__ == "__main__":
    with Manager() as manager:
        l = manager.list()
        d = manager.dict()
        p = start(l,d)
        p.join()
        print(l)
        print(d)

View Code

进程池

用multiprocess中的Pool起进程池
进程池中开启的个数：默认是cpu个数
提交任务（不能传队列作为子进程的参数，只能传管道）
- apply 同步提交，直接返回结果
- apply_async 异步提交，返回对象，通过对象获取返回值

from multiprocessing import Pool
import time

def Foo(i):
    time.sleep(2)
    return i + 100

def Back(arg):
    print('--->:', arg)

if __name__ == '__main__':
    pool = Pool(5)

    for i in range(10):
        pool.apply_async(func=Foo, args=(i,), callback=Back)
        # pool.apply(func=Foo, args=(i,))

    print('end')
    pool.close()
    pool.join()

View Code

协程

协程，又称微线程，纤程。英文名Coroutine。一句话说明什么是线程：协程是一种用户态的轻量级线程。

协程拥有自己的寄存器上下文和栈。协程调度切换时，将寄存器上下文和栈保存到其他地方，在切回来的时候，恢复先前保存的寄存器上下文和栈。因此：

协程能保留上一次调用时的状态（即所有局部状态的一个特定组合），每次过程重入时，就相当于进入上一次调用的状态，换种说法：进入上一次离开时所处逻辑流的位置。

协程的好处：

无需线程上下文切换的开销
无需原子操作锁定及同步的开销
- 　　"原子操作(atomic operation)是不需要synchronized"，所谓原子操作是指不会被线程调度机制打断的操作；这种操作一旦开始，就一直运行到结束，中间不会有任何 context switch （切换到另一个线程）。原子操作可以是一个步骤，也可以是多个操作步骤，但是其顺序是不可以被打乱，或者切割掉只执行部分。视作整体是原子性的核心。
方便切换控制流，简化编程模型
高并发+高扩展性+低成本：一个CPU支持上万的协程都不是问题。所以很适合用于高并发处理。

缺点：

无法利用多核资源：协程的本质是个单线程,它不能同时将单个CPU 的多个核用上,协程需要和进程配合才能运行在多CPU上.当然我们日常所编写的绝大部分应用都没有这个必要，除非是cpu密集型应用。
进行阻塞（Blocking）操作（如IO时）会阻塞掉整个程序

协程的四要素：

必须在只有一个单线程里实现并发
修改共享数据不需加锁
用户程序里自己保存多个控制流的上下文栈
一个协程遇到IO操作自动切换到其它协程

yield

import time
def func1():
    for i in range(11):
        yield
        print('这是我第%s次打印啦' % i)
        time.sleep(1)

def func2():
    g = func1()
    next(g)
    for k in range(10):
        print('哈哈，我第%s次打印了' % k)
        time.sleep(1)
        next(g)

#不写yield，下面两个任务是执行完func1里面所有的程序才会执行func2里面的程序，有了yield，我们实现了两个任务的切换+保存状态
func1()
func2(

View Code

但是并没有实现协程。

greentlen

import time
from greenlet import greenlet

def func1():
    print("func1 ing")
    gr2.switch()
    time.sleep(1)
    print("func1 ok")

def func2():
    print("func2 ing")
    gr1.switch()
    print("func2 ok")

gr1 = greenlet(func1)
gr2 = greenlet(func2)

gr1.switch()

print("end")

View Code

他与yield使用类似，但比他更加简单了，但是greenlen仍然没有实现自动切换,需手动switch（）进行切换。

gevent

Gevent 是一个第三方库，可以轻松通过gevent实现并发同步或异步编程，在gevent中用到的主要模式是Greenlet, 它是以C扩展模块形式接入Python的轻量级协程。 Greenlet全部运行在主程序操作系统进程的内部，但它们被协作式地调度。

import gevent

def func1():
    print("func1 ing")
    gevent.sleep(1)
    print("func1 ok")

def func2():
    print("func2 ing")
    print("func2 ok")

g1 = gevent.spawn(func1)
g2 = gevent.spawn(func2)
g1.join()
g2.join()

print("end")

View Code

同步与异步的性能区别

import time

def func1():
    print("func1 ing")
    time.sleep(1)
    print("func1 ok")

def func2():
    print("func2 ing")
    print("func2 ok")

print("end")

View Code

遇到IO阻塞时会自动切换任务

from gevent import monkey
monkey.patch_all()

import gevent
from urllib.request import urlopen


def f(url):
    print('GET: %s' % url)
    resp = urlopen(url)
    data = resp.read()
    print('%d bytes received from %s.' % (len(data), url))


if __name__ == '__main__':
    gevent.joinall([
        gevent.spawn(f, 'https://www.python.org/'),
        gevent.spawn(f, 'https://www.baidu.com/'),
        gevent.spawn(f, 'https://github.com/'),
    ])

View Code

get获取函数返回值

from gevent import monkey
monkey.patch_all()

import requests
import gevent

response = lambda url:"from url:%s code:%s"%(url,requests.get(url),)

req_all = gevent.joinall([gevent.spawn(response,"https://www.baidu.com"),
                gevent.spawn(response,"https://www.github.com"),
                gevent.spawn(response,"https://www.python.org")
                ])

for i in req_all:
    print(i.get())

View Code