Python Day09

一、进程与线程

1.什么是进程（process）？

程序并不能单独运行，只有将程序装载到内存中，系统为它分配资源才能运行，而这种执行的程序就称之为进程。程序和进程的区别就在于：程序是指令的集合，它是进程运行的静态描述文本；进程是程序的一次执行活动，属于动态概念。

在多道编程中，我们允许多个程序同时加载到内存中，在操作系统的调度下，可以实现并发地执行。这是这样的设计，大大提高了CPU的利用率。进程的出现让每个用户感觉到自己独享CPU，因此，进程就是为了在CPU上实现多道编程而提出的。

注：进程就是一堆资源的集合

2.什么是线程（thread）？

线程是操作系统能够进行运算调度的最小单位。它被包含在进程之中，是进程中的实际运作单位。一条线程指的是进程中一个单一顺序的控制流，一个进程中可以并发多个线程，每条线程并行执行不同的任务

注：线程是操作系统最小的调度单位，是一串指令的集合

TIp1：

　　进程要操作cpu , 必须要先创建一个线程，

　　all the threads in a process have the same view of the memory

　　所有在同一个进程里的线程是共享同一块内存空间的

TIp2：

　　1.进程和线程没有可比性，进程是一堆资源的集合，而线程是一堆指令的集合

　　2.启动一个线程比启动一个进程快

3.进程与线程的区别？

Threads share the address space of the process that created it; processes have their own address space.

线程共享内存空间，进程的内存是独立的

Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.

线程可以直接访问进程的数据段；进程有自己的父进程的数据段的副本。

Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.

同一个进程的线程之间可以直接交流，两个进程想通信，必须通过一个中间代理来实现

New threads are easily created; new processes require duplication of the parent process.

创建新线程很简单，创建新进程需要对其父进程进行一次克隆

Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.

一个线程可以控制和操作同一进程里的其他线程，但是进程只能操作子进程

Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.

主线程（取消、优先级更改等）的更改可能会影响进程的其他线程的行为；父进程的更改不会影响子进程。

二、线程

1.直接调用

 1 import threading
 2 import time
 3 
 4 def run(x):
 5     print("task:", x)
 6     time.sleep(2)
 7 
 8 t1 = threading.Thread(target=run, args=("t1",))
 9 t2 = threading.Thread(target=run, args=("t2",))
10 t1.start()
11 t2.start()

2.继承式调用

 1 import threading
 2 import time
 3 
 4 class MyThread(threading.Thread):
 5 
 6     def __init__(self, n):
 7         super(MyThread, self).__init__()
 8         self.n = n
 9 
10     def run(self):
11         print("running task:", self.n)
12         time.sleep(2)
13 
14 t1 = MyThread("t1")
15 t2 = MyThread("t2")
16 t1.start()
17 t2.start()

2.1join等待线程执行完毕

 1 import threading
 2 import time
 3 
 4 def run(x):
 5     print("running task:", x, threading.active_count())
 6     time.sleep(2)
 7     print("task%s done" % x, threading.current_thread())
 8 
 9 start_time = time.time()
10 t_objs = []
11 for i in range(50):
12     t = threading.Thread(target=run, args=("t%s" % i,))
13     t.start()
14     t_objs.append(t)
15 
16 for t in t_objs:
17     t.join()
18 
19 print("all threads done", threading.current_thread(), threading.active_count())
20 print("cost:", time.time() - start_time)

3.守护线程

 1 import threading
 2 import time
 3 
 4 def run(x):
 5     print("running task:", x, threading.active_count())
 6     time.sleep(2)
 7     print("task%s done" % x, threading.current_thread())
 8 
 9 start_time = time.time()
10 t_objs = []
11 for i in range(50):
12     t = threading.Thread(target=run, args=("t%s" % i,))
13     t.setDaemon(True)  # 把当前线程设置为守护线程
14     t.start()
15     t_objs.append(t)
16 
17 print("all threads done", threading.current_thread(), threading.active_count())
18 print("cost:", time.time() - start_time)

4.为什么有GIL锁

GIL：无论你启多少个线程，你有多少个cpu, Python在执行的时候会淡定的在同一时刻只允许一个线程运行

首先需要明确的一点是GIL并不是Python的特性，它是在实现Python解析器(CPython)时所引入的一个概念。就好比C++是一套语言（语法）标准，但是可以用不同的编译器来编译成可执行代码。有名的编译器例如GCC，INTEL C++，Visual C++等。Python也一样，同样一段代码可以通过CPython，PyPy，Psyco等不同的Python执行环境来执行。像其中的JPython就没有GIL。然而因为CPython是大部分环境下默认的Python执行环境。所以在很多人的概念里CPython就是Python，也就想当然的把GIL归结为Python语言的缺陷。所以这里要先明确一点：GIL并不是Python的特性，Python完全可以不依赖于GIL。

总结：

Python GIL其实是功能和性能之间权衡后的产物，它尤其存在的合理性，也有较难改变的客观因素。

– 因为GIL的存在，只有IO Bound场景下得多线程会得到较好的性能

– 如果对并行计算性能较高的程序可以考虑把核心部分也成C模块，或者索性用其他语言实现

– GIL在较长一段时间内将会继续存在，但是会不断对其进行改进

5.线程锁（互斥锁）

 1 import threading
 2 import time
 3 
 4 def change():
 5     lock.acquire()  # 加锁
 6     global num
 7     num += 1        # 加锁和释放锁之间要保证快速计算，避免占着茅坑不拉屎现象
 8     lock.release()  # 释放锁
 9     time.sleep(1)
10 
11 num = 0
12 lock = threading.Lock()  # 获取一把锁
13 
14 for i in range(50):
15     t = threading.Thread(target=change)
16     t.start()
17 
18 print(num)

为什么Python有GIL了，还需要互斥锁呢？这两个是不一样的，下面的图可以解释：

即：修改数据的代码执行到中途由于执行时间到了，被强制要求释放GIL，这时就会造成修改数据乱套。

既然用户程序已经自己有锁了，那为什么C python还需要GIL呢？加入GIL主要的原因是为了降低程序的开发的复杂度，比如现在的你写python不需要关心内存回收的问题，因为Python解释器帮你自动定期进行内存回收，你可以理解为python解释器里有一个独立的线程，每过一段时间它起wake up做一次全局轮询看看哪些内存数据是可以被清空的，此时你自己的程序里的线程和 py解释器自己的线程是并发运行的，假设你的线程删除了一个变量，py解释器的垃圾回收线程在清空这个变量的过程中的clearing时刻，可能一个其它线程正好又重新给这个还没来及得清空的内存空间赋值了，结果就有可能新赋值的数据被删除了，为了解决类似的问题，python解释器简单粗暴的加了锁，即当一个线程运行时，其它人都不能动，这样就解决了上述的问题，这可以说是Python早期版本的遗留问题。

6.递归锁

 1 import threading
 2 
 3 def run1():
 4     print("grab the first part data")
 5     lock.acquire()
 6     global num
 7     num += 1
 8     lock.release()
 9     return num
10 
11 def run2():
12     print("grab the second part data")
13     lock.acquire()
14     global num2
15     num2 += 1
16     lock.release()
17     return num2
18 
19 def run3():
20     lock.acquire()
21     res = run1()
22     print('--------between run1 and run2-----')
23     res2 = run2()
24     lock.release()
25     print(res, res2)
26 
27 if __name__ == '__main__':
28 
29     num, num2 = 0, 0
30     lock = threading.RLock()
31     for i in range(10):
32         t = threading.Thread(target=run3)
33         t.start()
34 
35 while threading.active_count() != 1:
36     print(threading.active_count())
37 else:
38     print('----all threads done---')
39     print(num, num2)

View Code

7.信号量（semaphore）

互斥锁同时只允许一个线程更改数据，而Semaphore是同时允许一定数量的线程更改数据，比如厕所有3个坑，那最多只允许3个人上厕所，后面的人只能等里面有人出来了才能再进去。

 1 import threading
 2 import time
 3 
 4 def run(n):
 5     semaphore.acquire()
 6     time.sleep(1)
 7     print("run the thread: %s
" % n)
 8     semaphore.release()
 9 
10 if __name__ == '__main__':
11 
12     semaphore = threading.BoundedSemaphore(5)  # 最多允许5个线程同时运行
13     for i in range(20):
14         t = threading.Thread(target=run, args=(i,))
15         t.start()
16 
17 while threading.active_count() != 1:
18     pass  # print threading.active_count()
19 else:
20     print('----all threads done---')

8.Timer

This class represents an action that should be run only after a certain amount of time has passed

这个类代表一个动作，只有在一定数量的时间通过后才能运行

Timers are started, as with threads, by calling their start() method. The timer can be stopped (before its action has begun) by calling thecancel() method. The interval the timer will wait before executing its action may not be exactly the same as the interval specified by the user.

定时器开始计时，如线程，调用start()方法。定时器可以停止（之前的行动已经开始，通过调用thecancel()方法）。计时器在执行其动作之前会等待的间隔可能与用户指定的时间间隔不完全相同。

1 def hello():
2     print("hello, world")
3 
4 t = Timer(30.0, hello)
5 t.start()  # after 30 seconds, "hello, world" will be printed

9.事件(Events)

An event is a simple synchronization object;

the event represents an internal flag, and threads

can wait for the flag to be set, or set or clear the flag themselves.

event = threading.Event()

# a client thread can wait for the flag to be set

event.wait()

# a server thread can set or reset it

event.set()

event.clear()

If the flag is set, the wait method doesn’t do anything.

If the flag is cleared, wait will block until it becomes set again.

Any number of threads may wait for the same event.

 1 import time
 2 import threading
 3 
 4 event = threading.Event()
 5 
 6 def lighter():
 7     count = 0
 8     event.set()
 9     while True:
10         if count <= 4:
11             print("33[42;1m绿灯33[0m")
12         elif 4 < count <= 9:
13             event.clear()
14             print("33[41;1m红灯33[0m")
15         elif count > 9:
16             event.set()
17             print("33[42;1m绿灯33[0m")
18             count = 0
19         time.sleep(1)
20         count += 1
21 
22 def car(name):
23     while True:
24         if event.is_set():
25             print("[%s] is running" % name)
26             time.sleep(1)
27         else:
28             print("33[31;1m红灯亮了33[0m")
29             print("[%s] is waiting" % name)
30             event.wait()
31 
32 l = threading.Thread(target=lighter)
33 l.start()
34 car1 = threading.Thread(target=car, args=("Tesla",))
35 car1.start()

10.队列(queue)

queue is especially useful in threaded programming when information must be exchanged safely between multiple threads.

class queue.Queue(maxsize=0) #先入先出

class queue.LifoQueue(maxsize=0) #last in fisrt out

class queue.PriorityQueue(maxsize=0) #存储数据时可设置优先级的队列

Constructor for a priority queue. maxsize is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If maxsize is less than or equal to zero, the queue size is infinite.

The lowest valued entries are retrieved first (the lowest valued entry is the one returned by sorted(list(entries))[0]). A typical pattern for entries is a tuple in the form: (priority_number, data).

exception queue.Empty

Exception raised when non-blocking get() (or get_nowait()) is called on a Queue object which is empty.

exception queue.Full

Exception raised when non-blocking put() (or put_nowait()) is called on a Queue object which is full.

Queue.qsize()

Queue.empty() #return True if empty

Queue.full() # return True if full

Queue.put(item, block=True, timeout=None)

Put item into the queue. If optional args block is true and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Full exception if no free slot was available within that time. Otherwise (block is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (timeout is ignored in that case).

Queue.put_nowait(item)

Equivalent to put(item, False).

Queue.get(block=True, timeout=None)

Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).

Queue.get_nowait()

Equivalent to get(False).

Two methods are offered to support tracking whether enqueued tasks have been fully processed by daemon consumer threads.

Queue.task_done()

Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete.

If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).

Raises a ValueError if called more times than there were items placed in the queue.

Queue.join() block直到queue被消费完毕

 1 import queue
 2 
 3 q = queue.Queue()
 4 q.put(1)
 5 q.put(2)
 6 q.put(3)
 7 
 8 while True:
 9     if q.qsize() == 0:
10         break
11     print(q.get())
12 
13 print("".center(50, "="))
14 
15 q = queue.LifoQueue()
16 q.put(1)
17 q.put(2)
18 q.put(3)
19 
20 while True:
21     if q.qsize() == 0:
22         break
23     print(q.get())
24 
25 print("".center(50, "="))
26 
27 q = queue.PriorityQueue()
28 q.put((2, "Profhua"))
29 q.put((1, "Breakering"))
30 q.put((3, "Wolf"))
31 
32 while True:
33     if q.qsize() == 0:
34         break
35     print(q.get())

View Code

11.生产者消费者模型

在并发编程中使用生产者和消费者模式能够解决绝大多数并发问题。该模式通过平衡生产线程和消费线程的工作能力来提高程序的整体处理数据的速度。

为什么要使用生产者和消费者模式

在线程世界里，生产者就是生产数据的线程，消费者就是消费数据的线程。在多线程开发当中，如果生产者处理速度很快，而消费者处理速度很慢，那么生产者就必须等待消费者处理完，才能继续生产数据。同样的道理，如果消费者的处理能力大于生产者，那么消费者就必须等待生产者。为了解决这个问题于是引入了生产者和消费者模式。

什么是生产者消费者模式

生产者消费者模式是通过一个容器来解决生产者和消费者的强耦合问题。生产者和消费者彼此之间不直接通讯，而通过阻塞队列来进行通讯，所以生产者生产完数据之后不用等待消费者处理，直接扔给阻塞队列，消费者不找生产者要数据，而是直接从阻塞队列里取，阻塞队列就相当于一个缓冲区，平衡了生产者和消费者的处理能力。

 1 import time
 2 import queue
 3 import threading
 4 q = queue.Queue(maxsize=10)
 5 
 6 def producer(name):
 7     count = 1
 8     while True:
 9         q.put("骨头%s" % count)
10         print("%s生产了骨头" % name, count)
11         count += 1
12         time.sleep(0.1)
13 
14 def consumer(name):
15     while True:
16         if q.qsize() > 0:
17             print("%s 正在吃 %s" % (name, q.get()))
18             time.sleep(1)
19 
20 p = threading.Thread(target=producer, args=("Breakering",))
21 c = threading.Thread(target=consumer, args=("Dog",))
22 c1 = threading.Thread(target=consumer, args=("Dog1",))
23 
24 p.start()
25 c.start()
26 c1.start()

三、进程

1.介绍

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote(远程的) concurrency(并发), effectively(有效) side-stepping(绕过，PS：猜测) the Global Interpreter Lock (全局解释器锁) by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage(利用) multiple processors on a given machine. It runs on both Unix and Windows.

usage:

 1 import multiprocessing
 2 import time
 3 
 4 def run(x):
 5     print("task:", x)
 6     time.sleep(2)
 7 
 8 if __name__ == '__main__':
 9     for i in range(10):  # 启动了10个进程
10         t = multiprocessing.Process(target=run, args=(i,))
11         t.start()

To show the individual process IDs involved, here is an expanded example:

 1 from multiprocessing import Process
 2 import os
 3 
 4 def info(title):
 5     print(title)
 6     print('module name:', __name__)
 7     print('parent process:', os.getppid())
 8     print('process id:', os.getpid())
 9     print("

")
10 
11 def f(name):
12     info('33[31;1mfunction f33[0m')
13     print('hello', name)
14 
15 if __name__ == '__main__':
16     info('33[32;1mmain process line33[0m')
17     p = Process(target=f, args=('bob',))
18     p.start()
19     p.join()

注：子进程都是由父进程创建的

2.进程间通讯

不同进程间内存是不共享的，要想实现两个进程间的数据交换，可以用以下方法：

Queues

 1 from multiprocessing import Process, Queue
 2 
 3 def f(q):
 4     q.put([1, "a", "A"])
 5 
 6 if __name__ == '__main__':
 7     q = Queue()
 8     p = Process(target=f, args=(q,))
 9     p.start()
10     p.join()
11     print(q.get())

Pipes

The Pipe() function returns a pair of(一对) connection objects connected by a pipe which by default is duplex (two-way). For example:

 1 from multiprocessing import Process, Pipe
 2 
 3 def f(conn):
 4     conn.send([42, None, 'hello from child'])
 5     conn.send([42, None, 'hello from child2'])
 6     print("from parent:", conn.recv())
 7     conn.close()
 8 
 9 if __name__ == '__main__':
10     parent_conn, child_conn = Pipe()
11     p = Process(target=f, args=(child_conn,))
12     p.start()
13     print(parent_conn.recv())  # prints "[42, None, 'hello']"
14     print(parent_conn.recv())  # prints "[42, None, 'hello']"
15     parent_conn.send("张洋可好")  # prints "[42, None, 'hello']"
16     p.join()

The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.

注：这句话就是管道对象返回两个端口，如果同时读取和写入管道的同一端，则管道的数据可能会损坏。当然，同时使用管道的不同端口是不会有任何风险的。

Managers

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate(操纵) them using proxies(代理).

A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array. For example:

 1 from multiprocessing import Process, Manager
 2 import os
 3 
 4 def f(d, l):
 5     d[os.getpid()] =os.getpid()
 6     l.append(os.getpid())
 7     print(l)
 8 
 9 if __name__ == '__main__':
10     with Manager() as manager:
11         d = manager.dict()  # {} # 生成一个字典，可在多个进程间共享和传递
12 
13         l = manager.list(range(5))  # 生成一个列表，可在多个进程间共享和传递
14         p_list = []
15         for i in range(10):
16             p = Process(target=f, args=(d, l))
17             p.start()
18             p_list.append(p)
19         for res in p_list:  # 等待结果
20             res.join()
21 
22         print(d)
23         print(l)

进程同步

Without using the lock output from the different processes is liable(有倾向的) to get all mixed up.

 1 from multiprocessing import Process, Lock
 2 
 3 def f(l, i):
 4     l.acquire()  # 锁上
 5     print('hello world', i)
 6     l.release()  # 释放锁
 7 
 8 if __name__ == '__main__':
 9     lock = Lock()  # 生成一个锁
10 
11     for num in range(100):
12         Process(target=f, args=(lock, num)).start()

3.进程池

进程池内部维护一个进程序列，当使用时，则去进程池中获取一个进程，如果进程池序列中没有可供使用的进进程，那么程序就会等待，直到进程池中有可用进程为止。

进程池中有两个方法：

apply
apply_async

 1 from multiprocessing import Process, Pool
 2 import time
 3 import os
 4 
 5 def Foo(i):
 6     time.sleep(2)
 7     print("in process", os.getpid())
 8     return i + 100
 9 
10 def Bar(arg):
11     print('-->exec done:', arg, os.getpid())
12 
13 if __name__ == '__main__':
14     # freeze_support()
15     pool = Pool(processes=3)  # 允许进程池同时放入5个进程
16     print("主进程", os.getpid())
17     for i in range(10):
18         pool.apply_async(func=Foo, args=(i,), callback=Bar)  # callback=回调
19         # pool.apply(func=Foo, args=(i,)) #串行
20         # pool.apply_async(func=Foo, args=(i,)) #串行
21     print('end')
22     pool.close()
23     pool.join()  # 进程池中进程执行完毕后再关闭，如果注释，那么程序直接关闭。.join()

四、总结

线程是执行的指令集，进程是资源的集合；
线程的启动速度要比进程的启动速度要快；
两个线程的执行速度是一样的；
进程与线程的运行速度是没有可比性的；
线程共享创建它的进程的内存空间，进程的内存是独立的；
两个线程共享的数据都是同一份数据，两个子进程的数据不是共享的，而且数据是独立的;
同一个进程的线程之间可以直接交流，同一个主进程的多个子进程之间是不可以进行交流，如果两个进程之间需要通信，就必须要通过一个中间代理来实现;
一个新的线程很容易被创建，一个新的进程创建需要对父进程进行一次克隆；
一个线程可以控制和操作同一个进程里的其他线程，线程与线程之间没有隶属关系，但是进程只能操作子进程；
改变主线程，有可能会影响到其他线程的行为，但是对于父进程的修改是不会影响子进程。