并发编程之多线程

一，什么是线程

　　在传统操作系统中，每个进程有一个地址空间，而且默认就有一个控制线程

　　线程顾名思义，就是一条流水线工作的过程（流水线的工作需要电源，电源就相当于cpu），而一条流水线必须属于一个车间，一个车间的工作过程是一个进程，车间负责把资源整合到一起，是一个资源单位，而一个车间内至少有一条流水线。

　　所以，进程只是用来把资源集中到一起（进程只是一个资源单位，或者说资源集合），而线程才是cpu上的执行单位。

　　多线程（即多个控制线程）的概念是，在一个进程中存在多个线程，多个线程共享该进程的地址空间，相当于一个车间内有多条流水线，都共用一个车间的资源。例如，北京地铁与上海地铁是不同的进程，而北京地铁里的13号线是一个线程，北京地铁所有的线路共享北京地铁所有的资源，比如所有的乘客可以被所有线路拉。

二，线程和进程的区别

Threads share the address space of the process that created it; processes have their own address space.
Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.
Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.
New threads are easily created; new processes require duplication of the parent process.
Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.
Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.

    1，线程共享创建它的进程的地址空间;进程有自己的地址空间。

    2，线程可以直接访问其进程的数据段;进程有自己的父进程数据段的副本。

    3，线程可以直接与进程的其他线程通信;进程必须使用进程间通信来与同胞进程通信。

    4，新线程很容易创建;新进程需要父进程的重复。

    5，线程可以对相同进程的线程进行相当大的控制;进程只能对子进程进行控制。

    6，对主线程的更改(取消、优先级更改等)可能会影响进程的其他线程的行为;对父进程的更改不会影响子进程。

总结上述区别，无非两个关键点，这也是我们在特定的场景下需要使用多线程的原因：

同一个进程内的多个线程共享该进程内的地址资源
创建线程的开销要远小于创建进程的开销（创建一个进程，就是创建一个车间，涉及到申请空间，而且在该空间内建至少一条流水线，但创建线程，就只是在一个车间内造一条流水线，无需申请空间，所以创建开销小）

三，多线程应用举例

　　开启一个字处理软件进程，该进程肯定需要办不止一件事情，比如监听键盘输入，处理文字，定时自动将文字保存到硬盘，这三个任务操作的都是同一块数据，因而不能用多进程。只能在一个进程里并发地开启三个线程,如果是单线程，那就只能是，键盘输入时，不能处理文字和自动保存，自动保存时又不能输入和处理文字。

四，threading模块介绍

　　multiprocess模块的完全模仿了threading模块的接口，二者在使用层面，有很大的相似性，因而不再详细介绍

五，开启线程的两种方式

方式一：

import time
import random
from threading import Thread

def study(name):
    print("%s is learning"%name)
    time.sleep(random.randint(1,3))
    print("%s is playing" % name)

if __name__ == '__main__':
    t = Thread(target=study,args=('james',))
    t.start()
    print("主线程开始运行....")

　　结果：

james is learning
主线程开始运行....
james is playing

方式二：

from threading import Thread
import time

class MyThread(Thread):
    def __init__(self,name):
        super().__init__()
        self.name = name

    def run(self):
        print('%s is learning' % self.name)
        time.sleep(2)
        print('%s is playing'%self.name)

if __name__ == '__main__':
    t1 = MyThread('james')
    t1.start()
    print("主线程开始运行....")

练习题

1、基于多线程实现并发的套接字通信

客户端

# _*_ coding: utf-8 _*_ 
from socket import *

ip_port = ('127.0.0.1',9999)
client = socket(AF_INET,SOCK_STREAM)
client.connect(ip_port)

while True:
    cmd = input(">>>").strip()
    if not cmd:
        continue
    client.send(cmd.encode('utf-8'))
    data = client.recv(1024)
    print(data.decode('utf-8'))
client.close()

服务端

import multiprocessing
import threading
import socket

ip_port = ('127.0.0.1',9999)
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.bind(ip_port)
s.listen(5)

def action(conn):
    while True:
        data = conn.recv(1024)
        print(data)
        conn.send(data.upper())
if __name__ == '__main__':
    while True:
        conn,addr = s.accept()

        p = threading.Thread(target=action,args=(conn,))
        p.start()

2、编写一个简单的文本处理工具，具备三个任务，一个接收用户输入，一个将用户输入的内容格式化成大写，一个将格式化后的结果存入文件

# _*_ coding: utf-8 _*_ 
# 练习二：三个任务，一个接收用户输入，一个将用户输入的内容格式
# 化成大写，一个将格式化后的结果存入文件

from threading import Thread

msg_l = []
format_l = []
def talk():
    while True:
        msg = input(">>>").strip()
        if not msg:
            break
        msg_l.append(msg)

def format_msg():
    while True:
        if msg_l:
            res = msg_l.pop()
            format_l.append(res.upper())

def save():
    while True:
        if format_l:
            with open('db.txt','a',encoding='utf-8') as f:
                res = format_l.pop()
                f.write('%s
'%res)

if __name__ == '__main__':
    t1 = Thread(target=talk)
    t2 = Thread(target=format_msg)
    t3 = Thread(target=save)

    t1.start()
    t2.start()
    t3.start()

六，多线程与多进程的区别

6.1谁的开启速度快？（说明开进程的开销远远大于开线程，因为进程要申请内存空间）

在主进程下开启线程

import time
import random
from multiprocessing import Process
from threading import Thread

def study(name):
    print("%s is learning"%name)
    time.sleep(random.randint(1,3))
    print("%s is playing" % name)

if __name__ == '__main__':
    t = Process(target=study,args=('james',))
    t.start()
    print("主进程程开始运行....")

执行结果如下，几乎是t.start ()的同时就将线程开启了，然后先打印出了主进程程开始运行....，证明线程的创建开销极小

主进程程开始运行....
james is learning
james is playing

在主进程下开启子进程

import time
import random
from multiprocessing import Process
from threading import Thread

def study(name):
    print("%s is learning"%name)
    time.sleep(random.randint(1,3))
    print("%s is playing" % name)

if __name__ == '__main__':

    t = Thread(target=study,args=('james',))
    t.start()
    print("主线程开始运行....")

　执行结果如下，p.start ()将开启进程的信号发给操作系统后，操作系统要申请内存空间，让好拷贝父进程地址空间到子进程，开销远大于线程

james is learning
主线程开始运行....
james is playing

6.2 开一下PID

在主进程下开启多个线程，每个线程都跟主进程的pid一样（线程共享主进程的Pid）

from threading import Thread
import os

def work():
    print('hello',os.getpid())

if __name__ == '__main__':
    t1=Thread(target=work)
    t2=Thread(target=work)
    t1.start()
    t2.start()
    print('主线程/主进程pid',os.getpid())

　　执行结果：

hello 7939
hello 7939
主线程/主进程 7939

开多个进程，每个进程都有不同的pid

from multiprocessing import Process
import os

def work():
    print('hello',os.getpid())

if __name__ == '__main__':
    p1=Process(target=work)
    p2=Process(target=work)
    p1.start()
    p2.start()
    print('主线程/主进程',os.getpid())

　　执行结果：

主线程/主进程 7951
hello 7952
hello 7953

6.3，同一进程内的线程共享该进程的数据

进程之间地址空间是隔离的

from multiprocessing import Process
import os

def work():
    global n
    n=0

if __name__ == '__main__':
    n=100
    p=Process(target=work)
    p.start()
    p.join()
    print('主',n)

　　执行结果如下，毫无疑问子进程p已经将自己的全局的n改成了0,但改的仅仅是它自己的,查看父进程的n仍然为100

主 100

同一进程内开启的多个线程是共享该进程地址空间的

from threading import Thread
import os

def work():
    global n
    n=0

if __name__ == '__main__':
    n=100
    t=Thread(target=work)
    t.start()
    t.join()
    print('主',n)

　　执行结果如下，查看结果为0,因为同一进程内的线程之间共享进程内的数据

主 0

七，Thread对象的其他属性或方法

介绍

Thread实例对象的方法
  # isAlive(): 返回线程是否活动的。
  # getName(): 返回线程名。
  # setName(): 设置线程名。

threading模块提供的一些方法：
  # threading.currentThread(): 返回当前的线程变量。
  # threading.enumerate(): 返回一个包含正在运行的线程的list。正在运行指线程启动后、
结束前，不包括启动前和终止后的线程。
  # threading.activeCount(): 返回正在运行的线程数量，与len(threading.enumerate())有相同的结果。

验证

from threading import Thread
from threading import current_thread
import time

def task():
    print("%s is running"%current_thread().getName())
    time.sleep(1)
    print("%s is done" % current_thread().getName())

if __name__ =='__main__':
    #没有子线程这个概念，只是为了理解方便
    t = Thread(target=task,name='子线程1')
    t.start()
    t.setName('儿子线程1')
    print("主线程 %s" % current_thread().getName())

　　结果：

主线程 MainThread
子线程1 is running
儿子线程1 is done

from threading import Thread
import threading
from multiprocessing import Process
import os

def work():
    import time
    time.sleep(3)
    print(threading.current_thread().getName())


if __name__ == '__main__':
    #在主进程下开启线程
    t=Thread(target=work)
    t.start()

    print(threading.current_thread().getName())
    print(threading.current_thread()) #主线程
    print(threading.enumerate()) #连同主线程在内有两个运行的线程
    print(threading.active_count())
    print('主线程/主进程')

    '''
    打印结果:
    MainThread
    <_MainThread(MainThread, started 140735268892672)>
    [<_MainThread(MainThread, started 140735268892672)>, <Thread(Thread-1, started 123145307557888)>]
    主线程/主进程
    Thread-1
    '''

　　主线程等待子线程结束

from threading import Thread
import time
def sayhi(name):
    time.sleep(2)
    print('%s say hello' %name)

if __name__ == '__main__':
    t=Thread(target=sayhi,args=('james',))
    t.start()
    t.join()
    print('主线程')
    print(t.is_alive())
    '''
    james say hello
    主线程
    False
    '''

八，守护线程

无论是进程还是线程都遵循：守护XXXX会等到主XXXX运行完毕后被销毁

需要强调的是：运行完毕并非终止运行

    对于主线程来讲：运行完毕指的是主线程所在的进程内所有非守护线程统统运行完毕，主线程才算运行完毕

    对于主进程来讲，运行完毕指的是主进程代码运行完毕

　　详细解释：

    1 主进程在其代码结束后就已经算运行完毕了（守护进程在此时就被回收）,然后主进程会一直等
非守护的子进程都运行完毕后回收子进程的资源(否则会产生僵尸进程)，才会结束，

    2 主线程在其他非守护线程运行完毕后才算运行完毕（守护线程在此时就被回收）。因为主线程的
结束意味着进程的结束，进程整体的资源都将被回收，而进程必须保证非守护线程都运行完毕后才能结束。

from threading import Thread
import time

def task(name):
    time.sleep(1)
    print("%s is working"%name)

if __name__ == '__main__':
    t = Thread(target=task,args=('james',))
    #t.setDaemon(True)#必须在t,start()之前设置，和t.daemon是一样的作用
    t.daemon =True
    t.start()
    print("主线程")
    print(t.is_alive())

练习：思考下面代码的执行结果有可能是那些情况？为什么？

from threading import Thread
import time

def task(name):
    print('task is running')
    time.sleep(1)
    print("%s is working"%name)

def play(name):
    print('play is running')
    time.sleep(1)
    print("%s is playing"%name)

if __name__ == '__main__':
    t1 = Thread(target=task,args=('james',))
    t2 = Thread(target=play, args=('durant',))
    #t.setDaemon(True)#必须在t,start()之前设置，和t.daemon是一样的作用
    t1.daemon =True
    t1.start()
    t2.start()
    print("主线程")

　　结果：

task is running
play is running
主线程
durant is playing
james is working

九，GIL全局解释器锁

具体内容见链接：https://www.cnblogs.com/wj-1314/p/9056555.html

十，线程Queue

　　queue is especially useful in threaded programming when information must be exchanged safely between multiple threads.

线程queue有三种不同的用法

1，class queue.Queue(maxsize=0) #队列：先进先出

import queue

q=queue.Queue()
q.put('first')
q.put('second')
q.put('third')

print(q.get())
print(q.get())
print(q.get())



'''
结果(先进先出):
first
second
third
'''

class queue.LifoQueue(maxsize=0) #堆栈：last in fisrt out

import queue

q=queue.LifoQueue()
q.put('first')
q.put('second')
q.put('third')

print(q.get())
print(q.get())
print(q.get())



'''
结果(后进先出):
third
second
first
'''

class queue.PriorityQueue(maxsize=0) #优先级队列：存储数据时可设置优先级的队列

import queue

q=queue.PriorityQueue()
#put进入一个元组,元组的第一个元素是优先级(通常是数字,也可以是非数字之间的比较),数字越小优先级越高
q.put((20,'a'))
q.put((10,'b'))
q.put((30,'c'))

print(q.get())
print(q.get())
print(q.get())



'''
结果(数字越小优先级越高,优先级高的优先出队):
(10, 'b')
(20, 'a')
(30, 'c')
'''

其他：

Constructor for a priority queue. maxsize is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If maxsize is less than or equal to zero, the queue size is infinite.

The lowest valued entries are retrieved first (the lowest valued entry is the one returned by sorted(list(entries))[0]). A typical pattern for entries is a tuple in the form: (priority_number, data).

exception queue.Empty
Exception raised when non-blocking get() (or get_nowait()) is called on a Queue object which is empty.

exception queue.Full
Exception raised when non-blocking put() (or put_nowait()) is called on a Queue object which is full.

Queue.qsize()
Queue.empty() #return True if empty  
Queue.full() # return True if full 
Queue.put(item, block=True, timeout=None)
Put item into the queue. If optional args block is true and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Full exception if no free slot was available within that time. Otherwise (block is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (timeout is ignored in that case).

Queue.put_nowait(item)
Equivalent to put(item, False).

Queue.get(block=True, timeout=None)
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).

Queue.get_nowait()
Equivalent to get(False).

Two methods are offered to support tracking whether enqueued tasks have been fully processed by daemon consumer threads.

Queue.task_done()
Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete.

If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).

Raises a ValueError if called more times than there were items placed in the queue.

Queue.join() block直到queue被消费完毕

十一，进程池与线程池

　　在刚开始学多进程或多线程时，我们迫不及待地基于多进程或多线程实现并发的套接字通信，然而这种实现方式的致命缺陷是：服务的开启的进程数或线程数都会随着并发的客户端数目地增多而增多，这会对服务端主机带来巨大的压力，甚至于不堪重负而瘫痪，于是我们必须对服务端开启的进程数或线程数加以控制，让机器在一个自己可以承受的范围内运行，这就是进程池或线程池的用途，例如进程池，就是用来存放进程的池子，本质还是基于多进程，只不过是对开启进程的数目加上了限制

1，Python标准模块-concurrent.futures：https://docs.python.org/dev/library/concurrent.futures.html

1 -1介绍

concurrent.futures模块提供了高度封装的异步调用接口

ThreadPoolExecutor：线程池，提供异步调用

ProcessPoolExecutor: 进程池，提供异步调用

Both implement the same interface, which is defined by the abstract Executor class.

1-2 基本方法

submit(fn, *args, **kwargs)
异步提交任务

map(func, *iterables, timeout=None, chunksize=1) 
取代for循环submit的操作

shutdown(wait=True) 
相当于进程池的pool.close()+pool.join()操作

wait=True，等待池内所有任务执行完毕回收完资源后才继续

wait=False，立即返回，并不会等待池内的任务执行完毕
但不管wait参数为何值，整个程序都会等到所有任务执行完毕
submit和map必须在shutdown之前

result(timeout=None)
取得结果

add_done_callback(fn)
回调函数

2，进程池

介绍：

The ProcessPoolExecutor class is an Executor subclass that uses a pool of 
processes to execute calls asynchronously. ProcessPoolExecutor uses the 
multiprocessing module, which allows it to side-step the Global Interpreter
 Lock but also means that only picklable objects can be executed and returned.

class concurrent.futures.ProcessPoolExecutor(max_workers=None, mp_context=None)
An Executor subclass that executes calls asynchronously using a pool of at most max_workers processes. If max_workers is None or not given, it will default to the number of processors on the machine. If max_workers is lower or equal to 0, then a ValueError will be raised.

用法：

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor

import os,time,random
def task(n):
    print('%s is runing' %os.getpid())
    time.sleep(random.randint(1,3))
    return n**2

if __name__ == '__main__':

    executor=ProcessPoolExecutor(max_workers=3)

    futures=[]
    for i in range(11):
        future=executor.submit(task,i)
        futures.append(future)
    executor.shutdown(True)
    print('+++>')
    for future in futures:
        print(future.result())

3，线程池

介绍：

ThreadPoolExecutor is an Executor subclass that uses a pool of threads
 to execute calls asynchronously.
class concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_
name_prefix='')
An Executor subclass that uses a pool of at most max_workers threads to 
execute calls asynchronously.

Changed in version 3.5: If max_workers is None or not given, it will default 
to the number of processors on the machine, multiplied by 5, assuming that 
ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the
 number of workers should be higher than the number of workers for ProcessPoolExecutor.

New in version 3.6: The thread_name_prefix argument was added to allow
 users to control the threading.Thread names for worker threads created by
 the pool for easier debugging.

用法：

把ProcessPoolExecutor换成ThreadPoolExecutor，其余用法全部相同

4，map方法

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor

import os,time,random
def task(n):
    print('%s is runing' %os.getpid())
    time.sleep(random.randint(1,3))
    return n**2

if __name__ == '__main__':

    executor=ThreadPoolExecutor(max_workers=3)

    # for i in range(11):
    #     future=executor.submit(task,i)

    executor.map(task,range(1,12)) #map取代了for+submit

5，回调函数

　　可以为进程池或线程池内的每个进程或线程绑定一个函数，该函数在进程或线程的任务执行完毕后自动触发，并接收任务的返回值当作参数，该函数称为回调函数

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
from multiprocessing import Pool
import requests
import json
import os

def get_page(url):
    print('<进程%s> get %s' %(os.getpid(),url))
    respone=requests.get(url)
    if respone.status_code == 200:
        return {'url':url,'text':respone.text}

def parse_page(res):
    res=res.result()
    print('<进程%s> parse %s' %(os.getpid(),res['url']))
    parse_res='url:<%s> size:[%s]
' %(res['url'],len(res['text']))
    with open('db.txt','a') as f:
        f.write(parse_res)


if __name__ == '__main__':
    urls=[
        'https://www.baidu.com',
        'https://www.python.org',
        'https://www.openstack.org',
        'https://help.github.com/',
        'http://www.sina.com.cn/'
    ]

    p=ProcessPoolExecutor(3)
    for url in urls:
        p.submit(get_page,url).add_done_callback(parse_page) 
#parse_page拿到的是一个future对象obj，需要用obj.result()拿到结果