Python网络编程之线程，进程

　　一. 线程：

　　　　　　基本使用

　　　　　　线程锁

　　　　　　线程池

　　　　　　队列（生产者消费者模型）

　　二. 进程：

　　　　　基本使用

　　　　　进程锁

进程池

进程数据共享

三. 协程：

　　　　　　gevent

　　　　　　greenlet

四. 缓存：

　　　　　　memcache

　　（一）线程：

　　　　　　　所有的线程都运行于一个进程中，一个进程中可以执行多个线程。多个线程共享进程内的资源。所以可以将线程可以看成是共享同一虚拟内存以及其他属性的进程。

　　　　　　　Threading用于提供线程相关的操作，线程是应用程序中工作的最小单元。

　　　　　　　Thread(target=None, name=None, args=(), kwargs={}) : 创建一个新的线程实例

target：可调用对象，线程启动时，run()将调用此对象，默认为None

name: 线程名

args: 传递给target的参数元组

kwargs: 传递给target的关键字参数字典

Thread的实例：

t.start: #线程准备就绪，等待CPU调度
t.run: #线程被cpu调度后自动执行线程对象的run方法
t.join([timeout]): #等待直到线程终止或者出现超时为止。
t.is_alive(): #如果线程是活动的。返回True，否则返回False
t.name: #线程名称
t.daemon: #线程的布尔型后台标志,必须在调用start()方法之前设置这个标志

###以线程的形式创建和启动一个函数：

 1 import threading
 2 import time
 3 
 4 def clock(interval):
 5     while True:
 6         print("The time is %s" % time.ctime())
 7         time.sleep(interval)
 8 
 9 t = threading.Thread(target=clock,args=(5,))
10 #t.daemon = True
11 t.start()
12 
13 The time is Sat Jul 23 02:08:58 2016
14 The time is Sat Jul 23 02:09:03 2016
15 The time is Sat Jul 23 02:09:08 2016

###将同一个线程定义为一个类：

 1 import threading
 2 import time
 3 
 4 class ClockThread(threading.Thread):
 5     def __init__(self,interval):
 6         threading.Thread.__init__(self)
 7         self.interval = interval
 8     def run(self):
 9         while True:
10             print("The time is %s" % time.ctime())
11             time.sleep(self.interval)
12 t = ClockThread(5)
13 t.start()
14 
15 
16 The time is Sat Jul 23 02:15:48 2016
17 The time is Sat Jul 23 02:15:53 2016
18 The time is Sat Jul 23 02:15:58 2016

　　Timer对象：

　　　　　　定时器，在某个时间执行某个函数

　　　　格式：

　　　　　　Timer(interval, func [,args [, kwargs]])

　　　　对象：

　　　　　p.start(): 启动定时器

　　　　　p.cancel(): 如果函数尚未执行，取消定时器

 1 from threading import Timer
 2  
 3  
 4 def hello():
 5     print("hello, world")
 6  
 7 t = Timer(3, hello)
 8 t.start()  #3s后执行函数显示"hello，word"
 9 
10 hello, world

　　信号量与有边界的信号量(semaphore)：

　　　　　互斥锁同时只允许一个线程更改数据，而Semaphore是同时允许一定数量的线程更改数据，每次调用acquire()方法时此计数器减1，每次调用release()方法时此计数器加1，如果计数器为0，acquire方法将会阻塞，直到其他线程调用release方法为止。比如厕所有3个坑，那最多只允许3个人上厕所，后面的人只能等里面有人出来了才能再进去。

　　Semaphore([value]) ：创建一个信号量，value为初始值，省略时，默认为1

　　　　p.acquire([blocking]):获取信号量

　　　　p.release() :通过将内部计数器值加1来释放一个信号量。

　　BoundedSemaphore([value]): 创建一个新的信号机，与Semaphore的工作方式完全相同，但是release()操作的次数不能超过acquire()操作次数

注：信号机与互斥锁的差别在于：

　　　　　　信号机可用于发射信号，可以从不同线程调用以上两个方法。

 1 import threading,time
 2 
 3 def run(n):
 4     semaphore.acquire()
 5     time.sleep(1)
 6     print("run the thread: %s" %n)
 7     semaphore.release()
 8 
 9 if __name__ == '__main__':
10 
11     num= 0
12     semaphore  = threading.BoundedSemaphore(5) #最多允许5个线程同时运行
13     for i in range(10):
14         t = threading.Thread(target=run,args=(i,))
15         t.start()
16 
17 
18 run the thread: 0
19 run the thread: 4
20 run the thread: 3
21 run the thread: 2
22 run the thread: 1
23 run the thread: 5
24 run the thread: 9
25 run the thread: 8
26 run the thread: 7
27 run the thread: 6

　　事件(Event):

　　　　用于在线程之间通信。一个线程发出“事件”信号，一个或多个其它线程等待，Event实例管理者一个内部标志，可以使用set()方法将它置为True，clear()置为Flase, wait()方法将阻塞直到标志位True.

　　　　Event()

e.is_set(): 当内部标志位Ture时才返回True

e.set(): 将内部标志设置为True。等待它变为True的所有线程都将被唤醒

e.clear(): 将内部标志重置为False

e.wait([timeout]): 阻塞直到内部标志位True。

 1 import threading
 2  
 3  
 4 def do(event):
 5     print 'start'
 6     event.wait()
 7     print 'execute'
 8  
 9  
10 event_obj = threading.Event()
11 for i in range(5):
12     t = threading.Thread(target=do, args=(event_obj,))
13     t.start()
14  
15 event_obj.clear()
16 inp = raw_input('input:')
17 if inp == 'true':
18     event_obj.set()
19 
20 
21 start
22 start
23 start
24 start
25 start
26 input:true
27 execute
28 execute
29 execute
30 execute
31 execute

　　条件(Condition):

　　　　使得线程等待，只有满足某条件时，才释放n个线程

　　　　Condition([lock]) :创建一个条件变量,lock为可选的Lock或RLock实例，为指定则创建新的RLock实例供条件变量使用。

　　c.acquire(*args): 获取底层锁定

　　c.release(): 释放底层锁定

　　c.wait([timeout]): 等待直到获得通知或出现超时为止

　　c.notify([n]) : 唤醒一个或多个等待此条件变量的线程。

　　c.notify_all(): 唤醒所有等待此条件的线程。

 1 import threading
 2 
 3 def condition_func():
 4 
 5     ret = False
 6     inp = input('>>>')
 7     if inp == '1':
 8         ret = True
 9 
10     return ret
11 
12 
13 def run(n):
14     con.acquire()
15     con.wait_for(condition_func)
16     print("run the thread: %s" %n)
17     con.release()
18 
19 if __name__ == '__main__':
20 
21     con = threading.Condition()
22     for i in range(10):
23         t = threading.Thread(target=run, args=(i,))
24         t.start()
25 
26 >>>1
27 run the thread: 0
28 >>>1
29 run the thread: 1
30 >>>1
31 run the thread: 2
32 >>>1
33 run the thread: 3

　　线程池：

　　　　线程池是一个存放很多线程的单位，同时还有一个对应的任务队列。整个执行过程其实就是使用线程池中已有有限的线程把任务队列中的任务做完。

 1 import queue,threading,time
 2 
 3 class ThreadPool:
 4     def __init__(self,maxsize = 5):
 5         self.maxsize = maxsize
 6         self._q = queue.Queue(maxsize)
 7         for i in range(maxsize):
 8             self._q.put(threading.Thread)
 9 
10     def get_thread(self):
11         return self._q.get()
12 
13     def add_thread(self):
14         self._q.put(threading.Thread)
15 
16 pool = ThreadPool(5)
17 
18 def task(arg, p):
19     print(arg)
20     time.sleep(1)
21     p.add_thread()
22 
23 for i in range(10):
24     #threading.Thread类
25     t = pool.get_thread()
26     obj = t(target = task, args = (i,pool))
27     obj.start()
28 
29 from threading import Timer
30 def hello():
31     print("hello,word")
32 t = Timer(3,hello)
33 t.start()
34 
35 
36 0
37 1
38 2
39 3
40 4
41 5
42 6
43 7
44 8
45 9
46 1
47 run the thread: 1
48 run the thread: 0
49 2
50 run the thread: 3
51 run the thread: 2
52 hello,word
53 3
54 run the thread: 4
55 4
56 5
57 6
58 7
59 8
60 9
61 10

View Code

　　队列：

　　　　队列是线程间最常用的交换数据的形式。queue模块是提供队列操作的模块,实现了各种多生产者，多使用者队列，可用于执行多个线程之间安全地交换信息。

　　queue模块定义了3种不同的队列类：

　　　　1. Queue([maxsize]): FIFO(先进先出)队列。maxsize为可放入项目的最大量。不设置或者为0时，队列无穷大。

　　　　2. LifoQueue([maxsize]): LIFO(后进先出)队列。也叫栈。

　　　　3. PriorityQueue([maxsize]): 优先级队列，项目按优先级从低到高排列，格式为(priority, data)形式的元组， priority为一个数字。

 1  实例如下：
 2   
 3  1 q.qsize(): #返回队列的正确大小
 4  2 q.empty(): #如果队列为空，则返回True
 5  3 q.full()：#如果队列已满，返回True
 6  4 q.put(item [, block [, timeout): #将item放入队列. block，调用者将被阻塞直到队列中有可用的空闲位置即可。
 7  5 q.put_nowait(item): #与q.put没什么差别
 8  6 q.get([block [, timeout]]):3 从队列中删除一项，然后返回这个项目
 9  7 q.get_nowait():#相当于get(0)
10 8 q.task_done(): 队列中数据的使用者用来指示对于项目的处理意见结束。
11 9 q.join(): 阻塞直到队列中的所有项目均被删除和处理为止。

案例：　

（先进先出）

 1 import queue
 2 q = queue.Queue(2)
 3 print(q.empty())
 4 q.put(11)
 5 q.put(22)
 6 print(q.empty())
 7 print(q.qsize())
 8 print(q.get())
 9 print(q.get())
10 q.put(33,block=False)
11 q.put(33,block=False,timeout=2)
12 print(q.get(timeout=2))
13 
14 q = queue.Queue(5)
15 
16 q.put(123)
17 q.put(456)
18 print(q.get())
19 q.task_done()
20 print(q.get())
21 q.task_done()
22 q.join()

1 #queue.LifoQueue, #后进先出队列
2 
3 q = queue.LifoQueue()
4 q.put(123)
5 q.put(456)
6 print(q.get())

1 # queue.PriorityQueue，优先级队列
2 
3 # q = queue.PriorityQueue()
4 # q.put((8, 'hong'))
5 # q.put((2, 345))
6 # q.put((3, 678))
7 # print(q.get())

1 # queue.deque，双向对队
2 
3 # q = queue.deque()
4 # q.append(123)
5 # q.append(333)
6 # q.appendleft(555)
7 #
8 # print(q.pop())
9 # print(q.popleft())

　　生产者与消费者模型：

　　　　　生产者的工作是产生一块数据，放到buffer中，如此循环。与此同时，消费者在消耗这些数据（例如从buffer中把它们移除），每次一块。这里的关键词是“同时”。所以生产者和消费者是并发运行的，我们需要对生产者和消费者做线程分离。　　

 1 import queue
 2 import threading
 3 import time
 4 q = queue.Queue()
 5 
 6 def productor(arg):
 7     """
 8     买票
 9     :param arg:
10     :return:
11     """
12     q.put(str(arg) + '- 买票')
13 
14 for i in range(20):
15     t = threading.Thread(target=productor,args=(i,))
16     t.start()

　　（二）进程：

　　　　　　　　进程是程序的一次执行，每个进程都有自己的地址空间，内存，数据栈。创建进程的时候，内核会为进程分配一定的资源，并在进程存活的时候不断进行调整，比如内存，进程创建的时候会占有一部分内存。进程结束的时候资源会释放出来，来让其他资源使用。我们可以把进程理解为一种容器，容器内的资源可多可少，但是只能进程间通信，不能共享信息。

　　谈到进程则要用到的就是multiprocessing模块，这个模块的所有功能基本都是在进程上的。

　　　　　　定义一个类运行一个进程：

　　　　　　 process([,target [,name [,args [,kwargs]]]])

target: 当进程启动时执行的可调用对象

name: 为进程执行描述性名称的字符串

args: 位置参数，元组

kwargs: 位置参数，字典

通过这个构造函数简单构造了一个process进程。

　　进程（process）实例：

p.is_alive() #如果p仍然运行，返回True
p.join([timeout]) #等待进程p终止，timeout是可选的超时期限。进程可被连接无数次，但连接自身时则会报错
p.run()# 启动进程时运行的方法，可调用target。
p.start() #启动进程，代表进程的子进程，并调用p.run()函数
p.terminate()#强制终止进程。进程p被立即终止，而不会进行清理，慎用。

单进程实例：

import multiprocessing
import time

def clock(interval):
    while True:
        print("The time is %s" % time.ctime())
        time.sleep(interval)
if __name__ == '__main__':
    p = multiprocessing.Process(target=clock, args=(5,))
    p.start()

The time is Fri Jul 22 17:15:45 2016
The time is Fri Jul 22 17:15:50 2016
The time is Fri Jul 22 17:15:55 2016

将上面的进程定义为继承自Process的类，目的为为了实现跨平台的可移植性，必须有主程序创建进程。

 1 import multiprocessing
 2 import time
 3 
 4 class ClockProcess(multiprocessing.Process):
 5     def __init__(self, interval):
 6         multiprocessing.Process.__init__(self)
 7         self.interval = interval
 8 
 9     def run(self):
10         while True:
11             print("The time is %s" % time.ctime())
12             time.sleep(self.interval)
13 if __name__ == '__main__':
14     p = ClockProcess(5)
15     p.start()
16 
17 The time is Fri Jul 22 17:25:08 2016
18 The time is Fri Jul 22 17:25:13 2016
19 The time is Fri Jul 22 17:25:18 2016

进程锁：

　　　　　当多个进程需要访问共享资源的时候，Lock可以用来避免访问的冲突。

 1 import multiprocessing
 2 import sys
 3 
 4 def worker_with(lock, f):
 5     with lock:
 6         fs = open(f,"a+")
 7         fs.write('Lock acquired via 
')
 8         fs.close()
 9 
10 def worker_no_with(lock, f):
11     lock.acquire()
12     try:
13         fs = open(f,"a+")
14         fs.write('Lock acquired directly
')
15         fs.close()
16     finally:
17         lock.release()
18 
19 if __name__ == "__main__":
20 
21     f = "file.txt"
22 
23     lock = multiprocessing.Lock()
24     w = multiprocessing.Process(target=worker_with, args=(lock, f))
25     nw = multiprocessing.Process(target=worker_no_with, args=(lock, f))
26 
27     w.start()
28     nw.start()
29 
30     w.join()
31     nw.join()
32 
33 
34 #cat file.txt
35 
36 Lock acquired directly
37 Lock acquired via

注：如果两个进程没有使用lock来同步,则他们对同一个文件的写操作可能会出现混乱。

　　进程池：

　　　　　进程池内部维护一个进程序列，当使用时，则去进程池中获取一个进程，如果进程池序列中没有可供使用的进进程，那么程序就会等待，直到进程池中有可用进程为止。

　　　　创建一个进程池：

　　　　　　　　Pool([numprocess [,initializer [, initargs]]])

numprocess: 要创建的进程数

initlalizer: 每个工作进程启动时要执行的可调用对象，默认为None

initargs：传递给initlalizer的元组

　　　　Pool的实例：

p.apply(func, [, args [, kwargs]])#在一个池工作进程中执行函数(**args, **kwargs),然后返回结果，不能再池中并行执行，可使用apply_async
p.apply_async(func, [, args [, kwargs [,callback]]])#在一个池工作进程中异步执行函数(**args, **kwargs),然后返回结果，传递给callback。
p.terminate()#立即终止
p.close()# 关闭进程池
p.join()# 等待所有工作进程退出

案例：

 1 from multiprocessing import Pool
 2 import time
 3 
 4 def f1(arg):
 5     time.sleep(3)
 6     print(arg)
 7 
 8 if __name__ == '__main__':
 9     pool = Pool(5) #并发执行5个函数
10 
11     for i in range(15):
12         #pool.apply(func=f1,args=(i,))#不能并发的执行函数
13         pool.apply_async(func=f1,args=(i,))#可并发执行函数
14 
15     pool.close() #所有的任务执行完毕
16     time.sleep(3)
17     #pool.terminate()#立即终止
18     pool.join()

　　进程数据共享：

　　　　　　　　通常进程之间是完全孤立的，使用数据共享，可以访问多个进程。

　　　实现进程数据共享有两种方法：

 1 #方法一，Array
 2 
 3 from multiprocessing import Process
 4 from multiprocessing import Array
 5 
 6 def foo(i,arg):
 7     arg[i] = i + 100
 8     for item in arg:
 9         print(item)
10     print('============')
11 
12 if __name__ == '__main__':
13     li = Array('i',10)
14     for i in range(10):
15         p = Process(target=foo,args=(i,li,))
16         p.start()

 1 #方法二：manage.dict()共享数据
 2 
 3 from multiprocessing import Process
 4 from multiprocessing import Manager
 5 #
 6 def foo(i,arg):
 7     arg[i] = i + 100
 8     print(arg.values())
 9 
10 if __name__ == '__main__':
11     obj = Manager()
12     li = obj.dict()
13     for i in range(10):
14         p = Process(target=foo,args=(i,li,))
15         p.start()
16     import time
17     time.sleep(1)

　　线程锁(Lock, RLock):

　　　　由于线程之间是进行随机调度，并且每个线程可能只执行n条执行之后，当多个线程同时修改同一条数据时可能会出现脏数据，所以，出现了线程锁 - 同一时刻允许一个线程执行操作。

　　　　Lock()：创建新的Lock对象，初始化状态为非锁定

　　　　lock.acquire([blocking]): 获取锁定

　　　　lock.release(): 释放锁定

 1 import threading,time
 2 
 3 def run(n):
 4     semaphore.acquire()
 5     time.sleep(1)
 6     print("run the thread: %s" %n)
 7     semaphore.release()
 8 
 9 if __name__ == '__main__':
10 
11     num= 0
12     semaphore  = threading.BoundedSemaphore(2) #最多允许5个线程同时运行
13     for i in range(5):
14         t = threading.Thread(target=run,args=(i,))
15         t.start()
16 
17 
18 1
19 run the thread: 1
20 run the thread: 0
21 2
22 run the thread: 3
23 run the thread: 2
24 3
25 run the thread: 4
26 4
27 5
28 6
29 7
30 8
31 9
32 10

　　（三）协程：

　　　　　　　协程我们可以看成是一种用户空间的线程，利用一个线程，分解一个线程成为多个“微线程”　　

　　　　　　 Python通过yield提供了对协程的基本支持，但是不完全。而第三方的gevent为Python提供了比较完善的协程支持。

　　　　　　 gevent是第三方库，通过greenlet实现协程，其基本思想是：　　

　　　　　　　　　当一个greenlet遇到IO操作时，比如访问网络，就自动切换到其他的greenlet，等到IO操作完成，再在适当的时候切换回来继续执行。由于IO操作非常耗时，经常使程序处于等待状态，有了gevent为我们自动切换协程，就保证总有greenlet在运行，而不是等待IO。

由于切换是在IO操作时自动完成，所以gevent需要修改Python自带的一些标准库，这一过程在启动时通过monkey patch完成：

 1 from gevent import monkey; monkey.patch_socket()
 2 import gevent
 3 
 4 def f(n):
 5     for i in range(n):
 6         print gevent.getcurrent(), i
 7 
 8 g1 = gevent.spawn(f, 5)
 9 g2 = gevent.spawn(f, 5)
10 g3 = gevent.spawn(f, 5)
11 g1.join()
12 g2.join()
13 g3.join()
14 
15 
16 <Greenlet at 0x10e49f550: f(5)> 0
17 <Greenlet at 0x10e49f550: f(5)> 1
18 <Greenlet at 0x10e49f550: f(5)> 2
19 <Greenlet at 0x10e49f550: f(5)> 3
20 <Greenlet at 0x10e49f550: f(5)> 4
21 <Greenlet at 0x10e49f910: f(5)> 0
22 <Greenlet at 0x10e49f910: f(5)> 1
23 <Greenlet at 0x10e49f910: f(5)> 2
24 <Greenlet at 0x10e49f910: f(5)> 3
25 <Greenlet at 0x10e49f910: f(5)> 4
26 <Greenlet at 0x10e49f4b0: f(5)> 0
27 <Greenlet at 0x10e49f4b0: f(5)> 1
28 <Greenlet at 0x10e49f4b0: f(5)> 2
29 <Greenlet at 0x10e49f4b0: f(5)> 3
30 <Greenlet at 0x10e49f4b0: f(5)> 4

可以看到，3个greenlet是依次运行而不是交替运行。

要让greenlet交替运行，可以通过gevent.sleep()交出控制权：

def f(n):
    for i in range(n):
        print gevent.getcurrent(), i
        gevent.sleep(0)


<Greenlet at 0x10cd58550: f(5)> 0
<Greenlet at 0x10cd58910: f(5)> 0
<Greenlet at 0x10cd584b0: f(5)> 0
<Greenlet at 0x10cd58550: f(5)> 1
<Greenlet at 0x10cd584b0: f(5)> 1
<Greenlet at 0x10cd58910: f(5)> 1
<Greenlet at 0x10cd58550: f(5)> 2
<Greenlet at 0x10cd58910: f(5)> 2
<Greenlet at 0x10cd584b0: f(5)> 2
<Greenlet at 0x10cd58550: f(5)> 3
<Greenlet at 0x10cd584b0: f(5)> 3
<Greenlet at 0x10cd58910: f(5)> 3
<Greenlet at 0x10cd58550: f(5)> 4
<Greenlet at 0x10cd58910: f(5)> 4
<Greenlet at 0x10cd584b0: f(5)> 4

3个greenlet交替运行，

把循环次数改为500000，让它们的运行时间长一点，然后在操作系统的进程管理器中看，线程数只有1个。

当然，实际代码里，我们不会用gevent.sleep()去切换协程，而是在执行到IO操作时，gevent自动切换，代码如下：

 1 from gevent import monkey; monkey.patch_all()
 2 import gevent
 3 import urllib2
 4 
 5 def f(url):
 6     print('GET: %s' % url)
 7     resp = urllib2.urlopen(url)
 8     data = resp.read()
 9     print('%d bytes received from %s.' % (len(data), url))
10 
11 gevent.joinall([
12         gevent.spawn(f, 'https://www.python.org/'),
13         gevent.spawn(f, 'https://www.yahoo.com/'),
14         gevent.spawn(f, 'https://github.com/'),
15 ])
16 
17 
18 GET: https://www.python.org/
19 GET: https://www.yahoo.com/
20 GET: https://github.com/
21 45661 bytes received from https://www.python.org/.
22 14823 bytes received from https://github.com/.
23 304034 bytes received from https://www.yahoo.com/.

从结果看，3个网络操作是并发执行的，而且结束顺序不同，但只有一个线程。

　　（四）缓存

　　　　　　memcache:

　　　　　　　　下载： wget http://ftp.tummy.com/pub/python-memcached/old-releases/python-memcached-1.54.tar.gz(自己更新最新版)

　　　　　　　　解压缩：tar -zxvf python-memcached-1.54.tar.gz

　　　　　　　　安装： python setup.py install

　　　　　　　　启动：memcached -d -m 10 -u root -l 127.0.0.1 -p 11511 -c 256 -P /tmp/memcached.pid

参数说明:

-d 是启动一个守护进程

-m 是分配给Memcache使用的内存数量，单位是MB

-u 是运行Memcache的用户

-l 是监听的服务器IP地址

-p 是设置Memcache监听的端口,最好是1024以上的端口

-c 选项是最大运行的并发连接数，默认是1024，按照你服务器的负载量来设定

-P 是设置保存Memcache的pid文件

代码：

import memcache

class MemcachedClient():
    ''' python memcached 客户端操作示例 '''

    def __init__(self, hostList):
        self.__mc = memcache.Client(hostList);

    def set(self, key, value):
        result = self.__mc.set("name", "hongfei")
        return result

    def get(self, key):
        name = self.__mc.get("name")
        return name

    def delete(self, key):
        result = self.__mc.delete("name")
        return result

if __name__ == '__main__':
    mc = MemcachedClient(["127.0.0.1:11511", "127.0.0.1:11512"])
    key = "name"
    result = mc.set(key, "hongfei")
    print("set的结果：", result)
    name = mc.get(key)
    print ("get的结果：", name)
    result = mc.delete(key)
    print ("delete的结果：", result)

set的结果： True
get的结果： hongfei
delete的结果： 1

　很抱歉，时间有点仓促，写的不是很细，有点乱，以后慢慢补充整理，谢谢查看。