进程与线程

  1 进程和线程是操作系统的基本概念，计算机是由硬件和软件组成。硬件中的CPU是计算机的核心，他承担计算机的所有任务。
  2 操作系统是运行在硬件上的软件，是计算机的管理者，他负责资源的管理和分配、任务的调度。
  3 程序是运行在系统上的具有某种功能的软件，比如浏览器、音乐、软件等。
  4 每次执行程序的时候，都会完成一定的功能，比如说浏览器帮我们打开网页，为了保证其独立性，就需要一个专门的管理和控制执行程序
  5 的数据结构--进程控制块。
  6 进程就是一个程序在一个数据集上的一次动态执行过程。进程一般由程序、数据集、进程控制块三部分组成。
  7 我们编写的程序用来描述进程要完成哪些功能以及如何完成；数据集则是程序在执行过程中所需要使用的资源；进程控制块用来记录进程的
  8 外部特征，描述进程的执行变化过程，系统可以利用他来控制和管理进程，他是系统感知进程存在的唯一标志。
  9 线程是操作系统能够进行运算调度的最小单位。他被包含在进程之中，是进程中的实际运作单位。
 10 一个线程指的是进程中一个单一顺序的控制流。
 11 一个进程中可以并发多个线程，每条线程并行执行不同的任务。
 12 
 13 1.进程是不活泼的，进程从来不执行任何东西，他只是线程的容器，若要是进程完成某种操作，他必须有一个在他环境中运行的线程，此线程负责
 14 执行包含在进程地址空间中的代码。
 15 2.创建一个进程时，操作系统会自动创建这个进程的第一个线程，称为主线程。此后该线程可以创建其他的线程。
 16 3.线程与进程的关系：线程是属于进程的，线程运行在进程空间内，同一进程所产生的线程共享同一内存空间，当进程退出时，该进程产生的
 17 线程都会被强制清楚并退出。线程可与同一进程的其他线程共享进程所拥有的全部资源，但是其本身基本上不拥有系统资源，只拥有一点在运行中必不可少的信息（如程序计数器、一组寄存器和栈）。
 18 
 19 线程与进程的区别：
 20 进程：对各种资源管理的集合;   
 21 线程：操作系统最小的调度单位，是一串指令的集合。
 22 
 23 进程中第一个线程是主线程，主线程创建其他线程，其他线程也可以创建线程，线程之间是平等的
 24 进程有父进程、子进程，独立的内存空间，唯一的进程标识符pid
 25 
 26 启动线程比启动进程快。运行进程和运行线程速度上是一样的，没有可比性。
 27 线程共享内存空间，进程的内存是独立的
 28 
 29 父进程生成子进程，相当于克隆一份内存空间。进程之间不能直接访问
 30 创建新线程很简单，创建新进程需要对其父进程进行一次克隆
 31 一个线程可以控制和操作同一进程里的其他线程，但是进程只能操作子进程
 32 
 33 同一个进程之间的线程之间可以直接通信
 34 两个进程想通信必须通过一个中间代理来实现
 35 
 36 进程的特性：
 37 动态性:进程的实质是程序的一次执行过程,进程是动态产生、动态消亡的
 38 并发性:任何进程都可以同其他进程一起并发执行
 39 独立性:进程是一个能独立运行的基本单位，同时也是系统分配资源和调度的独立单位
 40 异步性：每个进程都以相互独立、不可预知的速度向前推进
 41 
 42 进程由程序、数据和进程控制块三部分组成
 43 
 44 多任务的实现有3种方式：
 45 多进程模式
 46 多线程模式
 47 多进程+多线程模式
 48 
 49 
 50 # 直接调用
 51 import threading
 52 import time
 53 def run(n):
 54     # time.sleep(2)
 55     print('task', n)
 56 t1=threading.Thread(target=run,args=('t1',))
 57 t2=threading.Thread(target=run,args=('t2',))
 58 t1.start()
 59 t2.start()
 60 
 61 # task t1
 62 # task t2
 63 
 64 
 65 继承式调用
 66 import threading
 67 class MyThread(threading.Thread):
 68     def __init__(self,n):
 69         super(MyThread,self).__init__()
 70         self.n=n
 71     def run(self):
 72         print('这种方式函数名必须是run，写死的',self.n)
 73 t1=MyThread('t1')
 74 t2=MyThread('t2')
 75 t1.start()
 76 t2.start()
 77 
 78 # 这种方式函数名必须是run，写死的 t1
 79 # 这种方式函数名必须是run，写死的 t2
 80 
 81 
 82 
 83 使用传统编程看执行任务花费的时间
 84 import time
 85 import threading
 86 
 87 def task1():
 88     time.sleep(5)
 89     print('任务一完成',time.ctime())
 90 def task2():
 91     time.sleep(5)
 92     print('任务二完成',time.ctime())
 93 print('执行任务前打印当前时间',time.ctime())
 94 task1()
 95 task2()
 96 print('执行结束，记录结束时间',time.ctime())
 97 
 98 # 执行任务前打印当前时间 Sun Mar 10 08:21:49 2019
 99 # 任务一完成 Sun Mar 10 08:21:54 2019
100 # 任务二完成 Sun Mar 10 08:21:59 2019
101 # 执行结束，记录结束时间 Sun Mar 10 08:21:59 2019
102 
103 
104 
105 采用多线程实行并行处理，查看执行同样任务所花费的时间
106 import time
107 import threading
108 
109 def task1():
110     time.sleep(3)
111     print('任务一完成',time.ctime())
112 def task2():
113     time.sleep(3)
114     print('任务二完成',time.ctime())
115 print('执行任务前打印当前时间',time.ctime())
116 t1=threading.Thread(target=task1)
117 t2=threading.Thread(target=task2)
118 t1.start()
119 t2.start()
120 t1.join()
121 t2.join()
122 print('执行结束，打印当前时间')
123 
124 # 执行任务前打印当前时间 Sun Mar 10 08:28:44 2019
125 # 任务二完成 任务一完成 Sun Mar 10 08:28:47 2019
126 # Sun Mar 10 08:28:47 2019
127 # 执行结束，打印当前时间
128 
129 
130 
131 任何进程默认就会启动一个线程，我们把该线程称为主线程，主线程又可以启动新的线程，Python的threading模块有个current_thread()函数，
132 他永远返回当前线程的实例。主线程实例的名字叫MainThread,子线程的名字在创建线程实例时指定，这里用hello命名子线程。
133 名字仅仅用来在打印时显示，完全没有其他意义，如果不起名字，python就会自动给线程命名为Thread-1,Thread-2...
134 import time
135 import threading
136 
137 def loop():
138     print('线程%s正在执行。。。'%threading.current_thread().name)
139     n=0
140     while n<5:
141         n+=1
142         print('线程%s>>>%s'%(threading.current_thread().name,n))
143         time.sleep(2)
144     print('线程%s结束'%threading.current_thread().name)
145 print('主线程%s正在执行。。。'%threading.current_thread().name)
146 # 给线程起名hello
147 t=threading.Thread(target=loop,name='hello')
148 t.start()
149 t.join()
150 print('主线程%s结束'%threading.current_thread().name)
151 
152 # 主线程MainThread正在执行。。。
153 # 线程hello正在执行。。。
154 # 线程hello>>>1
155 # 线程hello>>>2
156 # 线程hello>>>3
157 # 线程hello>>>4
158 # 线程hello>>>5
159 # 线程hello结束
160 # 主线程MainThread结束
161 
162 
163 
164 当我们使用setDaemon(True)方法时，设置子线程为守护线程时，主线程一旦执行结束，则全部线程全部被终止执行，
165 可能出现的情况就是，子线程的任务还没有完全结束，就被迫停止。不设置的话默认为setDaemon(False),
166 如果你设置一个线程为守护线程，就表示这个线程是不重要的，在进程退出的时候，不用等待这个线程退出。
167 import time
168 import threading
169 
170 def task():
171     print('start fun',time.ctime())
172     time.sleep(2)
173     print('end fun',time.ctime())
174 t1=threading.Thread(target=task)
175 print('线程名字',t1.getName(),time.ctime()) #显示实例线程名字
176 t1.setDaemon(True)  # 设置t1为守护线程
177 t1.start()
178 time.sleep(1)
179 print(threading.current_thread().name,time.ctime()) # 主线程执行结束，则全部终止
180 
181 # 线程名字 Thread-1 Sun Mar 10 16:58:14 2019
182 # start fun Sun Mar 10 16:58:14 2019
183 # MainThread Sun Mar 10 16:58:15 2019
184 
185 
186 
187 
188 多进程与多线程最大的不同在于，多进程中，同一个变量，各自有一份拷贝在每个进程中，互不影响，而多线程中，所有变量都由所有线程
189 共享。所以任何一个变量都可以被任何一个线程修改，因此，线程之间共享数据最大的危险在于多个线程同时修改一个变量，把内容给改乱了。
190 
191 
192 
193 # 没有锁
194 import threading
195 
196 balance=0
197 def task(n):
198     global balance
199     balance+=n
200     balance-=n
201 def task2(arg,n):
202     while arg>0:
203         # lock.acquire()#获得锁
204         task(n)
205         arg-=1
206 
207 t1=threading.Thread(target=task2,args=(880000,5))
208 t2=threading.Thread(target=task2,args=(970000,6))
209 t1.start()
210 t2.start()
211 t1.join()
212 t2.join()
213 print(balance)
214 
215 
216 
217 import threading
218 
219 balance=0
220 lock=threading.RLock()
221 def task(n):
222     global balance
223     balance+=n
224     balance-=n
225 def task2(arg,n):
226     while arg>0:
227         lock.acquire()#获得锁
228         try:
229             task(n)
230         finally:
231             lock.release()
232         arg-=1
233 
234 t1=threading.Thread(target=task2,args=(1880000,5))
235 t2=threading.Thread(target=task2,args=(1770000,6))
236 t1.start()
237 t2.start()
238 t1.join()
239 t2.join()
240 print(balance)
241 
242 当多个线程同时执行lock.acquire()时，只有一个线程能够成功地获取锁，然后执行代码，其他线程就继续等待直到获得锁为止。
243 获得锁的线程用完后一定要释放锁，否则那些苦苦等待的线程将永远等待下去，称为死线程。
244 锁的好处就是确保了某段代码只能由一个线程从头到尾的执行。坏处当然也很多，首先是阻止了多线程并发执行，包含锁的某段代码实际上
245 只能以单线程的模式执行，效率就大大下降了。其次，由于可以存在多个锁，不同的线程持有不同的锁，并试图获取对方持有的锁时，可能会
246 造成死锁，导致多个线程全部挂起，既不能执行，也无法结束，只能靠操作系统强制终止。
247 
248 在python中，不能利用多线程实现多核任务，但可以通过多进程实现多核任务。
249 
250 import time
251 import threading
252 
253 globals_num=0
254 #lock=threading.RLock()
255 def Func():
256     # lock.acquire()
257     global globals_num
258     globals_num+=1
259     time.sleep(1)
260     print(globals_num)
261     # lock.release()
262 for i in range(10):
263     t=threading.Thread(target=Func)
264     t.start()
265 
266 # 1010
267 #
268 # 10
269 # 101010
270 #
271 # 10
272 #
273 # 101010
274 
275 
276 
277 import time
278 import threading
279 
280 globals_num=0
281 lock=threading.RLock()
282 def Func():
283     lock.acquire()
284     global globals_num
285     globals_num+=1
286     time.sleep(1)
287     print(globals_num)
288     lock.release()
289 for i in range(10):
290     t=threading.Thread(target=Func)
291     t.start()
292 
293 # 1
294 # 2
295 # 3
296 # 4
297 # 5
298 # 6
299 # 7
300 # 8
301 # 9
302 # 10
303 
304 在多线程环境下，每个线程都有自己的数据。一个线程使用自己的局部变量比使用全局变量好，因为局部变量只有线程自己能看见，不会影响
305 其他线程，而全局变量的修改必须加锁。
306 
307 ThreadLocal，很多地方叫做线程本地变量，也有些地方叫做线程本地存储，其实意思差不多。ThreadLocal为变量在每个线程中都创建了
308 一个副本，那么每个线程可以访问自己内部的副本变量。
309 ThreadLocal最常用的地方就是为每个线程绑定一个数据库连接，HTTP请求，用户身份信息等，这样一个线程的所有调用到的处理函数都可以
310 非常方便地访问这些资源。
311 
312 在新的线程中local_data并没有x属性，并且在新线程中的赋值并不会影响到其他线程。
313 去掉local_data=Widgt()的注释，local_data就变成了线程共享的变量。
314 
315 import threading
316 
317 class Widgt(object):
318     pass
319 
320 def test():
321     local_data=threading.local()
322     # local_data=Widgt()
323     local_data.x=1
324     def thread_func():
325         print('Has x in new thread:%s'%hasattr(local_data,'x'))#hasattr(实例名，属性名)判断对象中是否存在该属性
326         local_data.x=2
327         print('Has x in new thread:%s'%hasattr(local_data,'x'))
328         print('x in pre thread is %s'%local_data.x)
329     t=threading.Thread(target=thread_func)
330     t.start()
331     t.join()
332     print('x in pre thread is %s'%local_data.x)
333 if __name__=='__main__':
334     test()
335 
336 # Has x in new thread:False
337 # Has x in new thread:True
338 # x in pre thread is 2
339 # x in pre thread is 1
340 
341 
342 
343 
344 创建全局ThreadLocal对象
345 import threading
346 
347 local_school=threading.local()
348 def process_student():
349     #获取当前线程关联的student
350     std=local_school.student
351     print('Hello,%s(in %s)'%(std,threading.current_thread().name))
352 def process_thread(name):
353     #绑定ThreadLocal的student
354     local_school.student=name
355     process_student()
356 t1=threading.Thread(target=process_thread,args=('Alice',),name='Thread-A')
357 t2=threading.Thread(target=process_thread,args=('Bob',),name='Thread-B')
358 t1.start()
359 t2.start()
360 t1.join()
361 t2.join()
362 
363 # Hello,Alice(in Thread-A)
364 # Hello,Bob(in Thread-B)
365 
366 
367 
368 multiprocessing模块提供了一个Process类来代表一个进程对象，multiprocessing模块就是跨平台版本的多进程模块。
369 import time
370 import multiprocessing
371 
372 def add(number,value,lock):
373     lock.acquire()
374     try:
375         print('init add{0} number={1}'.format(value,number))
376         for i in range(1,6):
377             number+=value
378             time.sleep(1)
379             print('add{0} number={1}'.format(value,number))
380     except Exception as e:
381         raise e
382     finally:
383         lock.release()
384 if __name__=='__main__':
385     lock=multiprocessing.Lock()
386     number=0
387     p1=multiprocessing.Process(target=add,args=(number,1,lock))
388     p2=multiprocessing.Process(target=add,args=(number,3,lock))
389     p1.start()
390     p2.start()
391     print('main end')
392 
393 # main end
394 # init add1 number=0
395 # add1 number=1
396 # add1 number=2
397 # add1 number=3
398 # add1 number=4
399 # add1 number=5
400 # init add3 number=0
401 # add3 number=3
402 # add3 number=6
403 # add3 number=9
404 # add3 number=12
405 # add3 number=15
406 
407 
408 
409 
410 下面的例子演示了启动一个子进程并等待其结束
411 
412 from multiprocessing import Process
413 import os
414 
415 #子进程要执行的代码
416 def run_proc(name):
417     print('Run child process %s(%s)...'%(name,os.getpid()))
418 if __name__=='__main__':
419     print('Parent process %s.'%(os.getpid()))
420     p=Process(target=run_proc,args=('test',))
421     print('Child process will start.')
422     p.start()  #启动进程
423     p.join() #等待子进程执行结束后再往下执行，通常用于进程间的同步
424     print('Child process end.')
425 
426 # Parent process 6012.
427 # Child process will start.
428 # Run child process test(8724)...
429 # Child process end.
430 
431 
432 
433 
434 
435 Pool类 在使用Python进行系统管理时，特别是同时操作多个文件目录或者远程控制多台主机，并行操作可以节约大量的时间。如果操作
436 对象数目不大时，还可以直接使用Process类动态的生成多个进程，十几个还好，但是如果上百个甚至更多，那手动去限制进程数量就显得
437 特别的繁琐，此时进程池就派上用场了。
438 Pool类可以提供指定数量的进程共用户调用，当有新的需求提交到Pool中时，如果池还没满，就会创建一个新的进程来执行请求。如果
439 池满，请求就会告知等待，直到池中有进程结束，才会创建新的进程来执行这些请求。
440 下面介绍multiprocessing模块下的Pool类下的几个方法
441 apply()
442 apply(func[,args=()[,kwds={}]])该函数用于传递不定参数，主进程会被阻塞直到函数执行结束(不建议使用，并且3.x以后不再出现)
443 apply_async()
444 apply_async(func[,args=()[,kwds={}[,callback=None]]])与apply用法一样，但他是非阻塞且支持结果返回进行回调
445 map()
446 map(func,iterable[,chunksize=None]) Pool类中的map方法，与内置的map函数用法基本一致，它会使进程阻塞直到返回结果。注意：第二个
447 参数虽然是迭代器，但在实际使用中，必须在整个队列都就绪后，程序才会运行子进程。
448 close()关闭进程池，使其不再接收新的任务。
449 terminate()结束工作进程，不再处理未完成的任务。
450 join()主进程阻塞等待子进程的退出，join方法必须在close或terminate之后使用。
451 
452 from multiprocessing import Pool
453 import os
454 import time
455 import random
456 
457 def long_time_task(name):
458     print('运行任务%s(%s)...'%(name,os.getpid()),time.ctime()) # os.getpid()获得当前进程的进程号
459     start=time.time()
460     time.sleep(random.random()*3)
461     end=time.time()
462     print('任务%s运行%0.2f秒'%(name,(end-start)),time.ctime())
463 if __name__=='__main__':
464     print('Parent process %s.'%os.getpid(),time.ctime())
465     p=Pool(4) # 创建拥有4个进程数量的进程池
466     for i in range(5):
467         p.apply_async(long_time_task,args=(i,))
468     print('Waiting for all subprocesses done...',time.ctime())
469     p.close()
470     p.join()
471     print('All subprocesses done.',time.ctime())
472 
473 # Parent process 9704. Mon Mar 11 16:14:31 2019
474 # Waiting for all subprocesses done... Mon Mar 11 16:14:31 2019
475 # 运行任务0(5832)... Mon Mar 11 16:14:31 2019
476 # 运行任务1(7004)... Mon Mar 11 16:14:31 2019
477 # 运行任务2(6956)... Mon Mar 11 16:14:31 2019
478 # 运行任务3(1632)... Mon Mar 11 16:14:31 2019
479 # 任务3运行1.00秒 Mon Mar 11 16:14:32 2019
480 # 运行任务4(1632)... Mon Mar 11 16:14:32 2019
481 # 任务1运行1.98秒 Mon Mar 11 16:14:33 2019
482 # 任务2运行2.36秒 Mon Mar 11 16:14:34 2019
483 # 任务0运行2.98秒 Mon Mar 11 16:14:34 2019
484 # 任务4运行2.01秒 Mon Mar 11 16:14:34 2019
485 # All subprocesses done. Mon Mar 11 16:14:35 2019
486 
487 
488 
489 
490 Process之间肯定是需要通信的，操作系统提供了很多机制来实现进程间的通信。Python的multiprocessing模块包装了底层的机制，
491 提供了Queue,Pipes等多种方式来交换数据。我们以Queue为例，在父进程中创建两个子进程，一个往Queue里写数据，另一个从Queue里
492 读取数据。
493 
494 from multiprocessing import Process,Queue
495 import os
496 import time
497 import random
498 
499 # 写数据进程执行的代码
500 def write(q):
501     print('Process to write:%s'%os.getpid())
502     for value in ['A','B','C']:
503         print('Put %s to queue...'%value)
504         q.put(value)
505         time.sleep(random.random())
506 # 读数据进程执行的代码
507 def read(q):
508     print('Process to read:%s'%os.getpid())
509     while True:
510         value=q.get(True)
511         print('Get %s from queue.'%value)
512 if __name__=='__main__':
513     q=Queue() # 父进程创建Queue，并创建给子进程
514     w=Process(target=write,args=(q,))
515     r=Process(target=read,args=(q,))
516     w.start() # 启动子进程pw，写入
517     r.start() # 启动子进程pr,读取
518     w.join()
519     r.terminate() # r
520 
521 # Process to write:9048
522 # Put A to queue...
523 # Process to read:1580
524 # Get A from queue.
525 # Put B to queue...
526 # Get B from queue.
527 # Put C to queue...
528 # Get C from queue.
529 
530 
531 
532 
533 Queue是python标准库中的线程安全的队列（FIFO）实现，提供了一个适用于多线程编程的先进先出的数据结构，即队列，用来在生产者
534 和消费者线程之间的信息传递。基本FIFO队列 class queue.Queue(maxsize=0)
535 FIFO即First in First out ，先进先出。Queue提供了一个基本的FIFO容器，使用方法很简单，maxsize是一个整数，指明了队列中能存放的
536 数据个数的上限。一旦达到了上限，插入会导致阻塞，直到队列中数据被消费掉。如果maxsize小于或者等于0，队列大小没有限制。
537 
538 import queue
539 
540 q=queue.Queue()
541 for i in range(5):
542     q.put(i)
543 while not q.empty():
544     print(q.get())
545 
546 # 0
547 # 1
548 # 2
549 # 3
550 # 4
551 
552 
553 
554 LIFO队列 即 last in first out ，后进先出
555 class queue.LifoQueue(maxsize=0)
556 
557 import queue
558 
559 q=queue.LifoQueue(maxsize=0)
560 for i in range(5):
561     q.put(i)
562 while not q.empty():
563     print(q.get())
564 
565 # 4
566 # 3
567 # 2
568 # 1
569 # 0