python parallism

目的

总结python并行方法。

类别：

多线程

线程池

多进程

进程池

协程

threading

https://docs.python.org/3/library/threading.html#module-threading

https://github.com/jackfrued/Python-100-Days/blob/master/Day01-15/13.%E8%BF%9B%E7%A8%8B%E5%92%8C%E7%BA%BF%E7%A8%8B.md

from random import randint
from threading import Thread
from time import time, sleep


def download(filename):
    print('开始下载%s...' % filename)
    time_to_download = randint(5, 10)
    sleep(time_to_download)
    print('%s下载完成! 耗费了%d秒' % (filename, time_to_download))


def main():
    start = time()
    t1 = Thread(target=download, args=('Python从入门到住院.pdf',))
    t1.start()
    t2 = Thread(target=download, args=('Peking Hot.avi',))
    t2.start()
    t1.join()
    t2.join()
    end = time()
    print('总共耗费了%.3f秒' % (end - start))


if __name__ == '__main__':
    main()

multiprocessing

https://docs.python.org/3/library/multiprocessing.html

https://github.com/jackfrued/Python-100-Days/blob/master/Day01-15/13.%E8%BF%9B%E7%A8%8B%E5%92%8C%E7%BA%BF%E7%A8%8B.md

from multiprocessing import Process
from os import getpid
from random import randint
from time import time, sleep


def download_task(filename):
    print('启动下载进程，进程号[%d].' % getpid())
    print('开始下载%s...' % filename)
    time_to_download = randint(5, 10)
    sleep(time_to_download)
    print('%s下载完成! 耗费了%d秒' % (filename, time_to_download))


def main():
    start = time()
    p1 = Process(target=download_task, args=('Python从入门到住院.pdf', ))
    p1.start()
    p2 = Process(target=download_task, args=('Peking Hot.avi', ))
    p2.start()
    p1.join()
    p2.join()
    end = time()
    print('总共耗费了%.2f秒.' % (end - start))


if __name__ == '__main__':
    main()

线程池

https://docs.python.org/3.4/library/concurrent.futures.html#concurrent.futures.Executor

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

进程池

import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()

https://github.com/jackfrued/Python-100-Days/blob/master/Day01-15/13.%E8%BF%9B%E7%A8%8B%E5%92%8C%E7%BA%BF%E7%A8%8B.mdhttps://github.com/jackfrued/Python-100-Days/blob/master/Day01-15/13.%E8%BF%9B%E7%A8%8B%E5%92%8C%E7%BA%BF%E7%A8%8B.md

from multiprocessing import Process, Queue
from random import randint
from time import time


def task_handler(curr_list, result_queue):
    total = 0
    for number in curr_list:
        total += number
    result_queue.put(total)


def main():
    processes = []
    number_list = [x for x in range(1, 100000001)]
    result_queue = Queue()
    index = 0
    # 启动8个进程将数据切片后进行运算
    for _ in range(8):
        p = Process(target=task_handler,
                    args=(number_list[index:index + 12500000], result_queue))
        index += 12500000
        processes.append(p)
        p.start()
    # 开始记录所有进程执行完成花费的时间
    start = time()
    for p in processes:
        p.join()
    # 合并执行结果
    total = 0
    while not result_queue.empty():
        total += result_queue.get()
    print(total)
    end = time()
    print('Execution time: ', (end - start), 's', sep='')


if __name__ == '__main__':
    main()

asyncio

https://docs.python.org/3/library/asyncio.html

import asyncio

async def main():
    print('Hello ...')
    await asyncio.sleep(1)
    print('... World!')

# Python 3.7+
asyncio.run(main())

asyncio is a library to write concurrent code using the async/await syntax.

asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc.

asyncio is often a perfect fit for IO-bound and high-level structured network code.

asyncio provides a set of high-level APIs to:

run Python coroutines concurrently and have full control over their execution;

perform network IO and IPC;

control subprocesses;

distribute tasks via queues;

synchronize concurrent code;

Additionally, there are low-level APIs for library and framework developers to:

create and manage event loops, which provide asynchronous APIs for networking, running subprocesses, handling OS signals, etc;

implement efficient protocols using transports;

bridge callback-based libraries and code with async/await syntax.

线程or进程模式

https://docs.python.org/3/library/asyncio-eventloop.html

import asyncio
import concurrent.futures

def blocking_io():
    # File operations (such as logging) can block the
    # event loop: run them in a thread pool.
    with open('/dev/urandom', 'rb') as f:
        return f.read(100)

def cpu_bound():
    # CPU-bound operations will block the event loop:
    # in general it is preferable to run them in a
    # process pool.
    return sum(i * i for i in range(10 ** 7))

async def main():
    loop = asyncio.get_running_loop()

    ## Options:

    # 1. Run in the default loop's executor:
    result = await loop.run_in_executor(
        None, blocking_io)
    print('default thread pool', result)

    # 2. Run in a custom thread pool:
    with concurrent.futures.ThreadPoolExecutor() as pool:
        result = await loop.run_in_executor(
            pool, blocking_io)
        print('custom thread pool', result)

    # 3. Run in a custom process pool:
    with concurrent.futures.ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(
            pool, cpu_bound)
        print('custom process pool', result)

asyncio.run(main())

协程模式

https://www.cnblogs.com/callyblog/p/11220594.html

import asyncio

def callback(loop, i):
    print("success time {} {}".format(i, loop.time()))

async def get_html(url):
    print("start get url")
    await asyncio.sleep(1)
    print("end get url")


# 两种创建的方法
if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    # get_future = asyncio.ensure_future(get_html("http://www.imooc.com"))
    task = loop.create_task(get_html("http://www.imooc.com"))
    loop.run_until_complete(task) # 接收的是一个future对象

高层接口（协程）

https://docs.python.org/3/library/asyncio-task.html#coroutine

import asyncio
import random
import time


async def worker(name, queue):
    while True:
        # Get a "work item" out of the queue.
        sleep_for = await queue.get()

        # Sleep for the "sleep_for" seconds.
        await asyncio.sleep(sleep_for)

        # Notify the queue that the "work item" has been processed.
        queue.task_done()

        print(f'{name} has slept for {sleep_for:.2f} seconds')


async def main():
    # Create a queue that we will use to store our "workload".
    queue = asyncio.Queue()

    # Generate random timings and put them into the queue.
    total_sleep_time = 0
    for _ in range(20):
        sleep_for = random.uniform(0.05, 1.0)
        total_sleep_time += sleep_for
        queue.put_nowait(sleep_for)

    # Create three worker tasks to process the queue concurrently.
    tasks = []
    for i in range(3):
        task = asyncio.create_task(worker(f'worker-{i}', queue))
        tasks.append(task)

    # Wait until the queue is fully processed.
    started_at = time.monotonic()
    await queue.join()
    total_slept_for = time.monotonic() - started_at

    # Cancel our worker tasks.
    for task in tasks:
        task.cancel()
    # Wait until all worker tasks are cancelled.
    await asyncio.gather(*tasks, return_exceptions=True)

    print('====')
    print(f'3 workers slept in parallel for {total_slept_for:.2f} seconds')
    print(f'total expected sleep time: {total_sleep_time:.2f} seconds')


asyncio.run(main())

评价

https://zhuanlan.zhihu.com/p/82123111

最近在做并行编程，多线程，多进程，多核的概念令人迷惑，总结一下：

计算机的cpu物理核数是同时可以并行的线程数量(cpu只能看到线程，线程是cpu调度分配的最小单位)，由于超线程技术，实际上可以并行的线程数量通常是物理核数的两倍，这也是操作系统看到的核数。我们只care可以并行的线程数量，所以之后所说的核数是操作系统看到的核数，所指的核也是超线程技术之后的那个核（不是物理核）。

进程是操作系统资源分配（内存，显卡，磁盘）的最小单位，线程是执行调度（即cpu调度）的最小单位（cpu看到的都是线程而不是进程），一个进程可以有一个或多个线程，线程之间共享进程的资源，通过这样的范式，就可以减少进程的创建和销毁带来的代价，可以让进程少一点，保持相对稳定，不断去调度线程就好。如果计算机有多个cpu核，且计算机中的总的线程数量小于核数，那线程就可以并行运行在不同的核中，如果是单核多线程，那多线程之间就不是并行，而是并发，即为了均衡负载，cpu调度器会不断的在单核上切换不同的线程执行，但是我们说过，一个核只能运行一个线程，所以并发虽然让我们看起来不同线程之间的任务是并行执行的，但是实际上却由于增加了线程切换的开销使得代价更大了。如果是多核多线程，且线程数量大于核数，其中有些线程就会不断切换，并发执行，但实际上最大的并行数量还是当前这个进程中的核的数量，所以盲目增加线程数不仅不会让你的程序更快，反而会给你的程序增加额外的开销。

任务可以分为计算密集型和IO密集型，假设我们现在使用一个进程来完成这个任务，对计算密集型任务，可以使用【核心数】个线程，就可以占满cpu资源，进而可以充分利用cpu，如果再多，就会造成额外的开销；对于IO密集型任务（涉及到网络、磁盘IO的任务都是IO密集型任务），线程由于被IO阻塞，如果仍然用【核心数】个线程，cpu是跑不满的，于是可以使用更多个线程来提高cpu使用率。

实现并行计算有三种方式，多线程，多进程，多进程+多线程。如果是多进程，因为每个进程资源是独立的（地址空间和数据空间），就要在操作系统层面进行通信，如管道，队列，信号等；多线程的话会共享进程中的地址空间和数据空间，一个线程的数据可以直接提供给其他线程使用，但方便的同时会造成变量值的混乱，所以要通过线程锁来限制线程的执行

其他语言，CPU 是多核时是支持多个线程同时执行。但在 Python 中，无论是单核还是多核，一个进程同时只能由一个线程在执行。其根源是 GIL 的存在。GIL 的全称是 Global Interpreter Lock(全局解释器锁)，来源是 Python 设计之初的考虑，为了数据安全所做的决定。某个线程想要执行，必须先拿到 GIL，我们可以把 GIL 看作是“通行证”，并且在一个 Python 进程中，GIL 只有一个。拿不到通行证的线程，就不允许进入 CPU 执行。所以多线程在python中很鸡肋。

Python并行编程

https://python-parallel-programmning-cookbook.readthedocs.io/zh_CN/latest/index.html

第一章认识并行计算和Python

1. 介绍

2. 并行计算的内存架构

3. 内存管理

4. 并行编程模型

5. 如何设计一个并行程序

6. 如何评估并行程序的性能

7. 介绍Python

8. 并行世界的Python

9. 介绍线程和进程

10. 开始在Python中使用进程

11. 开始在Python中使用线程

第二章基于线程的并行

1. 介绍

2. 使用Python的线程模块

3. 如何定义一个线程

4. 如何确定当前的线程

5. 如何实现一个线程

6. 使用Lock进行线程同步

7. 使用RLock进行线程同步

8. 使用信号量进行线程同步

9. 使用条件进行线程同步

10. 使用事件进行线程同步

11. 使用with语法

12. 使用 queue 进行线程通信

13. 评估多线程应用的性能

第三章基于进程的并行

1. 介绍

2. 如何产生一个进程

3. 如何为一个进程命名

4. 如何在后台运行一个进程

5. 如何杀掉一个进程

6. 如何在子类中使用进程

7. 如何在进程之间交换对象

8. 进程如何同步

9. 如何在进程之间管理状态

10. 如何使用进程池

11. 使用Python的mpi4py模块

12. 点对点通讯

13. 避免死锁问题

14. 集体通讯：使用broadcast通讯

15. 集体通讯：使用scatter通讯

16. 集体通讯：使用gather通讯

17. 使用Alltoall通讯

18. 简化操作

19. 如何优化通讯

第四章异步编程

1. 介绍

2. 使用Python的 concurrent.futures 模块

3. 使用Asyncio管理事件循环

4. 使用Asyncio管理协程

5. 使用Asyncio控制任务

6. 使用Asyncio和Futures

第五章分布式Python编程

1. 介绍

2. 使用Celery实现分布式任务

3. 如何使用Celery创建任务

4. 使用SCOOP进行科学计算

5. 通过 SCOOP 使用 map 函数

6. 使用Pyro4进行远程方法调用

7. 使用 Pyro4 链接对象

8. 使用Pyro4部署客户端-服务器应用

9. PyCSP和通信顺序进程

10. 使用Disco进行MapReduce

11. 使用RPyC远程调用

第六章 Python GPU编程

1. 介绍

2. 使用PyCUDA模块

3. 如何创建一个PyCUDA应用

4. 理解PyCuDA内存模型

5. 使用GPUArray进行内核调用

6. 使用PyCUDA评估元素

7. 使用PyCUDA进行MapReduce操作

8. 使用NumbaPro进行GPU编程

9. 使用GPU加速的库

10. 使用PyOpenCL模块

11. 如何创建一个PyOpenCL应用

12. 使用PyOpenCL评估元素

13. 使用PyOpenCL测试你的GPU应用