python3 结束子线程

　　最近公司内部网络经常出问题，奇慢无比，导致人脸检测程序在下载图片时经常卡住，为了不影响数据的核对，决定在网络不佳图片下载超时后放弃下载，继续执行后续程序。

于是整理出解决思路如下：

　　1、在线程中完成图片下载任务

　　2、设置图片下载超时的时间

　　3、当下载超时后线束下载线程，执行后续任务

为了便于演示下载效果，决定采集requests请求方法，而不用urltrieve下载

一、先看看单线程如何下载图片的问题

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# __author__:kzg

import threading
import time
from urllib.request import urlretrieve

def callbackinfo(down, block, size):
    '''
    回调函数：
    down：已经下载的数据块
    block：数据块的大小
    size：远程文件的大小
    '''
    per = 100.0 * (down * block) / size
    if per > 100:
        per = 100
    time.sleep(1)  # sleep 1秒
    print('%.2f%%' % per)

# 图片下载函数
def downpic(url):
    urlretrieve(url, 'test.jpg', callbackinfo)

url = 'https://s1.tuchong.com/content-image/201909/98cac03c4a131754ce46d51faf597230.jpg'
# 执行线程
t = threading.Thread(target=downpic, args=(url,))
t.start()
t.join(3)
print("down OK")


结果：
0.00%
1.51%
down OK
3.02%
4.52%
6.03%
……

　　可以看到，执行过程

　　　　1、将图片下载程序塞到线程中执行

　　　　2、启动线程

　　　　3、三秒后线程仍未执行完，放弃阻塞

　　　　4、执行print

　　　　5、线程继续执行，直到完成

二、守护线程（deamon）

　　　　守护线程结束，其中的子线程也被迫结束

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# __author__:kzg

import threading
import time
from urllib.request import urlretrieve

def callbackinfo(down, block, size):
    '''
    回调函数：
    down：已经下载的数据块
    block：数据块的大小
    size：远程文件的大小
    '''
    per = 100.0 * (down * block) / size
    if per > 100:
        per = 100
    time.sleep(1)
    print('%.2f%%' % per)

def downpic(url):
    urlretrieve(url, 'test.jpg', callbackinfo)

def mainFunc(funcname, args):
    '''
    :param funcname: 函数名(图片下载函数)
    :param args: 参数（url地址）
    :return:
    '''
    t = threading.Thread(target=funcname, args=(args,))
    t.start()  # 开始执行线程
    t.join(timeout=5)  # 5秒后线程仍未执行完则放弃阻塞， 继续执行后续代码


url = 'https://s1.tuchong.com/content-image/201909/98cac03c4a131754ce46d51faf597230.jpg'

m = threading.Thread(target=mainFunc, args=(downpic, url))
m.setDaemon(True)
m.start()
m.join()


结果：
0.00%
1.51%
3.02%
4.52%

　　可以看到执行结果：

　　　　1、mainfunc函数被塞到m线程中

　　　　2、m线程设置为守护线程

　　　　3、启动守护线程

　　　　4、mainfunc下的子线程 t在5秒后仍未执行完，

　　　　　　　　放弃阻塞，执行后续程序

　　　　　　　　m.join被执行，守护线程结束，子线程t 被迫结束（结果中只有图片只下载了4秒）

　　　　　　　　图片中止下载

　　按说到此为止应该圆满结束了，然而在程序执行过程中发现子线程超时后，确实开始执行后续代码，但子线程并未退出，仍然在运行。　经过不断排查发现问题出现在for循环上，原来for循环也类似一个demon的线程，如果for循环一直不结束，其内的子线程就不会结束。

三、遇到问题，子线程未被关闭

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# __author__:kzg

import threading
import time
from urllib.request import urlretrieve

def callbackinfo(down, block, size):
    '''
    回调函数：
    down：已经下载的数据块
    block：数据块的大小
    size：远程文件的大小
    '''
    per = 100.0 * (down * block) / size
    if per > 100:
        per = 100
    time.sleep(1)
    print('%.2f%%' % per)

# 图片下载函数
def downpic(url):
    urlretrieve(url, 'test.jpg', callbackinfo)

def mainFunc(funcname, args):
    '''
    :param funcname: 函数名(图片下载函数)
    :param args: 参数（url地址）
    :return:
    '''
    t = threading.Thread(target=funcname, args=(args,))
    t.start()  # 开始执行线程
    t.join(timeout=5)  # 3秒后线程仍未执行完则放弃阻塞， 继续执行后续代码

for i in range(2):
    if i == 0:
        url = 'https://s1.tuchong.com/content-image/201909/98cac03c4a131754ce46d51faf597230.jpg'
    else:
        break
    # 守护线程
    m = threading.Thread(target=mainFunc, args=(downpic, url))
    m.setDaemon(True)
    m.start()
    m.join()
    print(m.is_alive())
    time.sleep(100)  # sleep 100秒， 模拟for一直不结束

结果：
0.00%
1.51%
3.02%
4.52%
False
6.03%
7.54%
9.05%
10.55%

　　从结果可以看出， 5秒后deamon线程结束，意味着 t 线程会被关闭，然而子线程 t 却一直在执行。

　　怎么办呢？

四、问题解决，强制关闭子线程

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# __author__:kzg

import threading
import time
import inspect
import ctypes
from urllib.request import urlretrieve

def callbackinfo(down, block, size):
    '''
    回调函数：
    down：已经下载的数据块
    block：数据块的大小
    size：远程文件的大小
    '''
    per = 100.0 * (down * block) / size
    if per > 100:
        per = 100
    time.sleep(1)
    print('%.2f%%' % per)

# 图片下载函数
def downpic(url):
    urlretrieve(url, 'test.jpg', callbackinfo)

def _async_raise(tid, exctype):
    """raises the exception, performs cleanup if needed"""
    tid = ctypes.c_long(tid)
    if not inspect.isclass(exctype):
        exctype = type(exctype)
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, ctypes.py_object(exctype))
    if res == 0:
        raise ValueError("invalid thread id")
    elif res != 1:
        # """if it returns a number greater than one, you're in trouble,
        # and you should call it again with exc=NULL to revert the effect"""
        ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

def stop_thread(thread):
    _async_raise(thread.ident, SystemExit)

for i in range(2):
    if i == 0:
        url = 'https://s1.tuchong.com/content-image/201909/98cac03c4a131754ce46d51faf597230.jpg'
    else:
        break
    t = threading.Thread(target=downpic, args=(url,))
    t.start()
    t.join(5)
    print(t.is_alive())
    if t.is_alive():
        stop_thread(t)
    print("t is kill")
    time.sleep(100)

结果：
0.00%
1.51%
3.02%
4.52%
True
t is kill

　　可以看到：

　　　　1、主函数mainfunc去掉了

　　　　2、在for循环中直接加入子线程

　　　　3、在timeout的时间后线程仍然活着则强制关闭

附：测试图片下载的另一种方法

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import requests
import os
import time

def downpic(url):
    '''
    根据url下载图片
    :param url: url地址
    :return: 下载后的图片名称
    '''
    try:
        print("Start Down %s" % url)
        ret = requests.get(url, timeout=3)  # 请求超时
        if ret.status_code == 200:
            with open("test.jpg", 'wb') as fp:
                for d in ret.iter_content(chunk_size=10240):
                    time.sleep(1)  # 每次下载10k，sleep 1秒
                    fp.write(d)
            print("downLoad ok %s" % url)
    except Exception as ex:
        print("downLoad pic fail %s" % url)

其它：

urlretrieve第三个参数为reporthook：
    是一个回调函数，当连接上服务器以及相应数据块传输完毕时会触发该回调，我们就可以利用该回调函数来显示当前的下载进度。
　　　　下载状态的报告，他有多个参数，
　　　　1）参数1：当前传输的块数
　　　　2）参数2：块的大小
　　　　3）参数3，总数据大小

def urlretrieve(url, filename=None, reporthook=None, data=None):
    """
    Retrieve a URL into a temporary location on disk.

    Requires a URL argument. If a filename is passed, it is used as
    the temporary file location. The reporthook argument should be
    a callable that accepts a block number, a read size, and the
    total file size of the URL target. The data argument should be
    valid URL encoded data.

    If a filename is passed and the URL points to a local resource,
    the result is a copy from local file to new file.

    Returns a tuple containing the path to the newly created
    data file as well as the resulting HTTPMessage object.
    """