多线程与多进程

numpy array 不管在什么情况下，运行起来，都比list要i快

资料显示，如果多线程的进程是CPU密集型的，那多线程并不能有多少效率上的提升，相反还可能会因为线程的频繁切换，导致效率下降，推荐使用多进程；如果是IO密集型，多线程进程可以利用IO阻塞等待时的空闲时间执行其他线程，提升效率。所以我们根据实验对比不同场景的效率

format

https://www.cnblogs.com/chunlaipiupiupiu/p/7978669.html

all the input array dimensions except for the concatenation axis must match exactly

result = np.concatenate((result, np.array((question, q2, q2_id, sim, tuple(q2_cut)),dtype=object)),axis=0)
不能拿一个空的result与非空的拼接

vec1 = np.array([[1, 2],
                 [1, 1]])
vec2 = np.array([[10, 20],
                 [1, 1]])

print(np.sum(vec1,axis=1))

vec1 = np.array([1, 2])
vec1 = np.tile(vec1,(5,1))
print(vec1)

[[1 2]
[1 2]
[1 2]
[1 2]
[1 2]]

# def cal():
#
#     dist = np.linalg.norm(vec1-vec2,axis=1)
#     sim = (1.0 / (1.0 + dist))
#     return sim
#
numpy的np.linalg.norm跟scipy的底层一样，但是numpy的可以用两个二维向量相减的方式，即对一个n行矩阵，计算欧氏距离

vec1 = np.array([[1, 2],
                [2,3]])
def cal():

    dist = np.linalg.norm(vec1,axis=1)
    sim = (1.0 / (1.0 + dist))
    return sim

print(cal())

[0.30901699 0.21712927]



=====================================
如何在大规模计算欧氏距离的时候，节省时间
对比使用for循环和使用numpy广播的速度

import time
def cal(vec):

    dist = np.linalg.norm(vec)
    sim = (1.0 / (1.0 + dist))
    return sim

def cal1(vec):

    dist = np.linalg.norm(vec)
    sim = (1.0 / (1.0 + dist))
    return sim

vec1 = np.random.randn(50000)
vec1 = np.tile(vec1,(10000,1))

time1=time.time()
for i in vec1:
    cal(i)
time2 = time.time()
print(time2-time1)

time1=time.time()
cal1(vec1)

time2 = time.time()
print(time2-time1)

0.39404821395874023
0.12035775184631348