[笔记] numpy保存文件的耗时记录

简单统计了一下numpy在保存数据到文件时几种方式的耗时。

所用的数据有两个,一个是10000x10000的大矩阵,一个是640x480的小矩阵,分别查看在大数据和小数据上保存和加载的表现。

保存方式有三种:

  • np.save():直接将对象dump为二进制文件,无压缩,文件大
  • np.savez():可同时保存多个对象,加载时通过字典读取,无压缩,文件大
  • np.savez_compressed():将np.savez()的结果进行压缩,文件小

运行环境:Win10 64bit,Python 3.7

测试数据显示:

  • 大矩阵的保存:
    np.save():耗时 550.91 ms, 文件大小 390625.12 KB
    np.savez():耗时 970.71 ms, 文件大小 390625.24 KB
    np.savez_compressed():耗时 36123.80 ms, 是np.save()的65.6倍,文件大小 63423.66 KB,压缩率 6.16

  • 大矩阵的加载,都是np.load(),耗时分别为 488.87 ms, 1162.00 ms, 2158.33 ms。加载压缩数据的时间,是无压缩数据时间的4.4倍。

  • 小矩阵的保存:
    np.save():耗时 1.70 ms, 文件大小 1200.12 KB
    np.savez():耗时 7.98 ms, 文件大小 1200.24 KB
    np.savez_compressed():耗时 146.19 ms, 是np.save()的86倍,文件大小 195.20 KB,压缩率 6.15

  • 小矩阵的加载,都是np.load(),耗时分别为 1.80 ms,6.28 ms,12.97 ms。加载压缩数据的时间,是无压缩数据时间的7.2倍。

由此可见:

  • np.savez()因为有字典操作,所以耗时比np.save()会增加
  • np.savez_compressed()有压缩操作,所以耗时比np.save()大60-90倍,对于随机数据,压缩率在6左右,如果是稀疏矩阵,压缩耗时及压缩率必然不同
  • 加载数据时,加载压缩过的数据耗时是原始数据的4-8倍,如果是稀疏矩阵,解压缩耗时必然不同

结论:

  • 对于偏大的稀疏矩阵,且对存储空间敏感,使用压缩方式存储是值得一试的方式
import os
import os.path as osp
import numpy as np
import time
# - check cost time for func()

def check_time(desc, func, run_times=10):
    t = time.time()
    for i in range(run_times):
        func()
    t = (time.time()-t)*1000/run_times
    print('%s cost avg time = %.2f ms' % (desc, t))
    return t
# - big and small ndarray
big = np.random.randint(0, 10, size=(10000,10000))
small = np.random.randint(0, 10, size=(640,480))

print('big =', big)
print('small =', small)
big = [[0 3 9 ... 7 3 2]
 [9 5 9 ... 5 8 7]
 [2 5 6 ... 3 6 9]
 ...
 [3 6 0 ... 6 0 1]
 [8 0 6 ... 5 1 1]
 [7 0 1 ... 7 7 7]]
small = [[3 7 8 ... 1 4 1]
 [6 1 0 ... 2 1 1]
 [0 7 5 ... 4 3 9]
 ...
 [3 5 4 ... 7 2 2]
 [6 3 1 ... 4 5 9]
 [3 1 9 ... 5 2 5]]
# - npy and npz filename
big_npy_filename = 'big_npy.npy'
big_npz_filename = 'big_npz.npz'
big_compressed_npz_filename = 'big_compressed.npz'

small_npy_filename = 'small_npy.npy'
small_npz_filename = 'small_npz.npz'
small_compressed_npz_filename = 'small_compressed.npz'
# - save functions

def test_save_big_npy():
    np.save(big_npy_filename, big)

def test_save_big_npz():
    np.savez(big_npz_filename, big)

def test_save_big_compressed_npz():
    np.savez_compressed(big_compressed_npz_filename, big)

def test_save_small_npy():
    np.save(small_npy_filename, small)

def test_save_small_npz():
    np.savez(small_npz_filename, small)
    
def test_save_small_compressed_npz():
    np.savez_compressed(small_compressed_npz_filename, small)
# - load functions

def test_load_big_npy():
    return np.load(big_npy_filename)

def test_load_big_npz():
    return np.load(big_npz_filename)['arr_0']

def test_load_big_compressed_npz():
    return np.load(big_compressed_npz_filename)['arr_0']

def test_load_small_npy():
    return np.load(small_npy_filename)

def test_load_small_npz():
    return np.load(small_npz_filename)['arr_0']
    
def test_load_small_compressed_npz():
    return np.load(small_compressed_npz_filename)['arr_0']
# - check save time for big

check_time('save big npy', test_save_big_npy)
check_time('save big npz', test_save_big_npz)
check_time('save big compressed npz', test_save_big_compressed_npz)

for f in [
    big_npy_filename, 
    big_npz_filename, 
    big_compressed_npz_filename
]:
    print('file %s size = %.2f KB' % (f, osp.getsize(f)/1024))
save big npy cost avg time = 550.91 ms
save big npz cost avg time = 970.71 ms
save big compressed npz cost avg time = 36123.80 ms
file big_npy.npy size = 390625.12 KB
file big_npz.npz size = 390625.24 KB
file big_compressed.npz size = 63423.66 KB
# - check load time for big

check_time('load big npy', test_load_big_npy)
check_time('load big npz', test_load_big_npz)
check_time('load big compressed npz', test_load_big_compressed_npz)
load big npy cost avg time = 488.87 ms
load big npz cost avg time = 1162.00 ms
load big compressed npz cost avg time = 2158.33 ms





2158.3264589309692
# - check save time for small

check_time('save small npy', test_save_small_npy)
check_time('save small npz', test_save_small_npz)
check_time('save small compressed npz', test_save_small_compressed_npz)

for f in [
    small_npy_filename, 
    small_npz_filename, 
    small_compressed_npz_filename
]:
    print('file %s size = %.2f KB' % (f, osp.getsize(f)/1024))
save small npy cost avg time = 1.70 ms
save small npz cost avg time = 7.98 ms
save small compressed npz cost avg time = 146.19 ms
file small_npy.npy size = 1200.12 KB
file small_npz.npz size = 1200.24 KB
file small_compressed.npz size = 195.20 KB
# check load time for small

check_time('load small npy', test_load_small_npy)
check_time('load small npz', test_load_small_npz)
check_time('load small compressed npz', test_load_small_compressed_npz)
load small npy cost avg time = 1.80 ms
load small npz cost avg time = 6.28 ms
load small compressed npz cost avg time = 12.97 ms





12.965798377990723
原文地址:https://www.cnblogs.com/journeyonmyway/p/12524425.html