opencv 一些函数的耗时计算

Release 模式

--------------------------------------------------
smooth gaussian : 2
cvtColor CV_BGR2Lab : 3
get_psnr : 16
convertTo CV_8U: 6
absdiff: 2
threshold: 0
dilate 20: 3
fill_hole : 3
bitwise_xor : 0
shape : 0
copyTo 3 channels : 0
select_color : 14
smooth median : 1
gen_bgra : 3
cal_color : 4
pic_mix : 22
110ms
--------------------------------------------------

smooth gaussian : 5

smooth median : 5

cvtColor CV_BGR2Lab : 3
accumulateWeighted : 11
convertTo CV_8U: 7
absdiff: 2
threshold: 0
dilate 20: 8
fill_hole : 1
bitwise_xor : 0

shape : 0

copyTo 3 channels : 0
select_color : 129
smooth median : 5
gen_bgra : 0
cal_color : 4
pic_mix : 23

gen_bgra : 3

all : 317ms

float , int ， char 的加、乘、移位运算的耗时整理，循环周期为 1000*1000*100

Debug模式：

int 加法:194
int 加两次:391
float 加法:1237
float 乘法:551

Release模式

int 加1次:35
int 加2次:37（分1个循环）

int 加2次：67 (2个循环)
float 加法:292
float 乘法:367

int型的加减乘除移位

右移8位 63ms -----最高效

除以256 97ms

除以256.0 368ms

除以255 144ms

除以255.0 1165ms

32位和8位整形的运算时间大致相等---------所以用32位处理图像数据会更快。

i*7 和(i<<2) + (i<<1) + i 的时间大致相等，所以不需要把整形的乘法改为移位，但是一定要把整形的除法改为移位。

unsigned char [1920*1080] 分配内存 100 000次，总耗时313ms

int [1920*1080] 分配内存 100 000次，总耗时413ms

2. opencv遍历Mat中的short数据

　　 for for { short * param = (short*)(res_map.data + sizeof(short)*of3); } 耗时4ms （DEBUG模式）

　　short * param = &((short*)res_map.data)[of3]; 耗时4ms （DEBUG模式）与上述相同

　　Vec3s& param = res_map.at<Vec3s>(__j,__i); 耗时75 ms （DEBUG模式）------大循环里千万不要用这种方式访问图像

short * param = &((short*)res_map.data)[of3];
short _index = param[0];
short _i = param[1];
short _j = param[2];

----耗时7ms

short * param = &((short*)res_map.data)[of3];
short& _index = param[0];
short& _i = param[1];
short& _j = param[2];

----short用引用耗时7ms

继续增加语句 int _of3 = (_j*wh+i)*3; 耗时8.4ms

继续增加语句 uchar* pixel = &mats[_index]->data[_of3]; ------31ms 麻痹的，大循环里千万不要对vector进行随机访问！！！

pixel 改成这种方法获取 uchar * pixel = &mats[_index].data[of3]; ---- 10ms

继续增加语句

res.data[of3] = pixel[0];
res.data[of3 + 1] = pixel[1];
res.data[of3 + 2] = pixel[2];
of3 += 3;

--------耗时20ms （DEBUG）

上述赋值方法改为

uchar * data = &res.data[of3];
data[0] = pixel[0];
data[1] = pixel[1];
data[2] = pixel[2];

--------耗时20ms （DEBUG）耗时并未改变

改为

uchar * data = &res.data[of3];

memcpy( data,pixel,3 ); 18.1ms

改为

uchar * data = &res.data[of3];

memcpy( data,pixel,4 ); ------18.2ms 耗时减少，可能是由于拷贝的是32位数据的原因？

使用一个循环 for (int j = 0; j != size; ++j) ----18.5ms 和上面一样，并不能减少耗时

3. mix_pix 耗时：

3.1 耗时 36 ms

int _res_y = (fg_y*res_a + bg_y*_255_a)>>8;
int _res_cb = (fg_cb*res_a + bg_cb*_255_a) >> 8;
int _res_cr = (fg_cr*res_a + bg_cr*_255_a) >> 8;

3.2 耗时36 ms

int _res_y = (res_a*(fg_y - bg_y) >> 8) + bg_y;
int _res_cb = (res_a*(fg_cb - bg_cb) >> 8) + bg_cb;
int _res_cr = (res_a*(fg_cr - bg_cr) >> 8) + bg_cr;

3.3 注释mix_pix 代码耗时 28ms

3.4 注释 mix_pix 函数里面的所有代码 29 ms

3.5 注释 int _res_y = (fg_y*res_a + bg_y*_255_a)>>8; 耗时 33ms

3.6 加CLAMP 比不加CLAMP 慢3-4 ms

4. convertT

　　src.convertTo(src_16s, CV_16SC3); 1080片： 3 ms

　　src.convertTo(src_16s, CV_32FC3); 6ms

　　　　src.convertTo(src_16s, CV_32SC3); 6ms