【记录一个问题】cv::cuda::dft()比cv::dft()慢很多

具体的profile调用图如下:


可以看见compute很快,但是构造函数很慢。

nvidia官网看到几篇类似的帖子,但是没有讲明白怎么解决的:



opencv上的参考文档:
https://docs.opencv.org/3.4/d9/d88/group__cudaarithm__arithm.html#gadea99cb15a715c983bcc2870d65a2e78

https://devtalk.nvidia.com/default/topic/1014986/gpu-accelerated-libraries/opencv-dft-vs-gpu-dft-performance-/
OpenCV dft vs. gpu::dft Performance 

https://devtalk.nvidia.com/default/topic/1020341/transfer-data-cpu-gpu-is-an-issue-/
Transfer data CPU/GPU is an issue.. 

========================================================
采用类的方式,避免频繁初始化(但是未验证数据是否准确),性能有所提升,但是仍然比CPU版本的慢。

cv::Ptr<cv::cuda::DFT> dft_handle = cv::cuda::createDFT(d_mul.size(), 0);
dft_handle->compute(d_mul, d_complex_result, stream);
原文地址:https://www.cnblogs.com/ahfuzhang/p/10999730.html