CUDA编程学习记录

主要是基于 OpenCV 的实现来编程。后面会慢慢记录一些常用的函数介绍和调用接口。

1. cv::cuda::GpuMat 类成员函数

1.1 upload 函数释义

第一种实现

void cv::cuda::GpuMat::upload ( InputArray arr );

Performs data upload to GpuMat (Blocking call).
This function copies data from host memory to device memory.
As being a blocking call, it is guaranteed that the copy operation
is finished when this function returns.

第二种实现

void cv::cuda::GpuMat::upload ( InputArray arr, Stream &stream );

Performs data upload to GpuMat (Non-Blocking call)
This function copies data from host memory to device memory.
As being a non-blocking call, this function may return even if the
copy operation is not finished.
The copy operation may be overlapped with operations in other non-default
streams if stream is not the default stream and dst is HostMem allocated
with HostMem::PAGE_LOCKED option.

2. cv::cuda::Stream 类成员函数

#include <opencv2/core/cuda.hpp>
typedef void(* StreamCallback) (int status, void *userData)

void cv::cuda::Stream::waitForCompletion ();

Blocks the current CPU thread until all operations in the stream are complete.

3. pthread 线程相关的函数

3.1 pthread_cond_broadcast

#include <pthread.h>

int pthread_cond_signal(pthread_cond_t *cond);
int pthread_cond_broadcast(pthread_cond_t *cond);

These two functions are used to unblock threads blocked on a condition variable.
The pthread_cond_signal() call unblocks at least one of the threads that are blocked on the specified condition variable cond(if any threads are blocked on cond)

The pthread_cond_broadcast() call unblocks all threads currently blocked on the specified condition variable cond.

pthread_cond_signal(&cond)的的作用是唤醒所有正在pthread_cond_wait(&cond, &mutex)的至少一个线程。

pthread_cond_broadcast(&cond)的作用是唤醒所有正在pthread_cond_wait(&cond, &mutex)的线程。

3.2 pthread_exit

#include <pthread.h>

void pthread_exit(void *retval);

The pthread_exit() function terminates the calling thread and returns
a value via retval that (if the thread is joinable) is available to another
thread in the same process that calls pthread_join().

使用函数 pthread_exit 退出线程，这是线程的主动行为。

由于一个进程中的多个线程是共享数据段的，因此通常在线程退出之后，退出线程所占用的资源并不会随着线程的终止而得到释放，但是可以用 pthread_join() 函数来同步并释放资源。

retval 为 pthread_exit()调用线程的返回值，可由其他函数如pthread_join来检索获取。

参考资料

[1] CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler https://developer.nvidia.com/blog/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/

[2] How to Implement Performance Metrics in CUDA C/C++ https://developer.nvidia.com/blog/how-implement-performance-metrics-cuda-cc/

[3] CUDA peer to peer多GPU间内存copy技术 https://blog.csdn.net/weixin_42730667/article/details/106481624

[4] 【转载】 NVIDIA RTX2080ti不支持P2P Access，这是真的么？ https://www.cnblogs.com/devilmaycry812839668/p/12370685.html