I am running “main.cpp” at Host and “foo.cu” at Device.
In main.cpp, call
foo( ... parameters ...);
extern "C" void foo(... parameters ...); extern "C" void test_print();
__global__ void test_print() { int tid; tid = blockIdx.x * blockDim.x + threadIdx.x; cuPrintf("%d\n", tid); } extern "C" void foo(... parameters ...){ cudaPrintfInit(); test_print<<<32,8>>>(); cudaPrintfDisplay(stdout, true); cudaPrintfEnd(); }
Note: This will affect the performance. In my application, the execute time raised from about 500ms to 1000ms.
No comments:
Post a Comment