Overview
This page summarizes how to debug JCuda applications and CUDA kernels in JCuda
Introduction
Debugging GPU-based applications is hard. Debugging GPU-based applications that
cross the language boundaries between Java and C is even harder. The lack of
integrated tool support for debugging CUDA via Java limits the possibilities to
analyze in detail what is happening in an application. The (albeit few)
approaches for debugging JCuda applications are shown here.
Detecting errors at API level
The simplest case is when errors can be detected at the API level. That is,
when errors are not caused by bugs in CUDA kernels, but by errors in the
API calls. Nearly all CUDA functions return an error code when something goes
wrong. In JCuda, these errors can be detected quite easily: All libraries
allow setting a flag that causes a
CudaException
to be thrown
when the return value from any function indicates an error.
So when there is a problem with an application, the first debugging
step should be to call
JCuda.setExceptionsEnabled(true);
as the first line of the
main
method. This is done
in all the
samples, so that
the tedious checks of all function return values can be omitted:
If any function returns an error code, an exception will be thrown
that also contains a human-readable representation of the actual
error code.
Note: This flag has to be set for every library individually.
So, for example, in order to detect API level errors in an application
that uses JCublas, one additionally has to call
JCublas.setExceptionsEnabled(true)
.
Detecting errors in kernels
Detecting errors in kernels is a bit more difficult. For native CUDA applications,
there are sophisticated tools like
NVIDIA Nsight for Visual Studio
or
NVIDIA Nsight for Eclipse.
But for Java development, no such tools exist. So the only option for detecting errors
in kernels is to use
cuda-memcheck.
For details about how to use
cuda-memcheck
, you have to refer to
the
cuda-memcheck user guide.
This section will only explain the basic workflow and prerequisites for debugging a CUDA
kernel in JCuda with
cuda-memcheck
.
The following description refers to the
JCudaVectorAdd example.
This is the same sample that is also used in the
Tutorial,
which shows in detail how to create, compile, load and execute CUDA kernels.
For illustration, the
vector addition kernel
that is part of this sample will be modified here to cause invalid memory accesses:
extern "C"
__global__ void add(int n, float *a, float *b, float *sum)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i<n)
{
// Cause invalid memory accesses:
// Reading from an invalid location
// and writing to an invalid location
float ai = a[i-1];
float bi = b[i];
sum[i-1] = ai + bi;
}
}
|
When this kernel is compiled, loaded and executed as described in the
Tutorial,
it will (likely) run normally, and only report that the results of
the computation are wrong.
Again: Despite the invalid memory accesses, the program will not necessarily
crash. There is nothing like an
ArrayIndexOutOfBoundsException
in CUDA.
It may just silently compute the wrong result.
In more complex scenarios, however, the program
may crash. Due to
the asynchronous nature of CUDA, and due to the unpredictable consequences
of overwriting arbitrary memory locations, the program may crash later,
in a seemingly unrelated function - probably with reporting an unspecific
error code like
CUDA_ERROR_LAUNCH_FAILED
.
In order to detect or rule out such errors,
cuda-memcheck
may
be used. It is installed together with the CUDA toolkit,
and should be available at the command prompt after the installation.
In the simplest case, it expects an executable as the only command line
parameter. So it is possible to execute a JCuda program in
cuda-memcheck
with the following command line:
cuda-memcheck java -cp .;jcuda.jar JCudaVectorAdd
If the
classpath
of the application is larger, or additional
parameters should be passed to the application or to
cuda-memcheck
,
then it is convenient to summarize the start of the JCuda application in
a small shell script or batch file. For example, creating a file like
JCudaVectorAddStart.bat
that only contains the line
java -cp .;jcuda.jar JCudaVectorAdd
allows running the JCuda program in
cuda-memcheck
by just calling
cuda-memcheck JCudaVectorAddStart.bat
The output of
cuda-memcheck
may look as follows:
Note: If
cuda-memcheck
prints an error message like
Internal Memcheck Error: Memcheck failed initialization
as some other tools is currently attached. Please make
sure that nvprof and Nsight Visual Studio Edition are
not being run simultaneously
then refer to the
Troubleshooting section below!
As shown in the screenshot above,
cuda-memcheck
will report that there is an error:
Invalid __global__ read of size 4
This corresponds to the first erroneous line in the kernel, where the input
is read from an invalid memory location. It will also say which thread caused
this invalid access, and in which block this thread was located.
After this line has been fixed, re-compiling and running the program will report another error:
Invalid __global__ write of size 4
This corresponds to the line where the result should be written to an invalid
memory location. Fixing this as well will cause
cuda-memcheck
to
report
ERROR SUMMARY: 0 errors
in the next call.
Preserving line numbers
Guessing the line that corresponds to an invalid read/write operation is
only possible for trivial kernels. It is possible to preserve line number
information in PTX- and CUBIN files, which can later be reported by
cuda-memcheck
. Therefore, the
-lineinfo
parameter has to be passed to the NVCC when the PTX- or CUBIN file is
compiled:
nvcc -lineinfo -ptx JCudaVectorAddKernel.cu
After compiling the erroneous kernel with the
-lineinfo
flag and starting the program in
cuda-memcheck
, the output
may look as follows:
Now,
cuda-memcheck
will say that the error was in the
JCudaVectorAddKernel.cu
file, at line 10,
in the
add
kernel.
If
cuda-memcheck
prints an error message like
Internal Memcheck Error: Memcheck failed initialization
as some other tools is currently attached. Please make
sure that nvprof and Nsight Visual Studio Edition are
not being run simultaneously
then you have probably installed some of the NVIDIA Nsight profiling tools. These
will set environment variables
CUDA_INJECTION32_PATH
and
CUDA_INJECTION64_PATH
pointing to some files in the NVIDIA directories. In order to use
cuda-memcheck
,
you may have to reset these variables. This can simply be done in the console where
you want to execute
cuda-memcheck
, by setting them to be empty.
For Windows:
set CUDA_INJECTION32_PATH=
and
set CUDA_INJECTION64_PATH=
For Linux:
unset CUDA_INJECTION32_PATH
and
unset CUDA_INJECTION64_PATH