5. Limitations
The following are known issues with the current release.
- On Windows, CUPTI samples and other applications using the CUPTI APIs will result in the error "cupti.dll was not found". This is due to a mismatch in the CUPTI dynamic library name referenced in the import library "cupti.lib". To workaround this issue rename the CUPTI dynamic library under the CUDA Toolkit directory (default location is: "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\extras\CUPTI\lib64") from "cupti64_101.dll" to "cupti.dll".
- Profiling is not supported for devices with compute capability 7.5 and higher on IBM POWER, Mac and Tegra platforms. This includes events and metrics APIs from headers cupti_events.h and cupti_metrics.h respectively, PC sampling, SASS source level analysis and NVLink throughput metrics.
- Profiling results might be inconsistent when auto boost is enabled. Profiler tries to disable auto boost by default. But it might fail to do so in some conditions and profiling will continue and results will be inconsistent. API cuptiGetAutoBoostState() can be used to query the auto boost state of the device. This API returns error CUPTI_ERROR_NOT_SUPPORTED on devices that don't support auto boost. Note that auto boost is supported only on certain Tesla devices with compute capability 3.0 and higher.
- CUPTI doesn't populate the activity structures which are deprecated, instead the newer version of the activity structure is filled with the information.
- While collecting events in continuous mode, event reporting may be delayed i.e. event values may be returned by a later call to readEvent(s) API and the event values for the last readEvent(s) API may get lost.
- When profiling events, it is possible that the domain instance that gets profiled gives event value 0 due to absence of workload on the domain instance since CUPTI profiles one instance of the domain by default. To profile all instances of the domain, user can set event group attribute CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES through API cuptiEventGroupSetAttribute().
- Starting CUDA Toolkit 9.0, CUPTI doesn't support CUDA Dynamic Parallelism (CDP) kernel launch tracing and source level metrics for devices with compute capability 7.0 and later.
- CUPTI doesn't support tracing and profiling on virtualized GPUs.
- Profiling results might be incorrect for CUDA applications compiled with nvcc version older than 9.0 for devices with compute capability 6.0 and 6.1. Profiling session will continue and CUPTI will notify it using error code CUPTI_ERROR_CUDA_COMPILER_NOT_COMPATIBLE. It is advised to recompile the application code with nvcc version 9.0 or later. Ignore this warning if code is already compiled with the recommended nvcc version
- Because of the low resolution of the timer on Windows, the start and end timestamps can be same for activities having short execution duration on Windows.
- Profiling (event and metric collection) is not supported for multidevice cooperative kernels, that is, kernels launched by using the API functions cudaLaunchCooperativeKernelMultiDevice or cuLaunchCooperativeKernelMultiDevice.
- The application which calls CUPTI APIs cannot be used with Nvidia tools like nvprof, Nvidia Visual Profiler, Nsight Compute, Nsight Systems, Nvidia Nsight Visual Studio Edition, cuda-gdb and cuda-memcheck.
- Profiling is not supported for CUDA kernel nodes launched by a CUDA Graph.
- CUDA runtime and driver API callbacks for kernel launch are not issued when the stream is in the capture mode.
- Tracing of a CUDA Graph may change its performance characteristics.