CUPTI Activity API


Data Structures

struct  CUpti_Activity
 The base activity record. More...
struct  CUpti_ActivityAPI
 The activity record for a driver or runtime API invocation. More...
struct  CUpti_ActivityAutoBoostState
 Device auto boost state structure. More...
struct  CUpti_ActivityBranch
 The activity record for source level result branch. (deprecated). More...
struct  CUpti_ActivityBranch2
 The activity record for source level result branch. More...
struct  CUpti_ActivityCdpKernel
 The activity record for CDP (CUDA Dynamic Parallelism) kernel. More...
struct  CUpti_ActivityContext
 The activity record for a context. More...
struct  CUpti_ActivityCudaEvent
 The activity record for CUDA event. More...
struct  CUpti_ActivityDevice
 The activity record for a device. (deprecated). More...
struct  CUpti_ActivityDevice2
 The activity record for a device. (deprecated). More...
struct  CUpti_ActivityDevice3
 The activity record for a device. (CUDA 7.0 onwards). More...
struct  CUpti_ActivityDevice4
 The activity record for a device. (CUDA 11.6 onwards). More...
struct  CUpti_ActivityDeviceAttribute
 The activity record for a device attribute. More...
struct  CUpti_ActivityEnvironment
 The activity record for CUPTI environmental data. More...
struct  CUpti_ActivityEvent
 The activity record for a CUPTI event. More...
struct  CUpti_ActivityEventInstance
 The activity record for a CUPTI event with instance information. More...
struct  CUpti_ActivityExternalCorrelation
 The activity record for correlation with external records. More...
struct  CUpti_ActivityFunction
 The activity record for global/device functions. More...
struct  CUpti_ActivityGlobalAccess
 The activity record for source-level global access. (deprecated). More...
struct  CUpti_ActivityGlobalAccess2
 The activity record for source-level global access. (deprecated in CUDA 9.0). More...
struct  CUpti_ActivityGlobalAccess3
 The activity record for source-level global access. More...
struct  CUpti_ActivityGraphTrace
 The activity record for trace of graph execution. More...
struct  CUpti_ActivityInstantaneousEvent
 The activity record for an instantaneous CUPTI event. More...
struct  CUpti_ActivityInstantaneousEventInstance
 The activity record for an instantaneous CUPTI event with event domain instance information. More...
struct  CUpti_ActivityInstantaneousMetric
 The activity record for an instantaneous CUPTI metric. More...
struct  CUpti_ActivityInstantaneousMetricInstance
 The instantaneous activity record for a CUPTI metric with instance information. More...
struct  CUpti_ActivityInstructionCorrelation
 The activity record for source-level sass/source line-by-line correlation. More...
struct  CUpti_ActivityInstructionExecution
 The activity record for source-level instruction execution. More...
struct  CUpti_ActivityKernel
 The activity record for kernel. (deprecated). More...
struct  CUpti_ActivityKernel2
 The activity record for kernel. (deprecated). More...
struct  CUpti_ActivityKernel3
 The activity record for a kernel (CUDA 6.5(with sm_52 support) onwards). (deprecated in CUDA 9.0). More...
struct  CUpti_ActivityKernel4
 The activity record for a kernel (CUDA 9.0(with sm_70 support) onwards). (deprecated in CUDA 11.0). More...
struct  CUpti_ActivityKernel5
 The activity record for a kernel (CUDA 11.0(with sm_80 support) onwards). (deprecated in CUDA 11.2) This activity record represents a kernel execution (CUPTI_ACTIVITY_KIND_KERNEL and CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL) but is no longer generated by CUPTI. Kernel activities are now reported using the CUpti_ActivityKernel7 activity record. More...
struct  CUpti_ActivityKernel6
 The activity record for kernel. (deprecated in CUDA 11.6). More...
struct  CUpti_ActivityKernel7
 The activity record for kernel. More...
struct  CUpti_ActivityMarker
 The activity record providing a marker which is an instantaneous point in time. (deprecated in CUDA 8.0). More...
struct  CUpti_ActivityMarker2
 The activity record providing a marker which is an instantaneous point in time. More...
struct  CUpti_ActivityMarkerData
 The activity record providing detailed information for a marker. More...
struct  CUpti_ActivityMemcpy
 The activity record for memory copies. (deprecated). More...
struct  CUpti_ActivityMemcpy3
 The activity record for memory copies. (deprecated in CUDA 11.1). More...
struct  CUpti_ActivityMemcpy4
 The activity record for memory copies. (deprecated in CUDA 11.6). More...
struct  CUpti_ActivityMemcpy5
 The activity record for memory copies. More...
struct  CUpti_ActivityMemcpyPtoP
 The activity record for peer-to-peer memory copies. More...
struct  CUpti_ActivityMemcpyPtoP2
 The activity record for peer-to-peer memory copies. (deprecated in CUDA 11.1). More...
struct  CUpti_ActivityMemcpyPtoP3
 The activity record for peer-to-peer memory copies. (deprecated in CUDA 11.6). More...
struct  CUpti_ActivityMemcpyPtoP4
 The activity record for peer-to-peer memory copies. More...
struct  CUpti_ActivityMemory
 The activity record for memory. More...
struct  CUpti_ActivityMemory2
 The activity record for memory. More...
struct  CUpti_ActivityMemory3
 The activity record for memory. More...
struct  CUpti_ActivityMemoryPool
 The activity record for memory pool. More...
struct  CUpti_ActivityMemoryPool2
 The activity record for memory pool. More...
struct  CUpti_ActivityMemset
 The activity record for memset. (deprecated). More...
struct  CUpti_ActivityMemset2
 The activity record for memset. (deprecated in CUDA 11.1). More...
struct  CUpti_ActivityMemset3
 The activity record for memset. (deprecated in CUDA 11.6). More...
struct  CUpti_ActivityMemset4
 The activity record for memset. More...
struct  CUpti_ActivityMetric
 The activity record for a CUPTI metric. More...
struct  CUpti_ActivityMetricInstance
 The activity record for a CUPTI metric with instance information. More...
struct  CUpti_ActivityModule
 The activity record for a CUDA module. More...
struct  CUpti_ActivityName
 The activity record providing a name. More...
struct  CUpti_ActivityNvLink
 NVLink information. (deprecated in CUDA 9.0). More...
struct  CUpti_ActivityNvLink2
 NVLink information. (deprecated in CUDA 10.0). More...
struct  CUpti_ActivityNvLink3
 NVLink information. More...
struct  CUpti_ActivityNvLink4
 NVLink information. More...
union  CUpti_ActivityObjectKindId
 Identifiers for object kinds as specified by CUpti_ActivityObjectKind. More...
struct  CUpti_ActivityOpenAcc
 The base activity record for OpenAcc records. More...
struct  CUpti_ActivityOpenAccData
 The activity record for OpenACC data. More...
struct  CUpti_ActivityOpenAccLaunch
 The activity record for OpenACC launch. More...
struct  CUpti_ActivityOpenAccOther
 The activity record for OpenACC other. More...
struct  CUpti_ActivityOpenMp
 The base activity record for OpenMp records. More...
struct  CUpti_ActivityOverhead
 The activity record for CUPTI and driver overheads. More...
struct  CUpti_ActivityPcie
 PCI devices information required to construct topology. More...
struct  CUpti_ActivityPCSampling
 The activity record for PC sampling. (deprecated in CUDA 8.0). More...
struct  CUpti_ActivityPCSampling2
 The activity record for PC sampling. (deprecated in CUDA 9.0). More...
struct  CUpti_ActivityPCSampling3
 The activity record for PC sampling. More...
struct  CUpti_ActivityPCSamplingConfig
 PC sampling configuration structure. More...
struct  CUpti_ActivityPCSamplingRecordInfo
 The activity record for record status for PC sampling. More...
struct  CUpti_ActivityPreemption
 The activity record for a preemption of a CDP kernel. More...
struct  CUpti_ActivitySharedAccess
 The activity record for source-level shared access. More...
struct  CUpti_ActivitySourceLocator
 The activity record for source locator. More...
struct  CUpti_ActivityStream
 The activity record for CUDA stream. More...
struct  CUpti_ActivitySynchronization
 The activity record for synchronization management. More...
struct  CUpti_ActivityUnifiedMemoryCounter
 The activity record for Unified Memory counters (deprecated in CUDA 7.0). More...
struct  CUpti_ActivityUnifiedMemoryCounter2
 The activity record for Unified Memory counters (CUDA 7.0 and beyond). More...
struct  CUpti_ActivityUnifiedMemoryCounterConfig
 Unified Memory counters configuration structure. More...

Defines

#define CUPTI_AUTO_BOOST_INVALID_CLIENT_PID   0
#define CUPTI_CORRELATION_ID_UNKNOWN   0
#define CUPTI_FUNCTION_INDEX_ID_INVALID   0
#define CUPTI_GRID_ID_UNKNOWN   0LL
#define CUPTI_MAX_NVLINK_PORTS   32
#define CUPTI_NVLINK_INVALID_PORT   -1
#define CUPTI_SOURCE_LOCATOR_ID_UNKNOWN   0
#define CUPTI_SYNCHRONIZATION_INVALID_VALUE   -1
#define CUPTI_TIMESTAMP_UNKNOWN   0LL

Typedefs

typedef void(* CUpti_BuffersCallbackCompleteFunc )(CUcontext context, uint32_t streamId, uint8_t *buffer, size_t size, size_t validSize)
 Function type for callback used by CUPTI to return a buffer of activity records.
typedef void(* CUpti_BuffersCallbackRequestFunc )(uint8_t **buffer, size_t *size, size_t *maxNumRecords)
 Function type for callback used by CUPTI to request an empty buffer for storing activity records.
typedef uint64_t(* CUpti_TimestampCallbackFunc )(void)
 Function type for callback used by CUPTI to request a timestamp to be used in activity records.

Enumerations

enum  CUpti_ActivityAttribute {
  CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE = 0,
  CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP = 1,
  CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT = 2,
  CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE = 3,
  CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT = 4,
  CUPTI_ACTIVITY_ATTR_ZEROED_OUT_ACTIVITY_BUFFER = 5,
  CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE = 6,
  CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE = 7,
  CUPTI_ACTIVITY_ATTR_MEM_ALLOCATION_TYPE_HOST_PINNED = 8
}
 Activity attributes. More...
enum  CUpti_ActivityComputeApiKind {
  CUPTI_ACTIVITY_COMPUTE_API_UNKNOWN = 0,
  CUPTI_ACTIVITY_COMPUTE_API_CUDA = 1,
  CUPTI_ACTIVITY_COMPUTE_API_CUDA_MPS = 2
}
 The kind of a compute API. More...
enum  CUpti_ActivityEnvironmentKind {
  CUPTI_ACTIVITY_ENVIRONMENT_UNKNOWN = 0,
  CUPTI_ACTIVITY_ENVIRONMENT_SPEED = 1,
  CUPTI_ACTIVITY_ENVIRONMENT_TEMPERATURE = 2,
  CUPTI_ACTIVITY_ENVIRONMENT_POWER = 3,
  CUPTI_ACTIVITY_ENVIRONMENT_COOLING = 4
}
 The kind of environment data. Used to indicate what type of data is being reported by an environment activity record. More...
enum  CUpti_ActivityFlag {
  CUPTI_ACTIVITY_FLAG_NONE = 0,
  CUPTI_ACTIVITY_FLAG_DEVICE_CONCURRENT_KERNELS = 1 << 0,
  CUPTI_ACTIVITY_FLAG_DEVICE_ATTRIBUTE_CUDEVICE = 1 << 0,
  CUPTI_ACTIVITY_FLAG_MEMCPY_ASYNC = 1 << 0,
  CUPTI_ACTIVITY_FLAG_MARKER_INSTANTANEOUS = 1 << 0,
  CUPTI_ACTIVITY_FLAG_MARKER_START = 1 << 1,
  CUPTI_ACTIVITY_FLAG_MARKER_END = 1 << 2,
  CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE = 1 << 3,
  CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_SUCCESS = 1 << 4,
  CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_FAILED = 1 << 5,
  CUPTI_ACTIVITY_FLAG_MARKER_SYNC_RELEASE = 1 << 6,
  CUPTI_ACTIVITY_FLAG_MARKER_COLOR_NONE = 1 << 0,
  CUPTI_ACTIVITY_FLAG_MARKER_COLOR_ARGB = 1 << 1,
  CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_SIZE_MASK = 0xFF << 0,
  CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_LOAD = 1 << 8,
  CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_CACHED = 1 << 9,
  CUPTI_ACTIVITY_FLAG_METRIC_OVERFLOWED = 1 << 0,
  CUPTI_ACTIVITY_FLAG_METRIC_VALUE_INVALID = 1 << 1,
  CUPTI_ACTIVITY_FLAG_INSTRUCTION_VALUE_INVALID = 1 << 0,
  CUPTI_ACTIVITY_FLAG_INSTRUCTION_CLASS_MASK = 0xFF << 1,
  CUPTI_ACTIVITY_FLAG_FLUSH_FORCED = 1 << 0,
  CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_SIZE_MASK = 0xFF << 0,
  CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_LOAD = 1 << 8,
  CUPTI_ACTIVITY_FLAG_MEMSET_ASYNC = 1 << 0,
  CUPTI_ACTIVITY_FLAG_THRASHING_IN_CPU = 1 << 0,
  CUPTI_ACTIVITY_FLAG_THROTTLING_IN_CPU = 1 << 0
}
 Flags associated with activity records. More...
enum  CUpti_ActivityInstructionClass {
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_UNKNOWN = 0,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_32 = 1,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_64 = 2,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_INTEGER = 3,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_BIT_CONVERSION = 4,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_CONTROL_FLOW = 5,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_GLOBAL = 6,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_SHARED = 7,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_LOCAL = 8,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_GENERIC = 9,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_SURFACE = 10,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_CONSTANT = 11,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_TEXTURE = 12,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_GLOBAL_ATOMIC = 13,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_SHARED_ATOMIC = 14,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_SURFACE_ATOMIC = 15,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_INTER_THREAD_COMMUNICATION = 16,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_BARRIER = 17,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_MISCELLANEOUS = 18,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_16 = 19,
  CUPTI_ACTIVITY_INSTRUCTION_CLASS_UNIFORM = 20
}
 SASS instruction classification. More...
enum  CUpti_ActivityKind {
  CUPTI_ACTIVITY_KIND_INVALID = 0,
  CUPTI_ACTIVITY_KIND_MEMCPY = 1,
  CUPTI_ACTIVITY_KIND_MEMSET = 2,
  CUPTI_ACTIVITY_KIND_KERNEL = 3,
  CUPTI_ACTIVITY_KIND_DRIVER = 4,
  CUPTI_ACTIVITY_KIND_RUNTIME = 5,
  CUPTI_ACTIVITY_KIND_EVENT = 6,
  CUPTI_ACTIVITY_KIND_METRIC = 7,
  CUPTI_ACTIVITY_KIND_DEVICE = 8,
  CUPTI_ACTIVITY_KIND_CONTEXT = 9,
  CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL = 10,
  CUPTI_ACTIVITY_KIND_NAME = 11,
  CUPTI_ACTIVITY_KIND_MARKER = 12,
  CUPTI_ACTIVITY_KIND_MARKER_DATA = 13,
  CUPTI_ACTIVITY_KIND_SOURCE_LOCATOR = 14,
  CUPTI_ACTIVITY_KIND_GLOBAL_ACCESS = 15,
  CUPTI_ACTIVITY_KIND_BRANCH = 16,
  CUPTI_ACTIVITY_KIND_OVERHEAD = 17,
  CUPTI_ACTIVITY_KIND_CDP_KERNEL = 18,
  CUPTI_ACTIVITY_KIND_PREEMPTION = 19,
  CUPTI_ACTIVITY_KIND_ENVIRONMENT = 20,
  CUPTI_ACTIVITY_KIND_EVENT_INSTANCE = 21,
  CUPTI_ACTIVITY_KIND_MEMCPY2 = 22,
  CUPTI_ACTIVITY_KIND_METRIC_INSTANCE = 23,
  CUPTI_ACTIVITY_KIND_INSTRUCTION_EXECUTION = 24,
  CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER = 25,
  CUPTI_ACTIVITY_KIND_FUNCTION = 26,
  CUPTI_ACTIVITY_KIND_MODULE = 27,
  CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE = 28,
  CUPTI_ACTIVITY_KIND_SHARED_ACCESS = 29,
  CUPTI_ACTIVITY_KIND_PC_SAMPLING = 30,
  CUPTI_ACTIVITY_KIND_PC_SAMPLING_RECORD_INFO = 31,
  CUPTI_ACTIVITY_KIND_INSTRUCTION_CORRELATION = 32,
  CUPTI_ACTIVITY_KIND_OPENACC_DATA = 33,
  CUPTI_ACTIVITY_KIND_OPENACC_LAUNCH = 34,
  CUPTI_ACTIVITY_KIND_OPENACC_OTHER = 35,
  CUPTI_ACTIVITY_KIND_CUDA_EVENT = 36,
  CUPTI_ACTIVITY_KIND_STREAM = 37,
  CUPTI_ACTIVITY_KIND_SYNCHRONIZATION = 38,
  CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION = 39,
  CUPTI_ACTIVITY_KIND_NVLINK = 40,
  CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT = 41,
  CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT_INSTANCE = 42,
  CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC = 43,
  CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC_INSTANCE = 44,
  CUPTI_ACTIVITY_KIND_MEMORY = 45,
  CUPTI_ACTIVITY_KIND_PCIE = 46,
  CUPTI_ACTIVITY_KIND_OPENMP = 47,
  CUPTI_ACTIVITY_KIND_INTERNAL_LAUNCH_API = 48,
  CUPTI_ACTIVITY_KIND_MEMORY2 = 49,
  CUPTI_ACTIVITY_KIND_MEMORY_POOL = 50,
  CUPTI_ACTIVITY_KIND_GRAPH_TRACE = 51
}
 The kinds of activity records. More...
enum  CUpti_ActivityLaunchType {
  CUPTI_ACTIVITY_LAUNCH_TYPE_REGULAR = 0,
  CUPTI_ACTIVITY_LAUNCH_TYPE_COOPERATIVE_SINGLE_DEVICE = 1,
  CUPTI_ACTIVITY_LAUNCH_TYPE_COOPERATIVE_MULTI_DEVICE = 2
}
 The type of the CUDA kernel launch. More...
enum  CUpti_ActivityMemcpyKind {
  CUPTI_ACTIVITY_MEMCPY_KIND_UNKNOWN = 0,
  CUPTI_ACTIVITY_MEMCPY_KIND_HTOD = 1,
  CUPTI_ACTIVITY_MEMCPY_KIND_DTOH = 2,
  CUPTI_ACTIVITY_MEMCPY_KIND_HTOA = 3,
  CUPTI_ACTIVITY_MEMCPY_KIND_ATOH = 4,
  CUPTI_ACTIVITY_MEMCPY_KIND_ATOA = 5,
  CUPTI_ACTIVITY_MEMCPY_KIND_ATOD = 6,
  CUPTI_ACTIVITY_MEMCPY_KIND_DTOA = 7,
  CUPTI_ACTIVITY_MEMCPY_KIND_DTOD = 8,
  CUPTI_ACTIVITY_MEMCPY_KIND_HTOH = 9,
  CUPTI_ACTIVITY_MEMCPY_KIND_PTOP = 10
}
 The kind of a memory copy, indicating the source and destination targets of the copy. More...
enum  CUpti_ActivityMemoryKind {
  CUPTI_ACTIVITY_MEMORY_KIND_UNKNOWN = 0,
  CUPTI_ACTIVITY_MEMORY_KIND_PAGEABLE = 1,
  CUPTI_ACTIVITY_MEMORY_KIND_PINNED = 2,
  CUPTI_ACTIVITY_MEMORY_KIND_DEVICE = 3,
  CUPTI_ACTIVITY_MEMORY_KIND_ARRAY = 4,
  CUPTI_ACTIVITY_MEMORY_KIND_MANAGED = 5,
  CUPTI_ACTIVITY_MEMORY_KIND_DEVICE_STATIC = 6,
  CUPTI_ACTIVITY_MEMORY_KIND_MANAGED_STATIC = 7
}
 The kinds of memory accessed by a memory operation/copy. More...
enum  CUpti_ActivityMemoryOperationType { ,
  CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_ALLOCATION = 1,
  CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_RELEASE = 2
}
 Memory operation types. More...
enum  CUpti_ActivityMemoryPoolOperationType { ,
  CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_CREATED = 1,
  CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_DESTROYED = 2,
  CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_TRIMMED = 3
}
 Memory pool operation types. More...
enum  CUpti_ActivityMemoryPoolType { ,
  CUPTI_ACTIVITY_MEMORY_POOL_TYPE_LOCAL = 1,
  CUPTI_ACTIVITY_MEMORY_POOL_TYPE_IMPORTED = 2
}
 Memory pool types. More...
enum  CUpti_ActivityObjectKind {
  CUPTI_ACTIVITY_OBJECT_UNKNOWN = 0,
  CUPTI_ACTIVITY_OBJECT_PROCESS = 1,
  CUPTI_ACTIVITY_OBJECT_THREAD = 2,
  CUPTI_ACTIVITY_OBJECT_DEVICE = 3,
  CUPTI_ACTIVITY_OBJECT_CONTEXT = 4,
  CUPTI_ACTIVITY_OBJECT_STREAM = 5
}
 The kinds of activity objects. More...
enum  CUpti_ActivityOverheadKind {
  CUPTI_ACTIVITY_OVERHEAD_UNKNOWN = 0,
  CUPTI_ACTIVITY_OVERHEAD_DRIVER_COMPILER = 1,
  CUPTI_ACTIVITY_OVERHEAD_CUPTI_BUFFER_FLUSH = 1<<16,
  CUPTI_ACTIVITY_OVERHEAD_CUPTI_INSTRUMENTATION = 2<<16,
  CUPTI_ACTIVITY_OVERHEAD_CUPTI_RESOURCE = 3<<16
}
 The kinds of activity overhead. More...
enum  CUpti_ActivityPartitionedGlobalCacheConfig {
  CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_UNKNOWN = 0,
  CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_NOT_SUPPORTED = 1,
  CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_OFF = 2,
  CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_ON = 3
}
 Partitioned global caching option. More...
enum  CUpti_ActivityPCSamplingPeriod {
  CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_INVALID = 0,
  CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MIN = 1,
  CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_LOW = 2,
  CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MID = 3,
  CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_HIGH = 4,
  CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MAX = 5
}
 Sampling period for PC sampling method. More...
enum  CUpti_ActivityPCSamplingStallReason {
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_INVALID = 0,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_NONE = 1,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_INST_FETCH = 2,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_EXEC_DEPENDENCY = 3,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_MEMORY_DEPENDENCY = 4,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_TEXTURE = 5,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_SYNC = 6,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_CONSTANT_MEMORY_DEPENDENCY = 7,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_PIPE_BUSY = 8,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_MEMORY_THROTTLE = 9,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_NOT_SELECTED = 10,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_OTHER = 11,
  CUPTI_ACTIVITY_PC_SAMPLING_STALL_SLEEPING = 12
}
 The stall reason for PC sampling activity. More...
enum  CUpti_ActivityPreemptionKind {
  CUPTI_ACTIVITY_PREEMPTION_KIND_UNKNOWN = 0,
  CUPTI_ACTIVITY_PREEMPTION_KIND_SAVE = 1,
  CUPTI_ACTIVITY_PREEMPTION_KIND_RESTORE = 2
}
 The kind of a preemption activity. More...
enum  CUpti_ActivityStreamFlag {
  CUPTI_ACTIVITY_STREAM_CREATE_FLAG_UNKNOWN = 0,
  CUPTI_ACTIVITY_STREAM_CREATE_FLAG_DEFAULT = 1,
  CUPTI_ACTIVITY_STREAM_CREATE_FLAG_NON_BLOCKING = 2,
  CUPTI_ACTIVITY_STREAM_CREATE_FLAG_NULL = 3,
  CUPTI_ACTIVITY_STREAM_CREATE_MASK = 0xFFFF
}
 stream type. More...
enum  CUpti_ActivitySynchronizationType {
  CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_UNKNOWN = 0,
  CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_EVENT_SYNCHRONIZE = 1,
  CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_STREAM_WAIT_EVENT = 2,
  CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_STREAM_SYNCHRONIZE = 3,
  CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_CONTEXT_SYNCHRONIZE = 4
}
 Synchronization type. More...
enum  CUpti_ActivityThreadIdType {
  CUPTI_ACTIVITY_THREAD_ID_TYPE_DEFAULT = 0,
  CUPTI_ACTIVITY_THREAD_ID_TYPE_SYSTEM = 1
}
 Thread-Id types. More...
enum  CUpti_ActivityUnifiedMemoryAccessType {
  CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_UNKNOWN = 0,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_READ = 1,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_WRITE = 2,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_ATOMIC = 3,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_PREFETCH = 4
}
 Memory access type for unified memory page faults. More...
enum  CUpti_ActivityUnifiedMemoryCounterKind {
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_UNKNOWN = 0,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD = 1,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH = 2,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT = 3,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT = 4,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING = 5,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING = 6,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP = 7,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOD = 8
}
 Kind of the Unified Memory counter. More...
enum  CUpti_ActivityUnifiedMemoryCounterScope {
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_UNKNOWN = 0,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_PROCESS_SINGLE_DEVICE = 1,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_PROCESS_ALL_DEVICES = 2
}
 Scope of the unified memory counter (deprecated in CUDA 7.0). More...
enum  CUpti_ActivityUnifiedMemoryMigrationCause {
  CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_UNKNOWN = 0,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_USER = 1,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_COHERENCE = 2,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_PREFETCH = 3,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_EVICTION = 4,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_ACCESS_COUNTERS = 5
}
 Migration cause of the Unified Memory counter. More...
enum  CUpti_ActivityUnifiedMemoryRemoteMapCause {
  CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_UNKNOWN = 0,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_COHERENCE = 1,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_THRASHING = 2,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_POLICY = 3,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_OUT_OF_MEMORY = 4,
  CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_EVICTION = 5
}
 Remote memory map cause of the Unified Memory counter. More...
enum  CUpti_DeviceVirtualizationMode {
  CUPTI_DEVICE_VIRTUALIZATION_MODE_NONE = 0,
  CUPTI_DEVICE_VIRTUALIZATION_MODE_PASS_THROUGH = 1,
  CUPTI_DEVICE_VIRTUALIZATION_MODE_VIRTUAL_GPU = 2
}
enum  CUpti_DevType { ,
  CUPTI_DEV_TYPE_GPU = 1,
  CUPTI_DEV_TYPE_NPU = 2
}
 The device type for device connected to NVLink. More...
enum  CUpti_EnvironmentClocksThrottleReason {
  CUPTI_CLOCKS_THROTTLE_REASON_GPU_IDLE = 0x00000001,
  CUPTI_CLOCKS_THROTTLE_REASON_USER_DEFINED_CLOCKS = 0x00000002,
  CUPTI_CLOCKS_THROTTLE_REASON_SW_POWER_CAP = 0x00000004,
  CUPTI_CLOCKS_THROTTLE_REASON_HW_SLOWDOWN = 0x00000008,
  CUPTI_CLOCKS_THROTTLE_REASON_UNKNOWN = 0x80000000,
  CUPTI_CLOCKS_THROTTLE_REASON_UNSUPPORTED = 0x40000000,
  CUPTI_CLOCKS_THROTTLE_REASON_NONE = 0x00000000
}
 Reasons for clock throttling. More...
enum  CUpti_ExternalCorrelationKind { ,
  CUPTI_EXTERNAL_CORRELATION_KIND_UNKNOWN = 1,
  CUPTI_EXTERNAL_CORRELATION_KIND_OPENACC = 2,
  CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM0 = 3,
  CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1 = 4,
  CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM2 = 5,
  CUPTI_EXTERNAL_CORRELATION_KIND_SIZE
}
 The kind of external APIs supported for correlation. More...
enum  CUpti_FuncShmemLimitConfig
 The shared memory limit per block config for a kernel This should be used to set 'cudaOccFuncShmemConfig' field in occupancy calculator API.
enum  CUpti_LinkFlag { ,
  CUPTI_LINK_FLAG_PEER_ACCESS = (1 << 1),
  CUPTI_LINK_FLAG_SYSMEM_ACCESS = (1 << 2),
  CUPTI_LINK_FLAG_PEER_ATOMICS = (1 << 3),
  CUPTI_LINK_FLAG_SYSMEM_ATOMICS = (1 << 4)
}
 Link flags. More...
enum  CUpti_OpenAccConstructKind
 The OpenAcc parent construct kind for OpenAcc activity records.
enum  CUpti_OpenAccEventKind
 The OpenAcc event kind for OpenAcc activity records. More...
enum  CUpti_PcieDeviceType {
  CUPTI_PCIE_DEVICE_TYPE_GPU = 0,
  CUPTI_PCIE_DEVICE_TYPE_BRIDGE = 1
}
enum  CUpti_PcieGen {
  CUPTI_PCIE_GEN_GEN1 = 1,
  CUPTI_PCIE_GEN_GEN2 = 2,
  CUPTI_PCIE_GEN_GEN3 = 3,
  CUPTI_PCIE_GEN_GEN4 = 4,
  CUPTI_PCIE_GEN_GEN5 = 5
}
 PCIE Generation. More...

Functions

CUptiResult cuptiActivityConfigurePCSampling (CUcontext ctx, CUpti_ActivityPCSamplingConfig *config)
 Set PC sampling configuration.
CUptiResult cuptiActivityConfigureUnifiedMemoryCounter (CUpti_ActivityUnifiedMemoryCounterConfig *config, uint32_t count)
 Set Unified Memory Counter configuration.
CUptiResult cuptiActivityDisable (CUpti_ActivityKind kind)
 Disable collection of a specific kind of activity record.
CUptiResult cuptiActivityDisableContext (CUcontext context, CUpti_ActivityKind kind)
 Disable collection of a specific kind of activity record for a context.
CUptiResult cuptiActivityEnable (CUpti_ActivityKind kind)
 Enable collection of a specific kind of activity record.
CUptiResult cuptiActivityEnableAndDump (CUpti_ActivityKind kind)
 Enable collection of a specific kind of activity record. For certain activity kinds it dumps existing records.
CUptiResult cuptiActivityEnableContext (CUcontext context, CUpti_ActivityKind kind)
 Enable collection of a specific kind of activity record for a context.
CUptiResult cuptiActivityEnableLatencyTimestamps (uint8_t enable)
 Controls the collection of queued and submitted timestamps for kernels.
CUptiResult cuptiActivityEnableLaunchAttributes (uint8_t enable)
 Controls the collection of launch attributes for kernels.
CUptiResult cuptiActivityFlush (CUcontext context, uint32_t streamId, uint32_t flag)
 Wait for all activity records to be delivered via the completion callback.
CUptiResult cuptiActivityFlushAll (uint32_t flag)
 Request to deliver activity records via the buffer completion callback.
CUptiResult cuptiActivityFlushPeriod (uint32_t time)
 Sets the flush period for the worker thread.
CUptiResult cuptiActivityGetAttribute (CUpti_ActivityAttribute attr, size_t *valueSize, void *value)
 Read an activity API attribute.
CUptiResult cuptiActivityGetNextRecord (uint8_t *buffer, size_t validBufferSizeBytes, CUpti_Activity **record)
 Iterate over the activity records in a buffer.
CUptiResult cuptiActivityGetNumDroppedRecords (CUcontext context, uint32_t streamId, size_t *dropped)
 Get the number of activity records that were dropped of insufficient buffer space.
CUptiResult cuptiActivityPopExternalCorrelationId (CUpti_ExternalCorrelationKind kind, uint64_t *lastId)
 Pop an external correlation id for the calling thread.
CUptiResult cuptiActivityPushExternalCorrelationId (CUpti_ExternalCorrelationKind kind, uint64_t id)
 Push an external correlation id for the calling thread.
CUptiResult cuptiActivityRegisterCallbacks (CUpti_BuffersCallbackRequestFunc funcBufferRequested, CUpti_BuffersCallbackCompleteFunc funcBufferCompleted)
 Registers callback functions with CUPTI for activity buffer handling.
CUptiResult cuptiActivityRegisterTimestampCallback (CUpti_TimestampCallbackFunc funcTimestamp)
 Registers callback function with CUPTI for providing timestamp.
CUptiResult cuptiActivitySetAttribute (CUpti_ActivityAttribute attr, size_t *valueSize, void *value)
 Write an activity API attribute.
CUptiResult cuptiComputeCapabilitySupported (int major, int minor, int *support)
 Check support for a compute capability.
CUptiResult cuptiDeviceSupported (CUdevice dev, int *support)
 Check support for a compute device.
CUptiResult cuptiDeviceVirtualizationMode (CUdevice dev, CUpti_DeviceVirtualizationMode *mode)
 Query the virtualization mode of the device.
CUptiResult cuptiFinalize (void)
 Detach CUPTI from the running process.
CUptiResult cuptiGetAutoBoostState (CUcontext context, CUpti_ActivityAutoBoostState *state)
 Get auto boost state.
CUptiResult cuptiGetContextId (CUcontext context, uint32_t *contextId)
 Get the ID of a context.
CUptiResult cuptiGetDeviceId (CUcontext context, uint32_t *deviceId)
 Get the ID of a device.
CUptiResult cuptiGetGraphId (CUgraph graph, uint32_t *pId)
 Get the unique ID of graph.
CUptiResult cuptiGetGraphNodeId (CUgraphNode node, uint64_t *nodeId)
 Get the unique ID of a graph node.
CUptiResult cuptiGetLastError (void)
 Returns the last error from a cupti call or callback.
CUptiResult cuptiGetStreamId (CUcontext context, CUstream stream, uint32_t *streamId)
 Get the ID of a stream.
CUptiResult cuptiGetStreamIdEx (CUcontext context, CUstream stream, uint8_t perThreadStream, uint32_t *streamId)
 Get the ID of a stream.
CUptiResult cuptiGetThreadIdType (CUpti_ActivityThreadIdType *type)
 Get the thread-id type.
CUptiResult cuptiGetTimestamp (uint64_t *timestamp)
 Get the CUPTI timestamp.
CUptiResult cuptiSetThreadIdType (CUpti_ActivityThreadIdType type)
 Set the thread-id type.

Detailed Description

Functions, types, and enums that implement the CUPTI Activity API.

Define Documentation

#define CUPTI_AUTO_BOOST_INVALID_CLIENT_PID   0

An invalid/unknown process id.

#define CUPTI_CORRELATION_ID_UNKNOWN   0

An invalid/unknown correlation ID. A correlation ID of this value indicates that there is no correlation for the activity record.

#define CUPTI_FUNCTION_INDEX_ID_INVALID   0

An invalid function index ID.

#define CUPTI_GRID_ID_UNKNOWN   0LL

An invalid/unknown grid ID.

#define CUPTI_MAX_NVLINK_PORTS   32

Maximum NVLink port numbers.

#define CUPTI_NVLINK_INVALID_PORT   -1

Invalid/unknown NVLink port number.

#define CUPTI_SOURCE_LOCATOR_ID_UNKNOWN   0

The source-locator ID that indicates an unknown source location. There is not an actual CUpti_ActivitySourceLocator object corresponding to this value.

#define CUPTI_SYNCHRONIZATION_INVALID_VALUE   -1

An invalid/unknown value.

#define CUPTI_TIMESTAMP_UNKNOWN   0LL

An invalid/unknown timestamp for a start, end, queued, submitted, or completed time.


Typedef Documentation

typedef void( * CUpti_BuffersCallbackCompleteFunc)(CUcontext context, uint32_t streamId, uint8_t *buffer, size_t size, size_t validSize)

This callback function returns to the CUPTI client a buffer containing activity records. The buffer contains validSize bytes of activity records which should be read using cuptiActivityGetNextRecord. The number of dropped records can be read using cuptiActivityGetNumDroppedRecords. After this call CUPTI relinquished ownership of the buffer and will not use it anymore. The client may return the buffer to CUPTI using the CUpti_BuffersCallbackRequestFunc callback. Note: CUDA 6.0 onwards, all buffers returned by this callback are global buffers i.e. there is no context/stream specific buffer. User needs to parse the global buffer to extract the context/stream specific activity records.

Parameters:
context The context this buffer is associated with. If NULL, the buffer is associated with the global activities. This field is deprecated as of CUDA 6.0 and will always be NULL.
streamId The stream id this buffer is associated with. This field is deprecated as of CUDA 6.0 and will always be NULL.
buffer The activity record buffer.
size The total size of the buffer in bytes as set in CUpti_BuffersCallbackRequestFunc.
validSize The number of valid bytes in the buffer.

typedef void( * CUpti_BuffersCallbackRequestFunc)(uint8_t **buffer, size_t *size, size_t *maxNumRecords)

This callback function signals the CUPTI client that an activity buffer is needed by CUPTI. The activity buffer is used by CUPTI to store activity records. The callback function can decline the request by setting *buffer to NULL. In this case CUPTI may drop activity records.

Parameters:
buffer Returns the new buffer. If set to NULL then no buffer is returned.
size Returns the size of the returned buffer.
maxNumRecords Returns the maximum number of records that should be placed in the buffer. If 0 then the buffer is filled with as many records as possible. If > 0 the buffer is filled with at most that many records before it is returned.

typedef uint64_t( * CUpti_TimestampCallbackFunc)(void)

This callback function signals the CUPTI client that a timestamp needs to be returned. This timestamp would be treated as normalized timestamp to be used for various purposes in CUPTI. For example to store start and end timestamps reported in the CUPTI activity records. The returned timestamp must be in nanoseconds.

See also:
cuptiActivityRegisterTimestampCallback


Enumeration Type Documentation

These attributes are used to control the behavior of the activity API.

Enumerator:
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE  The device memory size (in bytes) reserved for storing profiling data for concurrent kernels (activity kind CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL), memcopies and memsets for each buffer on a context. The value is a size_t.

There is a limit on how many device buffers can be allocated per context. User can query and set this limit using the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT. CUPTI doesn't pre-allocate all the buffers, it pre-allocates only those many buffers as set by the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE. When all of the data in a buffer is consumed, it is added in the reuse pool, and CUPTI picks a buffer from this pool when a new buffer is needed. Thus memory footprint does not scale with the kernel count. Applications with the high density of kernels, memcopies and memsets might result in having CUPTI to allocate more device buffers. CUPTI allocates another buffer only when it runs out of the buffers in the reuse pool.

Since buffer allocation happens in the main application thread, this might result in stalls in the critical path. CUPTI pre-allocates 3 buffers of the same size to mitigate this issue. User can query and set the pre-allocation limit using the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE.

Having larger buffer size leaves less device memory for the application. Having smaller buffer size increases the risk of dropping timestamps for records if too many kernels or memcopies or memsets are launched at one time.

This value only applies to new buffer allocations. Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.

The default value is 3200000 (~3MB) which can accommodate profiling data up to 100,000 kernels, memcopies and memsets combined.

Note: Starting with the CUDA 11.2 release, CUPTI allocates profiling buffer in the pinned host memory by default as this might help in improving the performance of the tracing run. Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_MEM_ALLOCATION_TYPE_HOST_PINNED for more details. Size of the memory and maximum number of pools are still controlled by the attributes CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE and CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT.

Note: The actual amount of device memory per buffer reserved by CUPTI might be larger.

CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP  The device memory size (in bytes) reserved for storing profiling data for CDP operations for each buffer on a context. The value is a size_t.

Having larger buffer size means less flush operations but consumes more device memory. This value only applies to new allocations.

Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.

The default value is 8388608 (8MB).

Note: The actual amount of device memory per context reserved by CUPTI might be larger.

CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT  The maximum number of device memory buffers per context. The value is a size_t.

For an application with high rate of kernel launches, memcopies and memsets having a bigger pool limit helps in timestamp collection for all these activties at the expense of a larger memory footprint. Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE for more details.

Setting this value will not modify the number of memory buffers currently stored.

Set this value before initializing CUDA to ensure the limit is not exceeded.

The default value is 250.

CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE  The profiling semaphore pool size reserved for storing profiling data for serialized kernels tracing (activity kind CUPTI_ACTIVITY_KIND_KERNEL) for each context. The value is a size_t.

There is a limit on how many semaphore pools can be allocated per context. User can query and set this limit using the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT. CUPTI doesn't pre-allocate all the semaphore pools, it pre-allocates only those many semaphore pools as set by the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE. When all of the data in a semaphore pool is consumed, it is added in the reuse pool, and CUPTI picks a semaphore pool from the reuse pool when a new semaphore pool is needed. Thus memory footprint does not scale with the kernel count. Applications with the high density of kernels might result in having CUPTI to allocate more semaphore pools. CUPTI allocates another semaphore pool only when it runs out of the semaphore pools in the reuse pool.

Since semaphore pool allocation happens in the main application thread, this might result in stalls in the critical path. CUPTI pre-allocates 3 semaphore pools of the same size to mitigate this issue. User can query and set the pre-allocation limit using the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE.

Having larger semaphore pool size leaves less device memory for the application. Having smaller semaphore pool size increases the risk of dropping timestamps for kernel records if too many kernels are issued/launched at one time.

This value only applies to new semaphore pool allocations. Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.

The default value is 25000 which can accommodate profiling data for upto 25,000 kernels.

CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT  The maximum number of profiling semaphore pools per context. The value is a size_t.

For an application with high rate of kernel launches, having a bigger pool limit helps in timestamp collection for all the kernels, at the expense of a larger device memory footprint. Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE for more details.

Set this value before initializing CUDA to ensure the limit is not exceeded.

The default value is 250.

CUPTI_ACTIVITY_ATTR_ZEROED_OUT_ACTIVITY_BUFFER  The flag to indicate whether user should provide activity buffer of zero value. The value is a uint8_t.

If the value of this attribute is non-zero, user should provide a zero value buffer in the CUpti_BuffersCallbackRequestFunc. If the user does not provide a zero value buffer after setting this to non-zero, the activity buffer may contain some uninitialized values when CUPTI returns it in CUpti_BuffersCallbackCompleteFunc

If the value of this attribute is zero, CUPTI will initialize the user buffer received in the CUpti_BuffersCallbackRequestFunc to zero before filling it. If the user sets this to zero, a few stalls may appear in critical path because CUPTI will zero out the buffer in the main thread. Set this value before returning from CUpti_BuffersCallbackRequestFunc to ensure it is considered for all the subsequent user buffers.

The default value is 0.

CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE  Number of device buffers to pre-allocate for a context during the initialization phase. The value is a size_t.

Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE for details.

This value must be less than the maximum number of device buffers set using the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT

Set this value before initializing CUDA or before creating a context to ensure it is considered by the CUPTI.

The default value is set to 3 to ping pong between these buffers (if possible).

CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE  Number of profiling semaphore pools to pre-allocate for a context during the initialization phase. The value is a size_t.

Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE for details.

This value must be less than the maximum number of profiling semaphore pools set using the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT

Set this value before initializing CUDA or before creating a context to ensure it is considered by the CUPTI.

The default value is set to 3 to ping pong between these pools (if possible).

CUPTI_ACTIVITY_ATTR_MEM_ALLOCATION_TYPE_HOST_PINNED  Allocate page-locked (pinned) host memory for storing profiling data for concurrent kernels, memcopies and memsets for each buffer on a context. The value is a uint8_t.

Starting with the CUDA 11.2 release, CUPTI allocates profiling buffer in the pinned host memory by default as this might help in improving the performance of the tracing run. Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging. For this reason user might want to change the location from pinned host memory to device memory by setting value of this attribute to 0.

The default value is 1.

Enumerator:
CUPTI_ACTIVITY_COMPUTE_API_UNKNOWN  The compute API is not known.
CUPTI_ACTIVITY_COMPUTE_API_CUDA  The compute APIs are for CUDA.
CUPTI_ACTIVITY_COMPUTE_API_CUDA_MPS  The compute APIs are for CUDA running in MPS (Multi-Process Service) environment.

Enumerator:
CUPTI_ACTIVITY_ENVIRONMENT_UNKNOWN  Unknown data.
CUPTI_ACTIVITY_ENVIRONMENT_SPEED  The environment data is related to speed.
CUPTI_ACTIVITY_ENVIRONMENT_TEMPERATURE  The environment data is related to temperature.
CUPTI_ACTIVITY_ENVIRONMENT_POWER  The environment data is related to power.
CUPTI_ACTIVITY_ENVIRONMENT_COOLING  The environment data is related to cooling.

Activity record flags. Flags can be combined by bitwise OR to associated multiple flags with an activity record. Each flag is specific to a certain activity kind, as noted below.

Enumerator:
CUPTI_ACTIVITY_FLAG_NONE  Indicates the activity record has no flags.
CUPTI_ACTIVITY_FLAG_DEVICE_CONCURRENT_KERNELS  Indicates the activity represents a device that supports concurrent kernel execution. Valid for CUPTI_ACTIVITY_KIND_DEVICE.
CUPTI_ACTIVITY_FLAG_DEVICE_ATTRIBUTE_CUDEVICE  Indicates if the activity represents a CUdevice_attribute value or a CUpti_DeviceAttribute value. Valid for CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE.
CUPTI_ACTIVITY_FLAG_MEMCPY_ASYNC  Indicates the activity represents an asynchronous memcpy operation. Valid for CUPTI_ACTIVITY_KIND_MEMCPY.
CUPTI_ACTIVITY_FLAG_MARKER_INSTANTANEOUS  Indicates the activity represents an instantaneous marker. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_START  Indicates the activity represents a region start marker. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_END  Indicates the activity represents a region end marker. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE  Indicates the activity represents an attempt to acquire a user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_SUCCESS  Indicates the activity represents success in acquiring the user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_FAILED  Indicates the activity represents failure in acquiring the user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_RELEASE  Indicates the activity represents releasing a reservation on user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_COLOR_NONE  Indicates the activity represents a marker that does not specify a color. Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA.
CUPTI_ACTIVITY_FLAG_MARKER_COLOR_ARGB  Indicates the activity represents a marker that specifies a color in alpha-red-green-blue format. Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA.
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_SIZE_MASK  The number of bytes requested by each thread Valid for CUpti_ActivityGlobalAccess3.
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_LOAD  If bit in this flag is set, the access was load, else it is a store access. Valid for CUpti_ActivityGlobalAccess3.
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_CACHED  If this bit in flag is set, the load access was cached else it is uncached. Valid for CUpti_ActivityGlobalAccess3.
CUPTI_ACTIVITY_FLAG_METRIC_OVERFLOWED  If this bit in flag is set, the metric value overflowed. Valid for CUpti_ActivityMetric and CUpti_ActivityMetricInstance.
CUPTI_ACTIVITY_FLAG_METRIC_VALUE_INVALID  If this bit in flag is set, the metric value couldn't be calculated. This occurs when a value(s) required to calculate the metric is missing. Valid for CUpti_ActivityMetric and CUpti_ActivityMetricInstance.
CUPTI_ACTIVITY_FLAG_INSTRUCTION_VALUE_INVALID  If this bit in flag is set, the source level metric value couldn't be calculated. This occurs when a value(s) required to calculate the source level metric cannot be evaluated. Valid for CUpti_ActivityInstructionExecution.
CUPTI_ACTIVITY_FLAG_INSTRUCTION_CLASS_MASK  The mask for the instruction class, CUpti_ActivityInstructionClass Valid for CUpti_ActivityInstructionExecution and CUpti_ActivityInstructionCorrelation
CUPTI_ACTIVITY_FLAG_FLUSH_FORCED  When calling cuptiActivityFlushAll, this flag can be set to force CUPTI to flush all records in the buffer, whether finished or not
CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_SIZE_MASK  The number of bytes requested by each thread Valid for CUpti_ActivitySharedAccess.
CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_LOAD  If bit in this flag is set, the access was load, else it is a store access. Valid for CUpti_ActivitySharedAccess.
CUPTI_ACTIVITY_FLAG_MEMSET_ASYNC  Indicates the activity represents an asynchronous memset operation. Valid for CUPTI_ACTIVITY_KIND_MEMSET.
CUPTI_ACTIVITY_FLAG_THRASHING_IN_CPU  Indicates the activity represents thrashing in CPU. Valid for counter of kind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING in CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER
CUPTI_ACTIVITY_FLAG_THROTTLING_IN_CPU  Indicates the activity represents page throttling in CPU. Valid for counter of kind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING in CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER

The sass instruction are broadly divided into different class. Each enum represents a classification.

Enumerator:
CUPTI_ACTIVITY_INSTRUCTION_CLASS_UNKNOWN  The instruction class is not known.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_32  Represents a 32 bit floating point operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_64  Represents a 64 bit floating point operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_INTEGER  Represents an integer operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_BIT_CONVERSION  Represents a bit conversion operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_CONTROL_FLOW  Represents a control flow instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_GLOBAL  Represents a global load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_SHARED  Represents a shared load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_LOCAL  Represents a local load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_GENERIC  Represents a generic load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_SURFACE  Represents a surface load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_CONSTANT  Represents a constant load instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_TEXTURE  Represents a texture load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_GLOBAL_ATOMIC  Represents a global atomic instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_SHARED_ATOMIC  Represents a shared atomic instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_SURFACE_ATOMIC  Represents a surface atomic instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_INTER_THREAD_COMMUNICATION  Represents a inter-thread communication instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_BARRIER  Represents a barrier instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_MISCELLANEOUS  Represents some miscellaneous instructions which do not fit in the above classification.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_16  Represents a 16 bit floating point operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_UNIFORM  Represents uniform instruction.

Each activity record kind represents information about a GPU or an activity occurring on a CPU or GPU. Each kind is associated with a activity record structure that holds the information associated with the kind.

See also:
CUpti_Activity

CUpti_ActivityAPI

CUpti_ActivityContext

CUpti_ActivityDevice

CUpti_ActivityDevice2

CUpti_ActivityDevice3

CUpti_ActivityDevice4

CUpti_ActivityDeviceAttribute

CUpti_ActivityEvent

CUpti_ActivityEventInstance

CUpti_ActivityKernel

CUpti_ActivityKernel2

CUpti_ActivityKernel3

CUpti_ActivityKernel4

CUpti_ActivityKernel5

CUpti_ActivityKernel6

CUpti_ActivityKernel7

CUpti_ActivityCdpKernel

CUpti_ActivityPreemption

CUpti_ActivityMemcpy

CUpti_ActivityMemcpy3

CUpti_ActivityMemcpy4

CUpti_ActivityMemcpy5

CUpti_ActivityMemcpyPtoP

CUpti_ActivityMemcpyPtoP2

CUpti_ActivityMemcpyPtoP3

CUpti_ActivityMemcpyPtoP4

CUpti_ActivityMemset

CUpti_ActivityMemset2

CUpti_ActivityMemset3

CUpti_ActivityMemset4

CUpti_ActivityMetric

CUpti_ActivityMetricInstance

CUpti_ActivityName

CUpti_ActivityMarker

CUpti_ActivityMarker2

CUpti_ActivityMarkerData

CUpti_ActivitySourceLocator

CUpti_ActivityGlobalAccess

CUpti_ActivityGlobalAccess2

CUpti_ActivityGlobalAccess3

CUpti_ActivityBranch

CUpti_ActivityBranch2

CUpti_ActivityOverhead

CUpti_ActivityEnvironment

CUpti_ActivityInstructionExecution

CUpti_ActivityUnifiedMemoryCounter

CUpti_ActivityFunction

CUpti_ActivityModule

CUpti_ActivitySharedAccess

CUpti_ActivityPCSampling

CUpti_ActivityPCSampling2

CUpti_ActivityPCSampling3

CUpti_ActivityPCSamplingRecordInfo

CUpti_ActivityCudaEvent

CUpti_ActivityStream

CUpti_ActivitySynchronization

CUpti_ActivityInstructionCorrelation

CUpti_ActivityExternalCorrelation

CUpti_ActivityUnifiedMemoryCounter2

CUpti_ActivityOpenAccData

CUpti_ActivityOpenAccLaunch

CUpti_ActivityOpenAccOther

CUpti_ActivityOpenMp

CUpti_ActivityNvLink

CUpti_ActivityNvLink2

CUpti_ActivityNvLink3

CUpti_ActivityNvLink4

CUpti_ActivityMemory

CUpti_ActivityPcie

Enumerator:
CUPTI_ACTIVITY_KIND_INVALID  The activity record is invalid.
CUPTI_ACTIVITY_KIND_MEMCPY  A host<->host, host<->device, or device<->device memory copy. The corresponding activity record structure is CUpti_ActivityMemcpy5.
CUPTI_ACTIVITY_KIND_MEMSET  A memory set executing on the GPU. The corresponding activity record structure is CUpti_ActivityMemset4.
CUPTI_ACTIVITY_KIND_KERNEL  A kernel executing on the GPU. This activity kind may significantly change the overall performance characteristics of the application because all kernel executions are serialized on the GPU. Other activity kind for kernel CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL doesn't break kernel concurrency. The corresponding activity record structure is CUpti_ActivityKernel7.
CUPTI_ACTIVITY_KIND_DRIVER  A CUDA driver API function execution. The corresponding activity record structure is CUpti_ActivityAPI.
CUPTI_ACTIVITY_KIND_RUNTIME  A CUDA runtime API function execution. The corresponding activity record structure is CUpti_ActivityAPI.
CUPTI_ACTIVITY_KIND_EVENT  An event value. The corresponding activity record structure is CUpti_ActivityEvent.
CUPTI_ACTIVITY_KIND_METRIC  A metric value. The corresponding activity record structure is CUpti_ActivityMetric.
CUPTI_ACTIVITY_KIND_DEVICE  Information about a device. The corresponding activity record structure is CUpti_ActivityDevice4.
CUPTI_ACTIVITY_KIND_CONTEXT  Information about a context. The corresponding activity record structure is CUpti_ActivityContext.
CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL  A kernel executing on the GPU. This activity kind doesn't break kernel concurrency. The corresponding activity record structure is CUpti_ActivityKernel7.
CUPTI_ACTIVITY_KIND_NAME  Resource naming done via NVTX APIs for thread, device, context, etc. The corresponding activity record structure is CUpti_ActivityName.
CUPTI_ACTIVITY_KIND_MARKER  Instantaneous, start, or end NVTX marker. The corresponding activity record structure is CUpti_ActivityMarker2.
CUPTI_ACTIVITY_KIND_MARKER_DATA  Extended, optional, data about a marker. The corresponding activity record structure is CUpti_ActivityMarkerData.
CUPTI_ACTIVITY_KIND_SOURCE_LOCATOR  Source information about source level result. The corresponding activity record structure is CUpti_ActivitySourceLocator.
CUPTI_ACTIVITY_KIND_GLOBAL_ACCESS  Results for source-level global acccess. The corresponding activity record structure is CUpti_ActivityGlobalAccess3.
CUPTI_ACTIVITY_KIND_BRANCH  Results for source-level branch. The corresponding activity record structure is CUpti_ActivityBranch2.
CUPTI_ACTIVITY_KIND_OVERHEAD  Overhead activity records. The corresponding activity record structure is CUpti_ActivityOverhead.
CUPTI_ACTIVITY_KIND_CDP_KERNEL  A CDP (CUDA Dynamic Parallel) kernel executing on the GPU. The corresponding activity record structure is CUpti_ActivityCdpKernel. This activity can not be directly enabled or disabled. It is enabled and disabled through concurrent kernel activity i.e. _CONCURRENT_KERNEL.
CUPTI_ACTIVITY_KIND_PREEMPTION  Preemption activity record indicating a preemption of a CDP (CUDA Dynamic Parallel) kernel executing on the GPU. The corresponding activity record structure is CUpti_ActivityPreemption.
CUPTI_ACTIVITY_KIND_ENVIRONMENT  Environment activity records indicating power, clock, thermal, etc. levels of the GPU. The corresponding activity record structure is CUpti_ActivityEnvironment.
CUPTI_ACTIVITY_KIND_EVENT_INSTANCE  An event value associated with a specific event domain instance. The corresponding activity record structure is CUpti_ActivityEventInstance.
CUPTI_ACTIVITY_KIND_MEMCPY2  A peer to peer memory copy. The corresponding activity record structure is CUpti_ActivityMemcpyPtoP4.
CUPTI_ACTIVITY_KIND_METRIC_INSTANCE  A metric value associated with a specific metric domain instance. The corresponding activity record structure is CUpti_ActivityMetricInstance.
CUPTI_ACTIVITY_KIND_INSTRUCTION_EXECUTION  Results for source-level instruction execution. The corresponding activity record structure is CUpti_ActivityInstructionExecution.
CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER  Unified Memory counter record. The corresponding activity record structure is CUpti_ActivityUnifiedMemoryCounter2.
CUPTI_ACTIVITY_KIND_FUNCTION  Device global/function record. The corresponding activity record structure is CUpti_ActivityFunction.
CUPTI_ACTIVITY_KIND_MODULE  CUDA Module record. The corresponding activity record structure is CUpti_ActivityModule.
CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE  A device attribute value. The corresponding activity record structure is CUpti_ActivityDeviceAttribute.
CUPTI_ACTIVITY_KIND_SHARED_ACCESS  Results for source-level shared acccess. The corresponding activity record structure is CUpti_ActivitySharedAccess.
CUPTI_ACTIVITY_KIND_PC_SAMPLING  Enable PC sampling for kernels. This will serialize kernels. The corresponding activity record structure is CUpti_ActivityPCSampling3.
CUPTI_ACTIVITY_KIND_PC_SAMPLING_RECORD_INFO  Summary information about PC sampling records. The corresponding activity record structure is CUpti_ActivityPCSamplingRecordInfo.
CUPTI_ACTIVITY_KIND_INSTRUCTION_CORRELATION  SASS/Source line-by-line correlation record. This will generate sass/source correlation for functions that have source level analysis or pc sampling results. The records will be generated only when either of source level analysis or pc sampling activity is enabled. The corresponding activity record structure is CUpti_ActivityInstructionCorrelation.
CUPTI_ACTIVITY_KIND_OPENACC_DATA  OpenACC data events. The corresponding activity record structure is CUpti_ActivityOpenAccData.
CUPTI_ACTIVITY_KIND_OPENACC_LAUNCH  OpenACC launch events. The corresponding activity record structure is CUpti_ActivityOpenAccLaunch.
CUPTI_ACTIVITY_KIND_OPENACC_OTHER  OpenACC other events. The corresponding activity record structure is CUpti_ActivityOpenAccOther.
CUPTI_ACTIVITY_KIND_CUDA_EVENT  Information about a CUDA event. The corresponding activity record structure is CUpti_ActivityCudaEvent.
CUPTI_ACTIVITY_KIND_STREAM  Information about a CUDA stream. The corresponding activity record structure is CUpti_ActivityStream.
CUPTI_ACTIVITY_KIND_SYNCHRONIZATION  Records for synchronization management. The corresponding activity record structure is CUpti_ActivitySynchronization.
CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION  Records for correlation of different programming APIs. The corresponding activity record structure is CUpti_ActivityExternalCorrelation.
CUPTI_ACTIVITY_KIND_NVLINK  NVLink information. The corresponding activity record structure is CUpti_ActivityNvLink4.
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT  Instantaneous Event information. The corresponding activity record structure is CUpti_ActivityInstantaneousEvent.
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT_INSTANCE  Instantaneous Event information for a specific event domain instance. The corresponding activity record structure is CUpti_ActivityInstantaneousEventInstance
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC  Instantaneous Metric information The corresponding activity record structure is CUpti_ActivityInstantaneousMetric.
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC_INSTANCE  Instantaneous Metric information for a specific metric domain instance. The corresponding activity record structure is CUpti_ActivityInstantaneousMetricInstance.
CUPTI_ACTIVITY_KIND_MEMORY  Memory activity tracking allocation and freeing of the memory The corresponding activity record structure is CUpti_ActivityMemory.
CUPTI_ACTIVITY_KIND_PCIE  PCI devices information used for PCI topology. The corresponding activity record structure is CUpti_ActivityPcie.
CUPTI_ACTIVITY_KIND_OPENMP  OpenMP parallel events. The corresponding activity record structure is CUpti_ActivityOpenMp.
CUPTI_ACTIVITY_KIND_INTERNAL_LAUNCH_API  A CUDA driver kernel launch occurring outside of any public API function execution. Tools can handle these like records for driver API launch functions, although the cbid field is not used here. The corresponding activity record structure is CUpti_ActivityAPI.
CUPTI_ACTIVITY_KIND_MEMORY2  Memory activity tracking allocation and freeing of the memory The corresponding activity record structure is CUpti_ActivityMemory3.
CUPTI_ACTIVITY_KIND_MEMORY_POOL  Memory pool activity tracking creation, destruction and triming of the memory pool. The corresponding activity record structure is CUpti_ActivityMemoryPool2.
CUPTI_ACTIVITY_KIND_GRAPH_TRACE  The corresponding activity record structure is CUpti_ActivityGraphTrace.

Enumerator:
CUPTI_ACTIVITY_LAUNCH_TYPE_REGULAR  The kernel was launched via a regular kernel call
CUPTI_ACTIVITY_LAUNCH_TYPE_COOPERATIVE_SINGLE_DEVICE  The kernel was launched via API cudaLaunchCooperativeKernel() or cuLaunchCooperativeKernel()
CUPTI_ACTIVITY_LAUNCH_TYPE_COOPERATIVE_MULTI_DEVICE  The kernel was launched via API cudaLaunchCooperativeKernelMultiDevice() or cuLaunchCooperativeKernelMultiDevice()

Each kind represents the source and destination targets of a memory copy. Targets are host, device, and array.

Enumerator:
CUPTI_ACTIVITY_MEMCPY_KIND_UNKNOWN  The memory copy kind is not known.
CUPTI_ACTIVITY_MEMCPY_KIND_HTOD  A host to device memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_DTOH  A device to host memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_HTOA  A host to device array memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_ATOH  A device array to host memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_ATOA  A device array to device array memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_ATOD  A device array to device memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_DTOA  A device to device array memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_DTOD  A device to device memory copy on the same device.
CUPTI_ACTIVITY_MEMCPY_KIND_HTOH  A host to host memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_PTOP  A peer to peer memory copy across different devices.

Each kind represents the type of the memory accessed by a memory operation/copy.

Enumerator:
CUPTI_ACTIVITY_MEMORY_KIND_UNKNOWN  The memory kind is unknown.
CUPTI_ACTIVITY_MEMORY_KIND_PAGEABLE  The memory is pageable.
CUPTI_ACTIVITY_MEMORY_KIND_PINNED  The memory is pinned.
CUPTI_ACTIVITY_MEMORY_KIND_DEVICE  The memory is on the device.
CUPTI_ACTIVITY_MEMORY_KIND_ARRAY  The memory is an array.
CUPTI_ACTIVITY_MEMORY_KIND_MANAGED  The memory is managed
CUPTI_ACTIVITY_MEMORY_KIND_DEVICE_STATIC  The memory is device static
CUPTI_ACTIVITY_MEMORY_KIND_MANAGED_STATIC  The memory is managed static

Describes the type of memory operation, to be used with CUpti_ActivityMemory3.

Enumerator:
CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_ALLOCATION  Memory is allocated.
CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_RELEASE  Memory is released.

Describes the type of memory pool operation, to be used with CUpti_ActivityMemoryPool2.

Enumerator:
CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_CREATED  Memory pool is created.
CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_DESTROYED  Memory pool is destroyed.
CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_TRIMMED  Memory pool is trimmed.

Describes the type of memory pool, to be used with CUpti_ActivityMemory3.

Enumerator:
CUPTI_ACTIVITY_MEMORY_POOL_TYPE_LOCAL  Memory pool is local to the process.
CUPTI_ACTIVITY_MEMORY_POOL_TYPE_IMPORTED  Memory pool is imported by the process.

See also:
CUpti_ActivityObjectKindId
Enumerator:
CUPTI_ACTIVITY_OBJECT_UNKNOWN  The object kind is not known.
CUPTI_ACTIVITY_OBJECT_PROCESS  A process.
CUPTI_ACTIVITY_OBJECT_THREAD  A thread.
CUPTI_ACTIVITY_OBJECT_DEVICE  A device.
CUPTI_ACTIVITY_OBJECT_CONTEXT  A context.
CUPTI_ACTIVITY_OBJECT_STREAM  A stream.

Enumerator:
CUPTI_ACTIVITY_OVERHEAD_UNKNOWN  The overhead kind is not known.
CUPTI_ACTIVITY_OVERHEAD_DRIVER_COMPILER  Compiler(JIT) overhead.
CUPTI_ACTIVITY_OVERHEAD_CUPTI_BUFFER_FLUSH  Activity buffer flush overhead.
CUPTI_ACTIVITY_OVERHEAD_CUPTI_INSTRUMENTATION  CUPTI instrumentation overhead.
CUPTI_ACTIVITY_OVERHEAD_CUPTI_RESOURCE  CUPTI resource creation and destruction overhead.

Enumerator:
CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_UNKNOWN  Partitioned global cache config unknown.
CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_NOT_SUPPORTED  Partitioned global cache not supported.
CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_OFF  Partitioned global cache config off.
CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_ON  Partitioned global cache config on.

Sampling period can be set using cuptiActivityConfigurePCSampling

Enumerator:
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_INVALID  The PC sampling period is not set.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MIN  Minimum sampling period available on the device.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_LOW  Sampling period in lower range.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MID  Medium sampling period.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_HIGH  Sampling period in higher range.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MAX  Maximum sampling period available on the device.

Enumerator:
CUPTI_ACTIVITY_PC_SAMPLING_STALL_INVALID  Invalid reason
CUPTI_ACTIVITY_PC_SAMPLING_STALL_NONE  No stall, instruction is selected for issue
CUPTI_ACTIVITY_PC_SAMPLING_STALL_INST_FETCH  Warp is blocked because next instruction is not yet available, because of instruction cache miss, or because of branching effects
CUPTI_ACTIVITY_PC_SAMPLING_STALL_EXEC_DEPENDENCY  Instruction is waiting on an arithmatic dependency
CUPTI_ACTIVITY_PC_SAMPLING_STALL_MEMORY_DEPENDENCY  Warp is blocked because it is waiting for a memory access to complete.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_TEXTURE  Texture sub-system is fully utilized or has too many outstanding requests.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_SYNC  Warp is blocked as it is waiting at __syncthreads() or at memory barrier.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_CONSTANT_MEMORY_DEPENDENCY  Warp is blocked waiting for __constant__ memory and immediate memory access to complete.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_PIPE_BUSY  Compute operation cannot be performed due to the required resources not being available.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_MEMORY_THROTTLE  Warp is blocked because there are too many pending memory operations. In Kepler architecture it often indicates high number of memory replays.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_NOT_SELECTED  Warp was ready to issue, but some other warp issued instead.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_OTHER  Miscellaneous reasons
CUPTI_ACTIVITY_PC_SAMPLING_STALL_SLEEPING  Sleeping.

Enumerator:
CUPTI_ACTIVITY_PREEMPTION_KIND_UNKNOWN  The preemption kind is not known.
CUPTI_ACTIVITY_PREEMPTION_KIND_SAVE  Preemption to save CDP block.
CUPTI_ACTIVITY_PREEMPTION_KIND_RESTORE  Preemption to restore CDP block.

The types of stream to be used with CUpti_ActivityStream.

Enumerator:
CUPTI_ACTIVITY_STREAM_CREATE_FLAG_UNKNOWN  Unknown data.
CUPTI_ACTIVITY_STREAM_CREATE_FLAG_DEFAULT  Default stream.
CUPTI_ACTIVITY_STREAM_CREATE_FLAG_NON_BLOCKING  Non-blocking stream.
CUPTI_ACTIVITY_STREAM_CREATE_FLAG_NULL  Null stream.
CUPTI_ACTIVITY_STREAM_CREATE_MASK  Stream create Mask

The types of synchronization to be used with CUpti_ActivitySynchronization.

Enumerator:
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_UNKNOWN  Unknown data.
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_EVENT_SYNCHRONIZE  Event synchronize API.
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_STREAM_WAIT_EVENT  Stream wait event API.
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_STREAM_SYNCHRONIZE  Stream synchronize API.
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_CONTEXT_SYNCHRONIZE  Context synchronize API.

CUPTI uses different methods to obtain the thread-id depending on the support and the underlying platform. This enum documents these methods for each type. APIs cuptiSetThreadIdType and cuptiGetThreadIdType can be used to set and get the thread-id type.

Enumerator:
CUPTI_ACTIVITY_THREAD_ID_TYPE_DEFAULT  Default type Windows uses API GetCurrentThreadId() Linux/Mac/Android/QNX use POSIX pthread API pthread_self()
CUPTI_ACTIVITY_THREAD_ID_TYPE_SYSTEM  This type is based on the system API available on the underlying platform and thread-id obtained is supposed to be unique for the process lifetime. Windows uses API GetCurrentThreadId() Linux uses syscall SYS_gettid Mac uses syscall SYS_thread_selfid Android/QNX use gettid()

This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT

Enumerator:
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_UNKNOWN  The unified memory access type is not known
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_READ  The page fault was triggered by read memory instruction
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_WRITE  The page fault was triggered by write memory instruction
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_ATOMIC  The page fault was triggered by atomic memory instruction
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_PREFETCH  The page fault was triggered by memory prefetch operation

Many activities are associated with Unified Memory mechanism; among them are tranfer from host to device, device to host, page fault at host side.

Enumerator:
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_UNKNOWN  The unified memory counter kind is not known.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD  Number of bytes transfered from host to device
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH  Number of bytes transfered from device to host
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT  Number of CPU page faults, this is only supported on 64 bit Linux and Mac platforms
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT  Number of GPU page faults, this is only supported on devices with compute capability 6.0 and higher and 64 bit Linux platforms
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING  Thrashing occurs when data is frequently accessed by multiple processors and has to be constantly migrated around to achieve data locality. In this case the overhead of migration may exceed the benefits of locality. This is only supported on 64 bit Linux platforms.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING  Throttling is a prevention technique used by the driver to avoid further thrashing. Here, the driver doesn't service the fault for one of the contending processors for a specific period of time, so that the other processor can run at full-speed. This is only supported on 64 bit Linux platforms.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP  In case throttling does not help, the driver tries to pin the memory to a processor for a specific period of time. One of the contending processors will have slow access to the memory, while the other will have fast access. This is only supported on 64 bit Linux platforms.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOD  Number of bytes transferred from one device to another device. This is only supported on 64 bit Linux platforms.

Enumerator:
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_UNKNOWN  The unified memory counter scope is not known.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_PROCESS_SINGLE_DEVICE  Collect unified memory counter for single process on one device
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_PROCESS_ALL_DEVICES  Collect unified memory counter for single process across all devices

This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH

Enumerator:
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_UNKNOWN  The unified memory migration cause is not known
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_USER  The unified memory migrated due to an explicit call from the user e.g. cudaMemPrefetchAsync
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_COHERENCE  The unified memory migrated to guarantee data coherence e.g. CPU/GPU faults on Pascal+ and kernel launch on pre-Pascal GPUs
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_PREFETCH  The unified memory was speculatively migrated by the UVM driver before being accessed by the destination processor to improve performance
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_EVICTION  The unified memory migrated to the CPU because it was evicted to make room for another block of memory on the GPU
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_ACCESS_COUNTERS  The unified memory migrated to another processor because of access counter notifications. Only frequently accessed pages are migrated between CPU and GPU, or between peer GPUs.

This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP

Enumerator:
CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_UNKNOWN  The cause of mapping to remote memory was unknown
CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_COHERENCE  Mapping to remote memory was added to maintain data coherence.
CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_THRASHING  Mapping to remote memory was added to prevent further thrashing
CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_POLICY  Mapping to remote memory was added to enforce the hints specified by the programmer or by performance heuristics of the UVM driver
CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_OUT_OF_MEMORY  Mapping to remote memory was added because there is no more memory available on the processor and eviction was not possible
CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_EVICTION  Mapping to remote memory was added after the memory was evicted to make room for another block of memory on the GPU

This indicates the virtualization mode in which CUDA device is running

Enumerator:
CUPTI_DEVICE_VIRTUALIZATION_MODE_NONE  No virtualization mode isassociated with the device i.e. it's a baremetal GPU
CUPTI_DEVICE_VIRTUALIZATION_MODE_PASS_THROUGH  The device is associated with the pass-through GPU. In this mode, an entire physical GPU is directly assigned to one virtual machine (VM).
CUPTI_DEVICE_VIRTUALIZATION_MODE_VIRTUAL_GPU  The device is associated with the virtual GPU (vGPU). In this mode multiple virtual machines (VMs) have simultaneous, direct access to a single physical GPU.

Enumerator:
CUPTI_DEV_TYPE_GPU  The device type is GPU.
CUPTI_DEV_TYPE_NPU  The device type is NVLink processing unit in CPU.

The possible reasons that a clock can be throttled. There can be more than one reason that a clock is being throttled so these types can be combined by bitwise OR. These are used in the clocksThrottleReason field in the Environment Activity Record.

Enumerator:
CUPTI_CLOCKS_THROTTLE_REASON_GPU_IDLE  Nothing is running on the GPU and the clocks are dropping to idle state.
CUPTI_CLOCKS_THROTTLE_REASON_USER_DEFINED_CLOCKS  The GPU clocks are limited by a user specified limit.
CUPTI_CLOCKS_THROTTLE_REASON_SW_POWER_CAP  A software power scaling algorithm is reducing the clocks below requested clocks.
CUPTI_CLOCKS_THROTTLE_REASON_HW_SLOWDOWN  Hardware slowdown to reduce the clock by a factor of two or more is engaged. This is an indicator of one of the following: 1) Temperature is too high, 2) External power brake assertion is being triggered (e.g. by the system power supply), 3) Change in power state.
CUPTI_CLOCKS_THROTTLE_REASON_UNKNOWN  Some unspecified factor is reducing the clocks.
CUPTI_CLOCKS_THROTTLE_REASON_UNSUPPORTED  Throttle reason is not supported for this GPU.
CUPTI_CLOCKS_THROTTLE_REASON_NONE  No clock throttling.

Custom correlation kinds are reserved for usage in external tools.

See also:
CUpti_ActivityExternalCorrelation
Enumerator:
CUPTI_EXTERNAL_CORRELATION_KIND_UNKNOWN  The external API is unknown to CUPTI
CUPTI_EXTERNAL_CORRELATION_KIND_OPENACC  The external API is OpenACC
CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM0  The external API is custom0
CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1  The external API is custom1
CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM2  The external API is custom2
CUPTI_EXTERNAL_CORRELATION_KIND_SIZE  Add new kinds before this line

Describes link properties, to be used with CUpti_ActivityNvLink.

Enumerator:
CUPTI_LINK_FLAG_PEER_ACCESS  Is peer to peer access supported by this link.
CUPTI_LINK_FLAG_SYSMEM_ACCESS  Is system memory access supported by this link.
CUPTI_LINK_FLAG_PEER_ATOMICS  Is peer atomic access supported by this link.
CUPTI_LINK_FLAG_SYSMEM_ATOMICS  Is system memory atomic access supported by this link.

See also:
CUpti_ActivityKindOpenAcc

Field to differentiate whether PCIE Activity record is of a GPU or a PCI Bridge

Enumerator:
CUPTI_PCIE_DEVICE_TYPE_GPU  PCIE GPU record
CUPTI_PCIE_DEVICE_TYPE_BRIDGE  PCIE Bridge record

Enumeration of PCIE Generation for pcie activity attribute pcieGeneration

Enumerator:
CUPTI_PCIE_GEN_GEN1  PCIE Generation 1
CUPTI_PCIE_GEN_GEN2  PCIE Generation 2
CUPTI_PCIE_GEN_GEN3  PCIE Generation 3
CUPTI_PCIE_GEN_GEN4  PCIE Generation 4
CUPTI_PCIE_GEN_GEN5  PCIE Generation 5


Function Documentation

CUptiResult cuptiActivityConfigurePCSampling ( CUcontext  ctx,
CUpti_ActivityPCSamplingConfig config 
)

For Pascal and older GPU architectures this API must be called before enabling activity kind CUPTI_ACTIVITY_KIND_PC_SAMPLING. There is no such requirement for Volta and newer GPU architectures.

For Volta and newer GPU architectures if this API is called in the middle of execution, PC sampling configuration will be updated for subsequent kernel launches.

Parameters:
ctx The context
config A pointer to CUpti_ActivityPCSamplingConfig structure containing PC sampling configuration.
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_OPERATION if this api is called while some valid event collection method is set.
CUPTI_ERROR_INVALID_PARAMETER if config is NULL or any parameter in the config structures is not a valid value
CUPTI_ERROR_NOT_SUPPORTED Indicates that the system/device does not support the unified memory counters

CUptiResult cuptiActivityConfigureUnifiedMemoryCounter ( CUpti_ActivityUnifiedMemoryCounterConfig config,
uint32_t  count 
)

Parameters:
config A pointer to CUpti_ActivityUnifiedMemoryCounterConfig structures containing Unified Memory counter configuration.
count Number of Unified Memory counter configuration structures
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_PARAMETER if config is NULL or any parameter in the config structures is not a valid value
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED One potential reason is that platform (OS/arch) does not support the unified memory counters
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_DEVICE Indicates that the device does not support the unified memory counters
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_NON_P2P_DEVICES Indicates that multi-GPU configuration without P2P support between any pair of devices does not support the unified memory counters

CUptiResult cuptiActivityDisable ( CUpti_ActivityKind  kind  ) 

Disable collection of a specific kind of activity record. Multiple kinds can be disabled by calling this function multiple times. By default all activity kinds are disabled for collection.

Parameters:
kind The kind of activity record to stop collecting
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_KIND if the activity kind is not supported

CUptiResult cuptiActivityDisableContext ( CUcontext  context,
CUpti_ActivityKind  kind 
)

Disable collection of a specific kind of activity record for a context. This setting done by this API will supersede the global settings for activity records. Multiple kinds can be enabled by calling this function multiple times.

Parameters:
context The context for which activity is to be disabled
kind The kind of activity record to stop collecting
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_KIND if the activity kind is not supported

CUptiResult cuptiActivityEnable ( CUpti_ActivityKind  kind  ) 

Enable collection of a specific kind of activity record. Multiple kinds can be enabled by calling this function multiple times. By default all activity kinds are disabled for collection.

Parameters:
kind The kind of activity record to collect
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_NOT_COMPATIBLE if the activity kind cannot be enabled
CUPTI_ERROR_INVALID_KIND if the activity kind is not supported

CUptiResult cuptiActivityEnableAndDump ( CUpti_ActivityKind  kind  ) 

In general, the behavior of this API is similar to the API cuptiActivityEnable i.e. it enables the collection of a specific kind of activity record. Additionally, this API can help in dumping the records for activities which happened in the past before enabling the corresponding activity kind. The API allows to get records for the current resource allocations done in CUDA For CUPTI_ACTIVITY_KIND_DEVICE, existing device records are dumped For CUPTI_ACTIVITY_KIND_CONTEXT, existing context records are dumped For CUPTI_ACTIVITY_KIND_STREAM, existing stream records are dumped For CUPTI_ACTIVITY_KIND_ NVLINK, existing NVLINK records are dumped For CUPTI_ACTIVITY_KIND_PCIE, existing PCIE records are dumped For other activities, the behavior is similar to the API cuptiActivityEnable

Device records are emitted in CUPTI on CUDA driver initialization. Those records can only be retrieved by the user if CUPTI is attached before CUDA initialization. Context and stream records are emitted on context and stream creation. The use case of the API is to provide the records for CUDA resources (contexs/streams/devices) that are currently active if user late attaches CUPTI.

Before calling this function, the user must register buffer callbacks to get the activity records by calling cuptiActivityRegisterCallbacks. If the user does not register the buffers and calls API cuptiActivityEnableAndDump, then CUPTI will enable the activity kind but not provide any records for that activity kind.

Parameters:
kind The kind of activity record to collect
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_UNKNOWN if buffer is not initialized.
CUPTI_ERROR_NOT_COMPATIBLE if the activity kind cannot be enabled
CUPTI_ERROR_INVALID_KIND if the activity kind is not supported

CUptiResult cuptiActivityEnableContext ( CUcontext  context,
CUpti_ActivityKind  kind 
)

Enable collection of a specific kind of activity record for a context. This setting done by this API will supersede the global settings for activity records enabled by cuptiActivityEnable. Multiple kinds can be enabled by calling this function multiple times.

Parameters:
context The context for which activity is to be enabled
kind The kind of activity record to collect
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_NOT_COMPATIBLE if the activity kind cannot be enabled
CUPTI_ERROR_INVALID_KIND if the activity kind is not supported

CUptiResult cuptiActivityEnableLatencyTimestamps ( uint8_t  enable  ) 

This API is used to control the collection of queued and submitted timestamps for kernels whose records are provided through the struct CUpti_ActivityKernel7. Default value is 0, i.e. these timestamps are not collected. This API needs to be called before initialization of CUDA and this setting should not be changed during the profiling session.

Parameters:
enable is a boolean, denoting whether these timestamps should be collected
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 

CUptiResult cuptiActivityEnableLaunchAttributes ( uint8_t  enable  ) 

This API is used to control the collection of launch attributes for kernels whose records are provided through the struct CUpti_ActivityKernel7. Default value is 0, i.e. these attributes are not collected.

Parameters:
enable is a boolean denoting whether these launch attributes should be collected

CUptiResult cuptiActivityFlush ( CUcontext  context,
uint32_t  streamId,
uint32_t  flag 
)

This function does not return until all activity records associated with the specified context/stream are returned to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks. To ensure that all activity records are complete, the requested stream(s), if any, are synchronized.

If context is NULL, the global activity records (i.e. those not associated with a particular stream) are flushed (in this case no streams are synchonized). If context is a valid CUcontext and streamId is 0, the buffers of all streams of this context are flushed. Otherwise, the buffers of the specified stream in this context is flushed.

Before calling this function, the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks.

Parameters:
context A valid CUcontext or NULL.
streamId The stream ID.
flag The flag can be set to indicate a forced flush. See CUpti_ActivityFlag
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_CUPTI_ERROR_INVALID_OPERATION if not preceeded by a successful call to cuptiActivityRegisterCallbacks
CUPTI_ERROR_UNKNOWN an internal error occurred
**DEPRECATED** This method is deprecated CONTEXT and STREAMID will be ignored. Use cuptiActivityFlushAll to flush all data.

CUptiResult cuptiActivityFlushAll ( uint32_t  flag  ) 

This function returns the activity records associated with all contexts/streams (and the global buffers not associated with any stream) to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks.

This is a blocking call but it doesn't issue any CUDA synchronization calls implicitly thus it's not guaranteed that all activities are completed on the underlying devices. Activity record is considered as completed if it has all the information filled up including the timestamps if any. It is the client's responsibility to issue necessary CUDA synchronization calls before calling this function if all activity records with complete information are expected to be delivered.

Behavior of the function based on the input flag:

  • For default flush i.e. when flag is set as 0, it returns all the activity buffers which have all the activity records completed, buffers need not to be full though. It doesn't return buffers which have one or more incomplete records. Default flush can be done at a regular interval in a separate thread.
  • For forced flush i.e. when flag CUPTI_ACTIVITY_FLAG_FLUSH_FORCED is passed to the function, it returns all the activity buffers including the ones which have one or more incomplete activity records. It's suggested for clients to do the force flush before the termination of the profiling session to allow remaining buffers to be delivered. In general, it can be done in the at-exit handler.

Before calling this function, the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks.

Parameters:
flag The flag can be set to indicate a forced flush. See CUpti_ActivityFlag
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_OPERATION if not preceeded by a successful call to cuptiActivityRegisterCallbacks
CUPTI_ERROR_UNKNOWN an internal error occurred
See also:
cuptiActivityFlushPeriod

CUptiResult cuptiActivityFlushPeriod ( uint32_t  time  ) 

CUPTI creates a worker thread to minimize the perturbance for the application created threads. CUPTI offloads certain operations from the application threads to the worker thread, this includes synchronization of profiling resources between host and device, delivery of the activity buffers to the client using the callback registered in cuptiActivityRegisterCallbacks. For performance reasons, CUPTI wakes up the worker thread based on certain heuristics.

This API is used to control the flush period of the worker thread. This setting will override the CUPTI heurtistics. Setting time to zero disables the periodic flush and restores the default behavior.

Periodic flush can return only those activity buffers which are full and have all the activity records completed.

It's allowed to use the API cuptiActivityFlushAll to flush the data on-demand, even when client sets the periodic flush.

Parameters:
time flush period in msec
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
See also:
cuptiActivityFlushAll

CUptiResult cuptiActivityGetAttribute ( CUpti_ActivityAttribute  attr,
size_t *  valueSize,
void *  value 
)

Read an activity API attribute and return it in *value.

Parameters:
attr The attribute to read
valueSize Size of buffer pointed by the value, and returns the number of bytes written to value
value Returns the value of the attribute
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_PARAMETER if valueSize or value is NULL, or if attr is not an activity attribute
CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT Indicates that the value buffer is too small to hold the attribute value.

CUptiResult cuptiActivityGetNextRecord ( uint8_t *  buffer,
size_t  validBufferSizeBytes,
CUpti_Activity **  record 
)

This is a helper function to iterate over the activity records in a buffer. A buffer of activity records is typically obtained by receiving a CUpti_BuffersCallbackCompleteFunc callback.

An example of typical usage:

 CUpti_Activity *record = NULL;
 CUptiResult status = CUPTI_SUCCESS;
   do {
      status = cuptiActivityGetNextRecord(buffer, validSize, &record);
      if(status == CUPTI_SUCCESS) {
           // Use record here...
      }
      else if (status == CUPTI_ERROR_MAX_LIMIT_REACHED)
          break;
      else {
          goto Error;
      }
    } while (1);

Parameters:
buffer The buffer containing activity records
record Inputs the previous record returned by cuptiActivityGetNextRecord and returns the next activity record from the buffer. If input value is NULL, returns the first activity record in the buffer. Records of kind CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL may contain invalid (0) timestamps, indicating that no timing information could be collected for lack of device memory.
validBufferSizeBytes The number of valid bytes in the buffer.
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_MAX_LIMIT_REACHED if no more records in the buffer
CUPTI_ERROR_INVALID_PARAMETER if buffer is NULL.

CUptiResult cuptiActivityGetNumDroppedRecords ( CUcontext  context,
uint32_t  streamId,
size_t *  dropped 
)

Get the number of records that were dropped because of insufficient buffer space. The dropped count includes records that could not be recorded because CUPTI did not have activity buffer space available for the record (because the CUpti_BuffersCallbackRequestFunc callback did not return an empty buffer of sufficient size) and also CDP records that could not be record because the device-size buffer was full (size is controlled by the CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP attribute). The dropped count maintained for the queue is reset to zero when this function is called.

Parameters:
context The context, or NULL to get dropped count from global queue
streamId The stream ID
dropped The number of records that were dropped since the last call to this function.
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_PARAMETER if dropped is NULL

CUptiResult cuptiActivityPopExternalCorrelationId ( CUpti_ExternalCorrelationKind  kind,
uint64_t *  lastId 
)

This function notifies CUPTI that the calling thread is leaving an external API region.

Parameters:
kind The kind of external API activities should be correlated with.
lastId If the function returns successful, contains the last external correlation id for this kind, can be NULL.
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_PARAMETER The external API kind is invalid.
CUPTI_ERROR_QUEUE_EMPTY No external id is currently associated with kind.

CUptiResult cuptiActivityPushExternalCorrelationId ( CUpti_ExternalCorrelationKind  kind,
uint64_t  id 
)

This function notifies CUPTI that the calling thread is entering an external API region. When a CUPTI activity API record is created while within an external API region and CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION is enabled, the activity API record will be preceeded by a CUpti_ActivityExternalCorrelation record for each CUpti_ExternalCorrelationKind.

Parameters:
kind The kind of external API activities should be correlated with.
id External correlation id.
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_PARAMETER The external API kind is invalid

CUptiResult cuptiActivityRegisterCallbacks ( CUpti_BuffersCallbackRequestFunc  funcBufferRequested,
CUpti_BuffersCallbackCompleteFunc  funcBufferCompleted 
)

This function registers two callback functions to be used in asynchronous buffer handling. If registered, activity record buffers are handled using asynchronous requested/completed callbacks from CUPTI.

Registering these callbacks prevents the client from using CUPTI's blocking enqueue/dequeue functions.

Parameters:
funcBufferRequested callback which is invoked when an empty buffer is requested by CUPTI
funcBufferCompleted callback which is invoked when a buffer containing activity records is available from CUPTI
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_PARAMETER if either funcBufferRequested or funcBufferCompleted is NULL

CUptiResult cuptiActivityRegisterTimestampCallback ( CUpti_TimestampCallbackFunc  funcTimestamp  ) 

This function registers a callback function to obtain timestamp of user's choice instead of using CUPTI provided timestamp. By default CUPTI uses different methods, based on the underlying platform, to retrieve the timestamp Linux and Android use clock_gettime(CLOCK_REALTIME, ..) Windows uses QueryPerformanceCounter() Mac uses mach_absolute_time() QNX uses ClockCycles() Timestamps retrieved using these methods are converted to nanosecond if needed before usage.

The registration of timestamp callback should be done before any of the CUPTI activity kinds are enabled to make sure that all the records report the timestamp using the callback function registered through cuptiActivityRegisterTimestampCallback API.

Changing the timestamp callback function in CUPTI through cuptiActivityRegisterTimestampCallback API in the middle of the profiling session can cause records generated prior to the change to report timestamps through previous timestamp method.

Parameters:
funcTimestamp callback which is invoked when a timestamp is needed by CUPTI
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_PARAMETER if funcTimestamp is NULL
CUPTI_ERROR_NOT_INITIALIZED 

CUptiResult cuptiActivitySetAttribute ( CUpti_ActivityAttribute  attr,
size_t *  valueSize,
void *  value 
)

Write an activity API attribute.

Parameters:
attr The attribute to write
valueSize The size, in bytes, of the value
value The attribute value to write
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_PARAMETER if valueSize or value is NULL, or if attr is not an activity attribute
CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT Indicates that the value buffer is too small to hold the attribute value.

CUptiResult cuptiComputeCapabilitySupported ( int  major,
int  minor,
int *  support 
)

This function is used to check the support for a device based on it's compute capability. It sets the support when the compute capability is supported by the current version of CUPTI, and clears it otherwise. This version of CUPTI might not support all GPUs sharing the same compute capability. It is suggested to use API cuptiDeviceSupported which provides correct information.

Parameters:
major The major revision number of the compute capability
minor The minor revision number of the compute capability
support Pointer to an integer to return the support status
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_PARAMETER if support is NULL
See also:
cuptiDeviceSupported

CUptiResult cuptiDeviceSupported ( CUdevice  dev,
int *  support 
)

This function is used to check the support for a compute device. It sets the support when the device is supported by the current version of CUPTI, and clears it otherwise.

Parameters:
dev The device handle returned by CUDA Driver API cuDeviceGet
support Pointer to an integer to return the support status
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_PARAMETER if support is NULL
CUPTI_ERROR_INVALID_DEVICE if dev is not a valid device
See also:
cuptiComputeCapabilitySupported

CUptiResult cuptiDeviceVirtualizationMode ( CUdevice  dev,
CUpti_DeviceVirtualizationMode mode 
)

This function is used to query the virtualization mode of the CUDA device.

Parameters:
dev The device handle returned by CUDA Driver API cuDeviceGet
mode Pointer to an CUpti_DeviceVirtualizationMode to return the virtualization mode
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_DEVICE if dev is not a valid device
CUPTI_ERROR_INVALID_PARAMETER if mode is NULL

CUptiResult cuptiFinalize ( void   ) 

This API detaches the CUPTI from the running process. It destroys and cleans up all the resources associated with CUPTI in the current process. After CUPTI detaches from the process, the process will keep on running with no CUPTI attached to it. For safe operation of the API, it is recommended this API is invoked from the exit callsite of any of the CUDA Driver or Runtime API. Otherwise CUPTI client needs to make sure that required CUDA synchronization and CUPTI activity buffer flush is done before calling the API. Sample code showing the usage of the API in the cupti callback handler code:

    void CUPTIAPI
    cuptiCallbackHandler(void *userdata, CUpti_CallbackDomain domain,
        CUpti_CallbackId cbid, void *cbdata)
    {
        const CUpti_CallbackData *cbInfo = (CUpti_CallbackData *)cbdata;

        // Take this code path when CUPTI detach is requested
        if (detachCupti) {
            switch(domain)
            {
            case CUPTI_CB_DOMAIN_RUNTIME_API:
            case CUPTI_CB_DOMAIN_DRIVER_API:
                if (cbInfo->callbackSite == CUPTI_API_EXIT) {
                    // call the CUPTI detach API
                    cuptiFinalize();
                }
                break;
            default:
                break;
            }
        }
    }

CUptiResult cuptiGetAutoBoostState ( CUcontext  context,
CUpti_ActivityAutoBoostState state 
)

The profiling results can be inconsistent in case auto boost is enabled. CUPTI tries to disable auto boost while profiling. It can fail to disable in cases where user does not have the permissions or CUDA_AUTO_BOOST env variable is set. The function can be used to query whether auto boost is enabled.

Parameters:
context A valid CUcontext.
state A pointer to CUpti_ActivityAutoBoostState structure which contains the current state and the id of the process that has requested the current state
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_PARAMETER if CUcontext or state is NULL
CUPTI_ERROR_NOT_SUPPORTED Indicates that the device does not support auto boost
CUPTI_ERROR_UNKNOWN an internal error occurred

CUptiResult cuptiGetContextId ( CUcontext  context,
uint32_t *  contextId 
)

Get the ID of a context.

Parameters:
context The context
contextId Returns a process-unique ID for the context
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_CONTEXT The context is NULL or not valid.
CUPTI_ERROR_INVALID_PARAMETER if contextId is NULL

CUptiResult cuptiGetDeviceId ( CUcontext  context,
uint32_t *  deviceId 
)

If context is NULL, returns the ID of the device that contains the currently active context. If context is non-NULL, returns the ID of the device which contains that context. Operates in a similar manner to cudaGetDevice() or cuCtxGetDevice() but may be called from within callback functions.

Parameters:
context The context, or NULL to indicate the current context.
deviceId Returns the ID of the device that is current for the calling thread.
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_DEVICE if unable to get device ID
CUPTI_ERROR_INVALID_PARAMETER if deviceId is NULL

CUptiResult cuptiGetGraphId ( CUgraph  graph,
uint32_t *  pId 
)

Returns the unique ID of CUDA graph.

Parameters:
graph The graph.
pId Returns the unique ID of the graph
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_PARAMETER if graph is NULL

CUptiResult cuptiGetGraphNodeId ( CUgraphNode  node,
uint64_t *  nodeId 
)

Returns the unique ID of the CUDA graph node.

Parameters:
node The graph node.
nodeId Returns the unique ID of the node
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_PARAMETER if node is NULL

CUptiResult cuptiGetLastError ( void   ) 

Returns the last error that has been produced by any of the cupti api calls or the callback in the same host thread and resets it to CUPTI_SUCCESS.

CUptiResult cuptiGetStreamId ( CUcontext  context,
CUstream  stream,
uint32_t *  streamId 
)

Get the ID of a stream. The stream ID is unique within a context (i.e. all streams within a context will have unique stream IDs).

Parameters:
context If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null.
stream The stream
streamId Returns a context-unique ID for the stream
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_STREAM if unable to get stream ID, or if context is non-NULL and stream does not belong to the context
CUPTI_ERROR_INVALID_PARAMETER if streamId is NULL
**DEPRECATED** This method is deprecated as of CUDA 8.0. Use method cuptiGetStreamIdEx instead.

CUptiResult cuptiGetStreamIdEx ( CUcontext  context,
CUstream  stream,
uint8_t  perThreadStream,
uint32_t *  streamId 
)

Get the ID of a stream. The stream ID is unique within a context (i.e. all streams within a context will have unique stream IDs).

Parameters:
context If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null.
stream The stream
perThreadStream Flag to indicate if program is compiled for per-thread streams
streamId Returns a context-unique ID for the stream
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_INITIALIZED 
CUPTI_ERROR_INVALID_STREAM if unable to get stream ID, or if context is non-NULL and stream does not belong to the context
CUPTI_ERROR_INVALID_PARAMETER if streamId is NULL

CUptiResult cuptiGetThreadIdType ( CUpti_ActivityThreadIdType type  ) 

Returns the thread-id type used in CUPTI

Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_PARAMETER if type is NULL

CUptiResult cuptiGetTimestamp ( uint64_t *  timestamp  ) 

Returns a timestamp normalized to correspond with the start and end timestamps reported in the CUPTI activity records. The timestamp is reported in nanoseconds.

Parameters:
timestamp Returns the CUPTI timestamp
Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_INVALID_PARAMETER if timestamp is NULL

CUptiResult cuptiSetThreadIdType ( CUpti_ActivityThreadIdType  type  ) 

CUPTI uses the method corresponding to set type to generate the thread-id. See enum CUpti_ActivityThreadIdType for the list of methods. Activity records having thread-id field contain the same value. Thread id type must not be changed during the profiling session to avoid thread-id value mismatch across activity records.

Return values:
CUPTI_SUCCESS 
CUPTI_ERROR_NOT_SUPPORTED if type is not supported on the platform


Generated on Tue Jul 12 11:16:29 2022 for Cupti by  doxygen 1.5.8