Updates in 2025.4

General

  • Added support for profiling CUDA tile workloads.

  • Introduced a new Tile section to summarize tile dimensions and pipeline utilization, displayed when enabled and a tile workload is profiled.

  • Source page supports correlation between SASS and high-level Tile code (limited to cuTile Python code).

  • Added a new ncu-repz file format for zstd compressed report files.

  • Added support for locking GPUs to boost clock instead of base on Ampere and newer GPU. Use the boost and force-boost options on supported drivers.

  • Warp sampling by default now focuses on the Not Issued ((_not_issued)) variants of the metrics. This is to avoid pointing to source locations where warp stalls are mitigated by having sufficient numbers of warps during an issue cycle to hide latency.

  • Added support for node-level profiling of CUDA conditional graphs, including device-updatable nodes and nodes that can set conditional graph handles.

  • Added support for node-level profiling of CUDA graphs launched from the device (DGL), including host graph nodes that can launch DGL.

  • Source page now displays symbol labels: A new column for symbol labels has been added, and symbol labels are shown alongside addresses in SASS instruction disassembly. This change aligns the output with that of the nvdisasm tool.

  • Added support for collecting Warp sampling metrics with PM sampling allowing user to see function-level warp stalls for the selected time range in the timeline. See the Function Stats tool window for details.

NVIDIA Nsight Compute

NVIDIA Nsight Compute CLI

Resolved Issues