CUDA Sample兼容状态(v1.5)
数据参照cuda sample tag v12.6版本。
CUDA Sample | Status | Comments |
simpleVoteIntrinsics | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
vectorAdd_nvrtc | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
deviceQuery | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
reduction | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
tf32TensorCoreGemm | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
shfl_scan | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
warpAggregatedAtomicsCG | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
concurrentKernels | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
bf16TensorCoreGemm | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
bandwidthTest | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
UnifiedMemoryPerf | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
binaryPartitionCG | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
conjugateGradientMultiBlockCG | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
cudaCompressibleMemory | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
cudaTensorCoreGemm | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
globalToShmemAsyncCopy | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
matrixMul | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
matrixMulDrv | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
nvJPEG | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
nvJPEG_encoder | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
p2pBandwidthLatencyTest | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleAWBarrier | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleCudaGraphs | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleZeroCopy | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleDrvRuntime | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
vectorAddMMAP | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleIPC | ❌ | 单卡pass。受限于PPU多卡直连与NV switch架构的区别,8/16卡在此 case 场景上有已知死锁问题。 |
streamOrderedAllocation | ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
streamOrderedAllocationIPC | ❌ | 单卡pass。受限于PPU多卡直连与NV switch架构的区别,8/16卡在此 case 场景上有已知死锁问题。 |
simplePrintf | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleTemplates | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleOccupancy | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
topologyQuery | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
clock | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
cppIntegration | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
dwtHaar1D | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
vectorAdd | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
vectorAddDrv | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
scalarProd | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleVoteIntrinsics_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
SobolQRNG | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleCooperativeGroups | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleAtomicIntrinsics | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
cudaOpenMP | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
fp16ScalarProduct | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
inlinePTX | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleMPI | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
template | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleHyperQ | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
reductionMultiBlockCG | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
threadFenceReduction | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
mergeSort | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
convolutionSeparable | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
FDTD3d | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
matrixMulCUBLAS | ❌ 11.5 ❌ 11.6 ❌ 11.7 ❌ 11.8 ❌ 12.0 ❌ 12.1 ❌ 12.2 ❌ 12.3 ❌ 12.4 ❌ 12.5 ❌ 12.6 | 计算结果跟nv存在精度上的差异,原因在于matrixMul计算方法差异导致,PPU选择了性能更好的实现。 |
sortingNetworks | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
fastWalshTransform | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
alignedTypes | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
deviceQueryDrv | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
scan | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
BlackScholes | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
transpose | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
histogram | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
MC_SingleAsianOptionP | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
MC_EstimatePiInlineP | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
quasirandomGenerator | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
binomialOptions | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
MonteCarloMultiGPU | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
UnifiedMemoryStreams | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
asyncAPI | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
c++11_cuda | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
cppOverload | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
cuHook | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
eigenvalues | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
interval | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
newdelete | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
radixSortThrust | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
segmentationTreeThrust | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleAssert | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleAttributes | ❌ 11.5 ❌ 11.6 ❌ 11.7 ❌ 11.8 ❌ 12.0 ❌ 12.1 ❌ 12.2 ❌ 12.3 ❌ 12.4 ❌ 12.5 ❌ 12.6 | 能pass,但执行时间会超长,原因在于 "Maximum y- or z-dimension of a grid of thread blocks" 在NV上是65535, 而PPU上是2^31-1,sample代码中依赖到这个值,导致运行时间超级长。 |
simpleMultiCopy | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleMultiGPU | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleP2P | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleSeparateCompilation | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleStreams | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
threadMigration | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
vectorAddDrv | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
binomialOptions_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
clock_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
inlinePTX_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
matrixMul_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
quasirandomGenerator_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleAssert_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleAtomicIntrinsics_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleTemplates_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleVoteIntrinsics_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
BlackScholes_nvrtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
libNVVM | ❌ 12.3 ❌ 12.4 ❌ 12.5 ❌ 12.6 | 目前PPU兼容llvm ir for nvgpu的定义,但不完全兼容nv官方的nvvm ir定义。 |
StreamPriorities | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
MC_EstimatePiInlineQ | ❌ | curand相关api 暂未完整支持。 |
MC_EstimatePiP | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
MC_EstimatePiQ | ❌ | curand相关api (curandCreateGenerator) 暂未完整支持。 |
MersenneTwisterGP11213 | ❌ | curand相关api 暂未完整支持。 |
batchCUBLAS | ❌ | curand相关api (cublasSetMatrix) 暂未完整支持。 |
batchedLabelMarkersAndLabelCompressionNPP | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
boxFilterNPP | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
conjugateGradientCudaGraphs | ❌ | cusparse 库暂未完整支持 |
conjugateGradient | ❌ | |
conjugateGradientMultiDeviceCG | ❌ | |
conjugateGradientPrecond | ❌ | |
conjugateGradientUM | ❌ | |
cuSolverDn_LinearSolver | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
cuSolverRf | ❌ | cusparse库(cusolverSpCreate/cusolverSpCreate) 暂未完整支持 |
cuSolverSp_LinearSolver | ❌ | |
cuSolverSp_LowlevelCholesky | ❌ | |
cuSolverSp_LowlevelQR | ❌ | |
graphMemoryFootprint | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
graphMemoryNodes | ❌ | 关于mem allocate和mem free的apis,目前头文件未到该版本故还没定义 |
immaTensorCoreGemm | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
jacobiCudaGraphs | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
matrixMulDynlinkJIT | ❌ | DynlinkJIT暂时不支持 |
memMapIPCDrv | ❌ | cuMemGetAllocationGranularity 获取到的 granularity 是2MB,ppu设计的是 8MB,代码中检查部分需要修改。 |
nbody | ❌ | Graphics, openGL 相关api 不支持 |
Mandelbrot | ❌ | |
particles | ❌ | |
oceanFFT | ❌ | |
simpleCUDA2GL | ❌ | |
simpleGL | ❌ | |
recursiveGaussian | ❌ | |
ptxjit | ❌ | PPU不支持ptx |
randomFog | ❌ | 缺少grahical display的能力。 |
simpleCUBLAS | ❌ | CUBLAS API实现不全 |
simpleCUBLASXT | ❌ | |
simpleCUBLAS_LU | ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleCUFFT | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
simpleCUFFT_2d_MGPU | ❌ | cuFFT部分支持 |
simpleCUFFT_MGPU | ❌ | |
simpleCUFFT_callback | ❌ | |
systemWideAtomics | ❌ | |
FilterBorderControlNPP | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
watershedSegmentationNPP | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
streamOrderedAllocationP2P | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
EGLStream_CUDA_CrossGPU | ❌ | EGL 相关API 不支持 |
EGLStream_CUDA_Interop | ❌ | |
EGLStreams_CUDA_Interop | ❌ | |
EGLSync_CUDAEvent_Interop | ❌ | GLES 不支持 |
cuDLALayerwiseStatsStandalone | ❌ | |
cuDLALayerwiseStatsHybrid | ❌ | |
simpleGLES_EGLOutput | ❌ | |
fluidsGLES | ❌ | |
nbody_opengles | ❌ | |
simpleGLES | ❌ | |
simpleGLES_screen | ❌ | |
nbody_screen | ❌ | |
cuDLAHybridMode | ❌ | |
cuDLAStandaloneMode | ❌ | |
cuDLAErrorReporting | ❌ | |
cudaNvSciNvMedia | ❌ | |
cdpAdvancedQuicksort | ❌ | CUDA CDP特性不支持 |
cdpBezierTessellation | ❌ | |
cdpQuadtree | ❌ | |
cdpSimplePrint | ❌ | |
cdpSimpleQuicksort | ❌ | |
cudaNvSci | ❌ | libnvscibuf.so not found nvsci 暂不支持 |
dmmaTensorCoreGemm | ❌ | PPU tensor core不支持Double MMA指令。 |
dxtc | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
freeImageInteropNPP | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
histEqualizationNPP | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
cannyEdgeDetectorNPP | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
convolutionTexture | ❌ | PPU不支持texture以及GL的功能 |
bindlessTexture | ❌ | |
bicubicTexture | ❌ | |
HSOpticalFlow | ❌ | |
simpleLayeredTexture | ❌ | |
simplePitchLinearTexture | ❌ | |
simpleSurfaceWrite | ❌ | |
simpleTexture | ❌ | |
simpleTexture3D | ❌ | |
simpleTextureDrv | ❌ | |
simpleCubemapTexture | ❌ | |
volumeFiltering | ❌ | |
volumeRender | ❌ | |
vulkanImageCUDA | ❌ | |
stereoDisparity | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
boxFilter | ❌ | |
bilateralFilter | ❌ | |
postProcessGL | ❌ | |
imageDenoising | ❌ | |
fluidsGL | ❌ | |
smokeParticles | ❌ | |
lineOfSight | ❌ | |
marchingCubes | ❌ | |
convolutionFFT2D | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
dct8x8 | ❌ | |
NV12toBGRandResize | ✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
SobelFilter | ❌ | |
FunctionPointers | ❌ | |
simpleVulkan | ❌ | Vulkan-Cuda 相关feature 暂不支持 |
simpleVulkanMMAP | ❌ | |
simpleD3D10 | ❌ | PPU不支持D3D graphics相关API |
fluidsD3D9 | ❌ | |
simpleD3D11 | ❌ | |
simpleD3D11Texture | ❌ | |
simpleD3D10RenderTarget | ❌ | |
simpleD3D10Texture | ❌ | |
SLID3D10Texture | ❌ | |
VFlockingD3D10 | ❌ | |
simpleD3D12 | ❌ | |
simpleD3D9Texture | ❌ | |
simpleD3D9 | ❌ | |
cudaGraphsPerfScaling | ❌ 12.5 ❌ 12.6 | cudaGraphUpload API暂不支持 |
simpleCallback | ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 | |
LargeKernelParameter | ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6 |