开源样例集兼容状态（v1.5）-真武 PPU 云服务(ppu)-阿里云帮助中心

说明

数据参照cuda sample tag v12.6版本。

CUDA Sample

Status

Comments

simpleVoteIntrinsics

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

vectorAdd_nvrtc

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

deviceQuery

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

reduction

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

tf32TensorCoreGemm

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

shfl_scan

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

warpAggregatedAtomicsCG

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

concurrentKernels

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

bf16TensorCoreGemm

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

bandwidthTest

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

UnifiedMemoryPerf

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

binaryPartitionCG

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

conjugateGradientMultiBlockCG

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

cudaCompressibleMemory

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

cudaTensorCoreGemm

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

globalToShmemAsyncCopy

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

matrixMul

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

matrixMulDrv

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

nvJPEG

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

nvJPEG_encoder

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

p2pBandwidthLatencyTest

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleAWBarrier

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleCudaGraphs

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleZeroCopy

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleDrvRuntime

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

vectorAddMMAP

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleIPC

❌

单卡pass。受限于PPU多卡直连与NV switch架构的区别，8/16卡在此 case 场景上有已知死锁问题。

streamOrderedAllocation

✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

streamOrderedAllocationIPC

❌

单卡pass。受限于PPU多卡直连与NV switch架构的区别，8/16卡在此 case 场景上有已知死锁问题。

simplePrintf

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleTemplates

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleOccupancy

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

topologyQuery

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

clock

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

cppIntegration

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

dwtHaar1D

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

vectorAdd

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

vectorAddDrv

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

scalarProd

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleVoteIntrinsics_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

SobolQRNG

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleCooperativeGroups

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleAtomicIntrinsics

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

cudaOpenMP

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

fp16ScalarProduct

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

inlinePTX

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleMPI

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

template

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleHyperQ

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

reductionMultiBlockCG

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

threadFenceReduction

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

mergeSort

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

convolutionSeparable

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

FDTD3d

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

matrixMulCUBLAS

❌ 11.5 ❌ 11.6 ❌ 11.7 ❌ 11.8 ❌ 12.0 ❌ 12.1 ❌ 12.2 ❌ 12.3 ❌ 12.4 ❌ 12.5 ❌ 12.6

计算结果跟nv存在精度上的差异，原因在于matrixMul计算方法差异导致，PPU选择了性能更好的实现。

sortingNetworks

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

fastWalshTransform

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

alignedTypes

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

deviceQueryDrv

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

scan

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

BlackScholes

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

transpose

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

histogram

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

MC_SingleAsianOptionP

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

MC_EstimatePiInlineP

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

quasirandomGenerator

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

binomialOptions

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

MonteCarloMultiGPU

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

UnifiedMemoryStreams

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

asyncAPI

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

c++11_cuda

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

cppOverload

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

cuHook

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

eigenvalues

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

interval

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

newdelete

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

radixSortThrust

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

segmentationTreeThrust

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleAssert

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleAttributes

❌ 11.5 ❌ 11.6 ❌ 11.7 ❌ 11.8 ❌ 12.0 ❌ 12.1 ❌ 12.2 ❌ 12.3 ❌ 12.4 ❌ 12.5 ❌ 12.6

能pass，但执行时间会超长，原因在于

"Maximum y- or z-dimension of a grid of thread blocks" 在NV上是65535, 而PPU上是2^31-1，sample代码中依赖到这个值，导致运行时间超级长。

simpleMultiCopy

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleMultiGPU

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleP2P

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleSeparateCompilation

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleStreams

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

threadMigration

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

vectorAddDrv

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

binomialOptions_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

clock_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

inlinePTX_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

matrixMul_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

quasirandomGenerator_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleAssert_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleAtomicIntrinsics_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleTemplates_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleVoteIntrinsics_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

BlackScholes_nvrtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

libNVVM

❌ 12.3 ❌ 12.4 ❌ 12.5 ❌ 12.6

目前PPU兼容llvm ir for nvgpu的定义，但不完全兼容nv官方的nvvm ir定义。

StreamPriorities

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

MC_EstimatePiInlineQ

❌

curand相关api 暂未完整支持。

MC_EstimatePiP

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

MC_EstimatePiQ

❌

curand相关api (curandCreateGenerator) 暂未完整支持。

MersenneTwisterGP11213

❌

curand相关api 暂未完整支持。

batchCUBLAS

❌

curand相关api (cublasSetMatrix) 暂未完整支持。

batchedLabelMarkersAndLabelCompressionNPP

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

boxFilterNPP

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

conjugateGradientCudaGraphs

❌

cusparse 库暂未完整支持

conjugateGradient

❌

conjugateGradientMultiDeviceCG

❌

conjugateGradientPrecond

❌

conjugateGradientUM

❌

cuSolverDn_LinearSolver

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

cuSolverRf

❌

cusparse库(cusolverSpCreate/cusolverSpCreate) 暂未完整支持

cuSolverSp_LinearSolver

❌

cuSolverSp_LowlevelCholesky

❌

cuSolverSp_LowlevelQR

❌

graphMemoryFootprint

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

graphMemoryNodes

❌

关于mem allocate和mem free的apis，目前头文件未到该版本故还没定义

immaTensorCoreGemm

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

jacobiCudaGraphs

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

matrixMulDynlinkJIT

❌

DynlinkJIT暂时不支持

memMapIPCDrv

❌

cuMemGetAllocationGranularity 获取到的 granularity 是2MB，ppu设计的是 8MB，代码中检查部分需要修改。

nbody

❌

Graphics, openGL 相关api 不支持

Mandelbrot

❌

particles

❌

oceanFFT

❌

simpleCUDA2GL

❌

simpleGL

❌

recursiveGaussian

❌

ptxjit

❌

PPU不支持ptx

randomFog

❌

缺少grahical display的能力。

simpleCUBLAS

❌

CUBLAS API实现不全

simpleCUBLASXT

❌

simpleCUBLAS_LU

✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleCUFFT

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

simpleCUFFT_2d_MGPU

❌

cuFFT部分支持

simpleCUFFT_MGPU

❌

simpleCUFFT_callback

❌

systemWideAtomics

❌

FilterBorderControlNPP

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

watershedSegmentationNPP

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

streamOrderedAllocationP2P

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

EGLStream_CUDA_CrossGPU

❌

EGL 相关API 不支持

EGLStream_CUDA_Interop

❌

EGLStreams_CUDA_Interop

❌

EGLSync_CUDAEvent_Interop

❌

GLES 不支持

cuDLALayerwiseStatsStandalone

❌

cuDLALayerwiseStatsHybrid

❌

simpleGLES_EGLOutput

❌

fluidsGLES

❌

nbody_opengles

❌

simpleGLES

❌

simpleGLES_screen

❌

nbody_screen

❌

cuDLAHybridMode

❌

cuDLAStandaloneMode

❌

cuDLAErrorReporting

❌

cudaNvSciNvMedia

❌

cdpAdvancedQuicksort

❌

CUDA CDP特性不支持

cdpBezierTessellation

❌

cdpQuadtree

❌

cdpSimplePrint

❌

cdpSimpleQuicksort

❌

cudaNvSci

❌

libnvscibuf.so not found

nvsci 暂不支持

dmmaTensorCoreGemm

❌

PPU tensor core不支持Double MMA指令。

dxtc

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

freeImageInteropNPP

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

histEqualizationNPP

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

cannyEdgeDetectorNPP

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

convolutionTexture

❌

PPU不支持texture以及GL的功能

bindlessTexture

❌

bicubicTexture

❌

HSOpticalFlow

❌

simpleLayeredTexture

❌

simplePitchLinearTexture

❌

simpleSurfaceWrite

❌

simpleTexture

❌

simpleTexture3D

❌

simpleTextureDrv

❌

simpleCubemapTexture

❌

volumeFiltering

❌

volumeRender

❌

vulkanImageCUDA

❌

stereoDisparity

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

boxFilter

❌

bilateralFilter

❌

postProcessGL

❌

imageDenoising

❌

fluidsGL

❌

smokeParticles

❌

lineOfSight

❌

marchingCubes

❌

convolutionFFT2D

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

dct8x8

❌

NV12toBGRandResize

✅ 11.1 ✅ 11.2 ✅ 11.3 ✅ 11.4 ✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

SobelFilter

❌

FunctionPointers

❌

simpleVulkan

❌

Vulkan-Cuda 相关feature 暂不支持

simpleVulkanMMAP

❌

simpleD3D10

❌

PPU不支持D3D graphics相关API

fluidsD3D9

❌

simpleD3D11

❌

simpleD3D11Texture

❌

simpleD3D10RenderTarget

❌

simpleD3D10Texture

❌

SLID3D10Texture

❌

VFlockingD3D10

❌

simpleD3D12

❌

simpleD3D9Texture

❌

simpleD3D9

❌

cudaGraphsPerfScaling

❌ 12.5 ❌ 12.6

cudaGraphUpload API暂不支持

simpleCallback

✅ 11.5 ✅ 11.6 ✅ 11.7 ✅ 11.8 ✅ 12.0 ✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6

LargeKernelParameter

✅ 12.1 ✅ 12.2 ✅ 12.3 ✅ 12.4 ✅ 12.5 ✅ 12.6