GDB使用指南(v2.1)
1. 概述
1.1 介绍
PPU-GDB是运行在Linux操作系统上的调试器。PPU-GDB是GNU GDB的扩展,该工具为开发人员提供了一种用于调试运行在实际硬件上的PPU应用程序的方法。
1.2 特点
PPU-GDB为用户提供一个调试工具,允许在同一个应用程序中同时调试GPU和CPU代码,使用PPU-GDB进行调试是对使用GNU GDB进行调试的扩展。在提供GNU GDB用来调试host端代码的同时,还提供额外的扩展功能来支持调试PPU device端代码。
PPU-GDB支持调试C/C++ CUDA应用程序和C/C++ HGGC应用程序。
PPU-GDB允许用户为单步CUDA/HGGC应用程序设置断点,还允许用户检查和修改硬件上运行的任何给定线程的内存和变量。
1.3 关于文档
本文档是PPU-GDB的主要介绍文档。本文档的其余部分将描述如何安装和使用PPU-GDB来调试CUDA/HGGC应用程序,以及如何使用添加到GNU GDB中的新的PPU-GDB命令,还提供了一些应用示例。用于调试host端应用程序的基本GDB命令,请参考GNU GDB。
2. 发布说明
2.1 不支持功能
不支持__managed__变量打印。
不支持MPS和MIG多实例GPU共享调试。
不支持多机多卡调试。
不支持ppu-gdb server远程debugging。
不支持watchpoint断点。
说明:单机多卡需要指定DEVICE进行调试,见3.3.1。
2.2 新增特性
查询PPUFocus的info命令与切换PPUFocus的switch命令前缀从
alippu更新为ppu。某些新机器移除info/switch ce/cu/pu/wid操作PPUFoucs命令。
2.3 修复已知问题
更新coredump文件OSABI Version,解析不兼容旧的SDK生成的coredump文件。
3. 入门指南
3.1 安装使用
PPU-GDB已集成在PPU_SDK软件包中,完成PPU_SDK安装后,可执行ppu-gdb检查版本信息,检查PPU-GDB是否可用:
ppu-gdb -vGNU gdb (GDB) 12.0.50.20211207-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
3.2 编译程序
3.2.1 Debug编译
clang是PPU llvm编译器的前端驱动程序,它提供了一种生成PPU-GDB正常工作所必需的调试信息的方法。在编译应用程序时-g选项必须传递给clang,以便使用PPU-GDB进行调试;例如:
clang -x hggc foo.cu -g -o foo
clang加入
-g选项会编译host和device双端的调试信息,且device端代码为O0优化等级,host端优化等级取决于host编译器默认优化等级。clang 加入
-g -O1/O2/O3显示优化等级,device端调试信息只携带行号信息,host端携带完整调试信息。只编译host端调试信息,
-Xarch_host -g。只编译device端调试信息,
-Xarch_device -g。Debug with lineinfo编译:clang加入
-gline-tables-only,可以通过-Xarch_host -gline-tables-only选择只编译携带host端行号信息,device端使用-Xarch_device。nvcc编译:host添加
-g,device添加-G,device只显示行号添加-lineinfo。
3.3 使用PPU-GDB
3.3.1 指定GPU Debuging
通过使用CUDA_VISIBLE_DEVICES环境变量指定哪个GPU对应用程序可见,可以实现GPU的独占使用。
CUDA_VISIBLE_DEVICES=1 ppu-gdb my_app
3.3.2 Attaching/Detaching
应用程序运行之后,查找当前程序进程ID进行Attachppu-gdb -p ID。
指定device id进行Attach,device id请使用ppu-smi获取ppu-gdb -p ID -deviceid=id。
4. 程序执行
程序的执行使用run命令,在PPU-GDB中启动应用程序的方式与在GNU GDB中启动应用程序的方式相同。本章介绍如何中断和单步CUDA/HGGC应用程序
4.1 中断程序
如果CUDA/HGGC应用程序出现挂起或陷入无限循环状态,可以通过按CTRL+C手动中断应用程序,此特性仅限于在调试器中运行的应用程序,无法进入和调试已在调试器外部启动的应用程序。
4.2 单步执行
PPU-GDB支持单步device端代码。但是与host端代码单步不同,device端代码在warp层面上进行单步执行,这表示单步执行device端kernel时,单步执行的是warp内的所有active线程,处于inactive中的线程不进行单步执行。
单步执行命令和GNU GDB功能相同:next(n)/nexti(ni)/step(s)/stepi(si)
next:单步到程序源代码的下一行,不进入被调函数。

nexti:单步到下一条ISA,不进入被调函数。

step: 单步到源程序代码的下一行,会可以进入被调函数,会跳过被调函数的prologue,停在被调函数源码的第一行。

stepi: 单步到下一条ISA,会可以进入被调函数,不会跳过被调函数的prologue,停在被调函数ISA的第一行。

5. 断点
5.1 根据符号设置断点
要在函数的入口处设置断点,使用break命令在函数或方法的名称前加上break命令:
普通函数
(ppu-gdb) break my_function
(ppu-gdb) break my_class::my_method
模版函数
(ppu-gdb) addWithCuda5.2 根据行号设置断点
要在特定的行号上设置断点,请使用以下语法:
(ppu-gdb) break my_file.cu:185如果指定的行对应于模板函数中的一条指令,则将创建多个断点,每个断点对应模板函数中的一个实例。
5.3 根据地址设置断点
要在指定地址上设置断点,使用break命令,以地址为参数:
(ppu-gdb) break *0x1afe34d0该地址可以是可执行程序中host或device上的任何地址。
5.4 根据kernel Entry设置断点
要在每个kernel的第一条指令上设置断点,设置break_on_launch选项为application:
(ppu-gdb) set ppu break_on_launch application5.5 条件断点
条件断点使用可选的if关键字或cond命令。
(ppu-gdb) break foo.cu:23 if threadIdx.x == 1 && i < 5
(ppu-gdb) cond 3 threadIdx.x == 1 && i < 5
// 寄存器断点,s0表示sreg0,v0表示vreg0
(ppu-gdb) cond 3 if $s0 == 0条件表达式可以引用任何变量,包括internal变量,如threadIdx和blockIdx,不允许在条件表达式中调用函数。
6. 查看程序状态
6.1 内存和变量
PPU-GDB print命令可以解析以下内存上的变量。
host端hggcMalloc出来的变量。
驻留在各种GPU内存区域中的数据,如shared memory、local memory和global memory。
运行时变量,例如threadIdx,blockIdx,warpSize等。
6.2 内存和可见性
根据变量类型和使用情况,变量可以存储在寄存器中,也可以存储在local、TSM、global中。可以打印任何变量的地址找出它的存储位置,并直接访问相关的内存。
打印地址语法:print &变量名,示例:
(ppu-gdb) p &item
$3 = (int *) private#0x24
打印值语法: _(type _)address_space#0x24,address_space可以是generic,tsm,private,默认gloabl不显示,示例:
(ppu-gdb) p *(int *) private#0x24
$3 = 0
(ppu-gdb) p *(int *) generic#0x24
$3 = 0
(ppu-gdb) p *(int *) tsm#0x0
$3 = 0
打印local数组示例: int local_memory[3]
(ppu-gdb) p local_memory
$7 = {10, 5, 15}
(ppu-gdb) p local_memory[0]
$8 = 10
(ppu-gdb) p local_memory[1]
$9 = 5
(ppu-gdb) p local_memory[2]
$10 = 156.3 查看当前状态
这些命令用于显示应用程序CUDA/HGGC状态的信息。info focus命令包括:
(ppu-gdb) help info ppu
Print informations about the current PPU activities. Available options:
devices : information about all the device
cus : information about all the cus in the current dispatch
warps : information about all the warps in the current dispatch
lanes : information about all the lanes in the current warp
kernels : information about all the active kernels
blocks : information about all the active blocks in the current kernel
threads : information about all the active threads in the current kernel
device-mem : information about tsm in the current thread block, e.g. info ppu device-mem --read [addr], info ppu device-mem --write [addr] val
dump : execute dump operation
restore : execute restore operationinfo ppu devices,查看所有的device信息,默认输出所有device,当set CUDA_VISIBLE_DEVICES时输出指定device。
(ppu-gdb) info ppu devices
Device_id Name Lanes/Warp Max_Regs/Lane Active_CEs
● 0 PPU#0 32 256 0x2
1 PPU#1 32 256 0x0info ppu kernels,查看正在运行的kernels。
(ppu-gdb) info ppu kernels
KernelIdx DeviceIdx Status GridDim BlockDim Invocation
● 0 0 active (1,1,1) (64,1,1) vectorAdd(float const, float const, float*, int)
1 0 active (1,1,1) (64,1,1) vectorMul(float const, float const, float*, int)info ppu cus,查看当前cu信息。
info ppu cus
CE CU Active Warps Mask
● 3 0 0x0000000000000001info ppu blocks,查看blocks,包含block的范围、数量与状态。
(ppu-gdb) info ppu blocks
Kernel BlockIdx To BlockIdx Count State
● 0 (0,0,0) to (0,0,0) 1 runninginfo ppu warps查看warps,包含所有的warp和每个warp的状态。
(ppu-gdb) info ppu warps
WarpId ActiveLaneMsk DivergentLaneMsk ActivePC Kernel ce cu BlockIdx FirstActiveThreadIdx
● 0 0x1 0x0 0xac800018 1 10 0 (0,0,0) (0,0,0)info ppu threads,查看threads,包含所有的threads。
(ppu-gdb) info ppu threads
Kernel BlockIdx ThreadIdx To BlockIdx ThreadIdx Count PC Filename Line
● 1 (0,0,0) (0,0,0) to (0,0,0) (0,0,0) 1 0xac800018 ../kernels/info/add.cu 7 info ppu lanes,查看lanes,包含warp内所有lane的状态。
(ppu-gdb) info ppu lanes
LaneId State EMSK PMSK YMSK WMSK QMSK PC ThreadIdx Exception
● 0 active 0x1 0x0 0x0 0x0 0x0 0xac800018 (0,0,0) None
1 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
2 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
3 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
4 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
5 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
6 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
7 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
8 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
9 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
10 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
11 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
12 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
13 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
14 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
15 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
16 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
17 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
18 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
19 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
20 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
21 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
22 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
23 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
24 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
25 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
26 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
27 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
28 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
29 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
30 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None
31 inactive 0x0 0x0 0x0 0x0 0x0 n/a n/a None6.4 切换状态
切换状态可以通过两种途径实现,第一种是通过原生gdb thread命令实现,另一种通过PPU扩展的switch focus命令实现。
6.4.1 扩展命令切换
switch focus包含以下命令:
(ppu-gdb) help ppu
Print or select the PPU focus.
List of ppu subcommands:
ppu all -- Print the current PPU all msg .
ppu block -- Print or select the current PPU block.
ppu device -- Print or select the current PPU device.
ppu grid -- Print or select the current PPU grid.
ppu kernel -- Print or select the current PPU kernel.
ppu lane -- Print or select the current PPU lane.
ppu thread -- Print or select the current PPU thread.
ppu warp -- Print or select the current PPU warp.
Type "help ppu" followed by ppu subcommand name for full documentation.
Type "apropos word" to search for commands related to "word".
Type "apropos -v word" for full documentation of commands related to "word".
Command name abbreviations are allowed if unambiguous.switch focus一系列命令有两种功能,一种不带参数时退化为查看当前状态命令,带参数时进行切换状态。
不带参数示例:
(ppu-gdb) ppu all
kernel 0, grid 0, block (0,0,0), thread (32,0,0), device 0, warp 1, lane 0
(ppu-gdb) ppu block
block (0,0,0)
(ppu-gdb) ppu device
device 0
(ppu-gdb) ppu grid
grid 0
(ppu-gdb) ppu kernel
kernel 0
(ppu-gdb) ppu lane
lane 0
(ppu-gdb) ppu thread
thread (32,0,0)
(ppu-gdb) ppu warp
warp 1带参数示例:
切换kernel:
(ppu-gdb) info ppu kernels
KernelIdx DeviceIdx Status GridDim BlockDim Invocation
● 0 0 active (1,1,1) (64,1,1) vectorAdd(float const, float const, float*, int)
1 0 active (1,1,1) (64,1,1) vectorMul(float const, float const, float*, int)
(ppu-gdb) ppu kernel 1
[Switching to thread 6, lane 0 (PPU focus kernel 1, grid 1, block(0,0,0), thread(0,0,0), device 0,warp 0, lane 0)]
#0 vectorMul (dA=0x72fc0000, dB=0x72fc1000, dC=0x72fc3000, numElements=64)
at ../kernels/info/multi_warps_and_multi_kernels.cu:52
52 int i = blockDim.x * blockIdx.x + threadIdx.x;
(ppu-gdb) info ppu kernels
KernelIdx DeviceIdx Status GridDim BlockDim Invocation
0 0 active (1,1,1) (64,1,1) vectorAdd(float const, float const, float*, int)
● 1 0 active (1,1,1) (64,1,1) vectorMul(float const, float const, float*, int)
(ppu-gdb) ppu kernel
kernel 1切换block,block支持单参数block x,以及三元参数block(x,y,z)。
(ppu-gdb) info ppu blocks
Kernel BlockIdx To BlockIdx Count State
● 0 (0,0,0) to (1,0,0) 2 running
(ppu-gdb) ppu block
block (1,0,0)
(ppu-gdb) ppu block 0
[Switching to thread 5, lane 0 (PPU focus kernel 0, grid 0, block(0,0,0), thread(0,0,0), device 0, warp 0, lane 0)]
#0 matMultiply (a=0x72fc0000, b=0x72fc1000, c=0x72fc2000, width=4, height=4)
at /mnt/ssd/huangzhufeng.hzf/0309/GdbTest/kernels/cuda/matmul.cu:6
6 int tx = threadIdx.x + blockIdx.x * blockDim.x;
(ppu-gdb) ppu block (1,0,0)
[Switching to thread 4, lane 0 (PPU focus kernel 0, grid 0, block(1,0,0), thread(0,0,0), device 0, warp 0, lane 0)]
#0 matMultiply (a=0x72fc0000, b=0x72fc1000, c=0x72fc2000, width=4, height=4)
at /mnt/ssd/huangzhufeng.hzf/0309/GdbTest/kernels/cuda/matmul.cu:6
6 int tx = threadIdx.x + blockIdx.x * blockDim.x;
(ppu-gdb) ppu block (0,0,0)
[Switching to thread 5, lane 0 (PPU focus kernel 0, grid 0, block(0,0,0), thread(0,0,0), device 0, warp 0, lane 0)]
#0 matMultiply (a=0x72fc0000, b=0x72fc1000, c=0x72fc2000, width=4, height=4)
at /mnt/ssd/huangzhufeng.hzf/0309/GdbTest/kernels/cuda/matmul.cu:6
6 int tx = threadIdx.x + blockIdx.x * blockDim.x;
(ppu-gdb) ppu block
block (0,0,0)切换thread,thread支持单参数thread x,以及三元参数thread(x,y,z)。
(ppu-gdb) info ppu threads
Kernel BlockIdx ThreadIdx To BlockIdx ThreadIdx Count PC Filename Line
● 0 (0,0,0) (0,0,0) to (1,0,0) (1,3,0) 16 0x77000160
/mnt/ssd/huangzhufeng.hzf/0309/GdbTest/kernels/cuda/matmul.cu 6
(ppu-gdb) ppu thread
thread (0,0,0)
(ppu-gdb) ppu thread (1,0,0)
[Switching to thread 5, lane 1 (PPU focus kernel 0, grid 0, block(0,0,0), thread(1,0,0), device 0, warp 0, lane 1)]
#0 matMultiply (a=0x72fc0000, b=0x72fc1000, c=0x72fc2000, width=4, height=4)
at /mnt/ssd/huangzhufeng.hzf/0309/GdbTest/kernels/cuda/matmul.cu:6
6 int tx = threadIdx.x + blockIdx.x * blockDim.x;
(ppu-gdb) ppu thread 0
[Switching to thread 5, lane 0 (PPU focus kernel 0, grid 0, block(0,0,0), thread(0,0,0), device 0, warp 0, lane 0)]
#0 matMultiply (a=0x72fc0000, b=0x72fc1000, c=0x72fc2000, width=4, height=4)
at /mnt/ssd/huangzhufeng.hzf/0309/GdbTest/kernels/cuda/matmul.cu:6
6 int tx = threadIdx.x + blockIdx.x * blockDim.x;切换warp。
(ppu-gdb) info ppu warps
WarpId ActiveLaneMsk DivergentLaneMsk ActivePC Kernel ce cu BlockIdx FirstActiveThreadIdx
● 0 0xffffffff 0x0 0x770003b8 0 10 0 (0,0,0) (0,0,0)
1 0xffffffff 0x0 0x770003b8 0 10 0 (0,0,0) (32,0,0)
(ppu-gdb) ppu warp 1
[Switching to thread 5, lane 0 (PPU focus kernel 0, grid 0, block(0,0,0), thread(32,0,0), device 0, warp 1, lane 0)]
#0 vectorAdd (dA=0x72fc0000, dB=0x72fc1000, dC=0x72fc2000, numElements=64)
at ../kernels/info/multi_warps_and_multi_kernels.cu:40
40 int i = blockDim.x * blockIdx.x + threadIdx.x;
(ppu-gdb) info ppu warps
WarpId ActiveLaneMsk DivergentLaneMsk ActivePC Kernel ce cu BlockIdx FirstActiveThreadIdx
0 0xffffffff 0x0 0x770003b8 0 10 0 (0,0,0) (0,0,0)
● 1 0xffffffff 0x0 0x770003b8 0 10 0 (0,0,0) (32,0,0)切换lane。
(ppu-gdb) info ppu lanes
LaneId State PC ThreadIdx Exception
● 0 active 0x77000160 (0,0,0) None
1 active 0x77000160 (1,0,0) None
2 active 0x77000160 (0,1,0) None
3 active 0x77000160 (1,1,0) None
4 active 0x77000160 (0,2,0) None
5 active 0x77000160 (1,2,0) None
6 active 0x77000160 (0,3,0) None
7 active 0x77000160 (1,3,0) None
8 inactive n/a n/a None
9 inactive n/a n/a None
10 inactive n/a n/a None
11 inactive n/a n/a None
12 inactive n/a n/a None
13 inactive n/a n/a None
14 inactive n/a n/a None
15 inactive n/a n/a None
16 inactive n/a n/a None
17 inactive n/a n/a None
18 inactive n/a n/a None
19 inactive n/a n/a None
20 inactive n/a n/a None
21 inactive n/a n/a None
22 inactive n/a n/a None
23 inactive n/a n/a None
24 inactive n/a n/a None
25 inactive n/a n/a None
26 inactive n/a n/a None
27 inactive n/a n/a None
28 inactive n/a n/a None
29 inactive n/a n/a None
30 inactive n/a n/a None
31 inactive n/a n/a None
(ppu-gdb) info lane 7
Id State Target Id Frame
7 A PPU focus kernel 0, grid 0, block(0,0,0), thread(1,3,0), device 0, warp 0, lane 7 matMultiply (a=0x72fc0000, b=0x72fc1000, c=0x72fc2000, width=4, height=4)
at /mnt/ssd/huangzhufeng.hzf/0309/GdbTest/kernels/cuda/matmul.cu:6
(ppu-gdb) info lane 8
Id State Target Id Frame
8 I PPU focus kernel 0, grid 0, block(0,0,0), thread(0,0,0), device 0, warp 0, lane 8 (inactive)
(ppu-gdb) info lane 0
Id State Target Id Frame
● 0 A PPU focus kernel 0, grid 0, block(0,0,0), thread(0,0,0), device 0, warp 0, lane 0 matMultiply (a=0x72fc0000, b=0x72fc1000, c=0x72fc2000, width=4, height=4)
at /mnt/ssd/huangzhufeng.hzf/0309/GdbTest/kernels/cuda/matmul.cu:6
(ppu-gdb) info ppu lanes
LaneId State PC ThreadIdx Exception
○ 0 active 0x77000160 (0,0,0) None
1 active 0x77000160 (1,0,0) None
2 active 0x77000160 (0,1,0) None
3 active 0x77000160 (1,1,0) None
4 active 0x77000160 (0,2,0) None
5 active 0x77000160 (1,2,0) None
6 active 0x77000160 (0,3,0) None
7 active 0x77000160 (1,3,0) None
8 inactive n/a n/a None
9 inactive n/a n/a None
10 inactive n/a n/a None
11 inactive n/a n/a None
12 inactive n/a n/a None
13 inactive n/a n/a None
14 inactive n/a n/a None
15 inactive n/a n/a None
16 inactive n/a n/a None
17 inactive n/a n/a None
18 inactive n/a n/a None
19 inactive n/a n/a None
20 inactive n/a n/a None
21 inactive n/a n/a None
22 inactive n/a n/a None
23 inactive n/a n/a None
24 inactive n/a n/a None
25 inactive n/a n/a None
26 inactive n/a n/a None
27 inactive n/a n/a None
28 inactive n/a n/a None
29 inactive n/a n/a None
30 inactive n/a n/a None
31 inactive n/a n/a None原生GDB切换。
切换warp。
首先info threads查找相应的warp,以PPU Thread开头的为GPU进程,PPU Thread 0.1,表示grid为0,1表示LWP 95585中thread id为1,是gdb内部ptid的编号。(0,0,0)/1表示当前block坐标,1表示warp 1。因此切换warp选择相应的block内的warp进行切换即可,例如需要切换到(0,0,0)/1的warp,执行命令thread 5 即可。
ppu-gdb) info threads
Id Target Id Frame
1 Thread 0x7ffff7fdbb80 (LWP 95444) "mwmk_g" hggc::_HGdevice::waitForFence (this=,
pFence=, flags=2, timeout=0) at ../driver/src/core/hggc_device.cpp:274
2 Thread 0x7ffff4d79700 (LWP 95584) "mwmk_g" 0x00007ffff73f2ad3 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
3 Thread 0x7ffff4578700 (LWP 95585) "mwmk_g" 0x00007ffff73f2ad3 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
● 4 PPU Thread 0.1 (0,0,0)/0 "mwmk_g" vectorAdd (dA=0x72fc0000, dB=0x72fc1000, dC=0x72fc2000, numElements=64)
at ../kernels/info/multi_warps_and_multi_kernels.cu:40
5 PPU Thread 0.2 (0,0,0)/1 "mwmk_g" vectorAdd (dA=0x72fc0000, dB=0x72fc1000, dC=0x72fc2000, numElements=64)
at ../kernels/info/multi_warps_and_multi_kernels.cu:40
(ppu-gdb) thread 5
[Switching to thread 5, lane 0 (PPU focus kernel 0, grid 0, block(0,0,0), thread(32,0,0), device 0, warp 1, lane 0)]
#0 vectorAdd (dA=0x72fc0000, dB=0x72fc1000, dC=0x72fc2000, numElements=64)
at ../kernels/info/multi_warps_and_multi_kernels.cu:40
40 int i = blockDim.x * blockIdx.x + threadIdx.x;切换lane,见6.4.1 swtich lane。
6.5 反汇编
设备ISA代码可以使用标准的GDB命令(例如x/i和display/i)进行显示。
(ppu-gdb) display /i $pc
1: x/i $pc
=> 0x6a000350 <_Z10vector_addPKiS0_Pii+768>: v.shrl.b32 vreg4, vreg33, 0x3
(ppu-gdb) x/4i $pc-32
0x6a000330 <_Z10vector_addPKiS0_Pii+736>: v.mov.b32 vreg7, 0x8
0x6a000338 <_Z10vector_addPKiS0_Pii+744>: v.madw.i64.i32 vreg[4:5], c0x1, vreg7, vreg[4:5]
0x6a000340 <_Z10vector_addPKiS0_Pii+752>: vmem.st.b32.ga.kp3 vreg6, [vreg5, vreg4]
@sreg[44:45]
0x6a000348 <_Z10vector_addPKiS0_Pii+760>: vmem.ld.b32.ga.kp3 vreg10, [vreg5, vreg4]
@sreg[44:45]
(ppu-gdb) disas $pc,+16
Dump of assembler code from 0x6a000350 to 0x6a000360:
=> 0x000000006a000350 <_Z10vector_addPKiS0_Pii+768>: v.shrl.b32 vreg4, vreg33, 0x30x
000000006a000358 <_Z10vector_addPKiS0_Pii+776>: v.add.i32 vreg4, vreg4, 0x186.6 寄存器
可以使用标准的GDB命令(如信息寄存器)检查/修改设备寄存器。sreg使用前缀s,vreg使用前缀v,特殊寄存器使用info all-registers可以看到所有寄存器内容,包含特殊寄存器名称。
(ppu-gdb) info registers pc s0 s1 v1
pc 0x6a000350 0x6a000350 <vector_add(int const, int const, int*, int)+768>
s0 0x60000 393216
s1 0x0 0
v1 0x0 0ppu-gdb原生读取vreg的命令结果显示是vreg在当前lane的值,如果想要查看整个vreg的值。
(ppu-gdb) ppu register v0
register v0 addr : 0x49560578
(ppu-gdb) x/32x 0x49560578
0x49560578: 0x8d1971a0 0xfe85f1a0 0x73f291a0 0x73fc29a0
0x49560588: 0x67a2e9a0 0x086839a0 0x08617da0 0xa1336da0
0x49560598: 0xc94e5da0 0xf6b01da0 0xb3f68da0 0xd74ac5a0
0x495605a8: 0x8ec295a0 0xd6d3f5a0 0x4afcd1a0 0xda5245a0
0x495605b8: 0xc174dda0 0x81b9a5a0 0xa6b8a5a0 0x71ed35a0
0x495605c8: 0x82c439a0 0x135611a0 0x0bf297a0 0x569f0aa0
0x495605d8: 0x77be7aa0 0xb5aff6a0 0x879c36a0 0x32caaea0
0x495605e8: 0x63671920 0x68510920 0xcf94a920 0x944c0120
(ppu-gdb):::
ppu-gdb查看shared memory base以及使用的shared memory大小,单位为byte。
(ppu-gdb) info registers TSM_BASE TSM_SIZE
TSM_BASE 0x3c280 246400
TSM_SIZE 0x0 07. 错误检查(exception)
ppu-gdb遵循gnu-gdb对signal处理的原则,正常执行发生exception时,ppu-gdb会停下来并支持查看程序出错现场,之后程序不能正常继续执行。exception出现的精确性描述:最快上报中断的warp被ppu-gdb捕获并报告给用户,出现exception的位置要小于等于停下来的位置PC。
exception分类 |
PPU_EXCEPTION_0: "Invalid instruction" |
PPU_EXCEPTION_1: "Invalid SIMT candicate mask" |
PPU_EXCEPTION_2: "Invalid barrier parameter" |
PPU_EXCEPTION_3: "Invalid warp sync" |
PPU_EXCEPTION_4: "Hardware hang" |
PPU_EXCEPTION_5: "Invalid vreg" |
PPU_EXCEPTION_6: "Invalid sreg" |
PPU_EXCEPTION_7: "Invalid vreg alignment" |
PPU_EXCEPTION_8: "Invalid sreg alignment" |
PPU_EXCEPTION_9: "TSM out of range" |
PPU_EXCEPTION_10: "Invalid TSM access alignment" |
PPU_EXCEPTION_11: "Global mem out of range" |
PPU_EXCEPTION_12: "Invalid global mem access alignment" |
PPU_EXCEPTION_13: "Invalid atomic op on system mem" |
PPU_EXCEPTION_14: "AIU_ld TSM size out of range" |
PPU_EXCEPTION_15: "AIU_ld cube out of range" |
PPU_EXCEPTION_16: "KI out of range" |
PPU_EXCEPTION_17: "Invalid PA (including c2c request)" |
PPU_EXCEPTION_18: "Invalid PA on page-walk" |
PPU_EXCEPTION_19: "Invalid page" |
PPU_EXCEPTION_20: "Read permission violation" |
PPU_EXCEPTION_21: "Write permission violaton" |
PPU_EXCEPTION_22: "Exec permission violation" |
PPU_EXCEPTION_23: "Invalid VA" |
PPU_EXCEPTION_24: "KI invalid PA" |
PPU_EXCEPTION_25: "KI invalid PA on page-walk" |
PPU_EXCEPTION_26: "KI invalid page" |
PPU_EXCEPTION_27: "KI read permission violation" |
PPU_EXCEPTION_28: "KI exec permission violation" |
PPU_EXCEPTION_29: "KI invalid VA" |
PPU_EXCEPTION_30: "Hbm ECC error on page-walk" |
PPU_EXCEPTION_31: "KI hbm ECC error on page-walk" |
PPU_EXCEPTION_32: "Hbm ECC error on data" |
PPU_EXCEPTION_33: "KI hbm ECC error on data" |
PPU_EXCEPTION_34: "Prefetch_error" |
PPU_EXCEPTION_35: "Invalid RLSU ld/st address" |
示例:
bash$ ppu-gdb exp_g -q
Reading symbols from exp_g...
(ppu-gdb) r
Starting program: /mnt/ssd/huangzhufeng.hzf/0309/GdbTest/build/exp_g
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff4d79700 (LWP 101571)]
[New Thread 0x7ffff4578700 (LWP 101572)]
[New PPU Thread 0.1 (0,0,0)/0]
Thread 4 "exp_g" received signal PPU_EXCEPTION_12, Invalid global mem access alignment.
[Switching to thread 4, lane 0 (PPU focus kernel 0, grid 0, block(0,0,0), thread(0,0,0), device 0, warp 0, lane 0)]
0x0000000077000188 in Decrease (a=0x72fc1000, b=0x72fc2000, c=0x72fc0000) at ../kernels/error_checking/Invalid_global_mem_access_alignment.cu:11
11 c = a - b+global_memory;8. Deivce Coredump
只支持device端coredump生成,有host端coredump需求的请参考标准host coredump生成方式。
coredump提供的环境变量有:
export ALIPPU_ENABLE_COREDUMP_ON_EXCEPTION=1,dump发生exception时的所有资源使用情况,包括各个memory使用,register使用,以及当前kernel对应的ELF文件。
export ALIPPU_ENABLE_COREDUMP_ON_EXCEPTION=1,export ALIPPU_ENABLE_LIGHTWEIGHT_COREDUMP=1,dump发生exception时的部分资源使用情况,包括shared memory使用,register使用,以及当前kernel对应的ELF文件。
export ALIPPU_COREDUMP_FILE=path/%p.%h.%t,指定发生exception的路径,默认当前路径下生成%p.%h.%t文件。
%p: 程序对应的进程PID
%h: 程序对应的名称
%t: 时间戳
示例:
bash$ export ALIPPU_ENABLE_COREDUMP_ON_EXCEPTION=1
// run application
bash$ ./my_application_core
[umd error]: coredump succeeded, file was written to hggc.core.17414.In_g.1700805817! (./core/coredumpelf.cpp:570:CreateCoreFile)
// 默认当前目录生成core文件,hggc.core.17414.In_g.1700805817
// 读取core文件
bash$ ppu-gdb
PTG (R) PPU Debugger
release unknown, build version: 12.0.50-20231122--git
Portions Copyright (C) 2023-2023 PTG Corporation
GNU gdb (GDB) 12.0.50.20211207-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(ppu-gdb) target ppucore hggc.core.17414.In_g.1700805817
Opening GPU coredump: hggc.core.17414.In_g.1700805817
linux_driver_t::linux_driver_t: m_os_pid == 0
xargs: Cannot open input file ‘/proc/0/environ’: No such file or directory
warning: PPU-GDB requires target-async, GPU debugging is disabled
PPU_EXCEPTION_12, Invalid global mem access alignment
#0 0x000000005a0000b8 in Decrease (a=0x59d41000, b=0x59d42000, c=0x59d40000)
at ../kernels/error_checking/Invalid_global_mem_access_alignment.cu:14
14 *c = *a - *b+*global_memory;
(ppu-gdb) 9. Deivce Autostep
ppu-gdb 语法
autostep [LOCATION]
autostep [LOCATION] for LENGTH [lines|instructions]LOCATION可以是用来指定断点位置,例如行号、函数名或前面带星号的指令地址。如果没有指定LOCATION,则使用当前指令地址。
LENGTH指定自动步进区间的行数或指令数(行和指令可以缩写,例如,l或i)。如果未指定长度类型,则行为默认值。如果省略for子句,则默认为1行。
astep可以用作autostep命令的别名。
在自动步进过程中对函数的调用将被跳过。
在出现分支的情况下,自动步进区间的长度由每个warp中第一个active lane的行数或指令数决定。
如果在自动步进区间内出现断点,则在程序resume时,命中断点的warp将不会继续自动步进。
不支持重叠的autostep断点。
ppu-gdb命令
info autosteps列出所有的autostep断点和普通断点。类似info breakpoints.
disable autosteps, 等价disable breakpoints n.
delete autosteps n,等价delete breakpoints n
