hgobjdump使用指南(v2.1)
1. 概述
PPU中编译生成的Binary(可执行文件或者动态链接库文件)中会同时包含运行于CPU上的Host代码和运行在PPU上的Device代码。相应的,编译产生的Binary也分为两层,PPU Binary是两层Binary的结合。PPU Binary使用通用的ELF格式,在格式上与64bit X86_64 ELF兼容,在Host Binary中新增加了一个.hggc_fatbin段用于保存Device Binary。单独的Device Binary也使用ELF格式,大部分的段使用上与标准ELF一致,同时有一些hggc_info section用于保存运行时会用到的一些信息,全局信息保存在.hg_info中,各kernel的信息保存在各自的.hg_info._name_段中,其中_name_后缀为kernel的mangling name。
hgobjdump可以从PPU Binary(单独的Device Binary或者完整的PPU Binary)中提取Device代码的相关信息,并以可读的格式呈现。hgobjdump的输出包括每个function的汇编代码、symbol table、relocation以及hggc特殊的一些section。
2. 使用介绍
2.1 输入文件类型
PPU SDK编译生成的可执行文件或者动态链接库文件。
单独一个Device Binary。
hggc fatbin文件(一般包含一个Device Binary和对应的IR文件)。
2.2 命令行参数
下表包含了支持的 hgobjdump 命令行选项,以及每个选项功能的描述。每个选项都有一个长名称和至少一个短名称,可以互换使用。
Option(long) | Option(short) | Desctiption |
--list-elf | -lelf, -l | List all the ELF files and kernel functions available in the fatbin. |
--list-all | -lall | List all device functions available in the fatbin. Implies list elf. |
--list-bc | -lbc, -b | List all device functions available in the fatbin. |
--extract-elf= | -xelf, -x | Extract ELF file(s) using file idx and save as file(s). Use '0' to extract all files. To get the list of ELF files use -lelf option. |
--extract-bc= | -xbc | Extract llvm-bc file(s) using file idx and save as file(s). Use '0' to extract all bc files. To get the list of bc files use -lbc option. |
--dump-elf | -elf, -e | Dump ELF Object function sections. |
--dump-isa | -isa, -a | Dump assembly for a single hgbin file or all hgbin files embedded in the binary. |
--dump-function= | -func, -f | Dump specific function(must use unmangle name). |
--line-numbers | -line, -n | Display source line numbers with disassembly. Implies disassemble isa. |
--dump-resource-usage= | -res-usage, -i | Dump specific function(must use unmangle name) resource usage. Use 'all' to dump all functions resource usage. |
--dump-elf-symbols= | -symbols, | Dump ELF files symbol names using ELF file idx. Use '0' to dump all files. To get the list of ELF files use -lelf option. |
--demangle | -d | Demangle function names for --list-elf. |
--help | -h | Print this help information on this tool. |
2.3 用法示例
下面是一些使用option的示例:
$ hgobjdump -lelf hggc.out
hggc.out: file format ELF64-x86-64
ELF FILE 1:
Func 1: _Z4MathPfS_$ hgobjdump -lelf -demangle hggc.out
hggc.out: file format ELF64-x86-64
ELF FILE 1:
Func 1: Math(float*, float*)$ hgobjdump -isa hggc.out
hggc.out: file format ELF64-x86-64
ELF FILE 1:
Disassembly of section .text:
0000000000000110 _Z4MathPfS_:
110: 00 00 00 00 00 00 f1 08 s.wait pipe_flush
118: 00 00 00 00 20 08 88 56 v.mov.alllane.b32 vreg32, 0x0
120: 00 00 00 00 60 48 08 50 v.mov.b32 vreg33, 0x0
128: ff 0f 00 00 20 09 00 26 s.and.b32 sreg0, sreg36, 0xfff
130: 00 00 00 00 00 09 c0 57 v.tid.init vreg0, sreg[36:37]
138: 00 00 00 13 00 00 40 2a s.mull.i32 sreg0, sreg0, sreg38
140: ff 0f 00 00 20 00 00 9a v.and.b32 vreg0, vreg0, 0xfff
148: 9d 98 bb 3b a0 80 00 50 v.mov.b32 vreg2, 0x3bbb989d
150: 00 00 00 00 00 00 40 82 v.add.i32 vreg0, sreg0, vreg0
158: 00 00 7c 43 e0 c0 00 50 v.mov.b32 vreg3, 0x437c0000
160: 3b aa b8 3f 20 01 01 50 v.mov.b32 vreg4, 0x3fb8aa3b
168: 60 70 a5 32 60 41 01 50 v.mov.b32 vreg5, 0x32a57060
170: 48 01 01 00 24 40 80 c8 vmem.ld.b32.sign vreg1, [0x0 + vreg0 * 0x4] @sreg[10:11]
178: 00 00 00 00 00 00 c4 08 s.wait vldcnt(0)
180: 00 80 3e 81 40 80 80 62 v.fma.f32.rtte vreg2, vreg1, vreg2, c0x3f000000
188: 00 00 00 00 a0 80 80 92 v.max.f32 vreg2, vreg2, 0x0
190: 00 00 80 3f a0 80 80 8a v.min.f32 vreg2, vreg2, 0x3f800000
198: 00 40 bf 81 88 80 80 62 v.fma.f32.rtn vreg2, vreg2, vreg3, c0x4b400001
1a0: 00 30 00 81 81 c0 80 8a v.min.f32 vreg3, !vreg2.reuse, !vreg2
1a8: 17 00 00 00 a0 80 80 a2 v.shll.b32 vreg2, vreg2, 0x17
1b0: 7f 00 40 4b e0 c0 80 82 v.add.f32 vreg3, vreg3, 0x4b40007f
1b8: 00 c0 40 82 41 c0 80 62 v.fma.f32.rtte vreg3, vreg1.reuse, vreg4, vreg3
1c0: 00 c0 c0 82 40 40 80 62 v.fma.f32.rtte vreg1, vreg1, vreg5, vreg3
1c8: 00 00 00 80 40 40 c0 58 v.exp2.f32 vreg1, vreg1
1d0: 00 00 00 81 40 40 80 aa v.mul.f32 vreg1, vreg1, vreg2
1d8: 08 01 01 00 24 40 80 cc vmem.st.b32.sign vreg1, [0x0 + vreg0 * 0x4] @sreg[8:9]
1e0: 00 00 00 00 00 00 40 08 s.exit
1e8: 00 00 00 00 00 00 00 08 s.nop$ hgobjdump -xelf=1 libacompute.so
libacompute.so: file format ELF64-x86-64
ELF FILE 1:
Extract File: libacompute.so_ELF_File_1$ hgobjdump -lelf libacompute.so_ELF_File_1
libacompute.so_ELF_File_1: file format ELF64-alippu
Func 1: _Z11mm_kernelTTIdddEvPKPT0_PKPKT_S8_iiiff
Func 2: _Z11mm_kernelNTIdddEvPKPT0_PKPKT_S8_iiiff
Func 3: _Z11mm_kernelNNIfffEvPKPT0_PKPKT_S8_iiiff
Func 4: _Z11mm_kernelTNIfffEvPKPT0_PKPKT_S8_iiiT1_S9_
Func 5: _Z11mm_kernelNTIfffEvPKPT0_PKPKT_S8_iiiff
Func 6: _Z11mm_kernelTNIdddEvPKPT0_PKPKT_S8_iiiT1_S9_
Func 7: _Z11mm_kernelNNIdddEvPKPT0_PKPKT_S8_iiiff
Func 8: _Z11mm_kernelTTIfffEvPKPT0_PKPKT_S8_iiiff$ hgobjdump -func=_Z11mm_kernelTTIdddEvPKPT0_PKPKT_S8_iiiff libacompute.so_ELF_File_1
libacompute.so_ELF_File_1: file format ELF64-alippu
Disassembly of section .text:
00000000000036c8 _Z11mm_kernelTTIdddEvPKPT0_PKPKT_S8_iiiff:
36c8: 00 00 00 00 20 00 04 50 v.mov.b32 vreg16, 0x0
36d0: 00 00 00 00 20 80 04 50 v.mov.b32 vreg18, 0x0
36d8: 00 00 00 00 20 40 04 50 v.mov.b32 vreg17, 0x0
36e0: 00 00 00 00 20 40 01 50 v.mov.b32 vreg5, 0x0
36e8: 00 00 00 00 00 00 f1 08 s.wait pipe_flush
36f0: 00 00 00 00 20 00 88 56 v.mov.alllane.b32 vreg32, 0x0
36f8: 00 00 00 00 20 40 08 50 v.mov.b32 vreg33, 0x0
3700: 00 00 00 00 00 0a 01 50 v.mov.b32 vreg4, sreg40
3708: 00 00 00 00 20 00 01 1a s.mov.b64 sreg[4:5], 0x0
3710: 00 00 00 00 00 c9 c0 57 v.tid.init vreg3, sreg[36:37]
3718: 01 00 00 00 20 44 c8 2f s.cmp.lt.i32 sreg0, sreg16, 0x1
3720: ff 0f 00 00 e1 80 00 9a v.and.b32 vreg2, vreg3.reuse, 0xfff
3728: 08 01 02 00 20 01 c0 c8 vmem.ld.b32x2 vreg[0:1], [0x0 + vreg4 * 0x8] @sreg[8:9]
3730: 40 06 00 06 e0 c0 80 67 v.bfe.b32 vreg3, vreg3, 0xc, 0xc
3738: 05 00 00 00 a0 89 80 28 s.shll.b32 sreg2, sreg38, 0x5
3740: 05 00 00 00 e0 49 80 28 s.shll.b32 sreg1, sreg39, 0x5
3748: 00 00 00 00 00 01 03 50 v.mov.b32 vreg12, sreg4
......$ hgobjdump -res-usage=_Z11mm_kernelTTIdddEvPKPT0_PKPKT_S8_iiiff libacompute.so_ELF_File_1
libacompute.so_ELF_File_1: file format ELF64-alippu
RESOURCE INFO:
SHADER API KERNEL CONTROL:
grid_dim_x_en:1
grid_dim_y_en:1
grid_dim_z_en:1
block_dim_en:1
block_idx_x_en:1
block_idx_y_en:1
block_idx_z_en:1
start_thread_idx_en:1
user_sreg_num:32
pri:0
fwd_progress:0
private_en:1
cu_disp_en:0
block_age_en:0
SHADER API KERNEL MODE:
fp_rndmode:0
i_rndmode:0
fp_denorm_flush:0
saturation:0
exception_en:0
relu:0
nan:0
vmem_ooo:0
saturation_fp64:0
trap_exception:0
debug_en:0
trap_en:0
perf_cnt_en:0
kp_modify_en:0
sw_defined_mode:0
SHADER API KERNEL RESOURCE:
vreg_number:34
sreg_number:48
shared_memory_size:130
treg_en:0
STACK SIZE:0
ARGUMENT:
ARG0 INDEX:hidden TYPE:uint64 KIND:hidden.gm.base sreg[0:1]
ARG1 INDEX:hidden TYPE:uint64 KIND:hidden.env.base sreg[2:3]
ARG2 INDEX:hidden TYPE:uint64 KIND:hidden.km.base sreg[4:5]
ARG3 INDEX:hidden TYPE:uint32 KIND:hidden.pm.size sreg6
ARG4 INDEX:hidden TYPE:uint32 KIND:hidden.tsm.size sreg7
ARG5 INDEX:0x0 TYPE:uint64 KIND:sreg.ptr sreg[8:9] ATTR:readonly
ARG6 INDEX:0x1 TYPE:uint64 KIND:sreg.ptr sreg[10:11] ATTR:readonly
ARG7 INDEX:0x2 TYPE:uint64 KIND:sreg.ptr sreg[12:13] ATTR:readonly
ARG8 INDEX:0x3 TYPE:uint32 KIND:sreg.value sreg14
ARG9 INDEX:0x4 TYPE:uint32 KIND:sreg.value sreg15
ARG10 INDEX:0x5 TYPE:uint32 KIND:sreg.value sreg16
ARG11 INDEX:0x6 TYPE:fp32 KIND:sreg.value sreg17
ARG12 INDEX:0x7 TYPE:fp32 KIND:sreg.value sreg18$ hgobjdump -symbols=0 test_cuda-math.math.float_math_op1_expf
test_cuda-math.math.float_math_op1_expf: file format ELF64-x86-64
ELF FILE 1:
SYMBOL TABLE:
00000000000001c0 l O .data 00000030 _ZN13heapallocatorL9heapAllocE.13
00000000000001a8 l O .data 00000018 _ZN13heapallocatorL8heapPropE.12
0000000000001d38 g F .text 00000020 __ppumath_fma_rtp_f32
00000000000000a0 l O .data 00000010 _ZN13heapallocatorL8hashPropE
0000000000001860 l F .text 00000248 __ppumathpriv_rcp_default_f32.5
0000000000001cf8 g F .text 00000020 __builtin_alippu_rcpf
00000000000010f0 l F .text 00000770 __ppumathpriv_drcp_rte_f64.4
0000000000001aa8 g F .text 00000250 __ppumath_rcp_default_ftz_f32
0000000000000ff8 l F .text 000000f8 __ppumathpriv_div_default_f32.3
0000000000001d58 g F .text 00000020 __ppumath_fma_rte_f32
00000000000002c8 l F .text 000007c8 __ppumathpriv_ddiv_rte_f64.2
0000000000000a90 g F .text 00000568 __ppumathpriv_div_rte_slow_path_f32
00000000000000fa g O .data 0000009e PIBITS_TBL
00000000000024a0 g F .text 00000020 __ppumath_fma_rtz_f32
0000000000001d18 g F .text 00000020 __ppumath_fma_rtn_f32
00000000000024c0 g F .text 000001f0 __ppumath_drcp_f64
00000000000000f8 gw O .data 00000001 _ZNSt17integral_constantIbLb0EE5valueE
0000000000001d78 g F .text 00000728 __ppumath_div_default_ftz_f32
0000000000000000 l O .data 0000009e PIBITS_TBL.10
00000000000000b0 l O .data 00000018 _ZN13heapallocatorL8heapPropE
00000000000000f9 gw O .data 00000001 _ZNSt17integral_constantIbLb1EE5valueE
00000000000000c8 l O .data 00000030 _ZN13heapallocatorL9heapAllocE
0000000000000198 l O .data 00000010 _ZN13heapallocatorL8hashPropE.11
00000000000001f0 g F .text 000000d8 _ZN12_GLOBAL__N_14MathEPfS0_