hgobjdump使用指南(v2.1)

更新时间:
复制为 MD 格式

1. 概述

PPU中编译生成的Binary(可执行文件或者动态链接库文件)中会同时包含运行于CPU上的Host代码和运行在PPU上的Device代码。相应的,编译产生的Binary也分为两层,PPU Binary是两层Binary的结合。PPU Binary使用通用的ELF格式,在格式上与64bit X86_64 ELF兼容,在Host Binary中新增加了一个.hggc_fatbin段用于保存Device Binary。单独的Device Binary也使用ELF格式,大部分的段使用上与标准ELF一致,同时有一些hggc_info section用于保存运行时会用到的一些信息,全局信息保存在.hg_info中,各kernel的信息保存在各自的.hg_info._name_段中,其中_name_后缀为kernelmangling name。

hgobjdump可以从PPU Binary(单独的Device Binary或者完整的PPU Binary)中提取Device代码的相关信息,并以可读的格式呈现。hgobjdump的输出包括每个function的汇编代码、symbol table、relocation以及hggc特殊的一些section。

2. 使用介绍

2.1 输入文件类型

  • PPU SDK编译生成的可执行文件或者动态链接库文件。

  • 单独一个Device Binary。

  • hggc fatbin文件(一般包含一个Device Binary和对应的IR文件)。

2.2 命令行参数

下表包含了支持的 hgobjdump 命令行选项,以及每个选项功能的描述。每个选项都有一个长名称和至少一个短名称,可以互换使用。

Option(long)

Option(short)

Desctiption

--list-elf

-lelf, -l

List all the ELF files and kernel functions available in the fatbin.

--list-all

-lall

List all device functions available in the fatbin. Implies list elf.

--list-bc

-lbc, -b

List all device functions available in the fatbin.

--extract-elf=

-xelf, -x

Extract ELF file(s) using file idx and save as file(s). Use '0' to extract all files. To get the list of ELF files use -lelf option.

--extract-bc=

-xbc

Extract llvm-bc file(s) using file idx and save as file(s). Use '0' to extract all bc files. To get the list of bc files use -lbc option.

--dump-elf

-elf, -e

Dump ELF Object function sections.

--dump-isa

-isa, -a

Dump assembly for a single hgbin file or all hgbin files embedded in the binary.

--dump-function=

-func, -f

Dump specific function(must use unmangle name).

--line-numbers

-line, -n

Display source line numbers with disassembly. Implies disassemble isa.

--dump-resource-usage=

-res-usage, -i

Dump specific function(must use unmangle name) resource usage. Use 'all' to dump all functions resource usage.

--dump-elf-symbols=

-symbols,
-s

Dump ELF files symbol names using ELF file idx. Use '0' to dump all files. To get the list of ELF files use -lelf option.

--demangle

-d

Demangle function names for --list-elf.

--help

-h

Print this help information on this tool.

2.3 用法示例

下面是一些使用option的示例:

$ hgobjdump -lelf hggc.out
hggc.out:	file format ELF64-x86-64

ELF FILE 1:
Func 1: _Z4MathPfS_
$ hgobjdump -lelf -demangle hggc.out
hggc.out:	file format ELF64-x86-64

ELF FILE 1:
Func 1: Math(float*, float*)
$ hgobjdump -isa hggc.out
hggc.out:	file format ELF64-x86-64

ELF FILE 1:

Disassembly of section .text:

0000000000000110 _Z4MathPfS_:
     110: 00 00 00 00 00 00 f1 08      	s.wait	pipe_flush
     118: 00 00 00 00 20 08 88 56      	v.mov.alllane.b32	vreg32, 0x0
     120: 00 00 00 00 60 48 08 50      	v.mov.b32	vreg33, 0x0
     128: ff 0f 00 00 20 09 00 26      	s.and.b32	sreg0, sreg36, 0xfff
     130: 00 00 00 00 00 09 c0 57      	v.tid.init	vreg0, sreg[36:37]
     138: 00 00 00 13 00 00 40 2a      	s.mull.i32	sreg0, sreg0, sreg38
     140: ff 0f 00 00 20 00 00 9a      	v.and.b32	vreg0, vreg0, 0xfff
     148: 9d 98 bb 3b a0 80 00 50      	v.mov.b32	vreg2, 0x3bbb989d
     150: 00 00 00 00 00 00 40 82      	v.add.i32	vreg0, sreg0, vreg0
     158: 00 00 7c 43 e0 c0 00 50      	v.mov.b32	vreg3, 0x437c0000
     160: 3b aa b8 3f 20 01 01 50      	v.mov.b32	vreg4, 0x3fb8aa3b
     168: 60 70 a5 32 60 41 01 50      	v.mov.b32	vreg5, 0x32a57060
     170: 48 01 01 00 24 40 80 c8      	vmem.ld.b32.sign	vreg1, [0x0 + vreg0 * 0x4] @sreg[10:11]
     178: 00 00 00 00 00 00 c4 08      	s.wait	vldcnt(0)
     180: 00 80 3e 81 40 80 80 62      	v.fma.f32.rtte	vreg2, vreg1, vreg2, c0x3f000000
     188: 00 00 00 00 a0 80 80 92      	v.max.f32	vreg2, vreg2, 0x0
     190: 00 00 80 3f a0 80 80 8a      	v.min.f32	vreg2, vreg2, 0x3f800000
     198: 00 40 bf 81 88 80 80 62      	v.fma.f32.rtn	vreg2, vreg2, vreg3, c0x4b400001
     1a0: 00 30 00 81 81 c0 80 8a      	v.min.f32	vreg3, !vreg2.reuse, !vreg2
     1a8: 17 00 00 00 a0 80 80 a2      	v.shll.b32	vreg2, vreg2, 0x17
     1b0: 7f 00 40 4b e0 c0 80 82      	v.add.f32	vreg3, vreg3, 0x4b40007f
     1b8: 00 c0 40 82 41 c0 80 62      	v.fma.f32.rtte	vreg3, vreg1.reuse, vreg4, vreg3
     1c0: 00 c0 c0 82 40 40 80 62      	v.fma.f32.rtte	vreg1, vreg1, vreg5, vreg3
     1c8: 00 00 00 80 40 40 c0 58      	v.exp2.f32	vreg1, vreg1
     1d0: 00 00 00 81 40 40 80 aa      	v.mul.f32	vreg1, vreg1, vreg2
     1d8: 08 01 01 00 24 40 80 cc      	vmem.st.b32.sign	vreg1, [0x0 + vreg0 * 0x4] @sreg[8:9]
     1e0: 00 00 00 00 00 00 40 08      	s.exit
     1e8: 00 00 00 00 00 00 00 08      	s.nop
$ hgobjdump -xelf=1 libacompute.so
libacompute.so:	file format ELF64-x86-64

ELF FILE 1:
Extract File: libacompute.so_ELF_File_1
$ hgobjdump -lelf libacompute.so_ELF_File_1
libacompute.so_ELF_File_1:	file format ELF64-alippu
Func 1: _Z11mm_kernelTTIdddEvPKPT0_PKPKT_S8_iiiff
Func 2: _Z11mm_kernelNTIdddEvPKPT0_PKPKT_S8_iiiff
Func 3: _Z11mm_kernelNNIfffEvPKPT0_PKPKT_S8_iiiff
Func 4: _Z11mm_kernelTNIfffEvPKPT0_PKPKT_S8_iiiT1_S9_
Func 5: _Z11mm_kernelNTIfffEvPKPT0_PKPKT_S8_iiiff
Func 6: _Z11mm_kernelTNIdddEvPKPT0_PKPKT_S8_iiiT1_S9_
Func 7: _Z11mm_kernelNNIdddEvPKPT0_PKPKT_S8_iiiff
Func 8: _Z11mm_kernelTTIfffEvPKPT0_PKPKT_S8_iiiff
$ hgobjdump -func=_Z11mm_kernelTTIdddEvPKPT0_PKPKT_S8_iiiff libacompute.so_ELF_File_1
libacompute.so_ELF_File_1:	file format ELF64-alippu

Disassembly of section .text:

00000000000036c8 _Z11mm_kernelTTIdddEvPKPT0_PKPKT_S8_iiiff:
    36c8: 00 00 00 00 20 00 04 50      	v.mov.b32	vreg16, 0x0
    36d0: 00 00 00 00 20 80 04 50      	v.mov.b32	vreg18, 0x0
    36d8: 00 00 00 00 20 40 04 50      	v.mov.b32	vreg17, 0x0
    36e0: 00 00 00 00 20 40 01 50      	v.mov.b32	vreg5, 0x0
    36e8: 00 00 00 00 00 00 f1 08      	s.wait	pipe_flush
    36f0: 00 00 00 00 20 00 88 56      	v.mov.alllane.b32	vreg32, 0x0
    36f8: 00 00 00 00 20 40 08 50      	v.mov.b32	vreg33, 0x0
    3700: 00 00 00 00 00 0a 01 50      	v.mov.b32	vreg4, sreg40
    3708: 00 00 00 00 20 00 01 1a      	s.mov.b64	sreg[4:5], 0x0
    3710: 00 00 00 00 00 c9 c0 57      	v.tid.init	vreg3, sreg[36:37]
    3718: 01 00 00 00 20 44 c8 2f      	s.cmp.lt.i32	sreg0, sreg16, 0x1
    3720: ff 0f 00 00 e1 80 00 9a      	v.and.b32	vreg2, vreg3.reuse, 0xfff
    3728: 08 01 02 00 20 01 c0 c8      	vmem.ld.b32x2	vreg[0:1], [0x0 + vreg4 * 0x8] @sreg[8:9]
    3730: 40 06 00 06 e0 c0 80 67      	v.bfe.b32	vreg3, vreg3, 0xc, 0xc
    3738: 05 00 00 00 a0 89 80 28      	s.shll.b32	sreg2, sreg38, 0x5
    3740: 05 00 00 00 e0 49 80 28      	s.shll.b32	sreg1, sreg39, 0x5
    3748: 00 00 00 00 00 01 03 50      	v.mov.b32	vreg12, sreg4
    ......
$ hgobjdump -res-usage=_Z11mm_kernelTTIdddEvPKPT0_PKPKT_S8_iiiff libacompute.so_ELF_File_1
libacompute.so_ELF_File_1:	file format ELF64-alippu

RESOURCE INFO:
SHADER API KERNEL CONTROL:
grid_dim_x_en:1
grid_dim_y_en:1
grid_dim_z_en:1
block_dim_en:1
block_idx_x_en:1
block_idx_y_en:1
block_idx_z_en:1
start_thread_idx_en:1
user_sreg_num:32
pri:0
fwd_progress:0
private_en:1
cu_disp_en:0
block_age_en:0

SHADER API KERNEL MODE:
fp_rndmode:0
i_rndmode:0
fp_denorm_flush:0
saturation:0
exception_en:0
relu:0
nan:0
vmem_ooo:0
saturation_fp64:0
trap_exception:0
debug_en:0
trap_en:0
perf_cnt_en:0
kp_modify_en:0
sw_defined_mode:0

SHADER API KERNEL RESOURCE:
vreg_number:34
sreg_number:48
shared_memory_size:130
treg_en:0

STACK SIZE:0

ARGUMENT:
ARG0 INDEX:hidden		TYPE:uint64		KIND:hidden.gm.base 	sreg[0:1]
ARG1 INDEX:hidden		TYPE:uint64		KIND:hidden.env.base	sreg[2:3]
ARG2 INDEX:hidden		TYPE:uint64		KIND:hidden.km.base 	sreg[4:5]
ARG3 INDEX:hidden		TYPE:uint32		KIND:hidden.pm.size 	sreg6
ARG4 INDEX:hidden		TYPE:uint32		KIND:hidden.tsm.size	sreg7
ARG5 INDEX:0x0   		TYPE:uint64		KIND:sreg.ptr       	sreg[8:9]		ATTR:readonly
ARG6 INDEX:0x1   		TYPE:uint64		KIND:sreg.ptr       	sreg[10:11]	ATTR:readonly
ARG7 INDEX:0x2   		TYPE:uint64		KIND:sreg.ptr       	sreg[12:13]	ATTR:readonly
ARG8 INDEX:0x3   		TYPE:uint32		KIND:sreg.value     	sreg14
ARG9 INDEX:0x4   	  TYPE:uint32		KIND:sreg.value     	sreg15
ARG10 INDEX:0x5   	TYPE:uint32		KIND:sreg.value     	sreg16
ARG11 INDEX:0x6   	TYPE:fp32			KIND:sreg.value     	sreg17
ARG12 INDEX:0x7   	TYPE:fp32			KIND:sreg.value     	sreg18
$ hgobjdump -symbols=0 test_cuda-math.math.float_math_op1_expf
test_cuda-math.math.float_math_op1_expf:        file format ELF64-x86-64
ELF FILE 1:
SYMBOL TABLE:
00000000000001c0 l     O .data  00000030 _ZN13heapallocatorL9heapAllocE.13
00000000000001a8 l     O .data  00000018 _ZN13heapallocatorL8heapPropE.12
0000000000001d38 g     F .text  00000020 __ppumath_fma_rtp_f32
00000000000000a0 l     O .data  00000010 _ZN13heapallocatorL8hashPropE
0000000000001860 l     F .text  00000248 __ppumathpriv_rcp_default_f32.5
0000000000001cf8 g     F .text  00000020 __builtin_alippu_rcpf
00000000000010f0 l     F .text  00000770 __ppumathpriv_drcp_rte_f64.4
0000000000001aa8 g     F .text  00000250 __ppumath_rcp_default_ftz_f32
0000000000000ff8 l     F .text  000000f8 __ppumathpriv_div_default_f32.3
0000000000001d58 g     F .text  00000020 __ppumath_fma_rte_f32
00000000000002c8 l     F .text  000007c8 __ppumathpriv_ddiv_rte_f64.2
0000000000000a90 g     F .text  00000568 __ppumathpriv_div_rte_slow_path_f32
00000000000000fa g     O .data  0000009e PIBITS_TBL
00000000000024a0 g     F .text  00000020 __ppumath_fma_rtz_f32
0000000000001d18 g     F .text  00000020 __ppumath_fma_rtn_f32
00000000000024c0 g     F .text  000001f0 __ppumath_drcp_f64
00000000000000f8 gw    O .data  00000001 _ZNSt17integral_constantIbLb0EE5valueE
0000000000001d78 g     F .text  00000728 __ppumath_div_default_ftz_f32
0000000000000000 l     O .data  0000009e PIBITS_TBL.10
00000000000000b0 l     O .data  00000018 _ZN13heapallocatorL8heapPropE
00000000000000f9 gw    O .data  00000001 _ZNSt17integral_constantIbLb1EE5valueE
00000000000000c8 l     O .data  00000030 _ZN13heapallocatorL9heapAllocE
0000000000000198 l     O .data  00000010 _ZN13heapallocatorL8hashPropE.11
00000000000001f0 g     F .text  000000d8 _ZN12_GLOBAL__N_14MathEPfS0_