本文介绍ACS集群中提供的GPU-HPN节点级别的Prometheus指标。
指标说明
指标 | 指标描述 | 标签 | 样例 |
node_cpu_seconds_total | 节点CPU使用时间总计 |
| node_cpu_seconds_total{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx",mode="user"} 135268.20999999988 |
node_boot_time_seconds | 购买GPU-HPN节点预留的时间点,当节点发生故障自愈时,该指标会更新为最近一次自愈完成的时间点。 | 无 | node_boot_time_seconds 1.735635132e+09 |
node_memory_MemAvailable_bytes | 节点可用内存大小(字节) |
| node_memory_MemAvailable_bytes{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 1.070595100672e+12 |
node_memory_MemFree_bytes | 节点空闲内存大小(字节) |
| node_memory_MemFree_bytes{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 1.069967446016e+12 |
node_memory_MemTotal_bytes | 节点总内存大小(字节) |
| node_memory_MemTotal_bytes{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 1.9327352832e+12 |
node_disk_read_bytes_total | 节点磁盘读取字节总计 |
| node_disk_read_bytes_total{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 1.36580096e+08 |
node_disk_reads_completed_total | 节点磁盘读取完成总数 |
| node_disk_reads_completed_total{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 2530 |
node_disk_writes_completed_total | 节点磁盘写入完成总数 |
| node_disk_writes_completed_total{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 85965 |
node_disk_written_bytes_total | 节点磁盘写入字节总数 |
| node_disk_written_bytes_total{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 7.331622912e+09 |
node_network_receive_bytes_total | 节点累计接收字节总数 |
| node_network_receive_bytes_total{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 4.5447566e+07 |
node_network_transmit_bytes_total | 节点累计发送字节总数 |
| node_network_transmit_bytes_total{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 8.6421368e+07 |
DCGM_FI_DEV_COUNT | 设备数量 |
| DCGM_FI_DEV_COUNT{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 8 |
DCGM_FI_DEV_FB_TOTAL | 表示总帧缓冲区(以MB为单位)。 |
| DCGM_FI_DEV_FB_TOTAL{NodeName="cn-wulanchabu-c.cr-xxx",instance="cn-wulanchabu-c.cr-xxx"} 1.56672e+06 |
DCGM_FI_DEV_FB_USED | 表示已用帧缓冲区大小(以MB为单位)。 |
| DCGM_FI_DEV_FB_USED{NodeName="cn-wulanchabu-c.cr-xxx",UUID="GPU-hashID",instance="cn-wulanchabu-c.cr-xx",modelName="mode-name-demo"} 9672 |
DCGM_FI_DEV_GPU_UTIL | GPU利用率(以百分比表示)。 |
| DCGM_FI_DEV_GPU_UTIL{NodeName="cn-wulanchabu-c.cr-xxx",UUID="GPU-hashID",instance="cn-wulanchabu-c.cr-xx",modelName="mode-name-demo"} 56 |