E-MapReduce事件

本文介绍E-MapReduce通过操作审计接入作为事件源发布到事件总线EventBridge的事件类型。

事件类型

E-MapReduce支持发布到事件总线EventBridge的事件类型如下所示。

事件类型

type参数值

阿里云平台对资源执行的操作事件

emr:ActionTrail:AliyunServiceEvent

API调用

emr:ActionTrail:ApiCall

控制台的操作事件

emr:ActionTrail:ConsoleOperation

EcmAgent心跳消息过期

emr:CloudMonitor:Agent[EcmAgentHeartbeatExpired]

EcmAgent长时间断连

emr:CloudMonitor:Agent[Maintenance.EcmAgentTimeout]

工作流已成功

emr:CloudMonitor:EMR-110401002

工作流已提交

emr:CloudMonitor:EMR-110401003

作业已提交

emr:CloudMonitor:EMR-110401004

工作流节点已启动

emr:CloudMonitor:EMR-110401005

工作流节点状态已检查

emr:CloudMonitor:EMR-110401006

工作流节点已完成

emr:CloudMonitor:EMR-110401007

工作流节点已结束

emr:CloudMonitor:EMR-110401008

工作流节点已取消

emr:CloudMonitor:EMR-110401009

工作流已取消

emr:CloudMonitor:EMR-110401010

工作流已重跑

emr:CloudMonitor:EMR-110401011

工作流已恢复

emr:CloudMonitor:EMR-110401012

工作流已暂停

emr:CloudMonitor:EMR-110401013

工作流已结束

emr:CloudMonitor:EMR-110401014

工作流节点已失败

emr:CloudMonitor:EMR-110401015

作业已失败

emr:CloudMonitor:EMR-110401016

工作流已失败

emr:CloudMonitor:EMR-210401001

工作流节点启动超时

emr:CloudMonitor:EMR-210401003

作业启动超时

emr:CloudMonitor:EMR-210401004

AIRFLOW Scheduler组件状态巡检失败

emr:CloudMonitor:Maintenance[AIRFLOW.Scheduler.StatusCheck.Fail]

AIRFLOW Web Server组件状态巡检失败

emr:CloudMonitor:Maintenance[AIRFLOW.WebServer.Check.Fail]

AIRFLOW Web Server组件服务状态巡检失败

emr:CloudMonitor:Maintenance[AIRFLOW.WebServer.StatusCheck.Fail]

APACHEDS状态巡检失败

emr:CloudMonitor:Maintenance[APACHEDS.StatusCheck.Fail]

ClickHouse服务状态巡检失败

emr:CloudMonitor:Maintenance[CLICKHOUSE.ServerStatusCheck.Fail]

DRUID Broker组件GC巡检失败

emr:CloudMonitor:Maintenance[DRUID.Broker.GcCheck.Fail]

DRUID Broker组件状态巡检失败

emr:CloudMonitor:Maintenance[DRUID.Broker.StatusCheck.Fail]

DRUID Coordinator组件GC巡检失败

emr:CloudMonitor:Maintenance[DRUID.Coordinator.GcCheck.Fail]

DRUID Coordinator组件状态巡检失败

emr:CloudMonitor:Maintenance[DRUID.Coordinator.StatusCheck.Fail]

DRUID Historical组件GC巡检失败

emr:CloudMonitor:Maintenance[DRUID.Historical.GcCheck.Fail]

DRUID Historical组件状态巡检失败

emr:CloudMonitor:Maintenance[DRUID.Historical.StatusCheck.Fail]

DRUID MiddleManager组件GC巡检失败

emr:CloudMonitor:Maintenance[DRUID.MiddleManager.GcCheck.Fail]

DRUID MiddleManager组件状态巡检失败

emr:CloudMonitor:Maintenance[DRUID.MiddleManager.StatusCheck.Fail]

DRUID Overlord组件GC巡检失败

emr:CloudMonitor:Maintenance[DRUID.Overlord.GcCheck.Fail]

DRUID Overlord组件状态巡检失败

emr:CloudMonitor:Maintenance[DRUID.Overlord.StatusCheck.Fail]

DRUID Router组件GC巡检失败

emr:CloudMonitor:Maintenance[DRUID.Router.GcCheck.Fail]

DRUID Router组件状态巡检失败

emr:CloudMonitor:Maintenance[DRUID.Router.StatusCheck.Fail]

Flink History Server组件GC巡检失败

emr:CloudMonitor:Maintenance[FLINK.HistoryServer.GcCheckP0.Fail]

Flink History Server 组件状态巡检失败

emr:CloudMonitor:Maintenance[FLINK.HistoryServer.StatusCheck.Fail]

Flink VVP Server组件状态巡检失败

emr:CloudMonitor:Maintenance[FLINK.VVP.StatusCheck.Fail]

HAS Admin组件状态巡检失败

emr:CloudMonitor:Maintenance[HAS.Admin.StatusCheck.Fail]

HAS服务状态巡检失败

emr:CloudMonitor:Maintenance[HAS.Server.StatusCheck.Fail]

HBASE集群可用性巡检失败

emr:CloudMonitor:Maintenance[HBASE.AvailabilityStatusCheck.Fail]

HBASE.HMaster的IPC端口不可用

emr:CloudMonitor:Maintenance[HBASE.HMaster.IpcPortUnAvailable]

HBase HMaster组件状态巡检失败

emr:CloudMonitor:Maintenance[HBASE.HMaster.StatusCheck.Fail]

HBASE.HRegionServer的IpcPort不可用

emr:CloudMonitor:Maintenance[HBASE.HRegionServer.IpcPortUnAvailable]

HBASE RegionServer组件GC巡检失败

emr:CloudMonitor:Maintenance[HBASE.RegionServer.GcCheckP0.Fail]

HBase RegionServer组件状态巡检失败

emr:CloudMonitor:Maintenance[HBASE.RegionServer.StatusCheck.Fail]

HBASE ThriftServer组件GC巡检失败

emr:CloudMonitor:Maintenance[HBASE.ThriftServer.GcCheckP0.Fail]

HBASE.ThriftServer的服务端口不可用

emr:CloudMonitor:Maintenance[HBASE.ThriftServer.ServicePortUnAvailable]

HBASE ThriftServer组件状态巡检失败

emr:CloudMonitor:Maintenance[HBASE.ThriftServer.StatusCheck.Fail]

HDFS可用性巡检失败

emr:CloudMonitor:Maintenance[HDFS.AvailabilityStatusCheck.Fail]

DataNode的数据传输端口不可用

emr:CloudMonitor:Maintenance[HDFS.DataNode.DataTransferPortUnAvailable]

HDFS存在Dead的DataNode

emr:CloudMonitor:Maintenance[HDFS.DataNode.DeadDataNodesExist]

DataNode的SecureMain发生异常

emr:CloudMonitor:Maintenance[HDFS.DataNode.ExceptionInSecureMain]

DataNode进程异常退出

emr:CloudMonitor:Maintenance[HDFS.DataNode.ExitUnexpected]

DataNode有坏盘

emr:CloudMonitor:Maintenance[HDFS.DataNode.FailueVolumes]

DataNode的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[HDFS.DataNode.GcCheckP0.Fail]

DataNode的IPC端口不可用

emr:CloudMonitor:Maintenance[HDFS.DataNode.IpcPortUnAvailable]

DataNode OOM不能创建新的Native线程

emr:CloudMonitor:Maintenance[HDFS.DataNode.OOM.UnableToCreateNewNativeThread]

JavaHeapSpace引起OOM错误

emr:CloudMonitor:Maintenance[HDFS.DataNode.OomForJavaHeapSpace]

DataNode状态巡检失败

emr:CloudMonitor:Maintenance[HDFS.DataNode.StatusCheck.Fail]

HDFS存在过多死亡的DataNodes

emr:CloudMonitor:Maintenance[HDFS.DataNode.TooManyDataNodeDead]

HDFS存在坏盘

emr:CloudMonitor:Maintenance[HDFS.DataNode.VolumeFailuresExist]

HDFS的HA状态巡检失败

emr:CloudMonitor:Maintenance[HDFS.HaStateCheck.Fail]

JournalNode的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[HDFS.JournalNode.GcCheckP0.Fail]

JournalNode的RPC端口不可用

emr:CloudMonitor:Maintenance[HDFS.JournalNode.RpcPortUnAvailable]

JournalNode状态巡检失败

emr:CloudMonitor:Maintenance[HDFS.JournalNode.StatusCheck.Fail]

NameNode发生主备切换

emr:CloudMonitor:Maintenance[HDFS.NameNode.ActiveStandbySwitch]

NameNode块容量趋于耗尽

emr:CloudMonitor:Maintenance[HDFS.NameNode.BlockCapacityNearUsedUp]

两个NameNode节点都处于Active状态

emr:CloudMonitor:Maintenance[HDFS.NameNode.BothActive]

NameNode节点均处于Standy状态

emr:CloudMonitor:Maintenance[HDFS.NameNode.BothStandy]

HDFS存在坏块

emr:CloudMonitor:Maintenance[HDFS.NameNode.CorruptBlocksOccured]

HDFS发生目录格式化

emr:CloudMonitor:Maintenance[HDFS.NameNode.DirectoryFormatted]

NameNode异常退出

emr:CloudMonitor:Maintenance[HDFS.NameNode.ExitUnexpectely]

NameNode的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[HDFS.NameNode.GcCheckP0.Fail]

NameNode的GC巡检失败(P1)

emr:CloudMonitor:Maintenance[HDFS.NameNode.GcCheckP1.Fail]

NameNode处于安全模式过长

emr:CloudMonitor:Maintenance[HDFS.NameNode.InSafeMode]

NameNode的IPC端口不可用

emr:CloudMonitor:Maintenance[HDFS.NameNode.IpcPortUnAvailable]

NameNode加载FsImage异常

emr:CloudMonitor:Maintenance[HDFS.NameNode.LoadFsImageException]

磁盘空间不足导致NameNode处于安全模式

emr:CloudMonitor:Maintenance[HDFS.NameNode.LowAvailableDiskSpaceAndInSafeMode]

HDFS有数据块丢失

emr:CloudMonitor:Maintenance[HDFS.NameNode.MissingBlock]

NameNode发生OOM

emr:CloudMonitor:Maintenance[HDFS.NameNode.OOM]

NameNode资源不足

emr:CloudMonitor:Maintenance[HDFS.NameNode.ResourceLow]

NameNode的RPC请求队列过长

emr:CloudMonitor:Maintenance[HDFS.NameNode.RpcPortCallQueueLengthTooLong]

NameNode状态巡检失败

emr:CloudMonitor:Maintenance[HDFS.NameNode.StatusCheck.Fail]

NameNode同步日志失败

emr:CloudMonitor:Maintenance[HDFS.NameNode.SyncJournalFailed]

HDFS的块空间使用过多

emr:CloudMonitor:Maintenance[HDFS.NameNode.TooMuchBlockCapacityUsed]

DataNode空间使用过多

emr:CloudMonitor:Maintenance[HDFS.NameNode.TooMuchDataNodeCapacityUsed]

HDFS存储空间使用过多

emr:CloudMonitor:Maintenance[HDFS.NameNode.TooMuchDfsCapacityUsed]

文件和块数过多导致多亮的堆内存消耗

emr:CloudMonitor:Maintenance[HDFS.NameNode.TooMuchHeapUsedByTooManyFilesAndBlocks]

HDFS写JournalNode超时

emr:CloudMonitor:Maintenance[HDFS.NameNode.WriteToJournalNodeTimeout]

ZKFC触发NameNode主备切换

emr:CloudMonitor:Maintenance[HDFS.ZKFC.ActiveStandbySwitchOccured]

HDFS.ZKFC端口不可用

emr:CloudMonitor:Maintenance[HDFS.ZKFC.PortUnAvailable]

ZKFC状态巡检失败

emr:CloudMonitor:Maintenance[HDFS.ZKFC.StatusCheck.Fail]

ZKFC监控NameNode健康状态时发生传输层异常事件

emr:CloudMonitor:Maintenance[HDFS.ZKFC.TransportLevelExceptionInMonitorHealth]

ZKFC不能连接ZookeeperQuorum

emr:CloudMonitor:Maintenance[HDFS.ZKFC.UnableToConnectToQuorum]

ZKFC不能启动

emr:CloudMonitor:Maintenance[HDFS.ZKFC.UnableToStartZKFC]

HIVE的可用性状态巡检失败

emr:CloudMonitor:Maintenance[HIVE.AvailabilityStatusCheck.Fail]

HiveMetaStore数据库通信链路失败

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.DataBaseCommunicationLinkFailure]

HiveMetaStore数据库连接失败

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.DataBaseConnectionFailed]

HiveMetastore数据库磁盘空间用尽

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.DataBaseDiskQuotaUsedup]

HIVE.HiveMetaStore.hiveServer2Port不可用

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.hiveServer2PortUnAvailable]

HiveMetaStore发生JDBC通信异常

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.JdbcCommunicationException]

HiveMetastore超过最大查询数

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.MaxQuestionsExceeded]

HiveMetastore超过最大更新数

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.MaxUpdatesExceeded]

HiveMetastore超过最大用户连接数

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.MaxUserConnectionExceeded]

HiveMetaStore发生OOM

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.OomOccured]

HiveMetastore配置文件解析错误

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.ParseConfError]

HIVE.HiveMetaStore的端口不可用

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.PortUnAvailable]

HiveMetastore请求的表丢失

emr:CloudMonitor:Maintenance[HIVE.HiveMetaStore.RequiredTableMissing]

HiveServer的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[HIVE.HiveServer.GcCheckP0.Fail]

HiveServer的GC巡检失败(P1)

emr:CloudMonitor:Maintenance[HIVE.HiveServer.GcCheckP1.Fail]

HiveServer状态巡检失败

emr:CloudMonitor:Maintenance[HIVE.HiveServer.StatusCheck.Fail]

不能通过提供的URIs连接到hiveServer2

emr:CloudMonitor:Maintenance[HIVE.HiveServer2.CannotConnectByAnyURIsProvided]

hiveServer2连接ZK超时

emr:CloudMonitor:Maintenance[HIVE.HiveServer2.ConnectToZkTimeout]

hiveServer2配置解析错误

emr:CloudMonitor:Maintenance[HIVE.HiveServer2.ErrorParseConf]

hiveServer2启动错误

emr:CloudMonitor:Maintenance[HIVE.HiveServer2.ErrorStartingHiveServer]

hiveServer2初始化MetaStore客户端失败

emr:CloudMonitor:Maintenance[HIVE.HiveServer2.FailedInitMetaStoreClient]

hiveServer2连接MetaStoreServer失败

emr:CloudMonitor:Maintenance[HIVE.HiveServer2.FailedToConnectToMetaStoreServer]

hiveServer2发生OOM

emr:CloudMonitor:Maintenance[HIVE.HiveServer2.HiveServer2OOM]

MetaStore的延时巡检失败(P0)

emr:CloudMonitor:Maintenance[HIVE.MetaStore.DelayCheckP0.Fail]

MetaStore的延时巡检失败(P1)

emr:CloudMonitor:Maintenance[HIVE.MetaStore.DelayCheckP1.Fail]

MetaStore的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[HIVE.MetaStore.GcCheckP0.Fail]

MetaStore的GC巡检失败(P1)

emr:CloudMonitor:Maintenance[HIVE.MetaStore.GcCheckP1.Fail]

MetaStore的状态巡检失败

emr:CloudMonitor:Maintenance[HIVE.MetaStore.StatusCheck.Fail]

主机CPU卡顿

emr:CloudMonitor:Maintenance[HOST.CpuStuck]

内存使用量过高

emr:CloudMonitor:Maintenance[HOST.HighMemoryUsage]

主机可用的绝对内存剩余空间过小

emr:CloudMonitor:Maintenance[HOST.LowAbsoluteFreeMemory]

/mnt/disk1可用空间过低

emr:CloudMonitor:Maintenance[HOST.LowDiskForMntDisk1]

根文件系统所在盘可用空间过低

emr:CloudMonitor:Maintenance[HOST.LowRootfsDisk]

主机/var/log/message有OOM异常

emr:CloudMonitor:Maintenance[HOST.OomFoundInVarLogMessage]

主节点的进程数过多

emr:CloudMonitor:Maintenance[HOST.TooManyProcessesOnMasterHost]

主机关闭

emr:CloudMonitor:Maintenance[HOST.VmHostShutDown]

主机启动

emr:CloudMonitor:Maintenance[HOST.VmHostStartUp]

Oozie的管理端口不可用

emr:CloudMonitor:Maintenance[HUE.OozieAdminPortUnAvailable]

HUE的服务端口不可用

emr:CloudMonitor:Maintenance[HUE.PortUnAvailable]

HUE runcherrypyserver组件状态巡检失败

emr:CloudMonitor:Maintenance[HUE.RunCherryPyServer.StatusCheck.Fail]

HUE服务状态巡检失败

emr:CloudMonitor:Maintenance[HUE.StatusCheck.Fail]

IMPALA可用性巡检失败

emr:CloudMonitor:Maintenance[IMPALA.AvailableCheck.Fail]

IMPALA Catalogd组件可用性巡检失败

emr:CloudMonitor:Maintenance[IMPALA.Catalogd.AvailableCheck.Fail]

IMPALA Impalad组件可用性巡检失败

emr:CloudMonitor:Maintenance[IMPALA.Impalad.AvailableCheck.Fail]

IMPALA StateStored组件可用性巡检失败

emr:CloudMonitor:Maintenance[IMPALA.StateStored.AvailableCheck.Fail]

JINDOFS的ManagerService组件状态巡检失败

emr:CloudMonitor:Maintenance[JINDOFS.JindoFsManagerService.StatusCheck.Fail]

JINDOFS的NamespaceService组件巡检失败

emr:CloudMonitor:Maintenance[JINDOFS.JindoFsNamespaceStatusCheck.Fail]

JINDOFS的StorageService组件巡检失败

emr:CloudMonitor:Maintenance[JINDOFS.JindoFsStorageServiceStatusCheck.Fail]

JINDOFS服务巡检失败

emr:CloudMonitor:Maintenance[JINDOFS.StatusCheck.Fail]

KafkaBroker的可用性巡检失败

emr:CloudMonitor:Maintenance[KAFKA.Broker.AvailableCheck.Fail]

KafkaBroker的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[KAFKA.Broker.GcCheckP0.Fail]

KafkaBroker的GC巡检失败(P1)

emr:CloudMonitor:Maintenance[KAFKA.Broker.GcCheckP1.Fail]

KafkaBroker的状态巡检失败

emr:CloudMonitor:Maintenance[KAFKA.Broker.StateCheck.Fail]

KafkaManager巡检失败

emr:CloudMonitor:Maintenance[KAFKA.KafkaManager.Check.Fail]

KafkaMetaDataMonitor巡检失败

emr:CloudMonitor:Maintenance[KAFKA.KafkaMetadataMonitor.Check.Fail]

KafkaRestProxy巡检失败

emr:CloudMonitor:Maintenance[KAFKA.RestProxy.Check.Fail]

KafkaSchemaRegistry巡检失败

emr:CloudMonitor:Maintenance[KAFKA.SchemaRegistry.Check.Fail]

KNOX GC巡检失败

emr:CloudMonitor:Maintenance[KNOX.GcCheckP0.Fail]

KNOX状态巡检失败

emr:CloudMonitor:Maintenance[KNOX.StatusCheck.Fail]

KUDU健康状态巡检失败

emr:CloudMonitor:Maintenance[KUDU.HealthyCheck.Fail]

KUDU master组件状态巡检失败

emr:CloudMonitor:Maintenance[KUDU.MasterStatusCheck.Fail]

KUDU tserver组件状态巡检失败

emr:CloudMonitor:Maintenance[KUDU.TServerStatusCheck.Fail]

LIVY GC巡检失败

emr:CloudMonitor:Maintenance[LIVY.GcCheckP0.Fail]

LIVY状态巡检失败

emr:CloudMonitor:Maintenance[LIVY.StatusCheck.Fail]

Oozie GC巡检失败

emr:CloudMonitor:Maintenance[OOZIE.GcCheckP0.Fail]

Oozie状态巡检失败

emr:CloudMonitor:Maintenance[OOZIE.StatusCheck.Fail]

OPENLDAP状态巡检失败

emr:CloudMonitor:Maintenance[OPENLDAP.StatusCheck.Fail]

PRESTO服务可用性巡检失败

emr:CloudMonitor:Maintenance[PRESTO.AvailabilityStatusCheck.Fail]

PRESTO Coordinator组件GC巡检失败

emr:CloudMonitor:Maintenance[PRESTO.Coordinator.GcCheckP0.Fail]

PRESTO Coordinator组件状态巡检失败

emr:CloudMonitor:Maintenance[PRESTO.Coordinator.StatusCheck.Fail]

PRESTO Worker组件GC巡检失败

emr:CloudMonitor:Maintenance[PRESTO.Worker.GcCheckP0.Fail]

PRESTO Worker组件状态巡检失败

emr:CloudMonitor:Maintenance[PRESTO.Worker.StatusCheck.Fail]

RANGER Admin组件GC巡检失败

emr:CloudMonitor:Maintenance[RANGER.ADMIN.GcCheck.Fail]

RANGER Admin组件状态巡检失败

emr:CloudMonitor:Maintenance[RANGER.ADMIN.StatusCheck.Fail]

RANGER Solr组件状态巡检失败

emr:CloudMonitor:Maintenance[RANGER.Solr.StatusCheck.Fail]

RANGER UserSync组件状态巡检失败

emr:CloudMonitor:Maintenance[RANGER.UserSync.StatusCheck.Fail]

Spark History组件GC巡检失败

emr:CloudMonitor:Maintenance[SPARK.HistoryServer.GcCheckP0.Fail]

Spark History组件状态巡检失败

emr:CloudMonitor:Maintenance[SPARK.HistoryServer.StatusCheck.Fail]

SparkHistory发生OOM

emr:CloudMonitor:Maintenance[SPARK.SparkHistory.OomOccured]

Spark Thrift Server组件状态巡检失败

emr:CloudMonitor:Maintenance[SPARK.ThriftServer.StatusCheck.Fail]

STORM.Nimbus.ThriftPort不可用

emr:CloudMonitor:Maintenance[STORM.Nimbus.ThriftPortUnAvailable]

SUPERSET状态巡检失败

emr:CloudMonitor:Maintenance[SUPERSET.StatusCheck.Fail]

TEZ Tomcat组件GC巡检失败

emr:CloudMonitor:Maintenance[TEZ.Tomcat.GcCheckP0.Fail]

TEZ Tomcat组件状态巡检失败

emr:CloudMonitor:Maintenance[TEZ.Tomcat.StatusCheck.Fail]

AppTimeLine的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[YARN.AppTimeLine.GcCheckP0.Fail]

AppTimeLine的状态巡检失败

emr:CloudMonitor:Maintenance[YARN.AppTimeLine.StatusCheck.Fail]

YARN的HA状态巡检失败

emr:CloudMonitor:Maintenance[YARN.HaStateCheck.Fail]

JobHistory服务异常退出

emr:CloudMonitor:Maintenance[YARN.JobHistory.ExitUnExpectedly]

JobHistory的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[YARN.JobHistory.GcCheckP0.Fail]

JobHistory的服务端口不可用

emr:CloudMonitor:Maintenance[YARN.JobHistory.PortUnAvailable]

JobHistory服务启动错误

emr:CloudMonitor:Maintenance[YARN.JobHistory.StartingError]

JobHistory的状态巡检失败

emr:CloudMonitor:Maintenance[YARN.JobHistory.StatusCheck.Fail]

检测到死亡的NodeManager节点

emr:CloudMonitor:Maintenance[YARN.NodeManager.DeadNodeDetected]

NodeManager启动RebootingNodeStatusUpdater失败

emr:CloudMonitor:Maintenance[YARN.NodeManager.ErrorRebootingNodeStatusUpdater]

NodeManager的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[YARN.NodeManager.GcCheckP0.Fail]

存在丢失的NodeManager节点

emr:CloudMonitor:Maintenance[YARN.NodeManager.LostNodesExist]

NodeManager发生OOM

emr:CloudMonitor:Maintenance[YARN.NodeManager.OOM]

NodeManager启动错误

emr:CloudMonitor:Maintenance[YARN.NodeManager.StartingError]

NodeManager的状态巡检失败

emr:CloudMonitor:Maintenance[YARN.NodeManager.StatusCheck.Fail]

磁盘错误导致不健康的NodeManager

emr:CloudMonitor:Maintenance[YARN.NodeManager.UnHealthyForDiskFailed]

YARN存在不健康的节点

emr:CloudMonitor:Maintenance[YARN.NodeManager.UnHealthyNodesExist]

ResourceManager发生主备切换

emr:CloudMonitor:Maintenance[YARN.ResourceManager.ActiveStandbySwitch]

ResourceManager两个节点都处于Active状态

emr:CloudMonitor:Maintenance[YARN.ResourceManager.BothInActive]

ResourceManager两个节点都处于standby状态

emr:CloudMonitor:Maintenance[YARN.ResourceManager.BothInStandby]

ResourceManager不能切换到Active状态

emr:CloudMonitor:Maintenance[YARN.ResourceManager.CouldNotTransitionToActive]

ResourceManager启动错误

emr:CloudMonitor:Maintenance[YARN.ResourceManager.ErrorInStarting]

ResourceManager切换到Active模式发生错误

emr:CloudMonitor:Maintenance[YARN.ResourceManager.ErrorInTransitionToActiveMode]

ResourceManager异常退出

emr:CloudMonitor:Maintenance[YARN.ResourceManager.ExitUnexpected]

ResourceManager的GC巡检失败(P0)

emr:CloudMonitor:Maintenance[YARN.ResourceManager.GcCheckP0.Fail]

ResourceManager的GC巡检失败(P1)

emr:CloudMonitor:Maintenance[YARN.ResourceManager.GcCheckP1.Fail]

ResourceManager无效配置问题:不能找到RM_HA_ID

emr:CloudMonitor:Maintenance[YARN.ResourceManager.InvalidConf.CannotFoundRMHAID]

ResourceManager发生OOM

emr:CloudMonitor:Maintenance[YARN.ResourceManager.OOM]

YARN.ResourceManager服务端口不可用

emr:CloudMonitor:Maintenance[YARN.ResourceManager.PortUnAvailable]

ResourceManager的重启状态巡检失败

emr:CloudMonitor:Maintenance[YARN.ResourceManager.RestartCheck.Fail]

ResourceManager状态巡检失败

emr:CloudMonitor:Maintenance[YARN.ResourceManager.StatusCheck.Fail]

ResourceManager发生UnkownHostException异常

emr:CloudMonitor:Maintenance[YARN.ResourceManager.UnkownHostException]

YARN服务中ZKRMStateStore不能连接ZK

emr:CloudMonitor:Maintenance[YARN.ResourceManager.ZKRMStateStoreCannotConnectZK]

YARN服务状态巡检失败

emr:CloudMonitor:Maintenance[YARN.StatusCheck.Fail]

TimelineServer启动错误

emr:CloudMonitor:Maintenance[YARN.TimelineServer.ErrorInStarting]

TimelineServer异常退出

emr:CloudMonitor:Maintenance[YARN.TimelineServer.ExistUnexpectedly]

YARN.TimelineServer端口不可用

emr:CloudMonitor:Maintenance[YARN.TimelineServer.PortUnAvailable]

WebAppProxy的状态巡检失败

emr:CloudMonitor:Maintenance[YARN.WebAppProxy.StatusCheck.Fail]

YARN.WebAppProxyServer服务端口不可用

emr:CloudMonitor:Maintenance[YARN.WebAppProxyServer.PortUnAvailable]

ZEPPELIN服务状态巡检失败

emr:CloudMonitor:Maintenance[ZEPPELIN.Server.StatusCheck.Fail]

ZEPPELIN组件状态巡检失败

emr:CloudMonitor:Maintenance[ZEPPELIN.ServerCheck.Fail]

Zookeeper的ClientPort不可用

emr:CloudMonitor:Maintenance[ZOOKEEPER.ClientPortUnAvailable]

ZOOKEEPER集群状态巡检失败

emr:CloudMonitor:Maintenance[ZOOKEEPER.ClusterStatusCheck.Fail]

ZOOKEEPER GC巡检失败

emr:CloudMonitor:Maintenance[ZOOKEEPER.GcCheckP0.Fail]

Zookeeper发生主从切换

emr:CloudMonitor:Maintenance[ZOOKEEPER.LeaderFollowerSwitch]

Zookeeper的LeaderPort不可用

emr:CloudMonitor:Maintenance[ZOOKEEPER.LeaderPortUnAvailable]

ZOOKEEPER的peer端口不可用

emr:CloudMonitor:Maintenance[ZOOKEEPER.PeerPortUnAvailable]

ZOOKEEPER进程状态巡检失败

emr:CloudMonitor:Maintenance[ZOOKEEPER.StatusCheck.Fail]

ZOOKEEPER不能运行QuorumServer

emr:CloudMonitor:Maintenance[ZOOKEEPER.UnableToRunQuorumServer]

伸缩活动失败

emr:CloudMonitor:Scaling[ScalingActivity:Failed]

弹性伸缩被拒绝

emr:CloudMonitor:Scaling[ScalingActivity:Rejected]

伸缩活动超时

emr:CloudMonitor:Scaling[ScalingActivity:Timeout]

服务组件状态

emr:CloudMonitor:StatusCheck

CloudEvents规范中定义的参数解释,请参见事件概述