Event_code Service_name Component_name Level Description
EMR-330200052 HDFS DataNode CRITICAL datanode volume failure >=5
EMR-330300042 YARN NodeManager CRITICAL unhealthy nodemanager num > 0
EMR-330200051 HDFS NameNode CRITICAL corrupt blocks occurred.
EMR-330200050 HDFS NameNode CRITICAL missing blocks occurred.
EMR-330100027 HOST HOST CRITICAL low absolute free memory(<200M for 30 minutes).
EMR-230100026 HOST HOST WARN low absolute free memory(<500M).
EMR-230300038 YARN ResourceManager WARN resourcemanager port(8032) unavailable(can not access in 5 seconds).
EMR-230300040 YARN NodeManager WARN nodemanager http port(8042) unavailable(can not access in 5 seconds).
EMR-330500059 ZOOKEEPER ZOOKEEPER CRITICAL ZK_SERVER_STATE changed(leader/follower switch).
EMR-230500058 ZOOKEEPER ZOOKEEPER WARN high ZK_OPEN_FILE_DESCRIPTOR_COUNT(ZK_OPEN_FILE_DESCRIPTOR_COUNT/ZK_MAX_FILE_DESCRIPTOR_COUNT>80%).
EMR-230500057 ZOOKEEPER ZOOKEEPER WARN high ZK_OUTSTANDING_REQUESTS(>5).
EMR-230500056 ZOOKEEPER ZOOKEEPER WARN high ZK_MAX_LATENCY value(>10000ms).
EMR-230500055 ZOOKEEPER ZOOKEEPER WARN high ZK_AVG_LATENCY value(>20ms).
EMR-230500054 ZOOKEEPER ZOOKEEPER WARN too many alive zk connections(>1000).
EMR-330300053 YARN ResourceManager CRITICAL resourcemanager ACTIVE/STANDBY switch occurred.
EMR-330300052 YARN ResourceManager CRITICAL two resourcemanagers are both in STANDBY state.
EMR-330300051 YARN ResourceManager CRITICAL two resourcemanagers are both in ACTIVE state.
EMR-230300050 YARN NodeManager WARN there are lost nodemanagers.
EMR-330200049 HDFS NameNode CRITICAL namenode ACTIVE/STANDBY switch occurred.
EMR-230300036 YARN ResourceManager WARN resourcemanager webapp port(8088) unavailable(can not access in 5 seconds).
EMR-330200048 HDFS NameNode CRITICAL namenode in SAFEMODE for more than 30 minutes.
EMR-230200047 HDFS NameNode WARN namenode in SAFEMODE for more than 10 minutes.
EMR-330200046 HDFS NameNode CRITICAL two namenodes are both in STANDBY state.
EMR-330200045 HDFS NameNode CRITICAL two namenodes are both in ACTIVE state.
EMR-230200044 HDFS DateNode WARN there are volume failures on datanode.
EMR-330200043 HDFS DataNode CRITICAL more than 1/3 datanodes are dead.
EMR-230200042 HDFS DataNode WARN there are some dead datanodes.
EMR-230200041 HDFS NameNode WARN datanode dfs capacity almost used up.(>95%)
EMR-330200040 HDFS NameNode CRITICAL cluster dfs capacity almost used up.(>95%)
EMR-230200040 HDFS NameNode WARN cluster dfs capacity almost used up.(>90%)
EMR-230200026 HDFS NameNode WARN namenode http port(50070) unavailable(can not access in 5 seconds).
EMR-230100025 HOST HOST WARN too many processes(>30000) on worker host.
EMR-230100017 HOST HOST WARN worker host high cup usage(>90%).
EMR-330100015 HOST HOST CRITICAL worker host consecutively high cup usage(>95% for 12h).
EMR-230100003 HOST HOST WARN low disk space for /mnt/disk1(<10%).
EMR-330100002 HOST HOST CRITICAL low disk space for /mnt/disk1(<10%).
EMR-230100002 HOST HOST WARN low disk space for /mnt/disk1(<20%).
EMR-220200007 HOST HOST CRITICAL component down is reported by emr agent more than once in recent 15 minutes.
EMR-220200006 HOST HOST WARN component down is reported by emr agent.
EMR-331200089 PRESTO PrestoWorker CRITICAL presto worker http port(9090) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-331200087 PRESTO PrestoMaster CRITICAL presto master http port(9090) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-331100085 OOZIE OOZIE CRITICAL oozie admin port(11001) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-331100083 OOZIE OOZIE CRITICAL oozie http port(11000) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-331000081 ZEPPELIN Zeppelin CRITICAL zeppelin port(8080) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330900079 SPARK SparkHistory CRITICAL spark history server ui port(18080) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330800077 HUE Hue CRITICAL hue port(8888) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330700075 STORM UI CRITICAL nimbus ui port(9999) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330700073 STORM Nimbus CRITICAL nimbus thrift port(6627) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330600071 HBASE ThriftServer CRITICAL thriftserver jmx port(10103) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330600069 HBASE ThriftServer CRITICAL thriftserver info port(9095) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330600067 HBASE ThriftServer CRITICAL thriftserver port(9099) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330600065 HBASE HRegionServer CRITICAL hregionserver jmx port(10102) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330600063 HBASE HRegionServer CRITICAL hregionserver http port(16030) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330600061 HBASE HRegionServer CRITICAL hregionserver ipc port(16020) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330600059 HBASE HMaster CRITICAL hmaster jmx port(10101) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330600057 HBASE HMaster CRITICAL hmaster http port(16010) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330600055 HBASE HMaster CRITICAL hmaster ipc port(16000) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330500053 ZOOKEEPER ZOOKEEPER CRITICAL zk peer port(2888) unavailable(can not access in 5 seconds) on leader host last for 5 minutes.
EMR-330500051 ZOOKEEPER ZOOKEEPER CRITICAL zk leader port(3888) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330500049 ZOOKEEPER ZOOKEEPER CRITICAL zk client port(2181) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330400047 HIVE HiveServer CRITICAL hveiveserver2 webui port(10002) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330400045 HIVE HiveServer CRITICAL hveiveserver2 port(10000) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330400043 HIVE HiveMetaStore CRITICAL hive metastore port(9083) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330300049 YARN WebAppProxyServer CRITICAL webappproxy server port(20888) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330300047 YARN TimeLineServer CRITICAL timeline server webapp port(8188) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330300045 YARN TimeLineServer CRITICAL timeline server port(10200) unavailable(can not access in 5 seconds) last for 5 minutes.
EMR-330300043 YARN JobHistory CRITICAL jobhistory server port(10020) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330300041 YARN NodeManager CRITICAL nodemanager http port(8042) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330300039 YARN ResourceManager CRITICAL resourcemanager port(8032) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330300037 YARN ResourceManager CRITICAL resourcemanager webapp port(8088) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330200039 HDFS ZKFC CRITICAL zkfc port(8019) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330200038 HDFS JournalNode CRITICAL journalnode http port(8480) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330200037 HDFS JournalNode CRITICAL journalnode rpc port(8485) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330200035 HDFS DataNode CRITICAL datanode http port(50075) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330200033 HDFS DataNode CRITICAL datanode ipc port(50020) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330200031 HDFS DataNode CRITICAL datanode port(50010) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330200029 HDFS NameNode CRITICAL namenode ipc port(9000/8020) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330200027 HDFS NameNode CRITICAL namenode http port(50070) unavailable(can not access in 5 seconds) for 5 minutes.
EMR-330100024 HOST HOST CRITICAL too many processes(>30000) on master host.
EMR-230100024 HOST HOST WARN too many processes(>20000) on master host.
EMR-230100021 HOST HOST WARN high load((load_one+load_five+load_fifteen)/(3*cpu_num)>2.0).
EMR-330100018 HOST HOST CRITICAL high memory usage(>95%).
EMR-230100018 HOST HOST WARN high memory usage(>85%).
EMR-230100016 HOST HOST WARN master host high cup usage(>90%).
EMR-330100014 HOST HOST CRITICAL master host consecutively high cup usage(>90% for 12h).
EMR-330100011 HOST HOST CRITICAL low disk device space(<5%).
EMR-230100011 HOST HOST WARN low disk device space(<15%).
EMR-330100008 HOST HOST CRITICAL high %util reported by iostat(>95%).
EMR-230100008 HOST HOST WARN high %util reported by iostat(>90%).
EMR-330100004 HOST HOST CRITICAL high await reported by iostat(>200ms).
EMR-230100004 HOST HOST WARN high await reported by iostat(>100ms).
EMR-330100001 HOST HOST CRITICAL low rootfs disk space(<10%).
EMR-230100001 HOST HOST WARN low rootfs disk space(<20%).

日志异常检测,异常告警事件描述说明:

Event_code Service_name Component_name Level Description
EMR-350100001 HOST HOST CRITICAL OOM occurred in /var/log/message.
EMR-350100002 HOST HOST CRITICAL vm host boot up.
EMR-350100003 HOST HOST CRITICAL vm host shut down.
EMR-150200001 HDFS NameNode INFO starting namenode.
EMR-250200002 HDFS NameNode WARN shutting down namenode.
EMR-250200003 HDFS NameNode WARN state chage: safe mode is on.
EMR-150200004 HDFS NameNode INFO state chage: safe mode is off.
EMR-350200005 HDFS NameNode CRITICAL directory formatted.
EMR-350200006 HDFS NameNode CRITICAL load fsimage execption.
EMR-350200007 HDFS NameNode CRITICAL sync journal failed
EMR-350200008 HDFS NameNode CRITICAL namenode exit unexpectedly(exit code is not 0).
EMR-250200009 HDFS NameNode WARN connection refused exception occurred.
EMR-250200010 HDFS NameNode WARN unknown host exception occurred.
EMR-250200011 HDFS NameNode WARN connection reset by peer.
EMR-350200012 HDFS NameNode CRITICAL OOM occurred in namenode.
EMR-350200013 HDFS NameNode CRITICAL low available disk space on namenode
EMR-350200014 HDFS NameNode CRITICAL resources are low on NN.
EMR-350200015 HDFS NameNode CRITICAL namenode write to journalnode timeout.
EMR-150201001 HDFS DataNode INFO starting datanode.
EMR-250201002 HDFS DataNode WARN shutting down datanode.
EMR-350201003 HDFS DataNode CRITICAL exception occurred in secureMain.
EMR-350201004 HDFS DataNode CRITICAL datanode exit unexpectedly(exit code is not 0).
EMR-250201005 HDFS DataNode WARN unknown host exception occurred.
EMR-250201006 HDFS DataNode WARN connection reset by peer.
EMR-250201007 HDFS DataNode WARN connection refused exception occurred.
EMR-350202001 HDFS ZKFC CRITICAL unable to start zkfc.
EMR-350202002 HDFS ZKFC CRITICAL unable to connect to ZK quorum.
EMR-250202003 HDFS ZKFC WARN namenode entered state: SERVICE_NOT_RESPONDING.
EMR-150202004 HDFS ZKFC WARN namenode entered state: SERVICE_HEALTHY.
EMR-350202005 HDFS ZKFC CRITICAL namenode active/standby state switch occurred.
EMR-350202006 HDFS ZKFC CRITICAL transport-level exception trying to monitor health of namenode.
EMR-150203001 HDFS JournalNode INFO starting journalnode.
EMR-250203002 HDFS JournalNode WARN shutting down journalnode.
EMR-350300001 YARN ResourceManager CRITICAL unknown host exception occurred.
EMR-150300002 YARN ResourceManager INFO starting resourcemanager.
EMR-250300003 YARN ResourceManager WARN shutting down resourcemanager.
EMR-350300004 YARN ResourceManager CRITICAL invalid configuration
EMR-250300005 YARN ResourceManager WARN transitioning to standby state.
EMR-250300006 YARN ResourceManager WARN transitioned to standby state.
EMR-250300007 YARN ResourceManager WARN transitioning to active state.
EMR-250300008 YARN ResourceManager WARN transitioned to active state.
EMR-350300009 YARN ResourceManager CRITICAL resourcemanager could not transition to active.
EMR-350300010 YARN ResourceManager CRITICAL error when transitioning to active mode.
EMR-350300011 YARN ResourceManager CRITICAL error when starting resourcemanager.
EMR-350300012 YARN ResourceManager CRITICAL maximum-am-resource-percent is insufficient.
EMR-350300013 YARN ResourceManager CRITICAL resourcemanager exit unexpected(exit code is not 0).
EMR-350300014 YARN ResourceManager CRITICAL active-standby elector failed to connect ZK.
EMR-350300015 YARN ResourceManager CRITICAL ZKRMStateStore can not connect ZK.
EMR-350300016 YARN ResourceManager CRITICAL OOM occurred in resourcemanager.
EMR-150301001 YARN NodeManager INFO starting nodemanager.
EMR-250301002 YARN NodeManager WARN shutting down nodemanager.
EMR-250301003 YARN NodeManager WARN connection reset by peer.
EMR-250301004 YARN NodeManager WARN unknown host exception occurred.
EMR-250301005 YARN NodeManager WARN connection refused exception occurred.
EMR-350301006 YARN NodeManager CRITICAL error while rebooting node-status-updater.
EMR-350301007 YARN NodeManager CRITICAL dead datanode detected.
EMR-350301008 YARN NodeManager CRITICAL error when starting nodemanager.
EMR-250301009 YARN NodeManager CRITICAL running beyond physical memory limits.
EMR-150302001 YARN JobHistory INFO starting JobHistoryServer.
EMR-250302002 YARN JobHistory WARN shutting down jobhistoryserver.
EMR-350302003 YARN JobHistory CRITICAL jobhistoryserver exit unexpectedly(exit code is not 0).
EMR-350302004 YARN JobHistory CRITICAL error when starting jobhistoryserver.
EMR-250302005 YARN JobHistory WARN network is unreachable.
EMR-250302006 YARN JobHistory WARN connection refused exception occurred.
EMR-250302007 YARN JobHistory WARN connection reset by peer.
EMR-150303001 YARN TimeLineServer INFO starting ApplicationistoryServer.
EMR-150303002 YARN TimeLineServer WARN shutting down ApplicationHistoryServer.
EMR-350303003 YARN TimeLineServer CRITICAL timelineserver(ApplicationHistoryServer) exit unexpectedly(exit code is not 0).
EMR-350303004 YARN TimeLineServer CRITICAL error when starting ApplicationHistoryServer.
EMR-150400000 HIVE HiveMetaStore INFO starting hive metastore.
EMR-250400001 HIVE HiveMetaStore WARN shutting down hive metastore.
EMR-350400002 HIVE HiveMetaStore CRITICAL parse conf error.
EMR-250400003 HIVE HiveMetaStore WARN partition column may contains non-ascii char.
EMR-250400004 HIVE HiveMetaStore WARN metastore schema dismatch.
EMR-350400005 HIVE HiveMetaStore CRITICAL metastore database max user connection exceeded.
EMR-350400006 HIVE HiveMetaStore CRITICAL metastore database max questions exceeded.
EMR-350400007 HIVE HiveMetaStore CRITICAL metastore database max updates exceeded.
EMR-350400008 HIVE HiveMetaStore CRITICAL required table missing in metastore database.
EMR-350400009 HIVE HiveMetaStore CRITICAL metastore database connection failed.
EMR-350400010 HIVE HiveMetaStore CRITICAL metastore database communications link failure.
EMR-250400011 HIVE HiveMetaStore WARN unknown host exception occurred.
EMR-250400012 HIVE HiveMetaStore WARN connection refused exception occurred.
EMR-250400013 HIVE HiveMetaStore WARN connection reset by peer.
EMR-350400014 HIVE HiveMetaStore CRITICAL OOM occurred in hive metastore.
EMR-350400015 HIVE HiveMetaStore CRITICAL metastore database disk quota used up.
EMR-150401001 HIVE HiveServer2 INFO starting HiveServer2.
EMR-250401002 HIVE HiveServer2 WARN shutting down HiveServer2.
EMR-350401003 HIVE HiveServer2 CRITICAL too many dynamic partitions.
EMR-350401004 HIVE HiveServer2 CRITICAL connect to zk timeout.
EMR-350401005 HIVE HiveServer2 CRITICAL hiveserver2 failed to connect to the metastore server.
EMR-350401006 HIVE HiveServer2 CRITICAL failed init metastore client.
EMR-350401007 HIVE HiveServer2 CRITICAL can not connecto to metastore using any of the URIs provided.
EMR-350401008 HIVE HiveServer2 CRITICAL error starting hiveserver.
EMR-250401009 HIVE HiveServer2 WARN connection refused exception occurred.
EMR-250401010 HIVE HiveServer2 WARN unknown host exception occurred.
EMR-250401011 HIVE HiveServer2 WARN connection reset by peer.
EMR-350401012 HIVE HiveServer2 CRITICAL error parsing conf.
EMR-350401013 HIVE HiveServer2 CRITICAL OOM occurred in hiveserver2.
EMR-350900001 SPARK SparkHistory CRITICAL OOM occurred in sparkhistory.
EMR-350500001 ZOOKEEPER ZOOKEEPER CRITICAL unable to run quorum server.
EMR-250500002 ZOOKEEPER ZOOKEEPER WARN connection refused exception occurred.
EMR-350500003 ZOOKEEPER ZOOKEEPER CRITICAL too many connection from zk client.
EMR-250500004 ZOOKEEPER ZOOKEEPER WARN zk node shut down.
EMR-250500005 ZOOKEEPER ZOOKEEPER WARN zk fsync latency.
EMR-351000001 ZEPPELIN ZEPPELIN CRITICAL insufficient momery