Problem management

更新时间: 2026-06-03 19:06:59

A problem aggregates related alert events into a single unit based on grouping rules, enabling centralized lifecycle management and notifications.

Overview

Problem management provides:

  • Alert event aggregation: Groups related events into problems based on grouping rules.

  • Lifecycle management: Tracks problems from creation to resolution.

  • Escalation notifications: Supports multi-stage escalation policies and repeated notifications.

  • Collaborative handling: Supports claiming and transferring problems, and adding related personnel.

View problems

  1. Log on to the Cloud Monitor 2.0 console, select the target workspace, and in the left-side navigation pane, choose Alert Center > Notification Management > Problem Management.

  2. The page lists all problems and their statuses. Filter by:

    Field

    Description

    Severity

    Critical, Error, Warning, or Normal

    Problem title / ID

    Title is determined by the alert rule or event subscription name. ID is auto-generated and unique.

    Notification policy

    Notification policy that triggered the alert.

    Assignee

    Person assigned to resolve the problem.

    Creation time

    Time when the problem was created.

    Resolution status

    Current status. Possible values:

    • open: Problem is being processed and continues to receive alert events.

    • resolved: Problem was manually resolved.

    • recovered: Problem was auto-resolved after no new events were received within a specified period.

    Actions

    Available actions, such as claim and resolve.

  3. The Problem Details page shows basic information, affected objects, entity topology, problem content, root cause analysis, events, and the activity log. Available actions:

    • For an unresolved problem, claim or resolve it, assign it, or change its severity.

    • View the root cause identified by the Cloud Monitor 2.0 StarOps agent.

    • The Events and Activities tabs:

      • Events: Lists alert events with their creation times and statuses. Click an event name to view details.

      • Activities: Shows the problem activity log.

Handle problems

Claim or resolve unresolved problems, assign them, or change their severity.

  1. On the Problems page, click Associate Operator in the upper-right corner.

    • The operator name is the DingTalk nickname. Because multiple users may share one Alibaba Cloud account, you must associate an operator to identify who handles each alert.

  2. Scan the QR code with DingTalk and bind your mobile number.

  3. To the right of the target problem, or, after clicking the problem, on the Problem Details page:

    • Click Claim to assign the problem to yourself.

    • Click Resolve to close the problem.

上一篇: Action integration 下一篇: Notification templates
阿里云首页 云监控 相关技术圈