Monitoring and analysis

更新时间:
复制 MD 格式

Continuous visibility into your cloud environment is the foundation of workload security. This topic describes how to build a layered monitoring posture — covering networks, identities, configurations, and operations — and how to collect, store, and act on logs to detect threats and meet compliance requirements.

Without continuous visibility into your cloud environment, threats go undetected, incidents are harder to investigate, and compliance becomes difficult to demonstrate. Effective monitoring spans multiple layers — network traffic, identity and permissions, resource configurations, and operational activity — so that gaps at any layer don't leave threats undetected.

Monitoring and control

Apply security monitoring at every layer of your environment. The following practices cover each layer and help you build a complete monitoring posture.

  • Network management: Build isolated, hierarchical networks so that similar resources are grouped and the blast radius of any unauthorized access is contained. For a virtual private cloud (VPC), use Elastic Compute Service (ECS) security groups, network ACLs, and flow logs to control and observe traffic. Apply Resource Access Management (RAM) access control policies to restrict who can interact with VPC resources.

  • Permission management: Assign the minimum permissions each user needs. Over-permissioned accounts are among the most common sources of accidental damage and deliberate attacks. Review role assignments regularly to ensure permissions stay appropriate as responsibilities change.

  • Configuration audit: At scale, resource configurations drift. Use Cloud Config to track configuration changes across your Alibaba Cloud account, maintain a configuration history, and perform real-time compliance auditing so deviations are caught before they become vulnerabilities.

  • Operation audit: Record user logons and resource access events across your account to support security analysis, intrusion detection, resource change tracking, and compliance auditing. Export behavioral events to Simple Log Service (SLS) or Object Storage Service (OSS) for long-term retention and deeper analysis. You can then perform actions such as behavior analysis, security analysis, resource change tracking, and behavior compliance audits.

  • DDoS protection: A distributed denial-of-service (DDoS) attack consumes a target server's performance or network bandwidth, making services unavailable to legitimate users. Reduce exposure with the following measures:

    • Reduce the attack surface by isolating resources and unrelated services.

    • Optimize business architecture.

    • Design for elastic scaling and disaster recovery failover using public cloud capabilities.

    • Establish business monitoring and emergency response plans.

    • Use Anti-DDoS products.

  • Intrusion detection: Early detection limits damage from data breaches and system compromise. Configure detection and alerting mechanisms to identify threats targeting your ECS instances and other Alibaba Cloud resources — including traffic from malicious IP addresses and indicators of asset compromise.

Logging and alerting

Logs are your primary evidence source for security investigations, audits, and compliance verification. Without reliable logs, you cannot establish what happened, when it happened, or who was responsible.

A common failure mode is collecting logs without managing them well: logs stored without access controls, retained for too long or deleted too soon, or spread across sources with no common format for analysis. Follow a structured log management approach that covers collection, storage, query, analysis, and alerting.

  • Log collection: Collect logs from all cloud resources, services, and applications. Keep the collection process non-intrusive to minimize performance impact on production systems.

  • Secure storage: Set a retention period based on your security and compliance requirements, factoring in each cloud product's log characteristics. Storage must be tamper-proof, with write and delete permissions strictly controlled.

  • Log query: Choose a query mechanism that meets your operational, business, and security requirements.

  • Log analysis: Normalize logs from different sources into a common format before analysis. Without normalization, correlating events across services is error-prone and slow. Once normalized, cross-source analysis becomes practical and reliable.

  • Alert generation: Alerts must be real-time, accurate, and reliably delivered. You can use multiple notification methods. You should also have a workable emergency response plan for each type of alert.