Automate resource remediation with MNS notifications

更新时间:
复制 MD 格式

When Cloud Config detects a non-compliant resource configuration, it sends an alert to a specified Simple Message Queue (MNS) topic. You can use Function Compute to automatically remediate non-compliant resources based on these alerts.

Prerequisites

Use case

Securing all Object Storage Service (OSS) buckets in your Alibaba Cloud environment is critical. The test-oss-bucket-public-read-prohibited managed rule in Cloud Config checks whether any OSS bucket ACL allows public read access. This rule scans all OSS buckets in your account and flags those with public read permissions. The following figure shows a resource that is evaluated as Non-compliant.

image

Configuration planning

This topic uses the remediation of public read permissions on an OSS bucket as an example. The following table lists the required configurations.

Cloud service

Parameter

Example

Cloud Config

Managed rule

oss-bucket-public-read-prohibited

Rule name

test-oss-bucket-public-read-prohibited

Simple Message Queue (MNS)

Topic name

MNSTestConfig

Topic region

China (Shanghai)

Object Storage Service (OSS)

OSS bucket

config-snapshot

bucket ACL

Public read

Function Compute

Service

resource_repair

Service role policy

AliyunOSSFullAccess

Function

oss_repair_acl_trigger

Trigger

ConfigRuleNonComplianceMNSTrigger

Note

Cloud Config is deployed in the China (Shanghai) region. To reduce network latency, we recommend that you also select China (Shanghai) as the region for the Simple Message Queue (MNS) topic.

How it works

The following figure shows the remediation workflow.修复流程

Procedure

  1. Log on to the Cloud Config console and deliver resource compliance events to a specified Simple Message Queue (MNS) topic, such as MNSTestConfig.

    For more information, see Deliver data to Simple Message Queue (MNS).

  2. Create a service.

    1. Log on to the Function Compute console.

    2. In the left-side navigation pane, click Services & Functions.

    3. In the top navigation bar, select a region, for example, China (Shanghai).

    4. On the Services page, click Create Service.

    5. In the Create Service panel, enter resource_repair for the Name.

    6. Click OK.

  3. Grant the service permissions to modify OSS bucket ACLs.

    1. In the left-side navigation pane of the resource_repair service, click Service Details.

    2. In the Role section, click Edit.

    3. Select a service role that has the AliyunOSSFullAccess policy attached.

      If you do not have a suitable service role, click Create Role. In the Resource Access Management (RAM) console, create a role. You must set the trusted entity type to Alibaba Cloud Service and the trusted service to Function Compute. Attach the AliyunOSSFullAccess policy to the role. For more information, see Create a RAM role for a trusted Alibaba Cloud service.

    4. Click Save.

  4. Create a function.

    1. In the left-side navigation pane of the resource_repair service, click Functions.

    2. Click Create Function.

    3. On the Create Function page, enter oss_repair_acl_trigger for the Function Name and select Python 3.6 for the Runtime. Keep the default values for other parameters.

    4. Click Create.

  5. Configure the environment variable for the function.

    1. On the details page of the oss_repair_acl_trigger function, click the Function Code tab, and then click the Configuration tab.

    2. In the Environment Variables section, click Edit.

    3. Click Add Variable, and enter the variable name and value.

      • Set Variables to prepareRuleId.

      • Set Value to the ID of the rule created in Cloud Config, for example, cr-a6129bc09da7009675a0.

    4. Click OK.

  6. Create a trigger.

    1. On the Configuration tab of the oss_repair_acl_trigger function, click the Triggers tab.

    2. Click Create Trigger.

    3. Set Trigger Type to Simple Message Queue.

    4. Configure the parameters for the Simple Message Queue (MNS) trigger.

      Configure the parameters as follows:

      • Set Name to ConfigRuleNonComplianceMNSTrigger.

      • Set MNS Region to China (Shanghai).

      • Set Topic to MNSTestConfig.

      • Set Event Format to STREAM.

      • Set Role Name to AliyunMNSNotificationRole.

    5. Click OK.

      After the trigger is created, you will receive notifications for non-compliant events when Cloud Config evaluates the target resources.

  7. Configure the remediation code.

    1. On the Triggers tab of the oss_repair_acl_trigger function, click the Function Code tab.

    2. Click the index.py file.

    3. Copy the following code into the index.py file.

      # -*- coding: utf-8 -*-
      import logging
      import json
      import os
      import oss2
      
      logger = logging.getLogger()
      
      # The rule ID configured as an environment variable (e.g., cr-a6129bc09da7009675a0).
      # Note: The value must be the configRuleId from the Cloud Config rule, not the rule name.
      # This ensures that the function processes events triggered only by the specified rule.
      ENV_RULE_ID = 'prepareRuleId'
      
      # A constant used to set the OSS bucket ACL to private.
      BUCKET_ACL_PRIVATE = oss2.BUCKET_ACL_PRIVATE
      
      
      def handler(event, context):
          """
          The main handler function. It receives events from MNS or direct invocations and remediates the OSS bucket ACL based on the event content.
      
          :param event: The event data that triggers the function (JSON string).
          :param context: The context object, which includes temporary credentials.
          :return: 'success' or 'fail'
          """
          logger.info("Received event: %s", event)
      
          try:
              # Parse the event as a JSON object.
              notify_json = json.loads(event)
          except json.JSONDecodeError as e:
              # If parsing fails, log an error and return 'fail'.
              logger.error("Failed to parse event as JSON: %s", str(e))
              return 'fail'
      
          # Check if the notification JSON is a non-empty list.
          if not isinstance(notify_json, list) or not notify_json:
              logger.error("Invalid event format: expected a non-empty list.")
              return 'fail'
      
          # Retrieve the rule ID from the environment variable.
          expected_rule_id = os.environ.get(ENV_RULE_ID)
          if not expected_rule_id:
              logger.warning("Environment variable '%s' is not set.", ENV_RULE_ID)
              return 'fail'
      
          for item in notify_json:
              config_rule_id = item.get('configRuleId')
              bucket_name = item.get('resourceId')
              region = item.get('regionId')
      
              logger.info(f"Processing rule ID: {config_rule_id}, Bucket: {bucket_name}, Region: {region}")
      
              # Check if the rule ID matches the expected ID.
              if config_rule_id != expected_rule_id:
                  logger.info(f"Rule ID '{config_rule_id}' does not match expected '{expected_rule_id}'. Skipping.")
                  continue
      
              # Check if the required fields exist.
              if not bucket_name or not region:
                  logger.warning("Missing resourceId or regionId. Skipping.")
                  continue
      
              # Perform the remediation.
              try:
                  remedy_by_fc_assume(context, region, bucket_name)
              except Exception as ex:
                  logger.exception("Failed to remediate bucket: %s", str(ex))
                  return 'fail'
      
          return 'success'
      
      
      def remedy_by_fc_assume(context, region, bucket_name):
          """
          Uses the temporary credentials provided by Function Compute to access OSS and change the bucket ACL to private.
      
          :param context: The Function Compute context object.
          :param region: The region where the OSS bucket is located.
          :param bucket_name: The name of the OSS bucket.
          """
          creds = context.credentials
          auth = oss2.StsAuth(
              creds.access_key_id,
              creds.access_key_secret,
              creds.security_token
          )
          endpoint = f'http://oss-{region}.aliyuncs.com'
          bucket = oss2.Bucket(auth, endpoint, bucket_name)
      
          # Set the bucket ACL to private.
          bucket.put_bucket_acl(BUCKET_ACL_PRIVATE)
          logger.info(f'Bucket {bucket_name} in {region} ACL set to private.')
      Note

      This code uses the prepareRuleId environment variable to match resources by rule ID. To use other parameters for remediation, see Examples of resource non-compliance events.

    4. In the upper-left corner of the code editor, click Deploy Code.

  8. Wait for about 10 minutes and then view the remediation result.

    Note

    If the resource configuration has not changed and the evaluation result is non-compliant, you must also run an on-demand evaluation before you perform this step. For more information, see Run an on-demand evaluation.

    • View the result in the Cloud Config console.

    • View the result in the OSS console.image