This topic describes how to use the cross-account management capability of Resource Directory and the account group management feature of Cloud Config to implement custom remediation for non-compliant resources across multiple accounts.
Prerequisites
A resource directory is enabled. For more information, see Enable a resource directory.
Function Compute is activated. For more information, see Step 1: Activate Function Compute.
ImportantWhen you use Function Compute functions, you are charged for the number of function calls, resource usage, and outbound Internet traffic. For more information, see Billing overview.
Background information
Cloud Config can detect non-compliant resources by running rules and apply custom remediation to fix them. To remediate non-compliant resources across multiple accounts, use the multi-level account and resource management capabilities of Resource Directory. This topic uses the example of checking whether an ECS instance has the CloudMonitor agent installed to demonstrate how to detect and automatically remediate non-compliant resources across accounts. Assume Account A (ID: 100931896542****) has administrator permissions in Resource Directory (or is a delegated administrator account for Cloud Config), and Account B (ID: 178366182654****) is a member account in the same Resource Directory that contains the non-compliant resources to be remediated. The following steps describe how to use Account A to detect and remediate non-compliant resources in Account B.
Step 1: Create a role for the management account and attach policies to the role
Log on to the Resource Access Management (RAM) console.
-
Create a RAM role.
-
In the navigation pane on the left, choose Identities > Roles.
-
Click Create Role. In the Create Role panel, configure the role parameters.
-
For Select type of trusted entity, select Alibaba Cloud Account, and then click Next.
-
Enter a Role Name, for example,
ConfigCustomRemediationRole. For Select Trusted Alibaba Cloud Account, select Current Alibaba Cloud Account. -
Click OK.
-
Click Close.
-
-
-
Create a permissions policy.
-
In the navigation pane on the left, choose Permissions > Policies.
-
Click Create Policy to go to the Create Policy page.
-
On the JSON tab, enter the following policy document.
// The entity that assumes this role has the permissions to install the CloudMonitor agent. { "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "cms:InstallMonitoringAgent", "Resource": "*" }, { "Action": "sts:AssumeRole", "Effect": "Allow", "Resource": "*" } ] } -
Click OK, and then enter a policy name, for example,
ConfigCustomRemediationPolicy. -
Click Save.
-
-
-
Grant permissions to the role.
-
In the navigation pane on the left, choose Permissions > Grants.
-
Click Grant Permission. In the panel that appears, grant new permissions to the role.
-
For Grant Permission On, select Account.
-
In the Principal field, enter
ConfigCustomRemediationRoleand select the role from the results. -
In the Select Policy section, click the Custom policy tab. Enter
ConfigCustomRemediationPolicyin the search box and select the policy from the results. -
Click OK.
-
-
-
Attach a trust policy to the role.
-
In the navigation pane on the left, choose Identities > Roles.
-
On the Roles page, search for
ConfigCustomRemediationRoleand click the role name to go to the role details page. -
On the Trust Policy Management tab, click Edit Trust Policy and replace the existing policy with the following content.
// Allow the Function Compute service to assume the role. { "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "RAM": [ "acs:ram::100931896542****:root" ] } }, { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": [ "fc.aliyuncs.com" ] } } ], "Version": "1" } -
Click OK.
-
Step 2: Create a role for the member and attach policies to the role
Create a RAM role and grant permissions to the role.
For more information, see Substeps 1 to 4 in Step 1.
Attach a trust policy to the role.
For more information, see Substep 5 in Step 1. Replace the policy script with the following sample script.
// Allow Account A (ID: 100931896542****) to assume the role. { "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "RAM": [ "acs:ram::178366182654****:root" ] } }, { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "RAM": [ "acs:ram::100931896542****:role/configcustomremediationrole" ] } } ], "Version": "1" }
Step 3: Create a custom remediation function
-
Log on to the Function Compute console.
-
In the navigation pane on the left, click Function.
-
Click Create Function. The Create Function panel appears.
-
Select Event Function.
-
In the Basic Configurations section, enter a FunctionName, for example,
ConfigRemediationService. -
Keep the default values for Specifications and Instance Concurrency.
-
-
-
In the elasticity settings, set Elastic Mode to Default mode: On-demand + Cold start.
-
In the Function Code section, select Python 3.9 for Runtime and Use Sample Code for Code Upload Method.
-
Click Create to go to the function details page.
-
On the Function Details tab, enter the following code for the resource remediation function.
#!/usr/bin/env python # -*- encoding: utf-8 -*- import json from aliyunsdkcore.client import AcsClient from aliyunsdkcore.acs_exception.exceptions import ClientException from aliyunsdkcore.acs_exception.exceptions import ServerException from aliyunsdkcore.request import CommonRequest from aliyunsdkcore.auth.credentials import StsTokenCredential from aliyunsdksts.request.v20150401.AssumeRoleRequest import AssumeRoleRequest import logging logger = logging.getLogger() # The sample code is used to remediate non-compliant resources based on the ecs-instance-monitor-enabled rule. You can modify the remediation logic based on your business requirements. def handler(event, context): get_resources_non_compliant(event, context) def get_resources_non_compliant(event, context): resources = parse_json(event) for resource in resources: remediation(resource, context) def parse_json(content): """ Parse string to json object :param content: json string content :return: Json object """ try: return json.loads(content) except Exception as e: logger.error('Parse content:{} to json error:{}.'.format(content, e)) return None def remediation(resource, context): logger.info(resource) region_id = resource['regionId'] account_id = resource['accountId'] resource_id = resource['resourceId'] resource_type = resource['resourceType'] config_rule_id = resource['configRuleId'] if resource_type == 'ACS::ECS::Instance': logger.info("process account_id: {}, resource_id: {}, config_rule_id: {}".format( account_id, resource_id, config_rule_id)) install_monitoring_agent(context, account_id, region_id, resource_id) def install_monitoring_agent(context, account_id, resource_region_id, resource_id): logger.info("start install agent {}: {}".format(resource_region_id, resource_id)) token = assume_role_and_get_token(context, account_id, resource_region_id) client = AcsClient(token['Credentials']['AccessKeyId'], token['Credentials']['AccessKeySecret'], region_id=resource_region_id) request = CommonRequest() request.set_accept_format('json') request.set_domain(f'metrics.{resource_region_id}.aliyuncs.com') request.set_method('POST') request.set_protocol_type('https') # https | http request.set_version('2019-01-01') request.set_action_name('InstallMonitoringAgent') request.add_query_param('InstanceIds.1', resource_id) request.add_query_param('Force', "true") request.add_query_param('SecurityToken', token['Credentials']['SecurityToken']) response = client.do_action_with_exception(request) logger.info(response) # Assume the role to obtain a temporary Security Token Service (STS) token. Replace the role name in the sample code with the actual role that you use. def assume_role_and_get_token(context, account_id, region_id): creds = context.credentials logger.info('assume_role_and_get_token begin.') credentials = StsTokenCredential(creds.access_key_id, creds.access_key_secret, creds.security_token) client = AcsClient(credential=credentials) request = AssumeRoleRequest() request.set_domain(f'sts-vpc.{region_id}.aliyuncs.com') request.set_accept_format('json') request.set_RoleArn(f'acs:ram::{account_id}:role/configcustomremediationrole') request.set_RoleSessionName("ConfigCustomRemediationRole") response = client.do_action_with_exception(request) logger.info('assume_role_and_get_token response : {}.'.format(response)) token = json.loads(response) logger.info('assume_role_and_get_token: {}, assume role: {}.'.format(context.credentials, token)) return token
-
-
Click Deploy Code.
Step 4: Create a rule and configure custom remediation
Log on to the Cloud Config console.
Create an account group and add Account A and Account B to the account group.
For more information, see Create an account group.
In the upper-left corner of the Cloud Config console, switch to the account group that you created in the previous step.
-
Create a rule. For more information, see Create a rule based on a managed rule.
-
On the Select Create Method page, select Based on managed rule. Search for and select the rule that checks whether the CloudMonitor agent is installed on running ECS instances, and then click Next.
-
On the Set Basic Properties page, specify the Rule Name, Risk Level, Trigger, and Description. Then, click Next.
-
On the Set Effective Scope page, keep the default settings and click Next.
-
On the Set Correction page, turn on the Set Remediation switch. Select Function Compute. Set the Invoke Type to Automatic Remediation, select the Function ARN of the function you created in Step 3, and then click Submit.
NoteIf your custom remediation function is still in testing, you can set the Invoke Type to Manual Remediation for debugging. After testing is complete, you can switch the Invoke Type to Automatic Remediation.
-
Step 5: Implement automatic remediation and verify the remediation result
-
On the Rules page, find the target rule and click Remediation Detail in the Remediation Template column.
-
On the Remediation Detail tab, click Perform Manual Correction next to Remediation Detail.
In the Execution Result List section, you can view the remediation results. For resources that were not successfully remediated, you can also view the failure reason.
NoteOn the Remediation Detail tab, you can click the Function ARN next to Remediation Template to open the Function Code tab for the function in the Function Compute console.