弹性伸缩可以结合资源编排来创建TiDB集群,并通过伸缩组的扩缩容能力来管理TiDB集群。本文为您介绍通过弹性伸缩创建和管理TiDB集群的方法,根据TiDB集群的实时情况实现该集群中ECS实例的自动扩缩容。
前提条件
您已拥有资源编排ROS的操作权限。
您已拥有弹性伸缩的操作权限。
您已拥有云服务器ECS的操作权限。
背景信息
伸缩组目前不支持直接关联TiDB集群,您可以先通过资源编排ROS创建伸缩组和TiDB集群,然后使用伸缩组的扩缩容功能管理TiDB集群。本文适用于如下场景:
您的业务数据存储在TiDB集群中。
您需要通过弹性扩缩容功能管理TiDB集群。
您需要在TiDB Dashboard界面对TiDB集群进行可视化分析及诊断。
操作步骤
本步骤以使用资源编排创建伸缩组关联的最小拓扑架构TiDB集群为例,该集群包含TiDB Server,PD Cluster和TiKV Cluster三个组件(角色),具体操作步骤如下所示。
步骤一:创建资源栈
本步骤以创建地域选择华北5(呼和浩特)为例。
登录资源编排ROS控制台。
在左侧导航栏中,单击资源栈。
在顶部菜单栏处,选择资源栈所在地域为华北5(呼和浩特)。
在资源栈列表页面,单击创建资源栈,然后在下拉列表中选择使用ROS。
在创建资源栈页面的选择模板阶段,根据需要选择指定模板,然后单击下一步。
模板相关配置项如下所示:
指定模板:模板是一个 JSON或YAML文件,该文件描述了资源栈的资源和属性。请您保持默认选项,即选择已有模板。
模板录入方式:包含使用url、输入模板、我的模板、共享模板四种方式。请您保持默认选项,即输入模板方式。
模板内容:支持在该区域输入ROS或Terraform两种模板方式。其中,选择ROS模板方式时,在ROS页签下,您可以选择输入JSON、YAML两种类型的模板内容。
本步骤以可实现简单功能的ROS脚本为例,请您在编辑区域根据需要自行创建JSON类型的ROS模板内容。
JSON
{ "ROSTemplateFormatVersion": "2015-09-01", "Description": { "zh-cn": "TiDB 集群最小部署的拓扑架构。", "en": "The minimal deployment topology of TiDB cluster." }, "Mappings": { "InstanceProfile": { "ControlServer": { "Description": "Instance for monitoring, grafana, alertmanager.", "InstanceType": "ecs.c6e.large" }, "TiDBServer": { "Description": "Better to be compute optimized.", "InstanceType": "ecs.c6e.large" }, "TiPDServer": { "Description": "TiDB Placement Driver.", "InstanceType": "ecs.c6e.large" }, "TiKVServer": { "Description": "Better to be storage optimized.", "InstanceType": "ecs.c6e.large" } } }, "Parameters": { "InstancePassword": { "NoEcho": true, "Type": "String", "Description": { "en": "Server login password, Length 8-30, must contain three(Capital letters, lowercase letters, numbers, ()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ Special symbol in).", "zh-cn": "服务器登录密码,长度8-30,必须包含三项(大写字母、小写字母、数字、 ()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ 中的特殊符号)。" }, "AllowedPattern": "[0-9A-Za-z\\_\\-\\&:;'<>,=%`~!@#\\(\\)\\$\\^\\*\\+\\|\\{\\}\\[\\]\\.\\?\\/]+$", "Label": { "en": "Instance Password", "zh-cn": "实例密码" }, "ConstraintDescription": { "en": "Length 8-30, must contain three(Capital letters, lowercase letters, numbers, ()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ Special symbol in).", "zh-cn": "长度8-30,必须包含三项(大写字母、小写字母、数字、 ()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ 中的特殊符号)。" }, "MinLength": 8, "MaxLength": 30 }, "TiDBServerCount": { "Type": "Number", "Description": { "en": "The number of TiDBServer. <br>Client connections can be evenly distributed among multiple TiDB instances to achieve load balancing. TiDB Server itself does not store data, but parses SQL, and forwards the actual data read request to the underlying storage node TiKV (or TiFlash).", "zh-cn": "TiDBServer 数量。<br>客户端的连接可以均匀地分摊在多个 TiDB 实例上以达到负载均衡的效果。TiDB Server 本身并不存储数据,只是解析 SQL,将实际的数据读取请求转发给底层的存储节点 TiKV(或 TiFlash)。" }, "MinValue": 3, "Label": { "en": "TiDB Server Count", "zh-cn": "TiDB Server 数量" }, "Default": 3 }, "TiPDServerCount": { "Type": "Number", "Description": { "en": "The number of TiPDServer. <br>The meta-information management module of the entire TiDB cluster is responsible for storing the real-time data distribution of each TiKV node and the overall topology of the cluster, providing the TiDB Dashboard control interface, and assigning transaction IDs for distributed transactions. PD not only stores meta-information, but also issues data scheduling commands to specific TiKV nodes based on the data distribution status reported by TiKV nodes in real time. In addition, the PD itself is also composed of at least 3 nodes and has high-availability capabilities. It is recommended to deploy an odd number of PD nodes. <br>This test turns on 1 node.", "zh-cn": "TiPDServer 数量。 <br>整个 TiDB 集群的元信息管理模块,负责存储每个 TiKV 节点实时的数据分布情况和集群的整体拓扑结构,提供 TiDB Dashboard 管控界面,并为分布式事务分配事务 ID。PD 不仅存储元信息,同时还会根据 TiKV 节点实时上报的数据分布状态,下发数据调度命令给具体的 TiKV 节点。此外,PD 本身也是由至少 3 个节点构成,拥有高可用的能力。建议部署奇数个 PD 节点。<br>本测试开启 1 个节点。" }, "MinValue": 3, "Label": { "en": "TiPD Server Count", "zh-cn": "TiPD Server 数量" }, "Default": 3 }, "TiKVServerCount": { "Type": "Number", "Description": { "en": "The number of TiKV Servers. <br>Storage node: Responsible for storing data. From the outside, TiKV is a distributed Key-Value storage engine that provides transactions. The basic unit of data storage is Region. Each Region is responsible for storing data in a Key Range (the left-closed and right-open interval from StartKey to EndKey). Each TiKV node is responsible for multiple Regions. TiKV's API provides native support for distributed transactions at the KV key-value pair level, and provides the SI (Snapshot Isolation) isolation level by default, which is also the core of TiDB's support for distributed transactions at the SQL level. After the SQL layer of TiDB finishes the SQL analysis, it will convert the SQL execution plan into the actual call to the TiKV API. Therefore, the data is stored in TiKV. In addition, the data in TiKV will automatically maintain multiple copies (three copies by default), which naturally supports high availability and automatic failover.", "zh-cn": "TiKV Server 数量。<br>存储节点: 负责存储数据,从外部看 TiKV 是一个分布式的提供事务的 Key-Value 存储引擎。存储数据的基本单位是 Region,每个 Region 负责存储一个 Key Range(从 StartKey 到 EndKey 的左闭右开区间)的数据,每个 TiKV 节点会负责多个 Region。TiKV 的 API 在 KV 键值对层面提供对分布式事务的原生支持,默认提供了 SI (Snapshot Isolation) 的隔离级别,这也是 TiDB 在 SQL 层面支持分布式事务的核心。TiDB 的 SQL 层做完 SQL 解析后,会将 SQL 的执行计划转换为对 TiKV API 的实际调用。所以,数据都存储在 TiKV 中。另外,TiKV 中的数据都会自动维护多副本(默认为三副本),天然支持高可用和自动故障转移。" }, "MinValue": 3, "Label": { "en": "TiKV Server Count", "zh-cn": "TiKV Server 数量" }, "Default": 3 }, "DateDiskSize": { "Default": 1000, "Type": "Number", "Description": { "zh-cn": "TiKV 集群的数据容量,TiKV 硬盘大小配置建议 PCI-E SSD 不超过 2 TB,普通 SSD 不超过 1.5 TB。 单位:GB。", "en": "The data capacity of TiKV cluster, TiKV hard disk size configuration recommended PCI-E SSD not exceed 2 TB, ordinary SSD not exceed 1.5 TB. Unit: GB." }, "Label": { "zh-cn": "TiKV 数据盘空间", "en": "TiKV Date Disk Space" } }, "SystemDiskSize": { "Default": 40, "Type": "Number", "Description": { "zh-cn": "各个节点系统盘大小, 取值范围:[40, 500], 单位:GB。", "en": "System disk size of each node, range of values: 40-500, units: GB." }, "Label": { "zh-cn": "系统盘空间", "en": "System Disk Space" } }, "Category": { "Type": "String", "Description": { "en": "<font color='blue'><b>Optional values:</b></font><br>[cloud_efficiency: <font color='green'>Efficient Cloud Disk</font>]<br>[cloud_ssd: <font color='green'>SSD Cloud Disk</font>]<br>[cloud_essd: <font color='green'>ESSD Cloud Disk</font>]<br>[cloud: <font color='green'>Cloud Disk</font>]<br>[ephemeral_ssd: <font color='green'>Local SSD Cloud Disk</font>]", "zh-cn": "<font color='blue'><b>可选值:</b></font><br>[cloud_efficiency: <font color='green'>高效云盘</font>]<br>[cloud_ssd: <font color='green'>SSD云盘</font>]<br>[cloud_essd: <font color='green'>ESSD云盘</font>]<br>[cloud: <font color='green'>普通云盘</font>]<br>[ephemeral_ssd: <font color='green'>本地SSD盘</font>]" }, "AllowedValues": [ "cloud_efficiency", "cloud_ssd", "cloud", "cloud_essd", "ephemeral_ssd" ], "Label": { "en": "System Disk Category", "zh-cn": "系统盘类型" }, "Default": "cloud_essd" } }, "Metadata": { "ALIYUN::ROS::Interface": { "ParameterGroups": [ { "Parameters": [ "TiDBServerCount", "TiPDServerCount", "TiKVServerCount" ], "Label": { "default": { "en": "Topological information", "zh-cn": "拓扑信息" } } }, { "Parameters": [ "DateDiskSize", "SystemDiskSize", "Category" ], "Label": { "default": { "en": "Disk configuration", "zh-cn": "磁盘配置" } } }, { "Parameters": [ "InstancePassword" ], "Label": { "default": { "en": "ECS configuration", "zh-cn": "ECS配置" } } } ], "TemplateTags": [ "The minimal deployment topology of TiDB cluster" ] } }, "Resources": { "VPC": { "Type": "ALIYUN::ECS::VPC", "Properties": { "CidrBlock": "10.0.0.0/16", "Tags": [ { "Key": "Application", "Value": { "Ref": "ALIYUN::StackId" } } ] } }, "VSwitch": { "Type": "ALIYUN::ECS::VSwitch", "Properties": { "VpcId": { "Ref": "VPC" }, "ZoneId": { "Fn::Select": [ "1", { "Fn::GetAZs": { "Ref": "ALIYUN::Region" } } ] }, "CidrBlock": "10.0.1.0/24", "Tags": [ { "Key": "Application", "Value": { "Ref": "ALIYUN::StackId" } } ] } }, "InstanceSecurityGroup": { "Type": "ALIYUN::ECS::SecurityGroup", "Properties": { "VpcId": { "Ref": "VPC" }, "SecurityGroupIngress": [ { "IpProtocol": "tcp", "PortRange": "1/65535", "SourceCidrIp": "0.0.0.0/0" } ] } }, "TiPDServerEip": { "Type": "ALIYUN::VPC::EIP", "Properties": { "InternetChargeType": "PayByTraffic", "Bandwidth": 5 } }, "EipAssociationTiPDServer": { "Type": "ALIYUN::VPC::EIPAssociation", "Properties": { "InstanceId": { "Fn::Jq": [ "First", ".DASHBOARD", { "Fn::GetAtt": [ "InvocationWaitCondition", "Data" ] } ] }, "AllocationId": { "Fn::GetAtt": [ "TiPDServerEip", "AllocationId" ] } } }, "ControlServerEip": { "Type": "ALIYUN::VPC::EIP", "Properties": { "InternetChargeType": "PayByTraffic", "Bandwidth": 5 } }, "EipAssociationControlServer": { "Type": "ALIYUN::VPC::EIPAssociation", "Properties": { "InstanceId": { "Fn::Select": [ "0", { "Fn::GetAtt": [ "ControlServer", "InstanceIds" ] } ] }, "AllocationId": { "Fn::GetAtt": [ "ControlServerEip", "AllocationId" ] } } }, "WaitCondition": { "Type": "ALIYUN::ROS::WaitCondition", "Properties": { "Count": 1, "Handle": { "Ref": "WaitConditionHandle" }, "Timeout": 1800 } }, "WaitConditionHandle": { "Type": "ALIYUN::ROS::WaitConditionHandle" }, "InvocationWaitCondition": { "Type": "ALIYUN::ROS::WaitCondition", "DependsOn": "Command", "Properties": { "Count": 3, "Handle": { "Ref": "InvocationWaitConditionHandle" }, "Timeout": 1800 } }, "InvocationWaitConditionHandle": { "Type": "ALIYUN::ROS::WaitConditionHandle" }, "ControlServer": { "Type": "ALIYUN::ECS::InstanceGroup", "Properties": { "InstanceName": "ControlServer", "ImageId": "centos_7.9", "VpcId": { "Ref": "VPC" }, "SecurityGroupId": { "Ref": "InstanceSecurityGroup" }, "VSwitchId": { "Ref": "VSwitch" }, "InstanceType": { "Fn::FindInMap": [ "InstanceProfile", "ControlServer", "InstanceType" ] }, "Password": { "Ref": "InstancePassword" }, "MaxAmount": 1, "AllocatePublicIP": false, "SystemDiskSize": { "Ref": "SystemDiskSize" }, "InstanceChargeType": "PostPaid", "SystemDiskCategory": { "Ref": "Category" }, "UserData": { "Fn::Replace": [ { "ros-notify": { "Fn::GetAtt": [ "WaitConditionHandle", "CurlCli" ] } }, { "Fn::Join": [ "", [ "#!/bin/sh\n", "ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa\n", "cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys\n", "chmod 0600 ~/.ssh/authorized_keys\n", "pub_key=`cat /root/.ssh/id_rsa.pub`\n", "echo '# -*- coding: utf-8 -*-' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'import fcntl' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'import os' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'scale_out_path = r\"/tmp/compute-nest-templates-scale-out.json\"' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'class Lock:' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' def __init__(self, filename):' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' self.filename = filename' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' self.handle = open(filename, \"w\")' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' def acquire(self):' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' fcntl.flock(self.handle, fcntl.LOCK_EX)' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' def release(self):' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' fcntl.flock(self.handle, fcntl.LOCK_UN)' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' def __del__(self):' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' self.handle.close()' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'with open(\"/tmp/params.txt\", \"r\") as f:' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' content = f.read()' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'server_type, host = content.split(\",\")' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'host = host.strip()' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'scale_out_dict = {}' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'if server_type == \"tidb_servers\":' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' scale_out_dict = {server_type: [{\"port\": 4000,\"ssh_port\": 22,\"status_port\": 10080,\"host\": host,\"deploy_dir\": \"/data/deploy/install/deploy/tidb-4000\",\"log_dir\": \"/data/deploy/install/log/tidb-4000\",}]}' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'elif server_type == \"tikv_servers\":' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' scale_out_dict = {server_type: [{\"port\": 20160,\"ssh_port\": 22,\"host\": host,\"status_port\": 20180,\"deploy_dir\": \"/data/deploy/install/deploy/tikv-20160\",\"data_dir\": \"/data/deploy/install/data/tikv-20160\",\"log_dir\": \"/data/deploy/install/log/tikv-20160\",}]}' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'elif server_type == \"pd_servers\":' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' scale_out_dict = {server_type: [{\"client_port\": 2379,\"ssh_port\": 22,\"host\": host,\"peer_port\": 2380,\"deploy_dir\": \"/data/deploy/install/deploy/pd-2379\",\"data_dir\": \"/data/deploy/install/data/pd-2379\",\"log_dir\": \"/data/deploy/install/log/pd-2379\",}]}' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'lock = Lock(scale_out_path)' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'try:' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' lock.acquire()' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' with open(scale_out_path, \"w\") as f:' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' import json' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' json.dump(scale_out_dict, f)' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' os.system(' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' \"echo yes | /root/.tiup/bin/tiup cluster scale-out tidb-test %s --user \"' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' \"root -i /root/.ssh/id_rsa\" % scale_out_path)' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo 'finally:' >> '/tmp/compute-nest-templates-scale-out.py'\n", "echo ' lock.release()' >> '/tmp/compute-nest-templates-scale-out.py'\n", "ros-notify -d \"{\\\"status\\\" : \\\"SUCCESS\\\",\\\"id\\\" : \\\"ssh_pub_key\\\", \\\"data\\\" : \\\"$pub_key\\\"}\" \n" ] ] } ] }, "Tags": [ { "Key": "Application", "Value": { "Ref": "ALIYUN::StackId" } }, { "Key": "Name", "Value": "ControlServer" } ] } }, "TiDBServerScalingGroup": { "Type": "ALIYUN::ESS::ScalingGroup", "Properties": { "MinSize": 3, "DefaultCooldown": 300, "VSwitchId": { "Ref": "VSwitch" }, "RemovalPolicys": [ "NewestInstance" ], "MaxSize": 4, "ScalingGroupName": { "Fn::Join": [ "-", [ "TiDBServer", { "Ref": "ALIYUN::StackId" } ] ] } } }, "TiDBServerScalingConfiguration": { "Type": "ALIYUN::ESS::ScalingConfiguration", "Properties": { "ScalingConfigurationName": { "Fn::Join": [ "-", [ "TiDBServe", { "Ref": "ALIYUN::StackId" } ] ] }, "InstanceType": { "Fn::FindInMap": [ "InstanceProfile", "TiDBServer", "InstanceType" ] }, "SystemDiskCategory": { "Ref": "Category" }, "SystemDiskSize": { "Ref": "SystemDiskSize" }, "ScalingGroupId": { "Ref": "TiDBServerScalingGroup" }, "SecurityGroupId": { "Ref": "InstanceSecurityGroup" }, "ImageId": "centos_7_9_x64_20G_alibase_20211227.vhd", "UserData": { "Fn::Join": [ "", [ "#!/bin/sh\n", "ssh_pub_key='", { "Fn::GetAtt": [ "WaitCondition", "Data" ] }, "'\n", "yum install -y jq\n", "pub_key=`echo \"$ssh_pub_key\" | jq '.ssh_pub_key' | xargs echo `\n", "echo \"$pub_key\" > /root/.ssh/authorized_keys\n", "chmod 600 /root/.ssh/authorized_keys\n", "yum install -y numactl\n", "mount -t ext4 -o remount,nodelalloc,noatime / \n", "echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login\n", "echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf\n", "echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf\n", "sysctl -p\n", "echo '* soft stack 10240' >> /etc/security/limits.conf\n", "echo '* hard nofile 1000000' >> /etc/security/limits.conf\n", "echo '* soft nofile 1000000' >> /etc/security/limits.conf\n", "ulimit -SHn 1000000\n", "echo 1200000 > /proc/sys/fs/file-max\n", "echo never > /sys/kernel/mm/transparent_hugepage/enabled\n", "echo never > /sys/kernel/mm/transparent_hugepage/defrag\n", "service irqbalance start\n" ] ] }, "TagList": [ { "Key": "Application", "Value": { "Ref": "ALIYUN::StackId" } }, { "Key": "Name", "Value": "TiDBServer" } ] } }, "TiDBServerScalingRule": { "Type": "ALIYUN::ESS::ScalingRule", "Properties": { "ScalingRuleName": { "Fn::Join": [ "-", [ "TiDBServer", { "Ref": "ALIYUN::StackId" } ] ] }, "AdjustmentValue": 4, "ScalingGroupId": { "Ref": "TiDBServerScalingGroup" }, "AdjustmentType": "TotalCapacity" } }, "TiDBServerScalingGroupEnable": { "DependsOn": [ "TiDBServerScalingConfiguration", "TiDBServerScalingRule", "TiDBServerScalingGroup" ], "Type": "ALIYUN::ESS::ScalingGroupEnable", "Properties": { "ScalingRuleArisExecuteVersion": "1", "ScalingGroupId": { "Ref": "TiDBServerScalingGroup" }, "ScalingConfigurationId": { "Ref": "TiDBServerScalingConfiguration" }, "InstanceIds": { "Fn::GetAtt": [ "TiDBServer", "InstanceIds" ] } } }, "RamRole": { "Type": "ALIYUN::RAM::Role", "Properties": { "RoleName": { "Fn::Join": [ "", [ "StackId-", { "Ref": "ALIYUN::StackId" } ] ] }, "AssumeRolePolicyDocument": { "Version": "1", "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": [ "oos.aliyuncs.com" ] } } ] }, "Policies": [ { "PolicyName": { "Fn::Join": [ "", [ "StackId-", { "Ref": "ALIYUN::StackId" } ] ] }, "PolicyDocument": { "Version": "1", "Statement": [ { "Action": [ "ecs:*" ], "Resource": [ "*" ], "Effect": "Allow" }, { "Action": [ "vpc:DescribeVpcs", "vpc:DescribeVSwitches" ], "Resource": [ "*" ], "Effect": "Allow" }, { "Action": [ "ess:CompleteLifecycleAction" ], "Resource": [ "*" ], "Effect": "Allow" } ] } } ] } }, "ScaleOutTemplate": { "Type": "ALIYUN::OOS::Template", "DependsOn": [ "TiDBServerScalingGroupEnable" ], "Properties": { "Content": { "Fn::Join": [ "", [ "{\"FormatVersion\": \"OOS-2019-06-01\", \"Parameters\": { \"commandInstanceId\": { \"Type\": \"String\", \"Default\": \"", { "Fn::Select": [ "0", { "Fn::GetAtt": [ "ControlServer", "InstanceIds" ] } ] }, "\" }, \"regionId\": { \"Type\": \"String\", \"Default\": \"", { "Ref": "ALIYUN::Region" }, "\" }, \"instanceIds\": { \"Type\": \"List\", \"Default\": [ \"${instanceIds}\" ] }, \"lifecycleHookId\": { \"Type\": \"String\", \"Default\": \"${lifecycleHookId}\" }, \"ServerType\": { \"Type\": \"String\" }, \"lifecycleActionToken\": { \"Type\": \"String\", \"Default\": \"${lifecycleActionToken}\" }, \"rateControl\": { \"Type\": \"Json\", \"AssociationProperty\": \"RateControl\", \"Default\": { \"Mode\": \"Concurrency\", \"MaxErrors\": 0, \"Concurrency\": 1 } }, \"OOSAssumeRole\": { \"Type\": \"String\", \"Default\": \"OOSServiceRole\" } }, \"RamRole\": \"{{ OOSAssumeRole }}\", \"Tasks\": [ { \"Name\": \"runCommandOnSpecifiedInstance\", \"Action\": \"ACS::ESS::RunCommandOnSpecifiedInstance\", \"OnError\": \"CompleteLifecycleActionForAbandon\", \"OnSuccess\": \"CompleteLifecycleActionForContinue\", \"Properties\": { \"regionId\": \"{{ regionId }}\", \"commandContent\": \"", "#!/bin/bash\\nsleep 60\\necho \\\"{{ ServerType }},$COMMAND_IP\\\">/tmp/params.txt\\nsudo python /tmp/compute-nest-templates-scale-out.py", "\", \"commandInstanceId\": \"{{ commandInstanceId }}\", \"instanceId\": \"{{ ACS::TaskLoopItem }}\" }, \"Loop\": { \"RateControl\": \"{{ rateControl }}\", \"Items\": \"{{ instanceIds }}\", \"Outputs\": { \"commandOutputs\": { \"AggregateType\": \"Fn::ListJoin\", \"AggregateField\": \"commandOutput\" } } }, \"Outputs\": { \"commandOutput\": { \"Type\": \"String\", \"ValueSelector\": \"invocationOutput\" } } }, { \"Name\": \"CompleteLifecycleActionForContinue\", \"Action\": \"ACS::ExecuteAPI\", \"Description\": { \"en\": \"Modify lifecycle action for continue\", \"zh-cn\": \"修改伸缩活动的等待状态为继续完成\" }, \"OnSuccess\": \"ACS::END\", \"Properties\": { \"Service\": \"ESS\", \"API\": \"CompleteLifecycleAction\", \"Parameters\": { \"RegionId\": \"{{ regionId }}\", \"LifecycleHookId\": \"{{ lifecycleHookId }}\", \"LifecycleActionToken\": \"{{ lifecycleActionToken }}\", \"LifecycleActionResult\": \"CONTINUE\" } } }, { \"Name\": \"CompleteLifecycleActionForAbandon\", \"Action\": \"ACS::ExecuteAPI\", \"Description\": { \"en\": \"Complete lifecycle action for Abandon\", \"zh-cn\": \"修改伸缩活动的等待状态为弃用\" }, \"Properties\": { \"Service\": \"ESS\", \"API\": \"CompleteLifecycleAction\", \"Parameters\": { \"RegionId\": \"{{ regionId }}\", \"LifecycleHookId\": \"{{ lifecycleHookId }}\", \"LifecycleActionToken\": \"{{ lifecycleActionToken }}\", \"LifecycleActionResult\": \"ABANDON\" } } } ]}" ] ] }, "TemplateName": { "Fn::Join": [ "", [ "ScaleOut-", { "Ref": "ALIYUN::StackId" } ] ] } } }, "ScaleInTemplate": { "Type": "ALIYUN::OOS::Template", "DependsOn": [ "TiDBServerScalingGroupEnable" ], "Properties": { "Content": { "Fn::Join": [ "", [ "{\"FormatVersion\": \"OOS-2019-06-01\", \"Parameters\": { \"commandInstanceId\": { \"Type\": \"String\", \"Default\": \"", { "Fn::Select": [ "0", { "Fn::GetAtt": [ "ControlServer", "InstanceIds" ] } ] }, "\" }, \"regionId\": { \"Type\": \"String\", \"Default\": \"", { "Ref": "ALIYUN::Region" }, "\" }, \"instanceIds\": { \"Type\": \"List\", \"Default\": [ \"${instanceIds}\" ] }, \"lifecycleHookId\": { \"Type\": \"String\", \"Default\": \"${lifecycleHookId}\" }, \"ServerType\": { \"Type\": \"String\" }, \"lifecycleActionToken\": { \"Type\": \"String\", \"Default\": \"${lifecycleActionToken}\" }, \"rateControl\": { \"Type\": \"Json\", \"AssociationProperty\": \"RateControl\", \"Default\": { \"Mode\": \"Concurrency\", \"MaxErrors\": 0, \"Concurrency\": 1 } }, \"OOSAssumeRole\": { \"Type\": \"String\", \"Default\": \"OOSServiceRole\" } }, \"RamRole\": \"{{ OOSAssumeRole }}\", \"Tasks\": [ { \"Name\": \"runCommandOnSpecifiedInstance\", \"Action\": \"ACS::ESS::RunCommandOnSpecifiedInstance\", \"OnError\": \"CompleteLifecycleActionForAbandon\", \"OnSuccess\": \"CompleteLifecycleActionForContinue\", \"Properties\": { \"regionId\": \"{{ regionId }}\", \"commandContent\": \"", "#!/bin/bash\\nsleep 60\\nip_address=`/root/.tiup/bin/tiup cluster display tidb-test | grep $COMMAND_IP | awk -F ' ' '{print $1}'`\\n/root/.tiup/bin/tiup cluster scale-in tidb-test -y --node $ip_address --force", "\", \"commandInstanceId\": \"{{ commandInstanceId }}\", \"instanceId\": \"{{ ACS::TaskLoopItem }}\" }, \"Loop\": { \"RateControl\": \"{{ rateControl }}\", \"Items\": \"{{ instanceIds }}\", \"Outputs\": { \"commandOutputs\": { \"AggregateType\": \"Fn::ListJoin\", \"AggregateField\": \"commandOutput\" } } }, \"Outputs\": { \"commandOutput\": { \"Type\": \"String\", \"ValueSelector\": \"invocationOutput\" } } }, { \"Name\": \"CompleteLifecycleActionForContinue\", \"Action\": \"ACS::ExecuteAPI\", \"Description\": { \"en\": \"Modify lifecycle action for continue\", \"zh-cn\": \"修改伸缩活动的等待状态为继续完成\" }, \"OnSuccess\": \"ACS::END\", \"Properties\": { \"Service\": \"ESS\", \"API\": \"CompleteLifecycleAction\", \"Parameters\": { \"RegionId\": \"{{ regionId }}\", \"LifecycleHookId\": \"{{ lifecycleHookId }}\", \"LifecycleActionToken\": \"{{ lifecycleActionToken }}\", \"LifecycleActionResult\": \"CONTINUE\" } } }, { \"Name\": \"CompleteLifecycleActionForAbandon\", \"Action\": \"ACS::ExecuteAPI\", \"Description\": { \"en\": \"Complete lifecycle action for Abandon\", \"zh-cn\": \"修改伸缩活动的等待状态为弃用\" }, \"Properties\": { \"Service\": \"ESS\", \"API\": \"CompleteLifecycleAction\", \"Parameters\": { \"RegionId\": \"{{ regionId }}\", \"LifecycleHookId\": \"{{ lifecycleHookId }}\", \"LifecycleActionToken\": \"{{ lifecycleActionToken }}\", \"LifecycleActionResult\": \"ABANDON\" } } } ]}" ] ] }, "TemplateName": { "Fn::Join": [ "", [ "ScaleIn-", { "Ref": "ALIYUN::StackId" } ] ] } } }, "TiDBServerScaleOutLifecycleHook": { "Type": "ALIYUN::ESS::LifecycleHook", "DependsOn": [ "TiDBServerScalingGroupEnable", "ScaleOutTemplate" ], "Properties": { "ScalingGroupId": { "Ref": "TiDBServerScalingGroup" }, "LifecycleTransition": "SCALE_OUT", "DefaultResult": "CONTINUE", "HeartbeatTimeout": 600, "NotificationArn": { "Fn::Join": [ "", [ "acs:ess:", { "Ref": "ALIYUN::Region" }, ":", { "Ref": "ALIYUN::TenantId" }, ":oos/", { "Fn::Join": [ "", [ "ScaleOut-", { "Ref": "ALIYUN::StackId" } ] ] } ] ] }, "NotificationMetadata": { "Fn::Join": [ "", [ "{\"regionId\": \"${regionId}\",\"instanceIds\": \"${instanceIds}\",\"lifecycleHookId\": \"${lifecycleHookId}\",\"lifecycleActionToken\": \"${lifecycleActionToken}\",\"ServerType\": \"tidb_servers\"}" ] ] } } }, "TiDBServerScaleInLifecycleHook": { "Type": "ALIYUN::ESS::LifecycleHook", "DependsOn": [ "TiDBServerScalingGroupEnable", "ScaleInTemplate" ], "Properties": { "ScalingGroupId": { "Ref": "TiDBServerScalingGroup" }, "LifecycleTransition": "SCALE_IN", "DefaultResult": "CONTINUE", "HeartbeatTimeout": 600, "NotificationArn": { "Fn::Join": [ "", [ "acs:ess:", { "Ref": "ALIYUN::Region" }, ":", { "Ref": "ALIYUN::TenantId" }, ":oos/", { "Fn::Join": [ "", [ "ScaleIn-", { "Ref": "ALIYUN::StackId" } ] ] } ] ] }, "NotificationMetadata": { "Fn::Join": [ "", [ "{\"regionId\": \"${regionId}\",\"instanceIds\": \"${instanceIds}\",\"lifecycleHookId\": \"${lifecycleHookId}\",\"lifecycleActionToken\": \"${lifecycleActionToken}\",\"ServerType\": \"tidb_servers\"}" ] ] } } }, "TiDBServer": { "Type": "ALIYUN::ECS::InstanceGroup", "Properties": { "InstanceName": { "Fn::Sub": [ "TiDBServer-[1,${Count}]", { "Count": { "Ref": "TiDBServerCount" } } ] }, "ImageId": "centos_7.9", "InstanceType": { "Fn::FindInMap": [ "InstanceProfile", "TiDBServer", "InstanceType" ] }, "VpcId": { "Ref": "VPC" }, "VSwitchId": { "Ref": "VSwitch" }, "SecurityGroupId": { "Ref": "InstanceSecurityGroup" }, "AllocatePublicIP": false, "Password": { "Ref": "InstancePassword" }, "MaxAmount": { "Ref": "TiDBServerCount" }, "SystemDiskSize": { "Ref": "SystemDiskSize" }, "InstanceChargeType": "PostPaid", "SystemDiskCategory": { "Ref": "Category" }, "UserData": { "Fn::Join": [ "", [ "#!/bin/sh\n", "ssh_pub_key='", { "Fn::GetAtt": [ "WaitCondition", "Data" ] }, "'\n", "yum install -y jq\n", "pub_key=`echo \"$ssh_pub_key\" | jq '.ssh_pub_key' | xargs echo `\n", "echo \"$pub_key\" > /root/.ssh/authorized_keys\n", "chmod 600 /root/.ssh/authorized_keys\n", "yum install -y numactl\n", "mount -t ext4 -o remount,nodelalloc,noatime / \n", "echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login\n", "echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf\n", "echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf\n", "sysctl -p\n", "echo '* soft stack 10240' >> /etc/security/limits.conf\n", "echo '* hard nofile 1000000' >> /etc/security/limits.conf\n", "echo '* soft nofile 1000000' >> /etc/security/limits.conf\n", "ulimit -SHn 1000000\n", "echo 1200000 > /proc/sys/fs/file-max\n", "echo never > /sys/kernel/mm/transparent_hugepage/enabled\n", "echo never > /sys/kernel/mm/transparent_hugepage/defrag\n", "service irqbalance start\n" ] ] }, "Tags": [ { "Key": "Application", "Value": { "Ref": "ALIYUN::StackId" } }, { "Key": "Name", "Value": "TiDBServer" } ] } }, "TiPDServerScalingGroup": { "Type": "ALIYUN::ESS::ScalingGroup", "Properties": { "MinSize": 3, "DefaultCooldown": 300, "VSwitchId": { "Ref": "VSwitch" }, "RemovalPolicys": [ "NewestInstance" ], "MaxSize": 4, "ScalingGroupName": { "Fn::Join": [ "-", [ "TiPDServer", { "Ref": "ALIYUN::StackId" } ] ] } } }, "TiPDServerScalingConfiguration": { "Type": "ALIYUN::ESS::ScalingConfiguration", "Properties": { "ScalingConfigurationName": { "Fn::Join": [ "-", [ "TiPDServer", { "Ref": "ALIYUN::StackId" } ] ] }, "InstanceType": { "Fn::FindInMap": [ "InstanceProfile", "TiPDServer", "InstanceType" ] }, "SystemDiskCategory": { "Ref": "Category" }, "SystemDiskSize": { "Ref": "SystemDiskSize" }, "ScalingGroupId": { "Ref": "TiPDServerScalingGroup" }, "SecurityGroupId": { "Ref": "InstanceSecurityGroup" }, "ImageId": "centos_7_9_x64_20G_alibase_20211227.vhd", "UserData": { "Fn::Join": [ "", [ "#!/bin/sh\n", "ssh_pub_key='", { "Fn::GetAtt": [ "WaitCondition", "Data" ] }, "'\n", "yum install -y jq\n", "pub_key=`echo \"$ssh_pub_key\" | jq '.ssh_pub_key' | xargs echo `\n", "echo \"$pub_key\" > /root/.ssh/authorized_keys\n", "chmod 600 /root/.ssh/authorized_keys\n", "yum install -y numactl\n", "mount -t ext4 -o remount,nodelalloc,noatime / \n", "echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login\n", "echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf\n", "echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf\n", "sysctl -p\n", "echo '* soft stack 10240' >> /etc/security/limits.conf\n", "echo '* hard nofile 1000000' >> /etc/security/limits.conf\n", "echo '* soft nofile 1000000' >> /etc/security/limits.conf\n", "ulimit -SHn 1000000\n", "echo 1200000 > /proc/sys/fs/file-max\n", "echo never > /sys/kernel/mm/transparent_hugepage/enabled\n", "echo never > /sys/kernel/mm/transparent_hugepage/defrag\n", "service irqbalance start\n" ] ] }, "TagList": [ { "Key": "Application", "Value": { "Ref": "ALIYUN::StackId" } }, { "Key": "Name", "Value": "TiDBServer" } ] } }, "TiPDServerScalingRule": { "Type": "ALIYUN::ESS::ScalingRule", "Properties": { "ScalingRuleName": { "Fn::Join": [ "-", [ "TiPDServer", { "Ref": "ALIYUN::StackId" } ] ] }, "AdjustmentValue": 4, "ScalingGroupId": { "Ref": "TiPDServerScalingGroup" }, "AdjustmentType": "TotalCapacity" } }, "TiPDServerScalingGroupEnable": { "DependsOn": [ "TiPDServerScalingConfiguration", "TiPDServerScalingRule", "TiPDServerScalingGroup" ], "Type": "ALIYUN::ESS::ScalingGroupEnable", "Properties": { "ScalingRuleArisExecuteVersion": "1", "ScalingGroupId": { "Ref": "TiPDServerScalingGroup" }, "ScalingConfigurationId": { "Ref": "TiPDServerScalingConfiguration" }, "InstanceIds": { "Fn::GetAtt": [ "TiPDServer", "InstanceIds" ] } } }, "TiPDServerScaleInLifecycleHook": { "Type": "ALIYUN::ESS::LifecycleHook", "DependsOn": [ "TiPDServerScalingGroupEnable", "ScaleInTemplate" ], "Properties": { "ScalingGroupId": { "Ref": "TiPDServerScalingGroup" }, "LifecycleTransition": "SCALE_IN", "DefaultResult": "CONTINUE", "HeartbeatTimeout": 600, "NotificationArn": { "Fn::Join": [ "", [ "acs:ess:", { "Ref": "ALIYUN::Region" }, ":", { "Ref": "ALIYUN::TenantId" }, ":oos/", { "Fn::Join": [ "", [ "ScaleIn-", { "Ref": "ALIYUN::StackId" } ] ] } ] ] }, "NotificationMetadata": { "Fn::Join": [ "", [ "{\"regionId\": \"${regionId}\",\"instanceIds\": \"${instanceIds}\",\"lifecycleHookId\": \"${lifecycleHookId}\",\"lifecycleActionToken\": \"${lifecycleActionToken}\",\"ServerType\": \"tidb_servers\"}" ] ] } } }, "TiPDServerOutLifecycleHook": { "Type": "ALIYUN::ESS::LifecycleHook", "DependsOn": [ "TiPDServerScalingGroupEnable", "ScaleOutTemplate" ], "Properties": { "ScalingGroupId": { "Ref": "TiPDServerScalingGroup" }, "LifecycleTransition": "SCALE_OUT", "DefaultResult": "CONTINUE", "HeartbeatTimeout": 600, "NotificationArn": { "Fn::Join": [ "", [ "acs:ess:", { "Ref": "ALIYUN::Region" }, ":", { "Ref": "ALIYUN::TenantId" }, ":oos/", { "Fn::Join": [ "", [ "ScaleOut-", { "Ref": "ALIYUN::StackId" } ] ] } ] ] }, "NotificationMetadata": { "Fn::Join": [ "", [ "{\"regionId\": \"${regionId}\",\"instanceIds\": \"${instanceIds}\",\"lifecycleHookId\": \"${lifecycleHookId}\",\"lifecycleActionToken\": \"${lifecycleActionToken}\",\"ServerType\": \"pd_servers\"}" ] ] } } }, "TiPDServer": { "Type": "ALIYUN::ECS::InstanceGroup", "Properties": { "InstanceName": { "Fn::Sub": [ "TiPDServer-[1,${Count}]", { "Count": { "Ref": "TiPDServerCount" } } ] }, "ImageId": "centos_7.9", "InstanceType": { "Fn::FindInMap": [ "InstanceProfile", "TiPDServer", "InstanceType" ] }, "VpcId": { "Ref": "VPC" }, "VSwitchId": { "Ref": "VSwitch" }, "SecurityGroupId": { "Ref": "InstanceSecurityGroup" }, "AllocatePublicIP": false, "MaxAmount": { "Ref": "TiPDServerCount" }, "Password": { "Ref": "InstancePassword" }, "SystemDiskSize": { "Ref": "SystemDiskSize" }, "InstanceChargeType": "PostPaid", "SystemDiskCategory": { "Ref": "Category" }, "UserData": { "Fn::Join": [ "", [ "#!/bin/sh\n", "ssh_pub_key='", { "Fn::GetAtt": [ "WaitCondition", "Data" ] }, "'\n", "yum install -y jq\n", "pub_key=`echo \"$ssh_pub_key\" | jq '.ssh_pub_key' | xargs echo `\n", "echo \"$pub_key\" > /root/.ssh/authorized_keys\n", "chmod 600 /root/.ssh/authorized_keys\n", "yum install -y numactl\n", "mount -t ext4 -o remount,nodelalloc,noatime / \n", "echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login\n", "echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf\n", "echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf\n", "sysctl -p\n", "echo '* soft stack 10240' >> /etc/security/limits.conf\n", "echo '* hard nofile 1000000' >> /etc/security/limits.conf\n", "echo '* soft nofile 1000000' >> /etc/security/limits.conf\n", "ulimit -SHn 1000000\n", "echo 1200000 > /proc/sys/fs/file-max\n", "echo never > /sys/kernel/mm/transparent_hugepage/enabled\n", "echo never > /sys/kernel/mm/transparent_hugepage/defrag\n", "service irqbalance start\n" ] ] }, "Tags": [ { "Key": "Application", "Value": { "Ref": "ALIYUN::StackId" } }, { "Key": "Name", "Value": "TiPDServer" } ] }, "DependsOn": "WaitCondition" }, "Command": { "Type": "ALIYUN::ECS::Command", "DependsOn": [ "InvocationWaitConditionHandle" ], "Properties": { "Timeout": "1200", "CommandContent": { "Fn::Base64Encode": { "Fn::Replace": [ { "ros-notify": { "Fn::GetAtt": [ "InvocationWaitConditionHandle", "CurlCli" ] } }, { "Fn::Join": [ "", [ "#!/bin/sh \n", "sudo curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sudo sh - \n", "source /root/.bash_profile\n", "which tiup\n", "tiup cluster\n", "tiup update --self && tiup update cluster\n", "tiup --binary cluster\n", "yum install -y expect\n", "yum install -y numactl\n", "yum install -y mysql\n", "mount -t ext4 -o remount,nodelalloc,noatime / \n", "echo '* soft stack 10240' >> /etc/security/limits.conf\n", "echo '* hard nofile 1000000' >> /etc/security/limits.conf\n", "echo '* soft nofile 1000000' >> /etc/security/limits.conf\n", "ulimit -SHn 1000000\n", "echo 'global:' >> ./topology.yaml\n", "echo ' user: \"root\"' >> ./topology.yaml\n", "echo ' ssh_port: 22' >> ./topology.yaml\n", "echo ' deploy_dir: \"/tidb-deploy\"' >> ./topology.yaml\n", "echo ' data_dir: \"/data1\"' >> ./topology.yaml\n", "echo 'server_configs:", "\n pd:\n replication.enable-placement-rules: true", "' >> ./topology.yaml\n", "echo 'pd_servers:' >> ./topology.yaml\n", "echo ' - host: ", { "Fn::Join": [ "\n - host: ", { "Fn::GetAtt": [ "TiPDServer", "PrivateIps" ] } ] }, "' >> ./topology.yaml\n", "echo tidb_servers: >> ./topology.yaml\n", "echo ' - host: ", { "Fn::Join": [ "\n - host: ", { "Fn::GetAtt": [ "TiDBServer", "PrivateIps" ] } ] }, "' >> ./topology.yaml\n", "echo 'tikv_servers:' >> ./topology.yaml\n", "echo ' - host: ", { "Fn::Join": [ "\n - host: ", { "Fn::GetAtt": [ "TiKVServer", "PrivateIps" ] } ] }, "' >> ./topology.yaml\n", "echo 'monitoring_servers:' >> ./topology.yaml\n", "echo ' - host: ", { "Fn::Join": [ "\n - host: ", { "Fn::GetAtt": [ "ControlServer", "PrivateIps" ] } ] }, "' >> ./topology.yaml\n", "echo 'grafana_servers:' >> ./topology.yaml\n", "echo ' - host: ", { "Fn::Join": [ "\n - host: ", { "Fn::GetAtt": [ "ControlServer", "PrivateIps" ] } ] }, "' >> ./topology.yaml\n", "echo 'alertmanager_servers:' >> ./topology.yaml\n", "echo ' - host: ", { "Fn::Join": [ "\n - host: ", { "Fn::GetAtt": [ "ControlServer", "PrivateIps" ] } ] }, "' >> ./topology.yaml\n", "retry_time=5\n", "until ((retry_time > 60))\n", "do\n", " let retry_time+=5 \n", " sleep 5 \n", " /root/.tiup/bin/tiup cluster check ./topology.yaml --apply --user root -i /root/.ssh/id_rsa --ssh-timeout 120 >> ./topology_check.log\n", " FAIL_MESSAGE=`cat ./topology_check.log | grep 'Timeout'`\n", " if [[ $? -eq 1 ]]; \n", " then \n", " ros-notify -d \"{\\\"id\\\" : \\\"27c7347b-352a-4377-1\\\", \\\"Data\\\" : \\\"check Success\\\", \\\"status\\\" : \\\"SUCCESS\\\"}\" \n", " break \n", " else \n", " ros-notify -d \"{\\\"id\\\" : \\\"27c7347b-352a-4377-1\\\", \\\"Data\\\" : \\\"check Timeout: $FAIL_MESSAGE\\\", \\\"status\\\" : \\\"WARNING\\\"}\" \n", " fi \n", "done \n", "/root/.tiup/bin/tiup cluster check ./topology.yaml --apply --user root -i /root/.ssh/id_rsa --ssh-timeout 120 >> ./topology_retry_check.log\n", "FAIL_MESSAGE=`cat ./topology_retry_check.log | grep 'Timeout'`\n", "if [[ $? -eq 1 ]]; \n", "then \n", "ros-notify -d \"{\\\"id\\\" : \\\"27c7347b-352a-4377-2\\\", \\\"Data\\\" : \\\"check check Success\\\", \\\"status\\\" : \\\"SUCCESS\\\"}\" \n", "else \n", "ros-notify -d \"{\\\"id\\\" : \\\"27c7347b-352a-4377-2\\\", \\\"Data\\\" : \\\"check check failed: $FAIL_MESSAGE\\\", \\\"status\\\" : \\\"FAILURE\\\"}\" \n", "fi \n", "echo yes | /root/.tiup/bin/tiup cluster deploy tidb-test v5.2.0 ./topology.yaml --user root -i /root/.ssh/id_rsa\n", "/root/.tiup/bin/tiup cluster start tidb-test >> ./topology_start.log\n", "FAIL_MESSAGE=`cat ./topology_start.log | grep 'Fail'`\n", "if [[ $? -eq 1 ]]; \n", "then \n", "ros-notify -d \"{\\\"id\\\" : \\\"27c7347b-352a-4377-3\\\", \\\"Data\\\" : \\\"deploy Success\\\", \\\"status\\\" : \\\"SUCCESS\\\"}\" \n", "else \n", "ros-notify -d \"{\\\"Data\\\" : \\\"deploy failed\\\", \\\"status\\\" : \\\"FAILURE\\\"}\" \n", "fi \n", "DASHBOARD_IP=`tiup cluster display tidb-test --dashboard | awk -F '/' '{print $3}' | awk -F ':' '{print $1}'`\n", "DASHBOARD_ID=`echo yes | ssh -o StrictHostKeyChecking=no root@$DASHBOARD_IP \"curl 100.100.100.200/latest/meta-data/instance-id\" | awk '{print $1}'`\n", "ros-notify -d \"{\\\"status\\\" : \\\"SUCCESS\\\",\\\"id\\\" : \\\"DASHBOARD\\\", \\\"data\\\" : \\\"$DASHBOARD_ID\\\"}\" \n" ] ] } ] } }, "Type": "RunShellScript", "Name": { "Fn::Join": [ "-", [ "ControlServer", { "Ref": "ALIYUN::StackId" } ] ] } } }, "Invocation": { "Type": "ALIYUN::ECS::Invocation", "DependsOn": [ "TiKVServer", "TiPDServer", "TiDBServer", "TiKVServer" ], "Properties": { "CommandId": { "Ref": "Command" }, "InstanceIds": { "Fn::GetAtt": [ "ControlServer", "InstanceIds" ] } } }, "TiKVServerScalingGroup": { "Type": "ALIYUN::ESS::ScalingGroup", "Properties": { "MinSize": 3, "DefaultCooldown": 300, "VSwitchId": { "Ref": "VSwitch" }, "RemovalPolicys": [ "NewestInstance" ], "MaxSize": 4, "ScalingGroupName": { "Fn::Join": [ "-", [ "TiKVServer", { "Ref": "ALIYUN::StackId" } ] ] } } }, "TiKVServerScalingConfiguration": { "Type": "ALIYUN::ESS::ScalingConfiguration", "Properties": { "ScalingConfigurationName": { "Fn::Join": [ "-", [ "TiKVServer", { "Ref": "ALIYUN::StackId" } ] ] }, "InstanceType": { "Fn::FindInMap": [ "InstanceProfile", "TiKVServer", "InstanceType" ] }, "SystemDiskCategory": { "Ref": "Category" }, "SystemDiskSize": { "Ref": "SystemDiskSize" }, "ScalingGroupId": { "Ref": "TiKVServerScalingGroup" }, "SecurityGroupId": { "Ref": "InstanceSecurityGroup" }, "ImageId": "centos_7_9_x64_20G_alibase_20211227.vhd", "UserData": { "Fn::Join": [ "", [ "#!/bin/sh\n", "ssh_pub_key='", { "Fn::GetAtt": [ "WaitCondition", "Data" ] }, "'\n", "yum install -y jq\n", "pub_key=`echo \"$ssh_pub_key\" | jq '.ssh_pub_key' | xargs echo `\n", "echo \"$pub_key\" > /root/.ssh/authorized_keys\n", "chmod 600 /root/.ssh/authorized_keys\n", "yum install -y numactl\n", "mount -t ext4 -o remount,nodelalloc,noatime / \n", "echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login\n", "echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf\n", "echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf\n", "sysctl -p\n", "echo '* soft stack 10240' >> /etc/security/limits.conf\n", "echo '* hard nofile 1000000' >> /etc/security/limits.conf\n", "echo '* soft nofile 1000000' >> /etc/security/limits.conf\n", "ulimit -SHn 1000000\n", "echo 1200000 > /proc/sys/fs/file-max\n", "echo never > /sys/kernel/mm/transparent_hugepage/enabled\n", "echo never > /sys/kernel/mm/transparent_hugepage/defrag\n", "service irqbalance start\n" ] ] }, "TagList": [ { "Key": "Application", "Value": { "Ref": "ALIYUN::StackId" } }, { "Key": "Name", "Value": "TiDBServer" } ] } }, "TiKVServerScalingRule": { "Type": "ALIYUN::ESS::ScalingRule", "Properties": { "ScalingRuleName": { "Fn::Join": [ "-", [ "TiKVServer", { "Ref": "ALIYUN::StackId" } ] ] }, "AdjustmentValue": 4, "ScalingGroupId": { "Ref": "TiKVServerScalingGroup" }, "AdjustmentType": "TotalCapacity" } }, "TiKVServerScalingGroupEnable": { "DependsOn": [ "TiKVServerScalingConfiguration", "TiKVServerScalingRule", "TiKVServerScalingGroup" ], "Type": "ALIYUN::ESS::ScalingGroupEnable", "Properties": { "ScalingRuleArisExecuteVersion": "1", "ScalingGroupId": { "Ref": "TiKVServerScalingGroup" }, "ScalingConfigurationId": { "Ref": "TiKVServerScalingConfiguration" }, "InstanceIds": { "Fn::GetAtt": [ "TiKVServer", "InstanceIds" ] } } }, "TiKVServerOutLifecycleHook": { "Type": "ALIYUN::ESS::LifecycleHook", "DependsOn": [ "TiKVServerScalingGroupEnable", "ScaleOutTemplate" ], "Properties": { "ScalingGroupId": { "Ref": "TiKVServerScalingGroup" }, "LifecycleTransition": "SCALE_OUT", "DefaultResult": "CONTINUE", "HeartbeatTimeout": 600, "NotificationArn": { "Fn::Join": [ "", [ "acs:ess:", { "Ref": "ALIYUN::Region" }, ":", { "Ref": "ALIYUN::TenantId" }, ":oos/", { "Fn::Join": [ "", [ "ScaleOut-", { "Ref": "ALIYUN::StackId" } ] ] } ] ] }, "NotificationMetadata": { "Fn::Join": [ "", [ "{\"regionId\": \"${regionId}\",\"instanceIds\": \"${instanceIds}\",\"lifecycleHookId\": \"${lifecycleHookId}\",\"lifecycleActionToken\": \"${lifecycleActionToken}\",\"ServerType\": \"tikv_servers\"}" ] ] } } }, "TiKVServerScaleInLifecycleHook": { "Type": "ALIYUN::ESS::LifecycleHook", "DependsOn": [ "TiKVServerScalingGroupEnable", "ScaleInTemplate" ], "Properties": { "ScalingGroupId": { "Ref": "TiKVServerScalingGroup" }, "LifecycleTransition": "SCALE_IN", "DefaultResult": "CONTINUE", "HeartbeatTimeout": 600, "NotificationArn": { "Fn::Join": [ "", [ "acs:ess:", { "Ref": "ALIYUN::Region" }, ":", { "Ref": "ALIYUN::TenantId" }, ":oos/", { "Fn::Join": [ "", [ "ScaleIn-", { "Ref": "ALIYUN::StackId" } ] ] } ] ] }, "NotificationMetadata": { "Fn::Join": [ "", [ "{\"regionId\": \"${regionId}\",\"instanceIds\": \"${instanceIds}\",\"lifecycleHookId\": \"${lifecycleHookId}\",\"lifecycleActionToken\": \"${lifecycleActionToken}\",\"ServerType\": \"tikv_servers\"}" ] ] } } }, "TiKVServer": { "Type": "ALIYUN::ECS::InstanceGroup", "Properties": { "InstanceName": { "Fn::Sub": [ "TiKVServer-[1,${Count}]", { "Count": { "Ref": "TiKVServerCount" } } ] }, "ImageId": "centos_7.9", "InstanceType": { "Fn::FindInMap": [ "InstanceProfile", "TiKVServer", "InstanceType" ] }, "Password": { "Ref": "InstancePassword" }, "DiskMappings": [ { "Category": "cloud_essd", "Device": "/dev/xvdb", "Size": { "Ref": "DateDiskSize" } } ], "VpcId": { "Ref": "VPC" }, "VSwitchId": { "Ref": "VSwitch" }, "SecurityGroupId": { "Ref": "InstanceSecurityGroup" }, "AllocatePublicIP": false, "MaxAmount": { "Ref": "TiKVServerCount" }, "SystemDiskSize": { "Ref": "SystemDiskSize" }, "InstanceChargeType": "PostPaid", "SystemDiskCategory": { "Ref": "Category" }, "UserData": { "Fn::Join": [ "", [ "#!/bin/sh\n", "ssh_pub_key='", { "Fn::GetAtt": [ "WaitCondition", "Data" ] }, "'\n", "yum install -y jq\n", "pub_key=`echo \"$ssh_pub_key\" | jq '.ssh_pub_key' | xargs echo `\n", "echo \"$pub_key\" > /root/.ssh/authorized_keys\n", "chmod 600 /root/.ssh/authorized_keys\n", "yum install -y numactl\n", "echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login\n", "echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf\n", "echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf\n", "sysctl -p\n", "echo '* soft stack 10240' >> /etc/security/limits.conf\n", "echo '* hard nofile 1000000' >> /etc/security/limits.conf\n", "echo '* soft nofile 1000000' >> /etc/security/limits.conf\n", "ulimit -SHn 1000000\n", "echo 1200000 > /proc/sys/fs/file-max\n", "echo never > /sys/kernel/mm/transparent_hugepage/enabled\n", "echo never > /sys/kernel/mm/transparent_hugepage/defrag\n", "service irqbalance start\n", "mount -t ext4 -o remount,nodelalloc,noatime / \n", "parted -s -a optimal /dev/vdb mklabel gpt -- mkpart primary ext4 1 -1\n", "mkfs.ext4 /dev/vdb\n", "mkdir /data1 && mount /dev/vdb /data1\n", "echo /dev/vdb /data1 ext4 defaults,nodelalloc,noatime 0 2 >> /etc/fstab\n", "mount -t ext4 -o remount,nodelalloc,noatime /data1 \n", "mount -t ext4\n", "echo 'vm.swappiness = 0' >> /etc/sysctl.conf\n", "swapoff -a && swapon -a\n", "sysctl -p\n", "sudo systemctl stop firewalld.service\n", "sudo systemctl disable firewalld.service\n", "ID_SERIAL=`udevadm info --name=/dev/vdb | grep ID_SERIAL | awk -F '=' '{print $2}'`\n", "M_ACTIVE=`grubby --default-kernel`\n" ] ] }, "Tags": [ { "Key": "Application", "Value": { "Ref": "ALIYUN::StackId" } }, { "Key": "Name", "Value": "TiKVServer" } ] }, "DependsOn": [ "WaitCondition" ] } }, "Outputs": { "TiDB Dashboard:": { "Value": { "Fn::Sub": [ "http://${ServerAddress}:2379/dashboard", { "ServerAddress": { "Fn::GetAtt": [ "TiPDServerEip", "EipAddress" ] } } ] } }, "Grafana:": { "Value": { "Fn::Sub": [ "http://${ServerAddress}:3000", { "ServerAddress": { "Fn::GetAtt": [ "ControlServerEip", "EipAddress" ] } } ] } }, "ErrorData": { "Description": "JSON serialized dict containing data associated with wait condition error signals sent to the handle.", "Value": { "Fn::GetAtt": [ "InvocationWaitCondition", "ErrorData" ] } } } }
YAML
ROSTemplateFormatVersion: '2015-09-01' Description: zh-cn: TiDB 集群最小部署的拓扑架构。 en: The minimal deployment topology of TiDB cluster. Mappings: InstanceProfile: ControlServer: Description: Instance for monitoring, grafana, alertmanager. InstanceType: ecs.c6e.large TiDBServer: Description: Better to be compute optimized. InstanceType: ecs.c6e.large TiPDServer: Description: TiDB Placement Driver. InstanceType: ecs.c6e.large TiKVServer: Description: Better to be storage optimized. InstanceType: ecs.c6e.large Parameters: InstancePassword: NoEcho: true Type: String Description: en: Server login password, Length 8-30, must contain three(Capital letters, lowercase letters, numbers, ()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ Special symbol in). zh-cn: 服务器登录密码,长度8-30,必须包含三项(大写字母、小写字母、数字、 ()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ 中的特殊符号)。 AllowedPattern: '[0-9A-Za-z\_\-\&:;''<>,=%`~!@#\(\)\$\^\*\+\|\{\}\[\]\.\?\/]+$' Label: en: Instance Password zh-cn: 实例密码 ConstraintDescription: en: Length 8-30, must contain three(Capital letters, lowercase letters, numbers, ()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ Special symbol in). zh-cn: 长度8-30,必须包含三项(大写字母、小写字母、数字、 ()`~!@#$%^&*_-+=|{}[]:;'<>,.?/ 中的特殊符号)。 MinLength: 8 MaxLength: 30 TiDBServerCount: Type: Number Description: en: The number of TiDBServer. <br>Client connections can be evenly distributed among multiple TiDB instances to achieve load balancing. TiDB Server itself does not store data, but parses SQL, and forwards the actual data read request to the underlying storage node TiKV (or TiFlash). zh-cn: TiDBServer 数量。<br>客户端的连接可以均匀地分摊在多个 TiDB 实例上以达到负载均衡的效果。TiDB Server 本身并不存储数据,只是解析 SQL,将实际的数据读取请求转发给底层的存储节点 TiKV(或 TiFlash)。 MinValue: 3 Label: en: TiDB Server Count zh-cn: TiDB Server 数量 Default: 3 TiPDServerCount: Type: Number Description: en: The number of TiPDServer. <br>The meta-information management module of the entire TiDB cluster is responsible for storing the real-time data distribution of each TiKV node and the overall topology of the cluster, providing the TiDB Dashboard control interface, and assigning transaction IDs for distributed transactions. PD not only stores meta-information, but also issues data scheduling commands to specific TiKV nodes based on the data distribution status reported by TiKV nodes in real time. In addition, the PD itself is also composed of at least 3 nodes and has high-availability capabilities. It is recommended to deploy an odd number of PD nodes. <br>This test turns on 1 node. zh-cn: TiPDServer 数量。 <br>整个 TiDB 集群的元信息管理模块,负责存储每个 TiKV 节点实时的数据分布情况和集群的整体拓扑结构,提供 TiDB Dashboard 管控界面,并为分布式事务分配事务 ID。PD 不仅存储元信息,同时还会根据 TiKV 节点实时上报的数据分布状态,下发数据调度命令给具体的 TiKV 节点。此外,PD 本身也是由至少 3 个节点构成,拥有高可用的能力。建议部署奇数个 PD 节点。<br>本测试开启 1 个节点。 MinValue: 3 Label: en: TiPD Server Count zh-cn: TiPD Server 数量 Default: 3 TiKVServerCount: Type: Number Description: en: 'The number of TiKV Servers. <br>Storage node: Responsible for storing data. From the outside, TiKV is a distributed Key-Value storage engine that provides transactions. The basic unit of data storage is Region. Each Region is responsible for storing data in a Key Range (the left-closed and right-open interval from StartKey to EndKey). Each TiKV node is responsible for multiple Regions. TiKV''s API provides native support for distributed transactions at the KV key-value pair level, and provides the SI (Snapshot Isolation) isolation level by default, which is also the core of TiDB''s support for distributed transactions at the SQL level. After the SQL layer of TiDB finishes the SQL analysis, it will convert the SQL execution plan into the actual call to the TiKV API. Therefore, the data is stored in TiKV. In addition, the data in TiKV will automatically maintain multiple copies (three copies by default), which naturally supports high availability and automatic failover.' zh-cn: 'TiKV Server 数量。<br>存储节点: 负责存储数据,从外部看 TiKV 是一个分布式的提供事务的 Key-Value 存储引擎。存储数据的基本单位是 Region,每个 Region 负责存储一个 Key Range(从 StartKey 到 EndKey 的左闭右开区间)的数据,每个 TiKV 节点会负责多个 Region。TiKV 的 API 在 KV 键值对层面提供对分布式事务的原生支持,默认提供了 SI (Snapshot Isolation) 的隔离级别,这也是 TiDB 在 SQL 层面支持分布式事务的核心。TiDB 的 SQL 层做完 SQL 解析后,会将 SQL 的执行计划转换为对 TiKV API 的实际调用。所以,数据都存储在 TiKV 中。另外,TiKV 中的数据都会自动维护多副本(默认为三副本),天然支持高可用和自动故障转移。' MinValue: 3 Label: en: TiKV Server Count zh-cn: TiKV Server 数量 Default: 3 DateDiskSize: Default: 1000 Type: Number Description: zh-cn: TiKV 集群的数据容量,TiKV 硬盘大小配置建议 PCI-E SSD 不超过 2 TB,普通 SSD 不超过 1.5 TB。 单位:GB。 en: 'The data capacity of TiKV cluster, TiKV hard disk size configuration recommended PCI-E SSD not exceed 2 TB, ordinary SSD not exceed 1.5 TB. Unit: GB.' Label: zh-cn: TiKV 数据盘空间 en: TiKV Date Disk Space SystemDiskSize: Default: 40 Type: Number Description: zh-cn: 各个节点系统盘大小, 取值范围:[40, 500], 单位:GB。 en: 'System disk size of each node, range of values: 40-500, units: GB.' Label: zh-cn: 系统盘空间 en: System Disk Space Category: Type: String Description: en: '<font color=''blue''><b>Optional values:</b></font><br>[cloud_efficiency: <font color=''green''>Efficient Cloud Disk</font>]<br>[cloud_ssd: <font color=''green''>SSD Cloud Disk</font>]<br>[cloud_essd: <font color=''green''>ESSD Cloud Disk</font>]<br>[cloud: <font color=''green''>Cloud Disk</font>]<br>[ephemeral_ssd: <font color=''green''>Local SSD Cloud Disk</font>]' zh-cn: '<font color=''blue''><b>可选值:</b></font><br>[cloud_efficiency: <font color=''green''>高效云盘</font>]<br>[cloud_ssd: <font color=''green''>SSD云盘</font>]<br>[cloud_essd: <font color=''green''>ESSD云盘</font>]<br>[cloud: <font color=''green''>普通云盘</font>]<br>[ephemeral_ssd: <font color=''green''>本地SSD盘</font>]' AllowedValues: - cloud_efficiency - cloud_ssd - cloud - cloud_essd - ephemeral_ssd Label: en: System Disk Category zh-cn: 系统盘类型 Default: cloud_essd Metadata: ALIYUN::ROS::Interface: ParameterGroups: - Parameters: - TiDBServerCount - TiPDServerCount - TiKVServerCount Label: default: en: Topological information zh-cn: 拓扑信息 - Parameters: - DateDiskSize - SystemDiskSize - Category Label: default: en: Disk configuration zh-cn: 磁盘配置 - Parameters: - InstancePassword Label: default: en: ECS configuration zh-cn: ECS配置 TemplateTags: - The minimal deployment topology of TiDB cluster Resources: VPC: Type: ALIYUN::ECS::VPC Properties: CidrBlock: 10.0.0.0/16 Tags: - Key: Application Value: Ref: ALIYUN::StackId VSwitch: Type: ALIYUN::ECS::VSwitch Properties: VpcId: Ref: VPC ZoneId: Fn::Select: - '1' - Fn::GetAZs: Ref: ALIYUN::Region CidrBlock: 10.0.1.0/24 Tags: - Key: Application Value: Ref: ALIYUN::StackId InstanceSecurityGroup: Type: ALIYUN::ECS::SecurityGroup Properties: VpcId: Ref: VPC SecurityGroupIngress: - IpProtocol: tcp PortRange: 1/65535 SourceCidrIp: 0.0.0.0/0 TiPDServerEip: Type: ALIYUN::VPC::EIP Properties: InternetChargeType: PayByTraffic Bandwidth: 5 EipAssociationTiPDServer: Type: ALIYUN::VPC::EIPAssociation Properties: InstanceId: Fn::Jq: - First - .DASHBOARD - Fn::GetAtt: - InvocationWaitCondition - Data AllocationId: Fn::GetAtt: - TiPDServerEip - AllocationId ControlServerEip: Type: ALIYUN::VPC::EIP Properties: InternetChargeType: PayByTraffic Bandwidth: 5 EipAssociationControlServer: Type: ALIYUN::VPC::EIPAssociation Properties: InstanceId: Fn::Select: - '0' - Fn::GetAtt: - ControlServer - InstanceIds AllocationId: Fn::GetAtt: - ControlServerEip - AllocationId WaitCondition: Type: ALIYUN::ROS::WaitCondition Properties: Count: 1 Handle: Ref: WaitConditionHandle Timeout: 1800 WaitConditionHandle: Type: ALIYUN::ROS::WaitConditionHandle InvocationWaitCondition: Type: ALIYUN::ROS::WaitCondition DependsOn: Command Properties: Count: 3 Handle: Ref: InvocationWaitConditionHandle Timeout: 1800 InvocationWaitConditionHandle: Type: ALIYUN::ROS::WaitConditionHandle ControlServer: Type: ALIYUN::ECS::InstanceGroup Properties: InstanceName: ControlServer ImageId: centos_7.9 VpcId: Ref: VPC SecurityGroupId: Ref: InstanceSecurityGroup VSwitchId: Ref: VSwitch InstanceType: Fn::FindInMap: - InstanceProfile - ControlServer - InstanceType Password: Ref: InstancePassword MaxAmount: 1 AllocatePublicIP: false SystemDiskSize: Ref: SystemDiskSize InstanceChargeType: PostPaid SystemDiskCategory: Ref: Category UserData: Fn::Replace: - ros-notify: Fn::GetAtt: - WaitConditionHandle - CurlCli - Fn::Join: - '' - - | #!/bin/sh - | ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa - | cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys - | chmod 0600 ~/.ssh/authorized_keys - | pub_key=`cat /root/.ssh/id_rsa.pub` - | echo '# -*- coding: utf-8 -*-' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'import fcntl' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'import os' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'scale_out_path = r"/tmp/compute-nest-templates-scale-out.json"' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'class Lock:' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' def __init__(self, filename):' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' self.filename = filename' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' self.handle = open(filename, "w")' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' def acquire(self):' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' fcntl.flock(self.handle, fcntl.LOCK_EX)' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' def release(self):' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' fcntl.flock(self.handle, fcntl.LOCK_UN)' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' def __del__(self):' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' self.handle.close()' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'with open("/tmp/params.txt", "r") as f:' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' content = f.read()' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'server_type, host = content.split(",")' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'host = host.strip()' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'scale_out_dict = {}' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'if server_type == "tidb_servers":' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' scale_out_dict = {server_type: [{"port": 4000,"ssh_port": 22,"status_port": 10080,"host": host,"deploy_dir": "/data/deploy/install/deploy/tidb-4000","log_dir": "/data/deploy/install/log/tidb-4000",}]}' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'elif server_type == "tikv_servers":' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' scale_out_dict = {server_type: [{"port": 20160,"ssh_port": 22,"host": host,"status_port": 20180,"deploy_dir": "/data/deploy/install/deploy/tikv-20160","data_dir": "/data/deploy/install/data/tikv-20160","log_dir": "/data/deploy/install/log/tikv-20160",}]}' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'elif server_type == "pd_servers":' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' scale_out_dict = {server_type: [{"client_port": 2379,"ssh_port": 22,"host": host,"peer_port": 2380,"deploy_dir": "/data/deploy/install/deploy/pd-2379","data_dir": "/data/deploy/install/data/pd-2379","log_dir": "/data/deploy/install/log/pd-2379",}]}' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'lock = Lock(scale_out_path)' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'try:' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' lock.acquire()' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' with open(scale_out_path, "w") as f:' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' import json' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' json.dump(scale_out_dict, f)' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' os.system(' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' "echo yes | /root/.tiup/bin/tiup cluster scale-out tidb-test %s --user "' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' "root -i /root/.ssh/id_rsa" % scale_out_path)' >> '/tmp/compute-nest-templates-scale-out.py' - | echo 'finally:' >> '/tmp/compute-nest-templates-scale-out.py' - | echo ' lock.release()' >> '/tmp/compute-nest-templates-scale-out.py' - | ros-notify -d "{\"status\" : \"SUCCESS\",\"id\" : \"ssh_pub_key\", \"data\" : \"$pub_key\"}" Tags: - Key: Application Value: Ref: ALIYUN::StackId - Key: Name Value: ControlServer TiDBServerScalingGroup: Type: ALIYUN::ESS::ScalingGroup Properties: MinSize: 3 DefaultCooldown: 300 VSwitchId: Ref: VSwitch RemovalPolicys: - NewestInstance MaxSize: 4 ScalingGroupName: Fn::Join: - '-' - - TiDBServer - Ref: ALIYUN::StackId TiDBServerScalingConfiguration: Type: ALIYUN::ESS::ScalingConfiguration Properties: ScalingConfigurationName: Fn::Join: - '-' - - TiDBServe - Ref: ALIYUN::StackId InstanceType: Fn::FindInMap: - InstanceProfile - TiDBServer - InstanceType SystemDiskCategory: Ref: Category SystemDiskSize: Ref: SystemDiskSize ScalingGroupId: Ref: TiDBServerScalingGroup SecurityGroupId: Ref: InstanceSecurityGroup ImageId: centos_7_9_x64_20G_alibase_20211227.vhd UserData: Fn::Join: - '' - - | #!/bin/sh - ssh_pub_key=' - Fn::GetAtt: - WaitCondition - Data - | ' - | yum install -y jq - | pub_key=`echo "$ssh_pub_key" | jq '.ssh_pub_key' | xargs echo ` - | echo "$pub_key" > /root/.ssh/authorized_keys - | chmod 600 /root/.ssh/authorized_keys - | yum install -y numactl - | mount -t ext4 -o remount,nodelalloc,noatime / - | echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login - | echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf - | echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf - | sysctl -p - | echo '* soft stack 10240' >> /etc/security/limits.conf - | echo '* hard nofile 1000000' >> /etc/security/limits.conf - | echo '* soft nofile 1000000' >> /etc/security/limits.conf - | ulimit -SHn 1000000 - | echo 1200000 > /proc/sys/fs/file-max - | echo never > /sys/kernel/mm/transparent_hugepage/enabled - | echo never > /sys/kernel/mm/transparent_hugepage/defrag - | service irqbalance start TagList: - Key: Application Value: Ref: ALIYUN::StackId - Key: Name Value: TiDBServer TiDBServerScalingRule: Type: ALIYUN::ESS::ScalingRule Properties: ScalingRuleName: Fn::Join: - '-' - - TiDBServer - Ref: ALIYUN::StackId AdjustmentValue: 4 ScalingGroupId: Ref: TiDBServerScalingGroup AdjustmentType: TotalCapacity TiDBServerScalingGroupEnable: DependsOn: - TiDBServerScalingConfiguration - TiDBServerScalingRule - TiDBServerScalingGroup Type: ALIYUN::ESS::ScalingGroupEnable Properties: ScalingRuleArisExecuteVersion: '1' ScalingGroupId: Ref: TiDBServerScalingGroup ScalingConfigurationId: Ref: TiDBServerScalingConfiguration InstanceIds: Fn::GetAtt: - TiDBServer - InstanceIds RamRole: Type: ALIYUN::RAM::Role Properties: RoleName: Fn::Join: - '' - - StackId- - Ref: ALIYUN::StackId AssumeRolePolicyDocument: Version: '1' Statement: - Action: sts:AssumeRole Effect: Allow Principal: Service: - oos.aliyuncs.com Policies: - PolicyName: Fn::Join: - '' - - StackId- - Ref: ALIYUN::StackId PolicyDocument: Version: '1' Statement: - Action: - ecs:* Resource: - '*' Effect: Allow - Action: - vpc:DescribeVpcs - vpc:DescribeVSwitches Resource: - '*' Effect: Allow - Action: - ess:CompleteLifecycleAction Resource: - '*' Effect: Allow ScaleOutTemplate: Type: ALIYUN::OOS::Template DependsOn: - TiDBServerScalingGroupEnable Properties: Content: Fn::Join: - '' - - '{"FormatVersion": "OOS-2019-06-01", "Parameters": { "commandInstanceId": { "Type": "String", "Default": "' - Fn::Select: - '0' - Fn::GetAtt: - ControlServer - InstanceIds - '" }, "regionId": { "Type": "String", "Default": "' - Ref: ALIYUN::Region - '" }, "instanceIds": { "Type": "List", "Default": [ "${instanceIds}" ] }, "lifecycleHookId": { "Type": "String", "Default": "${lifecycleHookId}" }, "ServerType": { "Type": "String" }, "lifecycleActionToken": { "Type": "String", "Default": "${lifecycleActionToken}" }, "rateControl": { "Type": "Json", "AssociationProperty": "RateControl", "Default": { "Mode": "Concurrency", "MaxErrors": 0, "Concurrency": 1 } }, "OOSAssumeRole": { "Type": "String", "Default": "OOSServiceRole" } }, "RamRole": "{{ OOSAssumeRole }}", "Tasks": [ { "Name": "runCommandOnSpecifiedInstance", "Action": "ACS::ESS::RunCommandOnSpecifiedInstance", "OnError": "CompleteLifecycleActionForAbandon", "OnSuccess": "CompleteLifecycleActionForContinue", "Properties": { "regionId": "{{ regionId }}", "commandContent": "' - '#!/bin/bash\nsleep 60\necho \"{{ ServerType }},$COMMAND_IP\">/tmp/params.txt\nsudo python /tmp/compute-nest-templates-scale-out.py' - '", "commandInstanceId": "{{ commandInstanceId }}", "instanceId": "{{ ACS::TaskLoopItem }}" }, "Loop": { "RateControl": "{{ rateControl }}", "Items": "{{ instanceIds }}", "Outputs": { "commandOutputs": { "AggregateType": "Fn::ListJoin", "AggregateField": "commandOutput" } } }, "Outputs": { "commandOutput": { "Type": "String", "ValueSelector": "invocationOutput" } } }, { "Name": "CompleteLifecycleActionForContinue", "Action": "ACS::ExecuteAPI", "Description": { "en": "Modify lifecycle action for continue", "zh-cn": "修改伸缩活动的等待状态为继续完成" }, "OnSuccess": "ACS::END", "Properties": { "Service": "ESS", "API": "CompleteLifecycleAction", "Parameters": { "RegionId": "{{ regionId }}", "LifecycleHookId": "{{ lifecycleHookId }}", "LifecycleActionToken": "{{ lifecycleActionToken }}", "LifecycleActionResult": "CONTINUE" } } }, { "Name": "CompleteLifecycleActionForAbandon", "Action": "ACS::ExecuteAPI", "Description": { "en": "Complete lifecycle action for Abandon", "zh-cn": "修改伸缩活动的等待状态为弃用" }, "Properties": { "Service": "ESS", "API": "CompleteLifecycleAction", "Parameters": { "RegionId": "{{ regionId }}", "LifecycleHookId": "{{ lifecycleHookId }}", "LifecycleActionToken": "{{ lifecycleActionToken }}", "LifecycleActionResult": "ABANDON" } } } ]}' TemplateName: Fn::Join: - '' - - ScaleOut- - Ref: ALIYUN::StackId ScaleInTemplate: Type: ALIYUN::OOS::Template DependsOn: - TiDBServerScalingGroupEnable Properties: Content: Fn::Join: - '' - - '{"FormatVersion": "OOS-2019-06-01", "Parameters": { "commandInstanceId": { "Type": "String", "Default": "' - Fn::Select: - '0' - Fn::GetAtt: - ControlServer - InstanceIds - '" }, "regionId": { "Type": "String", "Default": "' - Ref: ALIYUN::Region - '" }, "instanceIds": { "Type": "List", "Default": [ "${instanceIds}" ] }, "lifecycleHookId": { "Type": "String", "Default": "${lifecycleHookId}" }, "ServerType": { "Type": "String" }, "lifecycleActionToken": { "Type": "String", "Default": "${lifecycleActionToken}" }, "rateControl": { "Type": "Json", "AssociationProperty": "RateControl", "Default": { "Mode": "Concurrency", "MaxErrors": 0, "Concurrency": 1 } }, "OOSAssumeRole": { "Type": "String", "Default": "OOSServiceRole" } }, "RamRole": "{{ OOSAssumeRole }}", "Tasks": [ { "Name": "runCommandOnSpecifiedInstance", "Action": "ACS::ESS::RunCommandOnSpecifiedInstance", "OnError": "CompleteLifecycleActionForAbandon", "OnSuccess": "CompleteLifecycleActionForContinue", "Properties": { "regionId": "{{ regionId }}", "commandContent": "' - '#!/bin/bash\nsleep 60\nip_address=`/root/.tiup/bin/tiup cluster display tidb-test | grep $COMMAND_IP | awk -F '' '' ''{print $1}''`\n/root/.tiup/bin/tiup cluster scale-in tidb-test -y --node $ip_address --force' - '", "commandInstanceId": "{{ commandInstanceId }}", "instanceId": "{{ ACS::TaskLoopItem }}" }, "Loop": { "RateControl": "{{ rateControl }}", "Items": "{{ instanceIds }}", "Outputs": { "commandOutputs": { "AggregateType": "Fn::ListJoin", "AggregateField": "commandOutput" } } }, "Outputs": { "commandOutput": { "Type": "String", "ValueSelector": "invocationOutput" } } }, { "Name": "CompleteLifecycleActionForContinue", "Action": "ACS::ExecuteAPI", "Description": { "en": "Modify lifecycle action for continue", "zh-cn": "修改伸缩活动的等待状态为继续完成" }, "OnSuccess": "ACS::END", "Properties": { "Service": "ESS", "API": "CompleteLifecycleAction", "Parameters": { "RegionId": "{{ regionId }}", "LifecycleHookId": "{{ lifecycleHookId }}", "LifecycleActionToken": "{{ lifecycleActionToken }}", "LifecycleActionResult": "CONTINUE" } } }, { "Name": "CompleteLifecycleActionForAbandon", "Action": "ACS::ExecuteAPI", "Description": { "en": "Complete lifecycle action for Abandon", "zh-cn": "修改伸缩活动的等待状态为弃用" }, "Properties": { "Service": "ESS", "API": "CompleteLifecycleAction", "Parameters": { "RegionId": "{{ regionId }}", "LifecycleHookId": "{{ lifecycleHookId }}", "LifecycleActionToken": "{{ lifecycleActionToken }}", "LifecycleActionResult": "ABANDON" } } } ]}' TemplateName: Fn::Join: - '' - - ScaleIn- - Ref: ALIYUN::StackId TiDBServerScaleOutLifecycleHook: Type: ALIYUN::ESS::LifecycleHook DependsOn: - TiDBServerScalingGroupEnable - ScaleOutTemplate Properties: ScalingGroupId: Ref: TiDBServerScalingGroup LifecycleTransition: SCALE_OUT DefaultResult: CONTINUE HeartbeatTimeout: 600 NotificationArn: Fn::Join: - '' - - 'acs:ess:' - Ref: ALIYUN::Region - ':' - Ref: ALIYUN::TenantId - ':oos/' - Fn::Join: - '' - - ScaleOut- - Ref: ALIYUN::StackId NotificationMetadata: Fn::Join: - '' - - '{"regionId": "${regionId}","instanceIds": "${instanceIds}","lifecycleHookId": "${lifecycleHookId}","lifecycleActionToken": "${lifecycleActionToken}","ServerType": "tidb_servers"}' TiDBServerScaleInLifecycleHook: Type: ALIYUN::ESS::LifecycleHook DependsOn: - TiDBServerScalingGroupEnable - ScaleInTemplate Properties: ScalingGroupId: Ref: TiDBServerScalingGroup LifecycleTransition: SCALE_IN DefaultResult: CONTINUE HeartbeatTimeout: 600 NotificationArn: Fn::Join: - '' - - 'acs:ess:' - Ref: ALIYUN::Region - ':' - Ref: ALIYUN::TenantId - ':oos/' - Fn::Join: - '' - - ScaleIn- - Ref: ALIYUN::StackId NotificationMetadata: Fn::Join: - '' - - '{"regionId": "${regionId}","instanceIds": "${instanceIds}","lifecycleHookId": "${lifecycleHookId}","lifecycleActionToken": "${lifecycleActionToken}","ServerType": "tidb_servers"}' TiDBServer: Type: ALIYUN::ECS::InstanceGroup Properties: InstanceName: Fn::Sub: - TiDBServer-[1,${Count}] - Count: Ref: TiDBServerCount ImageId: centos_7.9 InstanceType: Fn::FindInMap: - InstanceProfile - TiDBServer - InstanceType VpcId: Ref: VPC VSwitchId: Ref: VSwitch SecurityGroupId: Ref: InstanceSecurityGroup AllocatePublicIP: false Password: Ref: InstancePassword MaxAmount: Ref: TiDBServerCount SystemDiskSize: Ref: SystemDiskSize InstanceChargeType: PostPaid SystemDiskCategory: Ref: Category UserData: Fn::Join: - '' - - | #!/bin/sh - ssh_pub_key=' - Fn::GetAtt: - WaitCondition - Data - | ' - | yum install -y jq - | pub_key=`echo "$ssh_pub_key" | jq '.ssh_pub_key' | xargs echo ` - | echo "$pub_key" > /root/.ssh/authorized_keys - | chmod 600 /root/.ssh/authorized_keys - | yum install -y numactl - | mount -t ext4 -o remount,nodelalloc,noatime / - | echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login - | echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf - | echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf - | sysctl -p - | echo '* soft stack 10240' >> /etc/security/limits.conf - | echo '* hard nofile 1000000' >> /etc/security/limits.conf - | echo '* soft nofile 1000000' >> /etc/security/limits.conf - | ulimit -SHn 1000000 - | echo 1200000 > /proc/sys/fs/file-max - | echo never > /sys/kernel/mm/transparent_hugepage/enabled - | echo never > /sys/kernel/mm/transparent_hugepage/defrag - | service irqbalance start Tags: - Key: Application Value: Ref: ALIYUN::StackId - Key: Name Value: TiDBServer TiPDServerScalingGroup: Type: ALIYUN::ESS::ScalingGroup Properties: MinSize: 3 DefaultCooldown: 300 VSwitchId: Ref: VSwitch RemovalPolicys: - NewestInstance MaxSize: 4 ScalingGroupName: Fn::Join: - '-' - - TiPDServer - Ref: ALIYUN::StackId TiPDServerScalingConfiguration: Type: ALIYUN::ESS::ScalingConfiguration Properties: ScalingConfigurationName: Fn::Join: - '-' - - TiPDServer - Ref: ALIYUN::StackId InstanceType: Fn::FindInMap: - InstanceProfile - TiPDServer - InstanceType SystemDiskCategory: Ref: Category SystemDiskSize: Ref: SystemDiskSize ScalingGroupId: Ref: TiPDServerScalingGroup SecurityGroupId: Ref: InstanceSecurityGroup ImageId: centos_7_9_x64_20G_alibase_20211227.vhd UserData: Fn::Join: - '' - - | #!/bin/sh - ssh_pub_key=' - Fn::GetAtt: - WaitCondition - Data - | ' - | yum install -y jq - | pub_key=`echo "$ssh_pub_key" | jq '.ssh_pub_key' | xargs echo ` - | echo "$pub_key" > /root/.ssh/authorized_keys - | chmod 600 /root/.ssh/authorized_keys - | yum install -y numactl - | mount -t ext4 -o remount,nodelalloc,noatime / - | echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login - | echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf - | echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf - | sysctl -p - | echo '* soft stack 10240' >> /etc/security/limits.conf - | echo '* hard nofile 1000000' >> /etc/security/limits.conf - | echo '* soft nofile 1000000' >> /etc/security/limits.conf - | ulimit -SHn 1000000 - | echo 1200000 > /proc/sys/fs/file-max - | echo never > /sys/kernel/mm/transparent_hugepage/enabled - | echo never > /sys/kernel/mm/transparent_hugepage/defrag - | service irqbalance start TagList: - Key: Application Value: Ref: ALIYUN::StackId - Key: Name Value: TiDBServer TiPDServerScalingRule: Type: ALIYUN::ESS::ScalingRule Properties: ScalingRuleName: Fn::Join: - '-' - - TiPDServer - Ref: ALIYUN::StackId AdjustmentValue: 4 ScalingGroupId: Ref: TiPDServerScalingGroup AdjustmentType: TotalCapacity TiPDServerScalingGroupEnable: DependsOn: - TiPDServerScalingConfiguration - TiPDServerScalingRule - TiPDServerScalingGroup Type: ALIYUN::ESS::ScalingGroupEnable Properties: ScalingRuleArisExecuteVersion: '1' ScalingGroupId: Ref: TiPDServerScalingGroup ScalingConfigurationId: Ref: TiPDServerScalingConfiguration InstanceIds: Fn::GetAtt: - TiPDServer - InstanceIds TiPDServerScaleInLifecycleHook: Type: ALIYUN::ESS::LifecycleHook DependsOn: - TiPDServerScalingGroupEnable - ScaleInTemplate Properties: ScalingGroupId: Ref: TiPDServerScalingGroup LifecycleTransition: SCALE_IN DefaultResult: CONTINUE HeartbeatTimeout: 600 NotificationArn: Fn::Join: - '' - - 'acs:ess:' - Ref: ALIYUN::Region - ':' - Ref: ALIYUN::TenantId - ':oos/' - Fn::Join: - '' - - ScaleIn- - Ref: ALIYUN::StackId NotificationMetadata: Fn::Join: - '' - - '{"regionId": "${regionId}","instanceIds": "${instanceIds}","lifecycleHookId": "${lifecycleHookId}","lifecycleActionToken": "${lifecycleActionToken}","ServerType": "tidb_servers"}' TiPDServerOutLifecycleHook: Type: ALIYUN::ESS::LifecycleHook DependsOn: - TiPDServerScalingGroupEnable - ScaleOutTemplate Properties: ScalingGroupId: Ref: TiPDServerScalingGroup LifecycleTransition: SCALE_OUT DefaultResult: CONTINUE HeartbeatTimeout: 600 NotificationArn: Fn::Join: - '' - - 'acs:ess:' - Ref: ALIYUN::Region - ':' - Ref: ALIYUN::TenantId - ':oos/' - Fn::Join: - '' - - ScaleOut- - Ref: ALIYUN::StackId NotificationMetadata: Fn::Join: - '' - - '{"regionId": "${regionId}","instanceIds": "${instanceIds}","lifecycleHookId": "${lifecycleHookId}","lifecycleActionToken": "${lifecycleActionToken}","ServerType": "pd_servers"}' TiPDServer: Type: ALIYUN::ECS::InstanceGroup Properties: InstanceName: Fn::Sub: - TiPDServer-[1,${Count}] - Count: Ref: TiPDServerCount ImageId: centos_7.9 InstanceType: Fn::FindInMap: - InstanceProfile - TiPDServer - InstanceType VpcId: Ref: VPC VSwitchId: Ref: VSwitch SecurityGroupId: Ref: InstanceSecurityGroup AllocatePublicIP: false MaxAmount: Ref: TiPDServerCount Password: Ref: InstancePassword SystemDiskSize: Ref: SystemDiskSize InstanceChargeType: PostPaid SystemDiskCategory: Ref: Category UserData: Fn::Join: - '' - - | #!/bin/sh - ssh_pub_key=' - Fn::GetAtt: - WaitCondition - Data - | ' - | yum install -y jq - | pub_key=`echo "$ssh_pub_key" | jq '.ssh_pub_key' | xargs echo ` - | echo "$pub_key" > /root/.ssh/authorized_keys - | chmod 600 /root/.ssh/authorized_keys - | yum install -y numactl - | mount -t ext4 -o remount,nodelalloc,noatime / - | echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login - | echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf - | echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf - | sysctl -p - | echo '* soft stack 10240' >> /etc/security/limits.conf - | echo '* hard nofile 1000000' >> /etc/security/limits.conf - | echo '* soft nofile 1000000' >> /etc/security/limits.conf - | ulimit -SHn 1000000 - | echo 1200000 > /proc/sys/fs/file-max - | echo never > /sys/kernel/mm/transparent_hugepage/enabled - | echo never > /sys/kernel/mm/transparent_hugepage/defrag - | service irqbalance start Tags: - Key: Application Value: Ref: ALIYUN::StackId - Key: Name Value: TiPDServer DependsOn: WaitCondition Command: Type: ALIYUN::ECS::Command DependsOn: - InvocationWaitConditionHandle Properties: Timeout: '1200' CommandContent: Fn::Base64Encode: Fn::Replace: - ros-notify: Fn::GetAtt: - InvocationWaitConditionHandle - CurlCli - Fn::Join: - '' - - | #!/bin/sh - | sudo curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sudo sh - - | source /root/.bash_profile - | which tiup - | tiup cluster - | tiup update --self && tiup update cluster - | tiup --binary cluster - | yum install -y expect - | yum install -y numactl - | yum install -y mysql - | mount -t ext4 -o remount,nodelalloc,noatime / - | echo '* soft stack 10240' >> /etc/security/limits.conf - | echo '* hard nofile 1000000' >> /etc/security/limits.conf - | echo '* soft nofile 1000000' >> /etc/security/limits.conf - | ulimit -SHn 1000000 - | echo 'global:' >> ./topology.yaml - | echo ' user: "root"' >> ./topology.yaml - | echo ' ssh_port: 22' >> ./topology.yaml - | echo ' deploy_dir: "/tidb-deploy"' >> ./topology.yaml - | echo ' data_dir: "/data1"' >> ./topology.yaml - 'echo ''server_configs:' - |2- pd: replication.enable-placement-rules: true - | ' >> ./topology.yaml - | echo 'pd_servers:' >> ./topology.yaml - 'echo '' - host: ' - Fn::Join: - |2- - host: - Fn::GetAtt: - TiPDServer - PrivateIps - | ' >> ./topology.yaml - | echo tidb_servers: >> ./topology.yaml - 'echo '' - host: ' - Fn::Join: - |2- - host: - Fn::GetAtt: - TiDBServer - PrivateIps - | ' >> ./topology.yaml - | echo 'tikv_servers:' >> ./topology.yaml - 'echo '' - host: ' - Fn::Join: - |2- - host: - Fn::GetAtt: - TiKVServer - PrivateIps - | ' >> ./topology.yaml - | echo 'monitoring_servers:' >> ./topology.yaml - 'echo '' - host: ' - Fn::Join: - |2- - host: - Fn::GetAtt: - ControlServer - PrivateIps - | ' >> ./topology.yaml - | echo 'grafana_servers:' >> ./topology.yaml - 'echo '' - host: ' - Fn::Join: - |2- - host: - Fn::GetAtt: - ControlServer - PrivateIps - | ' >> ./topology.yaml - | echo 'alertmanager_servers:' >> ./topology.yaml - 'echo '' - host: ' - Fn::Join: - |2- - host: - Fn::GetAtt: - ControlServer - PrivateIps - | ' >> ./topology.yaml - | retry_time=5 - | until ((retry_time > 60)) - | do - |2 let retry_time+=5 - |2 sleep 5 - |2 /root/.tiup/bin/tiup cluster check ./topology.yaml --apply --user root -i /root/.ssh/id_rsa --ssh-timeout 120 >> ./topology_check.log - |2 FAIL_MESSAGE=`cat ./topology_check.log | grep 'Timeout'` - |2 if [[ $? -eq 1 ]]; - |2 then - |2 ros-notify -d "{\"id\" : \"27c7347b-352a-4377-1\", \"Data\" : \"check Success\", \"status\" : \"SUCCESS\"}" - |2 break - |2 else - |2 ros-notify -d "{\"id\" : \"27c7347b-352a-4377-1\", \"Data\" : \"check Timeout: $FAIL_MESSAGE\", \"status\" : \"WARNING\"}" - |2 fi - | done - | /root/.tiup/bin/tiup cluster check ./topology.yaml --apply --user root -i /root/.ssh/id_rsa --ssh-timeout 120 >> ./topology_retry_check.log - | FAIL_MESSAGE=`cat ./topology_retry_check.log | grep 'Timeout'` - | if [[ $? -eq 1 ]]; - | then - | ros-notify -d "{\"id\" : \"27c7347b-352a-4377-2\", \"Data\" : \"check check Success\", \"status\" : \"SUCCESS\"}" - | else - | ros-notify -d "{\"id\" : \"27c7347b-352a-4377-2\", \"Data\" : \"check check failed: $FAIL_MESSAGE\", \"status\" : \"FAILURE\"}" - | fi - | echo yes | /root/.tiup/bin/tiup cluster deploy tidb-test v5.2.0 ./topology.yaml --user root -i /root/.ssh/id_rsa - | /root/.tiup/bin/tiup cluster start tidb-test >> ./topology_start.log - | FAIL_MESSAGE=`cat ./topology_start.log | grep 'Fail'` - | if [[ $? -eq 1 ]]; - | then - | ros-notify -d "{\"id\" : \"27c7347b-352a-4377-3\", \"Data\" : \"deploy Success\", \"status\" : \"SUCCESS\"}" - | else - | ros-notify -d "{\"Data\" : \"deploy failed\", \"status\" : \"FAILURE\"}" - | fi - | DASHBOARD_IP=`tiup cluster display tidb-test --dashboard | awk -F '/' '{print $3}' | awk -F ':' '{print $1}'` - | DASHBOARD_ID=`echo yes | ssh -o StrictHostKeyChecking=no root@$DASHBOARD_IP "curl 100.100.100.200/latest/meta-data/instance-id" | awk '{print $1}'` - | ros-notify -d "{\"status\" : \"SUCCESS\",\"id\" : \"DASHBOARD\", \"data\" : \"$DASHBOARD_ID\"}" Type: RunShellScript Name: Fn::Join: - '-' - - ControlServer - Ref: ALIYUN::StackId Invocation: Type: ALIYUN::ECS::Invocation DependsOn: - TiKVServer - TiPDServer - TiDBServer - TiKVServer Properties: CommandId: Ref: Command InstanceIds: Fn::GetAtt: - ControlServer - InstanceIds TiKVServerScalingGroup: Type: ALIYUN::ESS::ScalingGroup Properties: MinSize: 3 DefaultCooldown: 300 VSwitchId: Ref: VSwitch RemovalPolicys: - NewestInstance MaxSize: 4 ScalingGroupName: Fn::Join: - '-' - - TiKVServer - Ref: ALIYUN::StackId TiKVServerScalingConfiguration: Type: ALIYUN::ESS::ScalingConfiguration Properties: ScalingConfigurationName: Fn::Join: - '-' - - TiKVServer - Ref: ALIYUN::StackId InstanceType: Fn::FindInMap: - InstanceProfile - TiKVServer - InstanceType SystemDiskCategory: Ref: Category SystemDiskSize: Ref: SystemDiskSize ScalingGroupId: Ref: TiKVServerScalingGroup SecurityGroupId: Ref: InstanceSecurityGroup ImageId: centos_7_9_x64_20G_alibase_20211227.vhd UserData: Fn::Join: - '' - - | #!/bin/sh - ssh_pub_key=' - Fn::GetAtt: - WaitCondition - Data - | ' - | yum install -y jq - | pub_key=`echo "$ssh_pub_key" | jq '.ssh_pub_key' | xargs echo ` - | echo "$pub_key" > /root/.ssh/authorized_keys - | chmod 600 /root/.ssh/authorized_keys - | yum install -y numactl - | mount -t ext4 -o remount,nodelalloc,noatime / - | echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login - | echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf - | echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf - | sysctl -p - | echo '* soft stack 10240' >> /etc/security/limits.conf - | echo '* hard nofile 1000000' >> /etc/security/limits.conf - | echo '* soft nofile 1000000' >> /etc/security/limits.conf - | ulimit -SHn 1000000 - | echo 1200000 > /proc/sys/fs/file-max - | echo never > /sys/kernel/mm/transparent_hugepage/enabled - | echo never > /sys/kernel/mm/transparent_hugepage/defrag - | service irqbalance start TagList: - Key: Application Value: Ref: ALIYUN::StackId - Key: Name Value: TiDBServer TiKVServerScalingRule: Type: ALIYUN::ESS::ScalingRule Properties: ScalingRuleName: Fn::Join: - '-' - - TiKVServer - Ref: ALIYUN::StackId AdjustmentValue: 4 ScalingGroupId: Ref: TiKVServerScalingGroup AdjustmentType: TotalCapacity TiKVServerScalingGroupEnable: DependsOn: - TiKVServerScalingConfiguration - TiKVServerScalingRule - TiKVServerScalingGroup Type: ALIYUN::ESS::ScalingGroupEnable Properties: ScalingRuleArisExecuteVersion: '1' ScalingGroupId: Ref: TiKVServerScalingGroup ScalingConfigurationId: Ref: TiKVServerScalingConfiguration InstanceIds: Fn::GetAtt: - TiKVServer - InstanceIds TiKVServerOutLifecycleHook: Type: ALIYUN::ESS::LifecycleHook DependsOn: - TiKVServerScalingGroupEnable - ScaleOutTemplate Properties: ScalingGroupId: Ref: TiKVServerScalingGroup LifecycleTransition: SCALE_OUT DefaultResult: CONTINUE HeartbeatTimeout: 600 NotificationArn: Fn::Join: - '' - - 'acs:ess:' - Ref: ALIYUN::Region - ':' - Ref: ALIYUN::TenantId - ':oos/' - Fn::Join: - '' - - ScaleOut- - Ref: ALIYUN::StackId NotificationMetadata: Fn::Join: - '' - - '{"regionId": "${regionId}","instanceIds": "${instanceIds}","lifecycleHookId": "${lifecycleHookId}","lifecycleActionToken": "${lifecycleActionToken}","ServerType": "tikv_servers"}' TiKVServerScaleInLifecycleHook: Type: ALIYUN::ESS::LifecycleHook DependsOn: - TiKVServerScalingGroupEnable - ScaleInTemplate Properties: ScalingGroupId: Ref: TiKVServerScalingGroup LifecycleTransition: SCALE_IN DefaultResult: CONTINUE HeartbeatTimeout: 600 NotificationArn: Fn::Join: - '' - - 'acs:ess:' - Ref: ALIYUN::Region - ':' - Ref: ALIYUN::TenantId - ':oos/' - Fn::Join: - '' - - ScaleIn- - Ref: ALIYUN::StackId NotificationMetadata: Fn::Join: - '' - - '{"regionId": "${regionId}","instanceIds": "${instanceIds}","lifecycleHookId": "${lifecycleHookId}","lifecycleActionToken": "${lifecycleActionToken}","ServerType": "tikv_servers"}' TiKVServer: Type: ALIYUN::ECS::InstanceGroup Properties: InstanceName: Fn::Sub: - TiKVServer-[1,${Count}] - Count: Ref: TiKVServerCount ImageId: centos_7.9 InstanceType: Fn::FindInMap: - InstanceProfile - TiKVServer - InstanceType Password: Ref: InstancePassword DiskMappings: - Category: cloud_essd Device: /dev/xvdb Size: Ref: DateDiskSize VpcId: Ref: VPC VSwitchId: Ref: VSwitch SecurityGroupId: Ref: InstanceSecurityGroup AllocatePublicIP: false MaxAmount: Ref: TiKVServerCount SystemDiskSize: Ref: SystemDiskSize InstanceChargeType: PostPaid SystemDiskCategory: Ref: Category UserData: Fn::Join: - '' - - | #!/bin/sh - ssh_pub_key=' - Fn::GetAtt: - WaitCondition - Data - | ' - | yum install -y jq - | pub_key=`echo "$ssh_pub_key" | jq '.ssh_pub_key' | xargs echo ` - | echo "$pub_key" > /root/.ssh/authorized_keys - | chmod 600 /root/.ssh/authorized_keys - | yum install -y numactl - | echo 'session required /lib/security/pam_limits.so' >> /etc/pam.d/login - | echo 'net.core.somaxconn= 40000' >> /etc/sysctl.conf - | echo 'net.ipv4.tcp_syncookies = 0' >> /etc/sysctl.conf - | sysctl -p - | echo '* soft stack 10240' >> /etc/security/limits.conf - | echo '* hard nofile 1000000' >> /etc/security/limits.conf - | echo '* soft nofile 1000000' >> /etc/security/limits.conf - | ulimit -SHn 1000000 - | echo 1200000 > /proc/sys/fs/file-max - | echo never > /sys/kernel/mm/transparent_hugepage/enabled - | echo never > /sys/kernel/mm/transparent_hugepage/defrag - | service irqbalance start - | mount -t ext4 -o remount,nodelalloc,noatime / - | parted -s -a optimal /dev/vdb mklabel gpt -- mkpart primary ext4 1 -1 - | mkfs.ext4 /dev/vdb - | mkdir /data1 && mount /dev/vdb /data1 - | echo /dev/vdb /data1 ext4 defaults,nodelalloc,noatime 0 2 >> /etc/fstab - | mount -t ext4 -o remount,nodelalloc,noatime /data1 - | mount -t ext4 - | echo 'vm.swappiness = 0' >> /etc/sysctl.conf - | swapoff -a && swapon -a - | sysctl -p - | sudo systemctl stop firewalld.service - | sudo systemctl disable firewalld.service - | ID_SERIAL=`udevadm info --name=/dev/vdb | grep ID_SERIAL | awk -F '=' '{print $2}'` - | M_ACTIVE=`grubby --default-kernel` Tags: - Key: Application Value: Ref: ALIYUN::StackId - Key: Name Value: TiKVServer DependsOn: - WaitCondition Outputs: 'TiDB Dashboard:': Value: Fn::Sub: - http://${ServerAddress}:2379/dashboard - ServerAddress: Fn::GetAtt: - TiPDServerEip - EipAddress 'Grafana:': Value: Fn::Sub: - http://${ServerAddress}:3000 - ServerAddress: Fn::GetAtt: - ControlServerEip - EipAddress ErrorData: Description: JSON serialized dict containing data associated with wait condition error signals sent to the handle. Value: Fn::GetAtt: - InvocationWaitCondition - ErrorData
运行上述ROS脚本后,可实现的功能如下所示:
说明本示例中的ROS脚本仅供示例,具体请您根据实际需求输入脚本内容。
可创建3个伸缩组且每个伸缩组包含3个ECS实例。
可创建用于监控和运维TiDB集群的跳板机器(即1台ECS实例)。
每个伸缩组可创建1个伸缩规则。
每个伸缩组可创建1个缩容类型的生命周期挂钩,用于TiDB集群缩容时,ECS实例从该集群中移除。
每个伸缩组可创建1个扩容类型的生命周期挂钩,用于TiDB集群扩容时,ECS实例加入到该集群中。
在配置参数阶段,设置资源栈名称和ECS配置区域中的实例密码,然后单击创建。
说明请您根据控制台提示输入模板相关参数信息。其中,实例密码为弹性伸缩扩容后云服务器ECS的登录密码。
资源栈创建完成后,在资源栈信息页签下,您可以查看资源栈名称、资源栈ID和状态等基本信息。
步骤二:登录TiDB Dashboard页面并查看TiDB集群
在资源栈列表中,找到已创建好的资源栈,单击资源栈ID。
在资源栈详情页,单击输出页签。
找到TiDB Dashboard:参数,单击对应值列下的URL。
非首次操作:该URL直接跳转至TiDB Dashboard页面。
首次操作:该URL先跳转至SQL 用户登录页面,您无需输入密码,单击登录,然后跳转至TiDB Dashboard页面。
在TiDB Dashboard页面的左侧导航栏,单击集群信息。
在实例页签下,您可以查看到集群中三个组件(角色)所包含伸缩组实例的初始化信息。
步骤三:查看TiDB集群归属的伸缩组
登录弹性伸缩控制台。
在左侧导航栏中,单击伸缩组管理。
在顶部菜单栏处,选择地域为华北5(呼和浩特)。
在伸缩组管理页面的伸缩组列表中,查看已成功创建的TiDB集群归属的伸缩组。
TiDB集群归属的3个伸缩组分别以TiKVServer、TiPDServer、TiDBServer开头命名,如下图所示。
找到TiDB集群中任一角色归属的伸缩组,在该伸缩组对应的伸缩组名称/ID列下,单击伸缩组ID。
单击实例列表页签下的手动创建。
在手动创建的实例列表中,您可以查看已成功创建的伸缩组中的ECS实例信息。
说明您也可以登录ECS管理控制台,在左侧导航栏,选择 后,选择地域为华北5(呼和浩特),查看已成功创建的伸缩组中的ECS实例信息。
单击伸缩规则与报警任务页签,在伸缩规则页签下,查看已成功创建的伸缩规则。
单击生命周期挂钩页签,查看已成功创建的生命周期挂钩。
步骤四:对TiDB集群中的角色进行扩缩容操作
本步骤以对TiDB集群中TiDB Server角色进行扩缩容为例,TiDB集群中其他角色的扩缩容方法与其相同。
在左侧导航栏中,单击伸缩组管理。
在伸缩组列表中,找到TiDB Server角色归属的伸缩组,单击对应伸缩组名称/ID列下的伸缩组ID。
单击伸缩规则与报警任务页签。
在伸缩规则页签下,找到已成功创建的伸缩规则,单击对应操作列的修改。
关于如何修改伸缩规则,具体操作,请参见修改伸缩规则。例如:
扩容操作:例如,您需要将伸缩组实例数量由3台增加至4台,在修改伸缩规则页面的执行的操作参数后,选择增加1台。
缩容操作:例如,您需要将伸缩组实例数量由4台减少至3台,在修改伸缩规则页面的执行的操作参数后,选择减少1台。
伸缩规则修改完成后,单击确认。
单击伸缩规则对应操作列的执行。
单击伸缩活动页签,查看伸缩活动详情。
查看伸缩活动结果的具体说明如下:
扩容成功时,在伸缩活动列表下,您可以查看到变化后的总实例数量为4台。
缩容成功时,在伸缩活动列表下,您可以查看到变化后的总实例数量为3台。
说明如果扩缩容失败,请您在伸缩活动详情页查看失败原因并修改相关配置,然后再次执行伸缩规则。
再次登录并刷新TiDB Dashboard页面,查看TiDB集群的TiDB Server角色中是否新增或者移出ECS实例。
具体操作,请参见步骤二:登录TiDB Dashboard页面并查看TiDB集群。
如果伸缩组扩容成功时,您可以同时在TiDB集群的TiDB Server角色中查看到实例的变化情况,即多增加1台实例且显示在线运行。
如果伸缩组缩容成功时,您可以在TiDB集群的TiDB Server角色中查看到实例的变化情况,即减少1台实例且显示无法访问。