添加已有的ECS实例到Kubernetes集群的常见问题-阿里云帮助中心

备案控制台

输入文档关键字查找

概述

本文主要介绍添加已有的ECS实例到Kubernetes集群的常见问题。

详细信息

阿里云提醒您：

如果您对实例或数据有修改、变更等风险操作，务必注意实例的容灾、容错能力，确保数据安全。

如果您对实例（包括但不限于ECS、RDS）等进行配置与数据修改，建议提前创建快照或开启RDS日志备份等功能。

如果您在阿里云平台授权或者提交过登录账号、密码等安全信息，建议您及时修改。

前置条件

若出现节点添加失败的问题，请确认是否满足以下条件：

Kubernetes集群需要处于运行中状态。
添加的云服务器ECS需要与Kubernetes集群在同一地域同一VPC中。
添加的云服务器ECS没有添加在其他Kubernetes集群中。
添加的云服务器ECS处于运行中状态。
添加已有云服务器ECS时，请确保您的云服务器ECS有EIP，或者相应vSwitch已经配置了SNAT出网规则。总之，需要确保相应节点能正常访问公网。
仅支持添加操作系统为CentOS的节点。
每次最多添加100个节点。
Kubernetes集群节点配额充足。
网络插件为Flannel的话，需要确认VPC路由条目配额充足。
若使用RAM用户配置，RAM用户需要有Kubernetes节点权限。
推荐使用ACK默认的系统镜像，若需要使用自定义镜像，请参见使用自定义镜像创建ACK集群。

流程说明

下图是通过API添加集群节点的流程图。

排查步骤

登录Kubernetes控制台，单击所需的集群右侧的查看日志，通过集群日志信息定位错误位置，如果存在“Node XXX joined cluster successfully”信息，则说明节点添加成功。
如果存在“Wait k8s node XXX join cluster timeout”信息，则说明节点执行部署脚本报错，需要登录到该ECS节点，通过部署日志定位和确定问题。获取部署日志的命令以下所示。
```
cat /var/log/messages | grep cloud-init
```
通过智能运维检测集群状态和系统配置，详情请参见通过集群检查定位集群问题。
您还可以通过收集诊断日志，排查错误信息，详情请参见如何收集Kubernetes集群的诊断信息。

添加节点的常见问题

报错信息	解决方法
Code: ForbiddenAttachInstance, Message: Forbidden attach instance	子账号没有Kubernetes集群运维权限，如何给RAM用户授权请参见Kubernetes集群访问控制授权概述。
Code: ErrorNoAttachEcsInstance, Message: ecs instances invalid	没有符合条件的ECS实例，请根据前置条件，调整节点配置。
Throttling Message: Request was denied due to request throttling.	底层API限流，请稍后重试。
Code: 404 Code: InvalidImageId.NotFound Message: The specified ImageId does not exist	自定义镜像ID不存在，请确认自定义镜像ID是否正确。
Code: IncorrectInstanceStatus Message: The specified instance is in an incorrect status for the requested action	ECS实例状态校验不通过，ECS实例需要是运行中的状态。
Code: OperationDenied.UnpaidOrder Message: The specified instance has unpaid order.	存在未支付账单，请先处理未支付账单。
error on the server ("Get https://XXXX:XX/api/v1/namespaces/kube-system/services/kube-dns: net/http: request canceled while waiting for connection	确保kube-dns服务可用，且执行`kubectl -n kube-system get svc`命令，确保Service正常。
OperationDenied Message: The specified image contains the snapshot of the data disk,does not support this operation.	自定义镜像中包含了数据盘，需要将数据盘从ECS实例解绑后，再生成自定义镜像。
Failed to config security group: wait for ecs instance join to security group i-xx running timeout	ECS实例加入集群默认安全组失败，需要您手动添加ECS实例的安全组。
Failed to start instance i-xx: Aliyun API Error: RequestId: 909DA063-0BAE-4C40-844C-01FDAA502F80 Status Code: 403 Code: IncorrectInstanceStatus Message: The specified instance is in an incorrect status for the requested action; Status of the specified instance is Running but the expected status is in (Stopped).	添加节点i-xx过程中，ECS状态不符合预期，该问题一般是因为人为因素干扰。需要您重新添加一次，避免人为干扰因素。
Failed to attach node i-xxxx, err Aliyun API Error: RequestId: 7CE63A45-7932-493D-AE54-D1F199FD1EC7 Status Code: 403 Code: OperationDenied.UnpaidOrder Message: The specified instance has unpaid order.	存在未支付账单，只需要您支付该类订单即可。
mout: unknown filesystem type 'swap'	您的磁盘格式化过，且格式化成了Swap格式的分区，需要您先将Swap格式化成ext4格式，或者直接删除所有分区。
error ipv4 ip_forward not set to 1	建议将每个节点的ip_forward都修改为1，修改ip_forward命令以下所示。 echo 1 > /proc/sys/net/ipv4/ip_forward
May 27 17:11:32 iZuXXXz2lZ cloud-init: [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused. May 27 17:11:32 iZuXXXz2lZ cloud-init: Unfortunately, an error has occurred: May 27 17:11:32 iZuXXXz2lZ cloud-init: timed out waiting for the condition May 27 17:11:32 iZuXXXz2lZ cloud-init: This error is likely caused by: May 27 17:11:32 iZuXXXz2lZ cloud-init: - The kubelet is not running May 27 17:11:32 iZuXXXz2lZ cloud-init: - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)	kubelet启动失败，您可以通过以下命令，定位该问题。 journalctl -u kubelet
curl -k --connect-timeout 4 https://172.XXX.XXX.184:XXXX/version curl: (28) Connection timed out after 4001 milliseconds	API SERVER不通：检查内网SLB的端口健康检查状态是否正常。检查内网SLB访问控制规则是否正确。您可以通过智能运维定位问题。

适用于

容器服务Kubernetes版

文档内容是否对您有帮助？