为了确保服务的持续可用和安全,避免潜在的证书泄露或密钥破解带来的安全风险,在专有版集群中,建议您根据系统提醒及时轮转Master节点的etcd证书。本文介绍如何轮转ACK专有版集群Master节点的etcd证书。
背景信息
ACK专有版集群支持迁移至ACK Pro版集群,您可以选择将集群迁移到ACK Pro版集群,ACK Pro集群etcd和Kubernetes管控面证书由阿里云托管,原ACK专有版集群迁移完成后,无需进行以下轮转操作。迁移具体操作,请参见热迁移ACK专有集群至ACK集群Pro版。
注意事项
容器服务 Kubernetes 版 ACK(Container Service for Kubernetes)会在etcd证书过期前两个月发送站内和短信过期提醒,并在集群列表页面显示更新ETCD证书。
轮转过程中,系统将会逐个节点(one by one)重启集群Master节点的apiserver、etcd、kcm和kubelet等控制平面组件,其间对APIServer的长连接请求会发生断连,请在业务低峰期操作。轮转流程预计在30分钟内结束。
如果您修改过专业版集群etcd或Kubernetes的默认配置文件目录,请建立软链接到原有目录后再进行轮转,否则会导致轮转失败。
如果您通过手工方式轮转完成后,容器服务控制台依旧会显示更新ETCD证书的过期提示,请您提交工单,通过后台配置取消更新提示。
轮转流程中,如遇任何问题导致轮转失败,请提交工单处理。
场景一:etcd证书未过期时轮转方案
当etcd证书即将过期,提示需要更新时,您可以通过以下两种方式进行etcd证书轮转。
控制台自动化方式轮转etcd证书
登录容器服务管理控制台,在左侧导航栏选择集群。
单击etcd证书即将过期集群右侧的更新ETCD证书,进入更新证书页面,然后单击更新证书。
说明若集群证书即将在两个月后过期,在对应集群右侧才会出现更新ETCD证书。
在提示对话框,单击确定。
证书更新成功后,您可以看到以下内容:
在更新证书页面,显示更新成功。
在集群列表页面,目标集群右侧无更新ETCD证书提示。
手工方式轮转etcd证书
使用场景
专有版集群etcd证书即将过期。
无法通过模板部署的方式自动化轮转etcd证书。
无法通过控制台操作更新etcd证书。
当出现以上场景时,集群管理员可以登录任意Master节点,通过操作如下脚本来手工轮转etcd证书。
以下脚本使用需要root用户执行。
确认集群Master节点之间配置了root用户的免密登录。
在Master上通过SSH方式登录其他任意Master节点,如果提示输入密码,请您参考如下方式配置Master节点之间的免密登录。
# 1. 生成密钥。如果您的节点上已存在对应的登录密钥,可以跳过该步骤。 ssh-keygen -t rsa # 2. 使用ssh-copy-id工具传输公钥到其他所有Master节点,$(internal-ip)为其他Master节点的内网IP。 ssh-copy-id -i ~/.ssh/id_rsa.pub $(internal-ip)
说明如果您未执行免密登录相关操作,在运行脚本时,则需要输入root用户密码。
分别复制以下脚本内容,保存并命名为restart-apiserver.sh和rotate-etcd.sh,然后将两者保存到同一个文件夹下。
说明rotate-etcd.sh脚本会尝试通过访问节点的元数据服务获取Region信息并从该Region就近拉取轮转镜像,您也可以在执行该脚本时,输入参数
--region xxxx
指定Region信息。展开查看restart-apiserver.sh脚本
#! /bin/bash declare -x cmd k8s::wait_apiserver_ready() { set -e for i in $(seq 600); do if kubectl cluster-info &>/dev/null; then return 0 else echo "wait apiserver to be ready, retry ${i}th after 1s" sleep 1 fi done echo "failed to wait apiserver to be ready" return 1 } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } function restart_apiserver() { # 判断容器运行时 if [[ $cmd == "docker" ]]; then # 使用docker命令重启kube-apiserver Pod container_id=$(docker ps | grep kube-apiserver | awk '{print $1}' | head -n 1 ) if [[ -n $container_id ]]; then echo "Restarting kube-apiserver pod using Docker: $container_id" docker restart "${container_id}" else echo "kube-apiserver pod not found." fi elif [[ $cmd == "crictl" ]]; then # 使用crictl命令重启kube-apiserver Pod pod_id=$(crictl pods --label component=kube-apiserver --latest --state=ready | grep -v "POD ID" | head -n 1 | awk '{print $1}') if [[ -n $pod_id ]]; then echo "Restarting kube-apiserver pod using crictl: $pod_id" crictl stopp "${pod_id}" else echo "kube-apiserver pod not found." fi else echo "Unsupported container runtime: $cmd" fi k8s::wait_apiserver_ready } check_container_runtime restart_apiserver echo "API Server restarted"
展开查看rotate-etcd.sh脚本
#!/bin/bash set -eo pipefail declare -x TARGET_TEAR declare -x cmd dir=/tmp/etcdcert KUBE_CERT_PATH=/etc/kubernetes/pki ETCD_CERT_DIR=/var/lib/etcd/cert ETCD_HOSTS="" currentDir="$PWD" # 更新K8s证书,根据集群Region替换下面cn-hangzhou的默认镜像地域。 function get_etcdhosts() { name1=$(find "$ETCD_CERT_DIR" -name '*-name-1.pem' -exec basename {} \; | sed 's/-name-1.pem//g') name2=$(find "$ETCD_CERT_DIR" -name '*-name-2.pem' -exec basename {} \; | sed 's/-name-2.pem//g') name3=$(find "$ETCD_CERT_DIR" -name '*-name-3.pem' -exec basename {} \; | sed 's/-name-3.pem//g') echo "hosts: $name1 $name2 $name3" ETCD_HOSTS="$name1 $name2 $name3" } function gencerts() { echo "generate ssl cert ..." rm -rf $dir mkdir -p "$dir" local hosts hosts=$(echo $ETCD_HOSTS | tr -s " " ",") echo "-----generate ca" echo '{"CN":"CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/ca-config.json echo "-----generate etcdserver" export ADDRESS=$hosts,ext1.example.com,coreos1.local,coreos1,127.0.0.1 export NAME=etcd-server echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME export ADDRESS= export NAME=etcd-client echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME # gen peer-ca echo "-----generate peer certificates" echo '{"CN":"Peer-CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/peer-ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/peer-ca-config.json i=0 for host in $ETCD_HOSTS; do ((i = i + 1)) export MEMBER=${host}-name-$i echo '{"CN":"'${MEMBER}'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=$dir/peer-ca.pem -ca-key=$dir/peer-ca-key.pem -config=$dir/peer-ca-config.json -profile=peer \ -hostname="$hosts,${MEMBER}.local,${MEMBER}" - | cfssljson -bare $dir/${MEMBER} done # 制作bundle ca cat $KUBE_CERT_PATH/etcd/ca.pem >>$dir/bundle_ca.pem cat $ETCD_CERT_DIR/ca.pem >>$dir/bundle_ca.pem cat $dir/ca.pem >>$dir/bundle_ca.pem # 制作bundle peer-ca cat $ETCD_CERT_DIR/peer-ca.pem >$dir/bundle_peer-ca.pem cat $dir/peer-ca.pem >>$dir/bundle_peer-ca.pem current_year=$(date +%Y) TARGET_TEAR=$((TARGET_TEAR + 50)) # chown chown -R etcd:etcd $dir chmod 0644 $dir/* } function etcd_client_urls() { local etcd_hosts=() for ip in "${ETCD_HOSTS[@]}"; do etcd_hosts+=("https://$ip:2379") done local result=$( IFS=',' echo "${etcd_hosts[*]}" ) echo "$result" } function check_cert_files_exist() { REQUIRED_CERTS=("ca.pem" "etcd-server-key.pem" "etcd-server.pem" "peer-ca-key.pem" "peer-ca.pem") if [ ! -d "$ETCD_CERT_DIR" ]; then echo "Error: Directory $ETCD_CERT_DIR does not exist" exit 1 fi for cert_file in "${REQUIRED_CERTS[@]}"; do if [ ! -f "$ETCD_CERT_DIR/$cert_file" ]; then echo "Error: File $ETCD_CERT_DIR/$cert_file does not exist" exit 1 fi done echo "All required certificate files exist" } function check_etcd_cluster_ready() { local etcd_endpoints=() for ip in $ETCD_HOSTS; do etcd_endpoints+=("https://$ip:2379") done ready=0 for i in $(seq 300); do for idx in "${!etcd_endpoints[@]}"; do endpoint="${etcd_endpoints[$idx]}" local health_output=$(ETCDCTL_API=3 etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints "$endpoint" endpoint health --command-timeout=1s 2>&1) if echo "$health_output" | grep -q "successfully committed proposal"; then unset 'etcd_endpoints[$idx]' else echo "etcdctl result: ${health_output}" echo "$endpoint is not ready" fi done # shellcheck disable=SC2199 if [[ -z "${etcd_endpoints[@]}" ]]; then echo "ETCD cluster is ready" ready=1 break fi printf "wait etcd cluster to be ready, retry %d after 1s,total 300s \n" "$i" done } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } function rotate_etcd_ca() { for ADDR in $ETCD_HOSTS; do echo "update etcd CA on node $ADDR" scp -o StrictHostKeyChecking=no $dir/bundle_ca.pem root@$ADDR:$ETCD_CERT_DIR/ca.pem scp -o StrictHostKeyChecking=no $dir/bundle_ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem scp -o StrictHostKeyChecking=no $dir/etcd-client.pem root@$ADDR:$KUBE_CERT_PATH/etcd/etcd-client.pem scp -o StrictHostKeyChecking=no $dir/etcd-client-key.pem root@$ADDR:$KUBE_CERT_PATH/etcd/etcd-client-key.pem scp -o StrictHostKeyChecking=no $dir/bundle_peer-ca.pem root@$ADDR:$ETCD_CERT_DIR/peer-ca.pem ssh -o StrictHostKeyChecking=no root@$ADDR chown -R etcd:etcd $ETCD_CERT_DIR ssh -o StrictHostKeyChecking=no root@$ADDR chmod 0644 $ETCD_CERT_DIR/* echo "restart etcd on node $ADDR" ssh -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd echo "etcd on node $ADDR restarted" # 校验etcd是否启动成功,校验集群是否正常 echo "check connectivity for etcd nodes" check_etcd_cluster_ready echo "end to check connectivity for etcd nodes" restart_one_apiserver $ADDR echo "apiserver on node $ADDR restarted" done } function rotate_etcd_certs() { for ADDR in $ETCD_HOSTS; do echo "update etcd peer certs on node $ADDR" scp -o StrictHostKeyChecking=no \ $dir/{peer-ca-key.pem,etcd-server.pem,etcd-server-key.pem,etcd-client.pem,etcd-client-key.pem,ca-key.pem,*-name*.pem} root@$ADDR:$ETCD_CERT_DIR/ ssh -o StrictHostKeyChecking=no root@$ADDR chown -R etcd:etcd $ETCD_CERT_DIR ssh -o StrictHostKeyChecking=no root@$ADDR \ chmod 0400 $ETCD_CERT_DIR/{peer-ca-key.pem,etcd-server.pem,etcd-server-key.pem,etcd-client.pem,etcd-client-key.pem,ca-key.pem,*-name*.pem} echo "restart etcd on node $ADDR" ssh -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd echo "etcd on node $ADDR restarted" echo "check connectivity for etcd nodes" check_etcd_cluster_ready echo "end to check connectivity for etcd nodes" done } function recover_etcd_ca() { # Update certs on etcd nodes. for ADDR in $ETCD_HOSTS; do echo "replace etcd CA on node $ADDR" scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$ETCD_CERT_DIR/ca.pem scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem scp -o StrictHostKeyChecking=no $dir/peer-ca.pem root@$ADDR:$ETCD_CERT_DIR/peer-ca.pem ssh -o StrictHostKeyChecking=no root@$ADDR chown -R etcd:etcd $ETCD_CERT_DIR echo "restart apiserver on node $ADDR" restart_one_apiserver $ADDR echo "apiserver on node $ADDR restarted" echo "restart etcd on node $ADDR" ssh -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd echo "etcd on node $ADDR restarted" echo "check connectivity for etcd nodes" check_etcd_cluster_ready echo "end to check connectivity for etcd nodes" sleep 5 done } function recover_etcd_client_ca() { # Update certs on etcd nodes. for ADDR in $ETCD_HOSTS; do echo "replace etcd CA on node $ADDR" scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem scp -o StrictHostKeyChecking=no $dir/ca.pem root@$ADDR:$KUBE_CERT_PATH/etcd/ca.pem done } function renew_k8s_certs() { # try to get region id from meta-server if not given in parameter META_REGION=$(get_region_id) if [[ -z "$REGION" ]]; then if [[ -z "$META_REGION" ]]; then echo "failed to get region id from ECS meta-server, please enter the region parameter." return 1 fi REGION=$META_REGION elif [[ -n "${META_REGION}" && "$REGION" != "$META_REGION" ]] ; then echo "switch to use local region id $META_REGION" REGION=$META_REGION fi # Update certs for k8s components and kubeconfig for ADDR in $ETCD_HOSTS; do echo "renew k8s components cert on node $ADDR" #compatible containerd set +e IMAGE="registry.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" if is_vpc; then IMAGE="registry-vpc.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" fi echo "will pull rotate image $IMAGE" ssh -o StrictHostKeyChecking=no root@$ADDR docker run --privileged=true -v /:/alicoud-k8s-host --pid host --net host \ $IMAGE /renew/upgrade-k8s.sh --role master ssh -o StrictHostKeyChecking=no root@$ADDR ctr image pull $IMAGE ssh -o StrictHostKeyChecking=no root@$ADDR ctr run --privileged=true --mount type=bind,src=/,dst=/alicoud-k8s-host,options=rbind:rw \ --net-host $IMAGE cert-rotate /renew/upgrade-k8s.sh --role master set -e echo "finished renew k8s components cert on $ADDR" done } function get_region_id() { set +e; # close error out local path=100.100.100.200/latest/meta-data/region-id for (( i=0; i<3; i++)); do response=$(curl --retry 1 --retry-delay 5 -sSL $path) if [[ $? -gt 0 || "x$response" == "x" ]]; then sleep 2; continue fi if echo "$response"|grep -E "<title>.*</title>" >/dev/null; then sleep 3; continue fi echo "$response" # return from metadata succeed. set -e; return done set -e # open error out # function will return empty string when failed } function is_vpc() { # Execute the curl command and capture the network-type from ECS meta-server response=$(curl -s http://100.100.100.200/latest/meta-data/network-type) if [ "$response" = "vpc" ]; then return 0 else return 1 fi } function generate_cm() { echo "generate status configmap" cat <<-"EOF" >/tmp/ack-rotate-etcd-ca-cm.yaml.tpl apiVersion: v1 kind: ConfigMap metadata: name: ack-rotate-etcd-status namespace: kube-system data: status: "success" hosts: "$hosts" EOF sed -e "s#\$hosts#$ETCD_HOSTS#" /tmp/ack-rotate-etcd-ca-cm.yaml.tpl | kubectl apply -f - } function restart_one_apiserver() { ADDR=$1 if [[ -z "${ADDR}" ]]; then printf "ADDR is empty,exit." exit 1 fi printf "restart apiserver on node %s\n" "${ADDR}" scp -o StrictHostKeyChecking=no "${currentDir}"/restart-apiserver.sh root@"${ADDR}":/tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" chmod +x /tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" bash /tmp/restart-apiserver.sh } while [[ $# -gt 0 ]] do key="$1" case $key in --region) export REGION=$2 shift ;; *) echo "unknown option [$key]" exit 1 ;; esac shift done get_etcdhosts echo "${ETCD_HOSTS[@]}" check_container_runtime # Update certs on etcd nodes. echo "---restart runtime and kubelet on master nodes---" for ADDR in $ETCD_HOSTS; do if [ "$cmd" == "docker" ]; then echo "restart docker on node $ADDR" ssh -o StrictHostKeyChecking=no root@$ADDR systemctl restart docker fi ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" systemctl restart kubelet done sleep 5 echo "---end to restart runtime and kubelet on master nodes---" echo "---renew k8s components certs---" renew_k8s_certs echo "---end to renew k8s components certs---" echo "---check cert files exist---" check_cert_files_exist echo "---end to check cert files exist---" echo "---check connectivity for etcd nodes---" check_etcd_cluster_ready echo "---end to check connectivity for etcd nodes---" # Update certs on etcd nodes. for ADDR in $ETCD_HOSTS; do scp -o StrictHostKeyChecking=no restart-apiserver.sh root@$ADDR:/tmp/restart-apiserver.sh ssh -o StrictHostKeyChecking=no root@$ADDR chmod +x /tmp/restart-apiserver.sh done gencerts echo "---rotate etcd ca and etcd client ca---" rotate_etcd_ca echo "---end to rotate etcd ca and etcd client ca---" echo "---rotate etcd peer and certs---" rotate_etcd_certs echo "---end to rotate etcd peer and certs---" echo "check etcd cluster ready" check_etcd_cluster_ready echo "---replace etcd ca---" recover_etcd_ca echo "---end to replace etcd ca---" generate_cm echo "etcd CA and certs have succesfully rotated!"
在任意Master节点上运行
bash rotate-etcd.sh
。当看到命令行输出
etcd CA and certs have successfully rotated!
时,表示所有Master节点上的证书和K8s证书已经轮转完成。验证证书是否更新。
cd /var/lib/etcd/cert for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done cd /etc/kubernetes/pki/etcd for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done cd /etc/kubernetes/pki/ for i in `ls | grep crt| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
说明当以上脚本输出的时间在50年之后,表示轮转完成。
通过手工方式轮转成功后,由于容器服务控制面侧无法获取轮转结果,控制台集群列表中对应集群仍会显示更新按钮,请您提交工单以清除该按钮。
场景二:etcd证书已过期时轮转方案
使用场景
etcd证书已过期。
APIServer无法访问时轮转etcd证书。
无法通过模板部署的方式自动化轮转etcd证书。
无法通过控制台操作更新etcd证书。
当出现以上场景时,集群管理员可以登录任意Master节点,通过操作如下脚本来手工轮转etcd证书。
以下脚本使用需要root用户执行。
确认集群Master节点之间配置了root用户的免密登录。
在Master上通过SSH方式登录其他任意Master节点,如果提示输入密码,请您参考如下方式配置Master节点之间的免密登录。
# 1. 生成密钥。如果您的节点上已存在对应的登录密钥,可以跳过该步骤。 ssh-keygen -t rsa # 2. 使用ssh-copy-id工具传输公钥到其他所有Master节点,$(internal-ip)为其他Master节点的内网IP。 ssh-copy-id -i ~/.ssh/id_rsa.pub $(internal-ip)
说明如果您未执行免密登录相关操作,在运行脚本时,则需要输入root用户密码。
分别复制以下脚本内容,保存并命名为restart-apiserver.sh和rotate-etcd.sh,然后将两者保存到同一个文件夹下。
说明rotate-etcd.sh脚本会尝试通过访问节点的元数据服务获取Region信息并从该Region就近拉取轮转镜像,您也可以在执行该脚本时,输入参数
--region xxxx
指定Region信息。展开查看restart-apiserver.sh脚本
#! /bin/bash declare -x cmd k8s::wait_apiserver_ready() { set -e for i in $(seq 600); do if kubectl cluster-info &>/dev/null; then return 0 else echo "wait apiserver to be ready, retry ${i}th after 1s" sleep 1 fi done echo "failed to wait apiserver to be ready" return 1 } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } function restart_apiserver() { # 判断容器运行时 if [[ $cmd == "docker" ]]; then # 使用docker命令重启kube-apiserver Pod container_id=$(docker ps | grep kube-apiserver | awk '{print $1}' | head -n 1 ) if [[ -n $container_id ]]; then echo "Restarting kube-apiserver pod using Docker: $container_id" docker restart "${container_id}" else echo "kube-apiserver pod not found." fi elif [[ $cmd == "crictl" ]]; then # 使用crictl命令重启kube-apiserver Pod pod_id=$(crictl pods --label component=kube-apiserver --latest --state=ready | grep -v "POD ID" | head -n 1 | awk '{print $1}') if [[ -n $pod_id ]]; then echo "Restarting kube-apiserver pod using crictl: $pod_id" crictl stopp "${pod_id}" else echo "kube-apiserver pod not found." fi else echo "Unsupported container runtime: $cmd" fi k8s::wait_apiserver_ready } check_container_runtime restart_apiserver echo "API Server restarted"
展开查看rotate-etcd.sh脚本
#!/bin/bash set -eo pipefail declare -x TARGET_TEAR declare -x cmd dir=/tmp/rollback/etcdcert KUBE_CERT_PATH=/etc/kubernetes/pki ETCD_CERT_DIR=/var/lib/etcd/cert ETCD_HOSTS="" currentDir="$PWD" # 更新K8s证书,根据集群Region替换下面cn-hangzhou的默认镜像地域。 function get_etcdhosts() { name1=$(find "$ETCD_CERT_DIR" -name '*-name-1.pem' -exec basename {} \; | sed 's/-name-1.pem//g') name2=$(find "$ETCD_CERT_DIR" -name '*-name-2.pem' -exec basename {} \; | sed 's/-name-2.pem//g') name3=$(find "$ETCD_CERT_DIR" -name '*-name-3.pem' -exec basename {} \; | sed 's/-name-3.pem//g') echo "hosts: $name1 $name2 $name3" ETCD_HOSTS="$name1 $name2 $name3" } function gencerts() { echo "generate ssl cert ..." rm -rf $dir mkdir -p "$dir" cd $dir local hosts hosts=$(echo $ETCD_HOSTS | tr -s " " ",") echo "generate ca" echo '{"CN":"CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/ca-config.json echo "generate etcd server certificates" export ADDRESS=$hosts,ext1.example.com,coreos1.local,coreos1,127.0.0.1 export NAME=etcd-server echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME export ADDRESS= export NAME=etcd-client echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME # gen peer-ca echo "generate peer certificates" echo '{"CN":"Peer-CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/peer-ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/peer-ca-config.json i=0 for host in $ETCD_HOSTS; do ((i = i + 1)) export MEMBER=${host}-name-$i echo '{"CN":"'${MEMBER}'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=$dir/peer-ca.pem -ca-key=$dir/peer-ca-key.pem -config=$dir/peer-ca-config.json -profile=peer \ -hostname="$hosts,${MEMBER}.local,${MEMBER}" - | cfssljson -bare $dir/${MEMBER} done # chown chown -R etcd:etcd $dir chmod 0644 $dir/* for ADDR in $ETCD_HOSTS; do printf "sync the certificates of node %s" "${ADDR}" ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" mkdir -p "${dir}" scp -o StrictHostKeyChecking=no "${dir}"/* root@"${ADDR}":/var/lib/etcd/cert/ scp -o StrictHostKeyChecking=no "${dir}"/ca.pem "${dir}"/etcd-client.pem "${dir}"/etcd-client-key.pem root@"${ADDR}":/etc/kubernetes/pki/etcd/ done } function generate_cm() { echo "generate status configmap" cat <<-"EOF" >/tmp/ack-rotate-etcd-ca-cm.yaml.tpl apiVersion: v1 kind: ConfigMap metadata: name: ack-rotate-etcd-status namespace: kube-system data: status: "success" hosts: "$hosts" EOF sed -e "s#\$hosts#$ETCD_HOSTS#" /tmp/ack-rotate-etcd-ca-cm.yaml.tpl | kubectl apply -f - } function rotate_etcd() { for ADDR in $ETCD_HOSTS; do printf "rotate etcd's certificates on node %s\n" "${ADDR}" if [ "$cmd" == "docker" ]; then echo "restart docker on node $ADDR" ssh -e none -o StrictHostKeyChecking=no root@$ADDR systemctl restart docker fi ssh -e none -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd done } function rotate_apiserver() { echo "current dir: $currentDir" for ADDR in $ETCD_HOSTS; do printf "restart apiserver on node %s\n" "${ADDR}" scp -o StrictHostKeyChecking=no "${currentDir}"/restart-apiserver.sh root@"${ADDR}":/tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" systemctl restart kubelet ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" chmod +x /tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" bash /tmp/restart-apiserver.sh done } function check_etcd_cluster_ready() { local etcd_endpoints=() for ip in $ETCD_HOSTS; do etcd_endpoints+=("https://$ip:2379") done for i in $(seq 300); do for idx in "${!etcd_endpoints[@]}"; do endpoint="${etcd_endpoints[$idx]}" local health_output=$(ETCDCTL_API=3 etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints "$endpoint" endpoint health --command-timeout=1s 2>&1) if echo "$health_output" | grep -q "successfully committed proposal"; then unset 'etcd_endpoints[$idx]' else echo "etcdctl result: ${health_output}" echo "$endpoint is not ready" fi done # shellcheck disable=SC2199 if [[ -z "${etcd_endpoints[@]}" ]]; then echo "ETCD cluster is ready" break fi sleep 1 printf "wait etcd cluster to be ready, retry %d after 1s,total 300s \n" "$i" done } function get_region_id() { set +e; # close error out local path=100.100.100.200/latest/meta-data/region-id for (( i=0; i<3; i++)); do response=$(curl --retry 1 --retry-delay 5 -sSL $path) if [[ $? -gt 0 || "x$response" == "x" ]]; then sleep 2; continue fi if echo "$response"|grep -E "<title>.*</title>" >/dev/null; then sleep 3; continue fi echo "$response" # return from metadata succeed. set -e; return done set -e # open error out # function will return empty string when failed } function is_vpc() { # Execute the curl command and capture the network-type from ECS meta-server response=$(curl -s http://100.100.100.200/latest/meta-data/network-type) if [ "$response" = "vpc" ]; then return 0 else return 1 fi } function renew_k8s_certs() { # try to get region id from meta-server if not given in parameter META_REGION=$(get_region_id) if [[ -z "$REGION" ]]; then if [[ -z "$META_REGION" ]]; then echo "failed to get region id from ECS meta-server, please enter the region parameter." return 1 fi REGION=$META_REGION elif [[ -n "${META_REGION}" && "$REGION" != "$META_REGION" ]] ; then echo "switch to use local region id $META_REGION" REGION=$META_REGION fi # Update certs for k8s components and kubeconfig for ADDR in $ETCD_HOSTS; do echo "renew k8s components cert on node $ADDR" #compatible containerd set +e IMAGE="registry.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" if is_vpc; then IMAGE="registry-vpc.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" fi echo "will pull rotate image $IMAGE" ssh -o StrictHostKeyChecking=no root@$ADDR docker run --privileged=true -v /:/alicoud-k8s-host --pid host --net host \ $IMAGE /renew/upgrade-k8s.sh --role master ssh -o StrictHostKeyChecking=no root@$ADDR ctr image pull $IMAGE ssh -o StrictHostKeyChecking=no root@$ADDR ctr run --privileged=true --mount type=bind,src=/,dst=/alicoud-k8s-host,options=rbind:rw \ --net-host $IMAGE cert-rotate /renew/upgrade-k8s.sh --role master set -e echo "finished renew k8s components cert on $ADDR" done } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } while [[ $# -gt 0 ]] do key="$1" case $key in --region) export REGION=$2 shift ;; *) echo "unknown option [$key]" exit 1 ;; esac shift done get_etcdhosts printf "ETCD_HOSTS: %s\n" "$ETCD_HOSTS" gencerts echo "---generate certificates successfully---" rotate_etcd echo "---rotate etcd successfully---" echo "---check etcd cluster ready---" check_etcd_cluster_ready rotate_apiserver echo "---restart apiserver successfully---" echo "---renew k8s components certs---" renew_k8s_certs echo "---end to renew k8s components certs---" generate_cm echo "etcd CA and certs have successfully rotated!" rm -rf $dir
验证证书是否更新。
cd /var/lib/etcd/cert
for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
cd /etc/kubernetes/pki/etcd
for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
cd /etc/kubernetes/pki/
for i in `ls | grep crt| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
当以上脚本输出的时间在50年之后,表示轮转完成。
通过手工方式轮转成功后,由于容器服务控制面侧无法获取轮转结果,控制台集群列表中对应集群仍会显示已过期状态,请您提交工单以清除过期状态显示。
证书轮转失败后回滚
使用场景
通过云控制台证书轮转失败,恢复K8s集群。
通过黑屏方式证书轮转失败,恢复K8s集群。
当出现以上场景时,集群管理员可以登录任意Master节点,通过操作如下脚本来手工更新etcd证书,因老证书即将过期,此操作会新生成一套etcd证书,并更新etcd server证书和kube-apiserver的client证书。
以下脚本使用需要root用户执行。
确认集群Master节点之间配置了root用户的免密登录。
在Master上通过SSH方式登录其他任意Master节点,如果提示输入密码,请您参考如下方式配置Master节点之间的免密登录。
# 1. 生成密钥。如果您的节点上已存在对应的登录密钥,可以跳过该步骤。 ssh-keygen -t rsa # 2. 使用ssh-copy-id工具传输公钥到其他所有Master节点,$(internal-ip)为其他Master节点的内网IP。 ssh-copy-id -i ~/.ssh/id_rsa.pub $(internal-ip)
说明如果您未执行免密登录相关操作,在运行脚本时,则需要输入root用户密码。
分别复制以下脚本内容,保存并命名为restart-apiserver.sh和rollback-etcd.sh,然后将两者保存到同一个文件夹
说明rollback-etcd.sh脚本会尝试通过访问节点的元数据服务获取Region信息并从该Region就近拉取轮转镜像,您也可以在执行该脚本时,输入参数
--region xxxx
指定Region信息。展开查看restart-apiserver.sh脚本
#! /bin/bash declare -x cmd k8s::wait_apiserver_ready() { set -e for i in $(seq 600); do if kubectl cluster-info &>/dev/null; then return 0 else echo "wait apiserver to be ready, retry ${i}th after 1s" sleep 1 fi done echo "failed to wait apiserver to be ready" return 1 } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } function restart_apiserver() { # 判断容器运行时 if [[ $cmd == "docker" ]]; then # 使用docker命令重启kube-apiserver Pod container_id=$(docker ps | grep kube-apiserver | awk '{print $1}' | head -n 1 ) if [[ -n $container_id ]]; then echo "Restarting kube-apiserver pod using Docker: $container_id" docker restart "${container_id}" else echo "kube-apiserver pod not found." fi elif [[ $cmd == "crictl" ]]; then # 使用crictl命令重启kube-apiserver Pod pod_id=$(crictl pods --label component=kube-apiserver --latest --state=ready | grep -v "POD ID" | head -n 1 | awk '{print $1}') if [[ -n $pod_id ]]; then echo "Restarting kube-apiserver pod using crictl: $pod_id" crictl stopp "${pod_id}" else echo "kube-apiserver pod not found." fi else echo "Unsupported container runtime: $cmd" fi k8s::wait_apiserver_ready } check_container_runtime restart_apiserver echo "API Server restarted"
展开查看rollback-etcd.sh脚本
#!/bin/bash set -eo pipefail declare -x TARGET_TEAR declare -x cmd dir=/tmp/rollback/etcdcert KUBE_CERT_PATH=/etc/kubernetes/pki ETCD_CERT_DIR=/var/lib/etcd/cert ETCD_HOSTS="" currentDir="$PWD" # 更新K8s证书,根据集群Region替换下面cn-hangzhou的默认镜像地域。 function get_etcdhosts() { name1=$(find "$ETCD_CERT_DIR" -name '*-name-1.pem' -exec basename {} \; | sed 's/-name-1.pem//g') name2=$(find "$ETCD_CERT_DIR" -name '*-name-2.pem' -exec basename {} \; | sed 's/-name-2.pem//g') name3=$(find "$ETCD_CERT_DIR" -name '*-name-3.pem' -exec basename {} \; | sed 's/-name-3.pem//g') echo "hosts: $name1 $name2 $name3" ETCD_HOSTS="$name1 $name2 $name3" } function gencerts() { echo "generate ssl cert ..." rm -rf $dir mkdir -p "$dir" cd $dir local hosts hosts=$(echo $ETCD_HOSTS | tr -s " " ",") echo "generate ca" echo '{"CN":"CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/ca-config.json echo "generate etcd server certificates" export ADDRESS=$hosts,ext1.example.com,coreos1.local,coreos1,127.0.0.1 export NAME=etcd-server echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME export ADDRESS= export NAME=etcd-client echo '{"CN":"'$NAME'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -config=$dir/ca-config.json -ca=$dir/ca.pem -ca-key=$dir/ca-key.pem -hostname="$ADDRESS" - | cfssljson -bare $dir/$NAME # gen peer-ca echo "generate peer certificates" echo '{"CN":"Peer-CA","key":{"algo":"rsa","size":2048}, "ca": {"expiry": "438000h"}}' | cfssl gencert -initca - | cfssljson -bare $dir/peer-ca - echo '{"signing":{"default":{"expiry":"438000h","usages":["signing","key encipherment","server auth","client auth"]}}}' >$dir/peer-ca-config.json i=0 for host in $ETCD_HOSTS; do ((i = i + 1)) export MEMBER=${host}-name-$i echo '{"CN":"'${MEMBER}'","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=$dir/peer-ca.pem -ca-key=$dir/peer-ca-key.pem -config=$dir/peer-ca-config.json -profile=peer \ -hostname="$hosts,${MEMBER}.local,${MEMBER}" - | cfssljson -bare $dir/${MEMBER} done # chown chown -R etcd:etcd $dir chmod 0644 $dir/* for ADDR in $ETCD_HOSTS; do printf "sync the certificates of node %s" "${ADDR}" ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" mkdir -p "${dir}" scp -o StrictHostKeyChecking=no "${dir}"/* root@"${ADDR}":/var/lib/etcd/cert/ scp -o StrictHostKeyChecking=no "${dir}"/ca.pem "${dir}"/etcd-client.pem "${dir}"/etcd-client-key.pem root@"${ADDR}":/etc/kubernetes/pki/etcd/ done } function generate_cm() { echo "generate status configmap" cat <<-"EOF" >/tmp/ack-rotate-etcd-ca-cm.yaml.tpl apiVersion: v1 kind: ConfigMap metadata: name: ack-rotate-etcd-status namespace: kube-system data: status: "success" hosts: "$hosts" EOF sed -e "s#\$hosts#$ETCD_HOSTS#" /tmp/ack-rotate-etcd-ca-cm.yaml.tpl | kubectl apply -f - } function rotate_etcd() { for ADDR in $ETCD_HOSTS; do printf "rotate etcd's certificates on node %s\n" "${ADDR}" if [ "$cmd" == "docker" ]; then echo "restart docker on node $ADDR" ssh -e none -o StrictHostKeyChecking=no root@$ADDR systemctl restart docker fi ssh -e none -o StrictHostKeyChecking=no root@$ADDR systemctl restart etcd done } function rotate_apiserver() { echo "current dir: $currentDir" for ADDR in $ETCD_HOSTS; do printf "restart apiserver on node %s\n" "${ADDR}" scp -o StrictHostKeyChecking=no "${currentDir}"/restart-apiserver.sh root@"${ADDR}":/tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" systemctl restart kubelet ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" chmod +x /tmp/restart-apiserver.sh ssh -e none -o StrictHostKeyChecking=no root@"${ADDR}" bash /tmp/restart-apiserver.sh done } function check_etcd_cluster_ready() { local etcd_endpoints=() for ip in $ETCD_HOSTS; do etcd_endpoints+=("https://$ip:2379") done for i in $(seq 300); do for idx in "${!etcd_endpoints[@]}"; do endpoint="${etcd_endpoints[$idx]}" local health_output=$(ETCDCTL_API=3 etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints "$endpoint" endpoint health --command-timeout=1s 2>&1) if echo "$health_output" | grep -q "successfully committed proposal"; then unset 'etcd_endpoints[$idx]' else echo "etcdctl result: ${health_output}" echo "$endpoint is not ready" fi done # shellcheck disable=SC2199 if [[ -z "${etcd_endpoints[@]}" ]]; then echo "ETCD cluster is ready" break fi sleep 1 printf "wait etcd cluster to be ready, retry %d after 1s,total 300s \n" "$i" done } function get_region_id() { set +e; # close error out local path=100.100.100.200/latest/meta-data/region-id for (( i=0; i<3; i++)); do response=$(curl --retry 1 --retry-delay 5 -sSL $path) if [[ $? -gt 0 || "x$response" == "x" ]]; then sleep 2; continue fi if echo "$response"|grep -E "<title>.*</title>" >/dev/null; then sleep 3; continue fi echo "$response" # return from metadata succeed. set -e; return done set -e # open error out # function will return empty string when failed } function is_vpc() { # Execute the curl command and capture the network-type from ECS meta-server response=$(curl -s http://100.100.100.200/latest/meta-data/network-type) if [ "$response" = "vpc" ]; then return 0 else return 1 fi } function renew_k8s_certs() { # try to get region id from meta-server if not given in parameter META_REGION=$(get_region_id) if [[ -z "$REGION" ]]; then if [[ -z "$META_REGION" ]]; then echo "failed to get region id from ECS meta-server, please enter the region parameter." return 1 fi REGION=$META_REGION elif [[ -n "${META_REGION}" && "$REGION" != "$META_REGION" ]] ; then echo "switch to use local region id $META_REGION" REGION=$META_REGION fi # Update certs for k8s components and kubeconfig for ADDR in $ETCD_HOSTS; do echo "renew k8s components cert on node $ADDR" #compatible containerd set +e IMAGE="registry.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" if is_vpc; then IMAGE="registry-vpc.$REGION.aliyuncs.com/acs/etcd-rotate:v2.0.0" fi echo "will pull rotate image $IMAGE" ssh -o StrictHostKeyChecking=no root@$ADDR docker run --privileged=true -v /:/alicoud-k8s-host --pid host --net host \ $IMAGE /renew/upgrade-k8s.sh --role master ssh -o StrictHostKeyChecking=no root@$ADDR ctr image pull $IMAGE ssh -o StrictHostKeyChecking=no root@$ADDR ctr run --privileged=true --mount type=bind,src=/,dst=/alicoud-k8s-host,options=rbind:rw \ --net-host $IMAGE cert-rotate /renew/upgrade-k8s.sh --role master set -e echo "finished renew k8s components cert on $ADDR" done } function check_container_runtime() { if command -v dockerd &>/dev/null && ps aux | grep -q "[d]ockerd"; then cmd=docker elif command -v containerd &>/dev/null && ps aux | grep -q "[c]ontainerd"; then cmd=crictl else echo "Neither Dockerd nor Containerd is installed or running." exit 1 fi } while [[ $# -gt 0 ]] do key="$1" case $key in --region) export REGION=$2 shift ;; *) echo "unknown option [$key]" exit 1 ;; esac shift done get_etcdhosts printf "ETCD_HOSTS: %s\n" "$ETCD_HOSTS" gencerts echo "---generate certificates successfully---" rotate_etcd echo "---rotate etcd successfully---" echo "---check etcd cluster ready---" check_etcd_cluster_ready rotate_apiserver echo "---restart apiserver successfully---" echo "---renew k8s components certs---" renew_k8s_certs echo "---end to renew k8s components certs---" generate_cm echo "etcd CA and certs have successfully rotated!" rm -rf $dir
在任意Master节点上运行
bash rollback-etcd.sh
。当看到命令行输出
etcd CA and certs have successfully rotated!
时,表示所有Master节点上的证书和K8s证书已经轮转完成。验证证书是否更新。
cd /var/lib/etcd/cert
for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
cd /etc/kubernetes/pki/etcd
for i in `ls | grep pem| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
cd /etc/kubernetes/pki/
for i in `ls | grep crt| grep -v key`;do openssl x509 -noout -text -in $i | grep -i after && echo "$i" ;done
当以上脚本输出的时间在50年之后,表示轮转完成。