如何从SBD fence方案迁移到Fence aliyun方案?

更新时间:

问题描述

你希望将在阿里云上部署的SAP高可用环境(SBD fence方案)迁移到Fence aliyun方案。

适用于

  • 阿里云ECS实例上部署的SAP高可用环境(SAP HANA、SAP ASCS/SCS)
  • SAP ASCS/SCS高可用环境的ERS实例安装在本机,并且使用高可用虚拟IP产品管理服务地址

使用限制和说明

  • 使用此迁移方案前请确保当前您的SAP高可用环境(SAP HANA、SAP ASCS/SCS)运行正常。
  • SAP ASCS/SCS高可用环境没有安装ERS实例的环境,不适用此方案。
  • 操作系统的版本需要SLES for SAP 12 SP4及以上。
  • 此迁移方案需要业务停机,请提前规划停机窗口。
  • 强烈建议做变更前对ECS的系统盘和数据盘创建快照,您可以参考单块云盘快照或者多个云盘快照

方案

场景一:SAP HANA高可用环境

以下是SAP HANA高可用环境的操作流程,具体如下:

  1. 登录集群的主节点,执行以下命令,查看所有资源的状态。
    说明:未特殊说明的步骤只需要在集群的一个节点上操作即可。
    crm_mon -r
    系统显示类似如下,示例有两台ECS,hana001和hana002,集群状态和被管理的资源状态正常。
    Stack: corosync

    2 nodes configured
    6 resources configured

    Online: [ hana001 hana002 ]

    Full list of resources:
    rsc_sbd (stonith:external/sbd): Started hana001
    rsc_vip (ocf::heartbeat:IPaddr2):       Started hana001
     Clone Set: msl_SAPHana_HDB [rsc_SAPHana_HDB] (promotable)
         Masters: [ hana001 ]
         Slaves: [ hana002 ]
     Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
         Started: [ hana001 hana002 ]
  2. 执行以下命令,查找当前SBD的块设备名。
    cat /etc/sysconfig/sbd | grep SBD_DEVICE
    命令返回类似如下,本示例的SBD块设备名是/dev/vdf
    # SBD_DEVICE specifies the devices to use for exchanging sbd messages

    SBD_DEVICE="/dev/vdf"
    登录阿里云控制台,"存储与快照"->"共享块存储",点击共享块存储实例ID查看详情,再次确认ECS实例挂载的设备名跟上面查询到的设备名一致。

    请确认这里显示的设备名去掉x字符跟上面查询到的结果一致
  3. 本示例执行以下命令,查询ASCS和ERS的高可用虚拟IP的设置。
    crm configure show | grep -E "primitive rsc_vip|params ip"
    命令返回类似如下。
    primitive rsc_vip IPaddr2 \
            params ip=192.168.10.101
    请根据实际情况替换对应的参数名
  4. 参考SAP HANA同可用区高可用部署中的5.3.2 方案二:fence_aliyun章节,完成全部配置。
  5. 执行以下命令,将集群设置为维护模式。
    crm configure property maintenance-mode=true

    如果集群中存在maintenance属性的设定,会弹出类似提示,输入y即可。

    'maintenance' attribute already exists in rsc_sbd. Remove it (y/n)? y
    'is-managed' conflicts with 'maintenance' in cln_SAPHanaTopology_HDB. Remove it (y/n)? y
  6. 设置成功后,执行以下命令,确认所有资源都是unmanaged状态。
    crm_mon -r
    命令返回类似如下。
    2 nodes configured
    6 resources configured

                  *** Resource management is DISABLED ***
      The cluster will not attempt to start, stop or recover services

    Online: [ hana001 hana002 ]

    Full list of resources:

    rsc_sbd (stonith:external/sbd): Started hana001 (unmanaged)
    rsc_vip (ocf::heartbeat:IPaddr2):       Started hana001 (unmanaged)
     Clone Set: msl_SAPHana_HDB [rsc_SAPHana_HDB] (promotable) (unmanaged)
         rsc_SAPHana_HDB    (ocf::suse:SAPHana):    Slave hana002 (unmanaged)
         rsc_SAPHana_HDB    (ocf::suse:SAPHana):    Master hana001 (unmanaged)
     Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB] (unmanaged)
         rsc_SAPHanaTopology_HDB    (ocf::suse:SAPHanaTopology):    Started hana002 (unmanaged)
         rsc_SAPHanaTopology_HDB    (ocf::suse:SAPHanaTopology):    Started hana001 (unmanaged)
    说明:如果还存在没被unmanaged的资源,需要手工将其设置成unmanaged,命令语法如下:
    语法:
    crm resource maintenance [resource name] true
    以SAP HANA的资源没有被正常设置为unmanaged为例。
    2 nodes configured
    6 resources configured

                  *** Resource management is DISABLED ***
      The cluster will not attempt to start, stop or recover services

    Online: [ hana001 hana002 ]

    Full list of resources:

    rsc_sbd (stonith:external/sbd): Started hana001 (unmanaged)
    rsc_vip (ocf::heartbeat:IPaddr2):       Started hana001 (unmanaged)
     Clone Set: msl_SAPHana_HDB [rsc_SAPHana_HDB] (promotable) (unmanaged)
         rsc_SAPHana_HDB    (ocf::suse:SAPHana):    Slave hana002
         rsc_SAPHana_HDB    (ocf::suse:SAPHana):    Master hana001
     Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB] (unmanaged)
         rsc_SAPHanaTopology_HDB    (ocf::suse:SAPHanaTopology):    Started hana002 (unmanaged)
         rsc_SAPHanaTopology_HDB    (ocf::suse:SAPHanaTopology):    Started hana001 (unmanaged)
    执行以下命令来完成设置:
    crm resource maintenance rsc_SAPHana_HDB true
    请再次确认所有资源都已经处于unmanaged状态
  7. 将所有资源设置为stop状态。
    语法:
    crm resource stop ID1 ID2 ...
    本示例运行的命令:
    crm resource stop rsc_sbd rsc_vip rsc_SAPHana_HDB rsc_SAPHanaTopology_HDB
    请替换成您的环境的资源ID
  8. 删除所有资源。
    语法:
    crm configure delete ID1 ID2 ...
    本示例命令:
    crm configure delete rsc_sbd rsc_vip rsc_SAPHana_HDB rsc_SAPHanaTopology_HDB
  9. 分别在两个节点上重启pacemaker服务
    systemctl restart pacemaker
  10. 退出集群维护模式
    crm configure property maintenance-mode=false
  11. 清空资源后,确认集群中只有两个node,资源数为0。
    crm_mon -r

    Stack: corosync
    Current DC: hana001 (version 2.0.1+20190417.13d370ca9-3.24.1-2.0.1+20190417.13d370ca9) - partition with quorum
    Last updated: Thu Feb 24 11:57:13 2022
    Last change: Thu Feb 24 11:57:09 2022 by root via cibadmin on hana001

    2 nodes configured
    0 resources configured

    Online: [ hana001 hana002 ]

    No resources
  12. 参考SAP HANA同可用区高可用部署,11.2章节完成fence agent的脚本配置。
  13. 执行以下命令,验证集群配置。
    Stack: corosync
    Current DC: hana001 (version 2.0.1+20190417.13d370ca9-3.24.1-2.0.1+20190417.13d370ca9) - partition with quorum
    Last updated: Thu Feb 24 17:51:44 2022
    Last change: Thu Feb 24 17:51:41 2022 by root via crm_attribute on hana001

    2 nodes configured
    7 resources configured

    Online: [ hana001 hana002 ]

    Full list of resources:

    res_ALIYUN_STONITH_1    (stonith:fence_aliyun): Started hana002
    res_ALIYUN_STONITH_2    (stonith:fence_aliyun): Started hana001
    rsc_vip (ocf::heartbeat:IPaddr2):       Started hana001
     Clone Set: msl_SAPHana_HDB [rsc_SAPHana_HDB] (promotable)
         Masters: [ hana001 ]
         Slaves: [ hana002 ]
     Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
         Started: [ hana001 hana002 ]
    注意请确认集群的主备节点角色符合预期。
  14. 高可用环境切换测试验证,可参考SUSE官网文档或SAP系统高可用环境维护指南
  15. 执行以下命令,禁用SBD服务
    systemctl disable sbd
  16. 释放共享存储产品
    登录阿里云控制台,"存储与快照"->"共享块存储",找到本次操作的共享块存储实例,从ECS上卸载掉。

场景二:SAP ASCS/SCS高可用环境

以下是SAP S/4HANA ASCS高可用环境的操作流程,具体如下:

  1. 登录集群的主节点,执行以下命令,查看所有资源的状态。
    说明:未特殊说明的步骤只需要在集群的一个节点上操作即可。
    crm_mon -r
    系统显示类似如下,示例有两台ECS,SAPAPP01和SAPAPP02上安装了ASCS高可用环境,集群状态和被管理的资源状态正常。
    Stack: corosync

    2 nodes configured
    5 resource instances configured

    Online: [ SAPAPP01 SAPAPP02 ]

    Full list of resources:

    stonith-sbd     (stonith:external/sbd): Started SAPAPP01
     Resource Group: grp_S4A_ASCS00
         rsc_ip_S4A_ASCS00  (ocf::heartbeat:IPaddr2):       Started SAPAPP01
         rsc_sap_S4A_ASCS00 (ocf::heartbeat:SAPInstance):   Started SAPAPP01
     Resource Group: grp_S4A_ERS10
         rsc_ip_S4A_ERS10   (ocf::heartbeat:IPaddr2):       Started SAPAPP02
         rsc_sap_S4A_ERS10  (ocf::heartbeat:SAPInstance):   Started SAPAPP02
  2. 执行以下命令,查找当前SBD的块设备名。
    cat /etc/sysconfig/sbd | grep SBD_DEVICE
    命令返回类似如下,本示例的SBD块设备名是/dev/vdc
    # SBD_DEVICE specifies the devices to use for exchanging sbd messages

    SBD_DEVICE="/dev/vdc"
    登录阿里云控制台,"存储与快照"->"共享块存储",点击共享块存储实例ID查看详情,再次确认ECS实例挂载的设备名跟上面查询到的设备名一致。

    请确认这里显示的设备名去掉x字符跟上面查询到的结果一致
  3. 参考SAP S/4HANA同可用区高可用部署中的4.4 方案二:Fence_aliyun实现fence功能章节,完成全部配置。
  4. 执行以下命令,将集群设置为维护模式。
    crm configure property maintenance-mode=true
    如果集群中存在maintenance属性的设定,会弹出类似提示,输入y即可。
    'maintenance' attribute already exists in rsc_sap_S4A_ERS10. Remove it (y/n)?
  5. 设置成功后,执行以下命令,确认所有资源都是unmanaged状态。
  6. crm_mon -r
    命令返回类似如下。
    2 nodes configured
    5 resource instances configured

                  *** Resource management is DISABLED ***
      The cluster will not attempt to start, stop or recover services

    Online: [ SAPAPP01 SAPAPP02 ]

    Full list of resources:

    stonith-sbd     (stonith:external/sbd): Started SAPAPP01 (unmanaged)
     Resource Group: grp_S4A_ASCS00
         rsc_ip_S4A_ASCS00  (ocf::heartbeat:IPaddr2):       Started SAPAPP01 (unmanaged)
         rsc_sap_S4A_ASCS00 (ocf::heartbeat:SAPInstance):   Started SAPAPP01 (unmanaged)
     Resource Group: grp_S4A_ERS10
         rsc_ip_S4A_ERS10   (ocf::heartbeat:IPaddr2):       Started SAPAPP02 (unmanaged)
         rsc_sap_S4A_ERS10  (ocf::heartbeat:SAPInstance):   Started SAPAPP02 (unmanaged)
    说明:如果还存在没被unmanaged的资源,需要手工将其设置成unmanaged,命令语法如下:
    语法:
    crm resource maintenance [resource name] true
    以rsc_ip_S4A_ASCS00资源为例,执行以下命令来完成设置:
    crm resource maintenance rsc_ip_S4A_ASCS00 true
    请再次确认所有资源都已经处于unmanaged状态
  7. 将所有资源设置为stop状态。
    语法:
    crm resource stop ID1 ID2 ...
    本示例运行的命令:
    crm resource stop stonith-sbd rsc_ip_S4A_ERS10 rsc_sap_S4A_ERS10 rsc_ip_S4A_ASCS00 rsc_sap_S4A_ASCS00
    请替换成您的环境的资源ID
  8. 删除所有资源。
    语法:
    crm configure delete ID1 ID2 ...
    本示例命令:
    crm configure delete stonith-sbd rsc_ip_S4A_ERS10 rsc_sap_S4A_ERS10 rsc_ip_S4A_ASCS00 rsc_sap_S4A_ASCS00
  9. 分别在两个节点上重启pacemaker服务
    systemctl restart pacemaker
  10. 退出集群维护模式
    crm configure property maintenance-mode=false
  11. 清空资源后,确认集群中只有两个node,资源数为0。
    crm_mon -r

    2 nodes configured
    0 resource instances configured

    Online: [ SAPAPP01 SAPAPP02 ]

    No resources
  12. 参考SAP S/4HANA同可用区高可用部署,7.5.4 方案二Fence_aliyun实现fence功能章节完成fence agent的脚本配置。
  13. 执行以下命令,验证集群配置。
    Stack: corosync

    2 nodes configured
    6 resources configured

    Online: [ SAPAPP01 SAPAPP02 ]

    Full list of resources:

    res_ALIYUN_STONITH_1    (stonith:fence_aliyun): Started SAPAPP02
    res_ALIYUN_STONITH_2    (stonith:fence_aliyun): Started SAPAPP01
     Resource Group: grp_S4A_ASCS00
         rsc_ip_S4A_ASCS00  (ocf::heartbeat:IPaddr2):       Started SAPAPP01
         rsc_sap_S4A_ASCS00 (ocf::heartbeat:SAPInstance):   Started SAPAPP01
     Resource Group: grp_S4A_ERS10
         rsc_ip_S4A_ERS10   (ocf::heartbeat:IPaddr2):       Started SAPAPP02
         rsc_sap_S4A_ERS10  (ocf::heartbeat:SAPInstance):   Started SAPAPP02
    注意请确认集群的主备节点角色符合预期。
  14. 高可用环境切换测试验证,可参考SUSE官网文档或SAP系统高可用环境维护指南
  15. 执行以下命令,禁用SBD服务
    systemctl disable sbd
  16. 释放共享存储产品
    登录阿里云控制台,"存储与快照"->"共享块存储",找到本次操作的共享块存储实例,从ECS上卸载掉。

 

相关文档