全部产品
云市场

SAP系统高可用环境维护指南

更新时间:2019-07-31 10:01:25


版本管理

版本 修订日期 变更类型 生效日期
1.0 2019/4/15
1.1 2019/7/30 1.更新故障计数描述
2.更新启停顺序的说明
2019/7/30

SAP高可用环境维护概述

本文档适用基于SUSE HAE 12集群部署的SAP系统应用或HANA ECS实例需要进行运维操作的场景,例如ECS实例规格升降配、SAP应用 / 数据库升级、主/备节点的常规维护、节点发生异常切换等场景的前置和后处理说明。

通过SUSE HAE管理的SAP系统,如果要在集群节点上执行维护任务,可能需要停止该节点上运行的资源、移动这些资源,或者关闭或重启该节点。此外,可能还需要暂时接管群集中资源的控制权。

下面列举的场景以SAP HANA HA为例,SAP ASCS HA / SAP 数据库 HA等场景下维护操作类似。

SUSE HAE操作手册请参考:

HANA HSR配置手册请参考:

HANA HA常见维护场景

SUSE HAE的体系结构如下图:susehae

SUSE Pacemaker提供了多种选项用于不同需求的维护需求:

将集群设置为维护模式
使用全局集群属性 maintenance-mode 可以一次性将所有资源置于维护状态。集群将停止监控这些资源。

将节点设置为维护模式
一次性将指定节点上运行的所有资源置于维护状态。集群将停止监控这些资源。

将节点设置为待机模式
处于待机模式的节点不再能够运行资源。该节点上运行的所有资源将被移出或停止(如果没有其他节点可用于运行资源)。另外,该节点上的所有监控操作将会停止(设置了role=”Stopped” 的操作除外)。
如果您需要停止群集中的某个节点,同时继续提供另一个节点上运行的服务,则可以使用此选项。

将资源设置为维护模式
将某个资源设置成此模式后,将不会针对该资源触发监控操作。如果您需要手动调整此资源所管理的服务,并且不希望集群在此期间对该资源运行任何监控操作,则可以使用此选项。

将资源设置为不受管理模式
使用 is-managed 属性可以暂时“释放”某个资源,使其不受群集堆栈的管理。这意味着,您可以手动调整此资源管理的服务。不过,集群将继续监控该资源,并会报告错误的信息。如果您希望集群同时停止监控该资源,请改为使用按资源维护模式。

1.主节点异常后处理

主节点异常时,HAE会触发主备切换,原备节点Node B会被promote为primary,但原主节点Node A仍然是primary角色,因此在原主节点Node A故障修复后启动Pacemaker服务前,需要手工重新配置HANA HSR,将原主节点Node A注册为Secondary

本示例中主节点为saphana-01,备节点为saphana-02

1.1 SUSE HAE的正常状态

登录任意节点,使用crm status命令查询HAE的正常状态

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 14:33:22 2019
  5. Last change: Mon Apr 15 14:33:19 2019 by root via crm_attribute on saphana-01
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-01
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-01
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-01 ]
  14. Slaves: [ saphana-02 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

1.2 主节点出现异常后,HAE自动将备节点promote成primary

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 14:40:43 2019
  5. Last change: Mon Apr 15 14:40:41 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-02 ]
  9. OFFLINE: [ saphana-01 ]
  10. Full list of resources:
  11. rsc_sbd (stonith:external/sbd): Started saphana-02
  12. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  13. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  14. Masters: [ saphana-02 ]
  15. Stopped: [ saphana-01 ]
  16. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  17. Started: [ saphana-02 ]
  18. Stopped: [ saphana-01 ]

1.3 原主节点故障修复,需要重新注册HSR

重新配置HSR之前,一定要先确认主、备节点,配置错误可能会导致数据被覆盖甚至丢失

用HANA实例用户,登录原主节点,配置HSR

  1. h01adm@saphana-01:/usr/sap/H01/HDB00> hdbnsutil -sr_register --remoteHost=saphana-02 --remoteInstance=00 --replicationMode=syncmem --name=saphana-01 --operationMode=logreplay
  2. adding site ...
  3. checking for inactive nameserver ...
  4. nameserver saphana-01:30001 not responding.
  5. collecting information ...
  6. updating local ini files ...
  7. done.

1.4 检查SBD状态

如果发现节点槽的状态不是 “clear”,需要将其设置为 “clear”

  1. # sbd -d /dev/vdc list
  2. 0 saphana-01 reset saphana-02
  3. 1 saphana-02 reset saphana-01
  1. # sbd -d /dev/vdc message saphana-01 clear
  2. # sbd -d /dev/vdc message saphana-02 clear
  3. # sbd -d /dev/vdc list
  4. 0 saphana-01 clear saphana-01
  5. 1 saphana-02 clear saphana-01

1.5 启动pacemaker服务,HAE会自动拉起HANA服务

  1. # systemctl start pacemaker

此时,原备节点成为新主节点,当前HAE状态如下:

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 15:10:58 2019
  5. Last change: Mon Apr 15 15:09:56 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-02
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-02 ]
  14. Slaves: [ saphana-01 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

1.6 检查HANA HSR状态

1.6.1 通过HANA自带python脚本检查

使用HANA实例用户登录主节点,确保所有HANA进程“Replication Status”都是“ACTIVE”

  1. saphana-02:~ # su - h01adm
  2. h01adm@saphana-02:/usr/sap/H01/HDB00> cdpy
  3. h01adm@saphana-02:/usr/sap/H01/HDB00/exe/python_support> python systemReplicationStatus.py
  4. | Database | Host | Port | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary | Replication | Replication | Replication |
  5. | | | | | | | | Host | Port | Site ID | Site Name | Active Status | Mode | Status | Status Details |
  6. | -------- | ---------- | ----- | ------------ | --------- | ------- | ---------- | ---------- | --------- | --------- | ---------- | ------------- | ----------- | ----------- | -------------- |
  7. | SYSTEMDB | saphana-02 | 30001 | nameserver | 1 | 2 | saphana-02 | saphana-01 | 30001 | 1 | saphana-01 | YES | SYNCMEM | ACTIVE | |
  8. | H01 | saphana-02 | 30007 | xsengine | 3 | 2 | saphana-02 | saphana-01 | 30007 | 1 | saphana-01 | YES | SYNCMEM | ACTIVE | |
  9. | H01 | saphana-02 | 30003 | indexserver | 2 | 2 | saphana-02 | saphana-01 | 30003 | 1 | saphana-01 | YES | SYNCMEM | ACTIVE | |
  10. status system replication site "1": ACTIVE
  11. overall system replication status: ACTIVE
  12. Local System Replication State
  13. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  14. mode: PRIMARY
  15. site id: 2
  16. site name: saphana-02

1.6.2 通过SUSE提供的SAPHanaSR工具,查看复制状态,确保备节点的 sync_state为SOK

  1. saphana-02:~ # SAPHanaSR-showAttr
  2. Global cib-time
  3. --------------------------------
  4. global Mon Apr 15 15:17:12 2019
  5. Hosts clone_state lpa_h01_lpt node_state op_mode remoteHost roles site srmode standby sync_state version vhost
  6. ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  7. saphana-01 DEMOTED 30 online logreplay saphana-02 4:S:master1:master:worker:master saphana-01 syncmem SOK 2.00.020.00.1500920972 saphana-01
  8. saphana-02 PROMOTED 1555312632 online logreplay saphana-01 4:P:master1:master:worker:master saphana-02 syncmem off PRIM 2.00.020.00.1500920972 saphana-02

1.7 重置故障计数(可选)

如果资源失败,它将自动重新启动,但是每次失败都会增加资源的故障计数。如果为该资源设置了migration-threshold,当故障数量达到阈值前,节点将不再允许运行该资源,因此我们需要手工清理这个故障计数。

清理故障计数的命令如下:

  1. # crm resource cleanup [resouce name] [node]

例如:节点saphana-01的rsc_SAPHana_HDB的资源已经被修复,这时我们需要cleanup这个监控报警,命令如下:

  1. crm resource cleanup rsc_SAPHana_HDB saphana-01

2.备节点异常后处理

备节点异常时,主节点不受任何影响,HAE不会触发主备切换。当备节点故障恢复后,启动pacemaker服务,会自动拉起HANA服务,主备角色不会发生变化,无需人工干预

本示例中主节点为saphana-02,备节点为saphana-01

2.1 SUSE HAE的正常状态

登录任意节点,使用crm status命令查询HAE的正常状态

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 15:34:52 2019
  5. Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-02
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-02 ]
  14. Slaves: [ saphana-01 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

2.2 备节点故障恢复后,先 检查SBD,再重启pacemaker

  1. # systemctl start pacemaker

HSR保持原主备关系,当前HAE状态如下:

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 15:43:28 2019
  5. Last change: Mon Apr 15 15:43:25 2019 by root via crm_attribute on saphana-01
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-02
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-02 ]
  14. Slaves: [ saphana-01 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

2.3 检查HANA HSR状态

2.4 重置故障计数(可选)

3.主备节点停机维护

将集群设置为维护模式,然后依次将备和主节点关停

本示例中主节点为saphana-02,备节点为saphana-01

3.1 SUSE HAE的正常状态

登录任意节点,使用crm status命令查询HAE的正常状态

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 15:34:52 2019
  5. Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-02
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-02 ]
  14. Slaves: [ saphana-01 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

3.2 将集群和master/slave资源集设置为维护模式

登录主节点,设置集群为维护模式

  1. # crm configure property maintenance-mode=true

将master/slave资源集设置为维护模式,本示例master/slave资源集为rsc_SAPHana_HDB和rsc_SAPHanaTopology_HDB

  1. # crm resource maintenance rsc_SAPHana_HDB true
  2. Performing update of 'maintenance' on 'msl_SAPHana_HDB', the parent of 'rsc_SAPHana_HDB'
  3. Set 'msl_SAPHana_HDB' option: id=msl_SAPHana_HDB-meta_attributes-maintenance name=maintenance=true
  4. # crm resource maintenance rsc_SAPHanaTopology_HDB true
  5. Performing update of 'maintenance' on 'cln_SAPHanaTopology_HDB', the parent of 'rsc_SAPHanaTopology_HDB'
  6. Set 'cln_SAPHanaTopology_HDB' option: id=cln_SAPHanaTopology_HDB-meta_attributes-maintenance name=maintenance=true

3.3 当前HAE的状态如下

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 16:02:13 2019
  5. Last change: Mon Apr 15 16:02:11 2019 by root via crm_resource on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. *** Resource management is DISABLED ***
  9. The cluster will not attempt to start, stop or recover services
  10. Online: [ saphana-01 saphana-02 ]
  11. Full list of resources:
  12. rsc_sbd (stonith:external/sbd): Started saphana-02 (unmanaged)
  13. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02 (unmanaged)
  14. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB] (unmanaged)
  15. rsc_SAPHana_HDB (ocf::suse:SAPHana): Slave saphana-01 (unmanaged)
  16. rsc_SAPHana_HDB (ocf::suse:SAPHana): Master saphana-02 (unmanaged)
  17. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB] (unmanaged)
  18. rsc_SAPHanaTopology_HDB (ocf::suse:SAPHanaTopology): Started saphana-01 (unmanaged)
  19. rsc_SAPHanaTopology_HDB (ocf::suse:SAPHanaTopology): Started saphana-02 (unmanaged)

3.4 停止备-主节点HANA服务,关停ECS进行停机维护任务

用HANA实例用户登录两个节点,再停备节点HANA服务,再停主节点HANA服务

  1. saphana-01:~ # su - h01adm
  2. h01adm@saphana-01:/usr/sap/H01/HDB00> HDB stop
  3. hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
  4. Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400
  5. 15.04.2019 16:46:42
  6. Stop
  7. OK
  8. Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2
  9. 15.04.2019 16:46:54
  10. WaitforStopped
  11. OK
  12. hdbdaemon is stopped.
  13. saphana-02:~ # su - h01adm
  14. h01adm@saphana-02:/usr/sap/H01/HDB00> HDB stop
  15. hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
  16. Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400
  17. 15.04.2019 16:47:05
  18. Stop
  19. OK
  20. Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2
  21. 15.04.2019 16:47:35
  22. WaitforStopped
  23. OK
  24. hdbdaemon is stopped.

3.5 启动HANA ECS主备节点,并将集群和资源集恢复为正常模式

依次登录主和备节点,启动pacemaker服务

  1. # systemctl start pacemaker

将集群和资源集恢复为正常模式

  1. # crm configure property maintenance-mode=false
  2. # crm resource maintenance rsc_SAPHana_HDB false
  3. Performing update of 'maintenance' on 'msl_SAPHana_HDB', the parent of 'rsc_SAPHana_HDB'
  4. Set 'msl_SAPHana_HDB' option: id=msl_SAPHana_HDB-meta_attributes-maintenance name=maintenance=false
  5. # crm resource maintenance rsc_SAPHanaTopology_HDB false
  6. Performing update of 'maintenance' on 'cln_SAPHanaTopology_HDB', the parent of 'rsc_SAPHanaTopology_HDB'
  7. Set 'cln_SAPHanaTopology_HDB' option: id=cln_SAPHanaTopology_HDB-meta_attributes-maintenance name=maintenance=false

SUSE HAE集群会自动将主备节点的HANA服务拉起,并保持原主备角色不变

3.6 当前HAE状态如下

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 16:56:49 2019
  5. Last change: Mon Apr 15 16:56:43 2019 by root via crm_attribute on saphana-01
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-01
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-02 ]
  14. Slaves: [ saphana-01 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

3.7 检查HANA HSR状态

3.8 重置故障计数(可选)

4.主节点停机维护

将集群设置为维护模式,并将备节点设置为standby

本示例中主节点为saphana-02,备节点为saphana-01

4.1 SUSE HAE的正常状态

登录任意节点,使用crm status命令查询HAE的正常状态

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 15:34:52 2019
  5. Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-02
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-02 ]
  14. Slaves: [ saphana-01 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

4.2 将备节点设置为维护模式和standby模式

本示例备节点是saphana-01,首先将其设置为维护模式

  1. # crm node maintenance saphana-01

再将备节点设置为standby模式

  1. # crm node standby saphana-01

4.2 当前HAE的状态如下

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 17:07:56 2019
  5. Last change: Mon Apr 15 17:07:38 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Node saphana-01: standby
  9. Online: [ saphana-02 ]
  10. Full list of resources:
  11. rsc_sbd (stonith:external/sbd): Started saphana-01 (unmanaged)
  12. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  13. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  14. rsc_SAPHana_HDB (ocf::suse:SAPHana): Slave saphana-01 (unmanaged)
  15. Masters: [ saphana-02 ]
  16. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  17. rsc_SAPHanaTopology_HDB (ocf::suse:SAPHanaTopology): Started saphana-01 (unmanaged)
  18. Started: [ saphana-02 ]

4.3 停止主节点HANA服务,关停ECS进行停机维护任务

用HANA实例用户登录主节点,停止HANA服务

  1. saphana-02:~ # su - h01adm
  2. h01adm@saphana-02:/usr/sap/H01/HDB00> HDB stop
  3. hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
  4. Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400
  5. 15.04.2019 16:47:05
  6. Stop
  7. OK
  8. Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2
  9. 15.04.2019 16:47:35
  10. WaitforStopped
  11. OK
  12. hdbdaemon is stopped.

4.4 启动HANA ECS主节点,并将节点恢复为正常模式

登录主节点,启动pacemaker服务

  1. # systemctl start pacemaker

有些情况rsc_sbd资源不在主节点,需要手工将它迁移到主节点

  1. 当前主节点是saphana-02,需要手工迁移过来
  2. rsc_sbd (stonith:external/sbd): Started saphana-01
  3. # crm resource migrate rsc_sbd saphana-02

主节点角色和维护前是一致的

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 17:57:56 2019
  5. Last change: Mon Apr 15 17:57:22 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Node saphana-01: standby
  9. Online: [ saphana-02 ]
  10. Full list of resources:
  11. rsc_sbd (stonith:external/sbd): Started saphana-02
  12. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  13. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  14. rsc_SAPHana_HDB (ocf::suse:SAPHana): Slave saphana-01 (unmanaged)
  15. Masters: [ saphana-02 ]
  16. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  17. rsc_SAPHanaTopology_HDB (ocf::suse:SAPHanaTopology): Started saphana-01 (unmanaged)
  18. Started: [ saphana-02 ]

将备节点恢复为正常模式

  1. saphana-02:~ # crm node ready saphana-02
  2. saphana-02:~ # crm node online saphana-02

SUSE HAE集群会自动将备节点的HANA服务拉起,并保持原主备角色不变

4.5 当前HAE状态如下

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 18:02:33 2019
  5. Last change: Mon Apr 15 18:01:31 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-02
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-02 ]
  14. Slaves: [ saphana-01 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

4.6 检查HANA HSR状态

4.7 重置故障计数(可选)

5.备节点停机维护

将备节点设置为维护模式

本示例中主节点为saphana-02,备节点为saphana-01

5.1 SUSE HAE的正常状态

登录任意节点,使用crm status命令查询HAE的正常状态

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 15:34:52 2019
  5. Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-02
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-02 ]
  14. Slaves: [ saphana-01 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

5.2 将备节点设为维护模式

  1. # crm node maintenance saphana-01

设置生效后,HAE状态如下

  1. Stack: corosync
  2. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  3. Last updated: Mon Apr 15 18:18:10 2019
  4. Last change: Mon Apr 15 18:17:49 2019 by root via crm_attribute on saphana-01
  5. 2 nodes configured
  6. 6 resources configured
  7. Node saphana-01: maintenance
  8. Online: [ saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-02
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. rsc_SAPHana_HDB (ocf::suse:SAPHana): Slave saphana-01 (unmanaged)
  14. Masters: [ saphana-02 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. rsc_SAPHanaTopology_HDB (ocf::suse:SAPHanaTopology): Started saphana-01 (unmanaged)
  17. Started: [ saphana-02 ]

5.3 停止备节点HANA服务,关停ECS进行停机维护任务

用HANA实例用户登录备节点,停止HANA服务

  1. saphana-01:~ # su - h01adm
  2. h01adm@saphana-01:/usr/sap/H01/HDB00> HDB stop
  3. hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
  4. Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400
  5. 15.04.2019 16:47:05
  6. Stop
  7. OK
  8. Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2
  9. 15.04.2019 16:47:35
  10. WaitforStopped
  11. OK
  12. hdbdaemon is stopped.

5.4 启动HANA ECS备节点,并将节点恢复为正常模式

登录备节点,启动pacemaker服务

  1. # systemctl start pacemaker

将备节点恢复为正常模式

  1. saphana-02:~ # crm node ready saphana-01

SUSE HAE集群会自动将备节点的HANA服务拉起,并保持原主备角色不变

5.4 当前HAE状态如下

  1. # crm status
  2. Stack: corosync
  3. Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
  4. Last updated: Mon Apr 15 18:02:33 2019
  5. Last change: Mon Apr 15 18:01:31 2019 by root via crm_attribute on saphana-02
  6. 2 nodes configured
  7. 6 resources configured
  8. Online: [ saphana-01 saphana-02 ]
  9. Full list of resources:
  10. rsc_sbd (stonith:external/sbd): Started saphana-02
  11. rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
  12. Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
  13. Masters: [ saphana-02 ]
  14. Slaves: [ saphana-01 ]
  15. Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
  16. Started: [ saphana-01 saphana-02 ]

5.5 检查HANA HSR状态

5.6 重置故障计数(可选)