SAP系统高可用环境维护指南

更新时间:
重要

本文中含有需要您注意的重要提示信息,忽略该信息可能对您的业务造成影响,请务必仔细阅读。

版本管理

版本

修订日期

变更类型

生效日期

1.0

2019/4/15

1.1

2019/7/30

1.更新故障计数描述

2.更新启停顺序的说明

2019/7/30

SAP高可用环境维护概述

本文档适用基于SUSE HAE 12集群部署的SAP系统应用或SAP HANA ECS实例需要进行运维操作的场景,例如ECS实例规格升降配、SAP应用 / 数据库升级、主/备节点的常规维护、节点发生异常切换等场景的前置和后处理说明。

通过SUSE HAE管理的SAP系统,如果要在集群节点上执行维护任务,可能需要停止该节点上运行的资源、移动这些资源,或者关闭或重启该节点。此外,可能还需要暂时接管集群中资源的控制权。

下面列举的场景以SAP HANA高可用为例,SAP应用高可用维护操作类似。

重要

本文档无法代替标准的SUSE和SAP的安装/管理文档,更多高可用环境维护指导请参考SUSE和SAP的官方文档。

SUSE HAE操作手册请参考:

SAP HANA HSR配置手册请参考:

SAP HANA高可用常见维护场景

SUSE Pacemaker提供了多种选项用于不同需求的维护需求:

将集群设置为维护模式

使用全局集群属性 maintenance-mode 可以一次性将所有资源置于维护状态。集群将停止监控这些资源。

将节点设置为维护模式

一次性将指定节点上运行的所有资源置于维护状态。集群将停止监控这些资源。

将节点设置为待机模式

处于待机模式的节点不再能够运行资源。该节点上运行的所有资源将被移出或停止(如果没有其他节点可用于运行资源)。另外,该节点上的所有监控操作将会停止(设置了role=”Stopped” 的操作除外)。

如果您需要停止集群中的某个节点,同时继续提供另一个节点上运行的服务,则可以使用此选项。

将资源设置为维护模式

将某个资源设置成此模式后,将不会针对该资源触发监控操作。如果您需要手动调整此资源所管理的服务,并且不希望集群在此期间对该资源运行任何监控操作,则可以使用此选项。

将资源设置为不受管理模式

使用 is-managed 属性可以暂时“释放”某个资源,使其不受集群堆栈的管理。这意味着,您可以手动调整此资源管理的服务。不过,集群将继续监控该资源,并会报告错误的信息。如果您希望集群同时停止监控该资源,请改为使用按资源维护模式。

1.主节点异常后处理

重要

主节点异常时,HAE会触发主备切换,原备节点Node B会被promote为primary,但原主节点Node A仍然是primary角色,因此在原主节点Node A故障修复后启动Pacemaker服务前,需要手工重新配置HANA HSR,将原主节点Node A注册为Secondary。

说明

本示例初始状态的主节点为saphana-01,备节点为saphana-02。

1.1 查询SUSE HAE的正常状态

登录任意节点,使用crm status命令查询HAE的正常状态。

# crm status
Stack: corosync
Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 14:33:22 2019
Last change: Mon Apr 15 14:33:19 2019 by root via crm_attribute on saphana-01

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-01
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-01
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-01 ]
     Slaves: [ saphana-02 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

主节点出现异常后,HAE自动将备节点promote成primary。

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 14:40:43 2019
Last change: Mon Apr 15 14:40:41 2019 by root via crm_attribute on saphana-02

2 nodes configured
6 resources configured

Online: [ saphana-02 ]
OFFLINE: [ saphana-01 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-02 ]
     Stopped: [ saphana-01 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-02 ]
     Stopped: [ saphana-01 ]

1.2 重新注册HSR,修复原主节点故障

警告

重新配置HSR之前,一定要先确认主、备节点,配置错误可能会导致数据被覆盖甚至丢失。

用SAP HANA实例用户,登录原主节点,配置HSR。

h01adm@saphana-01:/usr/sap/H01/HDB00> hdbnsutil -sr_register --remoteHost=saphana-02 --remoteInstance=00 --replicationMode=syncmem --name=saphana-01 --operationMode=logreplay
adding site ...
checking for inactive nameserver ...
nameserver saphana-01:30001 not responding.
collecting information ...
updating local ini files ...
done.

1.3 检查SBD状态

如果发现节点槽的状态不是 “clear”,需要将其设置为 “clear”。

# sbd -d /dev/vdc list
0       saphana-01      reset   saphana-02
1       saphana-02      reset   saphana-01
# sbd -d /dev/vdc message saphana-01 clear
# sbd -d /dev/vdc message saphana-02 clear

# sbd -d /dev/vdc list
0       saphana-01      clear   saphana-01
1       saphana-02      clear   saphana-01

1.4 启动pacemaker服务

执行以下命令启动pacemaker服务。启动pacemaker服务后,HAE会自动拉起SAP HANA服务。

# systemctl start pacemaker

此时,原备节点成为新主节点,当前HAE状态如下:

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:10:58 2019
Last change: Mon Apr 15 15:09:56 2019 by root via crm_attribute on saphana-02

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-02 ]
     Slaves: [ saphana-01 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

1.5 检查SAP HANA HSR状态

  • 通过SAP HANA自带python脚本检查

    使用SAP HANA实例用户登录主节点,确保所有SAP HANA进程Replication Status都是ACTIVE

    saphana-02:~ # su - h01adm
    h01adm@saphana-02:/usr/sap/H01/HDB00> cdpy
    h01adm@saphana-02:/usr/sap/H01/HDB00/exe/python_support> python systemReplicationStatus.py 
    | Database | Host       | Port  | Service Name | Volume ID | Site ID | Site Name  | Secondary  | Secondary | Secondary | Secondary  | Secondary     | Replication | Replication | Replication    | 
    |          |            |       |              |           |         |            | Host       | Port      | Site ID   | Site Name  | Active Status | Mode        | Status      | Status Details | 
    | -------- | ---------- | ----- | ------------ | --------- | ------- | ---------- | ---------- | --------- | --------- | ---------- | ------------- | ----------- | ----------- | -------------- | 
    | SYSTEMDB | saphana-02 | 30001 | nameserver   |         1 |       2 | saphana-02 | saphana-01 |     30001 |         1 | saphana-01 | YES           | SYNCMEM     | ACTIVE      |                | 
    | H01      | saphana-02 | 30007 | xsengine     |         3 |       2 | saphana-02 | saphana-01 |     30007 |         1 | saphana-01 | YES           | SYNCMEM     | ACTIVE      |                | 
    | H01      | saphana-02 | 30003 | indexserver  |         2 |       2 | saphana-02 | saphana-01 |     30003 |         1 | saphana-01 | YES           | SYNCMEM     | ACTIVE      |                |
    
    status system replication site "1": ACTIVE
    overall system replication status: ACTIVE
    
    Local System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    mode: PRIMARY
    site id: 2
    site name: saphana-02
  • 通过SUSE提供的SAPHanaSR工具,查看复制状态,确保备节点的 sync_stateSOK

    saphana-02:~ # SAPHanaSR-showAttr
    Global cib-time                 
    --------------------------------
    global Mon Apr 15 15:17:12 2019 
    
    
    Hosts      clone_state lpa_h01_lpt node_state op_mode   remoteHost roles                            site       srmode  standby sync_state version                vhost      
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    saphana-01 DEMOTED     30          online     logreplay saphana-02 4:S:master1:master:worker:master saphana-01 syncmem         SOK        2.00.020.00.1500920972 saphana-01 
    saphana-02 PROMOTED    1555312632  online     logreplay saphana-01 4:P:master1:master:worker:master saphana-02 syncmem off     PRIM       2.00.020.00.1500920972 saphana-02

1.6 (可选)重置故障计数

如果资源失败,它将自动重新启动,但是每次失败都会增加资源的故障计数。如果为该资源设置了migration-threshold,当故障数量达到阈值前,节点将不再允许运行该资源,因此我们需要手工清理这个故障计数。

清理故障计数的命令如下:

# crm resource cleanup [resouce name] [node]

例如:节点saphana-01的rsc_SAPHana_HDB的资源已经被修复,这时我们需要cleanup这个监控报警,命令如下:

crm resource cleanup rsc_SAPHana_HDB saphana-01

2.备节点异常后处理

重要

备节点异常时,主节点不受任何影响,不会触发主备切换动作。当备节点故障恢复后,启动pacemaker服务,会自动拉起SAP HANA服务,主备角色不会发生变化,无需人工干预。

说明

本示例初始状态的主节点为saphana-02,备节点为saphana-01。

2.1 查询HAE的正常状态

以SUSE HAE的正常状态登录任意节点,使用crm status命令查询HAE的正常状态。

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:34:52 2019
Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-02 ]
     Slaves: [ saphana-01 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

2.2 重启pacemaker

备节点故障恢复后,先检查SBD,再重启pacemaker。

# systemctl start pacemaker

HSR保持原主备关系,当前HAE状态如下:

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:43:28 2019
Last change: Mon Apr 15 15:43:25 2019 by root via crm_attribute on saphana-01

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-02 ]
     Slaves: [ saphana-01 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

2.3 检查SAP HANA HSR状态

详细操作,请参见1.5 检查SAP HANA HSR状态

2.4 重置故障计数(可选)

3.主备节点停机维护

重要

将集群设置为维护模式,依次关停备和主节点。

说明

本示例初始状态的主节点为saphana-02,备节点为saphana-01。

3.1 查询HAE的正常状态

以SUSE HAE的正常状态登录任意节点,使用crm status命令查询HAE的正常状态。

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:34:52 2019
Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-02 ]
     Slaves: [ saphana-01 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

3.2 将集群和master/slave资源集设置为维护模式

登录主节点,设置集群为维护模式。

# crm configure property maintenance-mode=true

将master/slave资源集设置为维护模式,本示例master/slave资源集为rsc_SAPHana_HDB和rsc_SAPHanaTopology_HDB。

# crm resource maintenance rsc_SAPHana_HDB true
Performing update of 'maintenance' on 'msl_SAPHana_HDB', the parent of 'rsc_SAPHana_HDB'
Set 'msl_SAPHana_HDB' option: id=msl_SAPHana_HDB-meta_attributes-maintenance name=maintenance=true

# crm resource maintenance rsc_SAPHanaTopology_HDB true
Performing update of 'maintenance' on 'cln_SAPHanaTopology_HDB', the parent of 'rsc_SAPHanaTopology_HDB'
Set 'cln_SAPHanaTopology_HDB' option: id=cln_SAPHanaTopology_HDB-meta_attributes-maintenance name=maintenance=true

当前HAE的状态如下:

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 16:02:13 2019
Last change: Mon Apr 15 16:02:11 2019 by root via crm_resource on saphana-02

2 nodes configured
6 resources configured

              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02 (unmanaged)
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02 (unmanaged)
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB] (unmanaged)
     rsc_SAPHana_HDB    (ocf::suse:SAPHana):    Slave saphana-01 (unmanaged)
     rsc_SAPHana_HDB    (ocf::suse:SAPHana):    Master saphana-02 (unmanaged)
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB] (unmanaged)
     rsc_SAPHanaTopology_HDB    (ocf::suse:SAPHanaTopology):    Started saphana-01 (unmanaged)
     rsc_SAPHanaTopology_HDB    (ocf::suse:SAPHanaTopology):    Started saphana-02 (unmanaged)

3.3 停止备-主节点SAP HANA服务并关停ECS

用SAP HANA实例用户登录两个节点,先停备节点SAP HANA服务,再停主节点SAP HANA服务。

saphana-01:~ # su - h01adm
h01adm@saphana-01:/usr/sap/H01/HDB00> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400

15.04.2019 16:46:42
Stop
OK
Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2


15.04.2019 16:46:54
WaitforStopped
OK
hdbdaemon is stopped.

saphana-02:~ # su - h01adm
h01adm@saphana-02:/usr/sap/H01/HDB00> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400

15.04.2019 16:47:05
Stop
OK
Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2


15.04.2019 16:47:35
WaitforStopped
OK
hdbdaemon is stopped.

3.4 启动SAP HANA ECS主备节点,并将集群和资源集恢复为正常模式

依次登录主和备节点,执行以下命令启动pacemaker服务。

# systemctl start pacemaker

将集群和资源集恢复为正常模式。

# crm configure property maintenance-mode=false
# crm resource maintenance rsc_SAPHana_HDB false
Performing update of 'maintenance' on 'msl_SAPHana_HDB', the parent of 'rsc_SAPHana_HDB'
Set 'msl_SAPHana_HDB' option: id=msl_SAPHana_HDB-meta_attributes-maintenance name=maintenance=false
# crm resource maintenance rsc_SAPHanaTopology_HDB false
Performing update of 'maintenance' on 'cln_SAPHanaTopology_HDB', the parent of 'rsc_SAPHanaTopology_HDB'
Set 'cln_SAPHanaTopology_HDB' option: id=cln_SAPHanaTopology_HDB-meta_attributes-maintenance name=maintenance=false

SUSE HAE集群会自动将主备节点的SAP HANA服务拉起,并保持原主备角色不变。

当前HAE状态如下:

# crm status
Stack: corosync
Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 16:56:49 2019
Last change: Mon Apr 15 16:56:43 2019 by root via crm_attribute on saphana-01

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-01
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-02 ]
     Slaves: [ saphana-01 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

3.5 检查SAP HANA HSR状态

详细操作,请参见1.5 检查SAP HANA HSR状态

3.6 重置故障计数(可选)

详细操作,请参见1.6 (可选)重置故障计数

4.主节点停机维护

重要

主节点将被设置为standby模式,集群将触发切换。

说明

本示例初始状态的主节点为saphana-02,备节点为saphana-01。

4.1 查询SUSE HAE的正常状态

登录任意节点,使用crm status命令查询HAE的正常状态。

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:34:52 2019
Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-02 ]
     Slaves: [ saphana-01 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

4.2 将主节点设置standby模式

本示例主节点是saphana-02。

# crm node standby saphana-02

集群会停掉saphana-02节点的SAP HANA,并将saphana-01节点的SAP HANA设置为主节点。

当前HAE的状态如下:

# crm status
Stack: corosync
Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 17:07:56 2019
Last change: Mon Apr 15 17:07:38 2019 by root via crm_attribute on saphana-02

2 nodes configured
6 resources configured

Node saphana-02: standby
Online: [ saphana-01 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-01
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-01
 Clone Set: msl_SAPHana_HDB [rsc_SAPHana_HDB] (promotable)
     Masters: [ saphana-01 ]
     Stopped: [ saphana-02 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 ]
     Stopped: [ saphana-02 ]

4.3 关停ECS,执行停机维护任务

4.4 启动维护节点,重新注册HSR

登录被维护节点,注册HSR。

# hdbnsutil -sr_register --remoteHost=saphana-01 --remoteInstance=00 --replicationMode=syncmem --name=saphana-02 --operationMode=logreplay

4.5 启动pacemaker服务,并将standby节点恢复成online模式

# systemctl start pacemaker
# crm node online saphana-02

SUSE HAE集群会自动将备节点的SAP HANA服务拉起。

当前HAE状态如下:

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 18:02:33 2019
Last change: Mon Apr 15 18:01:31 2019 by root via crm_attribute on saphana-02

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-01
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-01
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-01 ]
     Slaves: [ saphana-02 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

4.6 检查SAP HANA HSR状态

详细操作,请参见1.5 检查SAP HANA HSR状态

4.7 重置故障计数(可选)

详细操作,请参见1.6 (可选)重置故障计数

5.备节点停机维护

重要

将备节点设置为维护模式。

说明

本示例初始状态的主节点为saphana-02,备节点为saphana-01。

5.1 查询HAE的正常状态。

SUSE HAE的正常状态登录任意节点,使用crm status命令查询HAE的正常状态。

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:34:52 2019
Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-02 ]
     Slaves: [ saphana-01 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

5.2 将备节点设为维护模式

# crm node maintenance saphana-01

设置生效后,HAE状态如下:

Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 18:18:10 2019
Last change: Mon Apr 15 18:17:49 2019 by root via crm_attribute on saphana-01

2 nodes configured
6 resources configured

Node saphana-01: maintenance
Online: [ saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     rsc_SAPHana_HDB    (ocf::suse:SAPHana):    Slave saphana-01 (unmanaged)
     Masters: [ saphana-02 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     rsc_SAPHanaTopology_HDB    (ocf::suse:SAPHanaTopology):    Started saphana-01 (unmanaged)
     Started: [ saphana-02 ]

5.3 停止备节点SAP HANA服务,关停ECS进行停机维护任务

用SAP HANA实例用户登录备节点,停止SAP HANA服务。

saphana-01:~ # su - h01adm
h01adm@saphana-01:/usr/sap/H01/HDB00> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400

15.04.2019 16:47:05
Stop
OK
Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2


15.04.2019 16:47:35
WaitforStopped
OK
hdbdaemon is stopped.

5.4 启动SAP HANA ECS备节点,并将节点恢复为正常模式

登录备节点,启动pacemaker服务。

# systemctl start pacemaker

将备节点恢复为正常模式。

saphana-02:~ # crm node ready saphana-01

SUSE HAE集群会自动将备节点的SAP HANA服务拉起,并保持原主备角色不变。

当前HAE状态如下:

# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 18:02:33 2019
Last change: Mon Apr 15 18:01:31 2019 by root via crm_attribute on saphana-02

2 nodes configured
6 resources configured

Online: [ saphana-01 saphana-02 ]

Full list of resources:

rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-02
 Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
     Masters: [ saphana-02 ]
     Slaves: [ saphana-01 ]
 Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
     Started: [ saphana-01 saphana-02 ]

5.5 检查SAP HANA HSR状态

详细操作,请参见1.5 检查SAP HANA HSR状态

5.6 重置故障计数(可选)

详细操作,请参见1.6 (可选)重置故障计数