本文档介绍了如何在公共云环境同可用区部署SAP HANA的高可用环境。

背景信息

镜像版本支持SLES for SAP 11/12/15。

名词解释

  • VPC

    专有网络VPC(Virtual Private Cloud)是基于阿里云构建的一个隔离的网络环境,专有网络之间逻辑上彻底隔离。专有网络是您自己独有的的云上私有网络。

  • ECS

    云服务器ECS(Elastic Compute Service)是阿里云提供的性能卓越、稳定可靠、 弹性扩展的IaaS(Infrastructure as a Service)级别云计算服务。

  • ENI

    弹性网卡ENI(Elastic Network Interface)是一种可以附加到专有网络VPC类型 ECS实例上的虚拟网卡,通过弹性网卡,您可以实现高可用集群搭建、低成本故障 转移和精细化的网络管理。

  • HAVIP

    高可用虚拟IP HAVIP(Private High-Availability Virtual IP Address),是一种可以 独立创建和释放的私网IP资源。这种私网IP的特殊之处在于,用户可以在ECS上 使用ARP协议进行该IP的宣告。

  • 共享块存储

    共享块存储是一种支持多台ECS实例并发读写访问的数据块级存储设备,具备多并 发、高性能、高可靠等特性。常用于高可用架构数据库集群Oracle RAC(Real Application Cluster)以及高可用架构服务器集群(High-availability cluster)的场景。

  • 地域

    地域(Region)是指物理的数据中心。资源创建成功后不能更换地域。

  • 可用区

    可用区(Zone)是指在同一地域内,电力和网络互相独立的物理区域。同一可用区内实例之间的网络延时更小。

架构介绍

阿里公共云支持SAP HANA在同可用区的高可用部署,通过配置SAP HANA的System Replication功能,结合Suse HAE实现自动切换。

结构介绍

准备工作

  • SAP安装介质准备
    工具 访问方式 备注
    Windows跳板机 在跳板机上安装SAP Download Manager下载介质上传到OSS或直接挂载到ECS 跳板机需使用EIP或NAT具备公网访问能力
    OSS工具 将本机介质通过OSS工具(如ossutil)上传到客户的 oss bucket
  • 网络规划
    网络 地域 用途 子网
    业务网 华东2 可用区F For Business/SR 192.168.10.0/24
    心跳网 华东2 可用区F For HA 192.168.20.0/24
  • 主机规划
    主机名 角色 心跳地址 业务地址 高可用虚拟IP(HAVIP)
    saphana-01 HANA主节点 192.168.20.19 192.168.10.168 192.168.10.12
    saphana-02 HANA备节点 192.168.20.20 192.168.10.169
  • 文件系统规划
    类型 大小 文件系统 VG LVM条带 挂载点
    数据盘 800G XFS datavg /hana/data
    数据盘 400G XFS datavg /hana/log
    数据盘 300G XFS datavg /hana/shared
    数据盘 50G XFS sapvg /usr/sap

云资源配置

创建VPC和ECS

使用ECS之前,您需要先创建VPC和Vswitch。请根据实际情况创建所需的VPC和 Vswitch。

  1. 登录阿里云控制台https://vpc.console.aliyun.com/
  2. 按规划在上海可用区F,创建 192.168.0.0/16的网段和192.168.10.0/24(业务)、192.168.20.0/24(心跳)这两个子网。
  3. 登录阿里云控制台https://ecs.console.aliyun.com/
  4. 按规划创建两台HANA ECS。

创建其他云资源

在部署SAP同可用区高可用环境前,您需要先创建共享块存储和高可用虚拟IP。

共享块存储作为高可用集群的STNOITH设备,用来fence故障节点。

高可用虚拟IP作为集群中的虚拟IP挂载到集群中的活动主节点,本示例作为HANA实例对 外提供服务的虚拟IP地址。

  1. 登录阿里云控制台,云服务器>存储与快照>共享块存储,创建共享块存储;在 ECS同地域同可用区下,创建一个20G SSD共享块存储。
  2. 创建完后分别挂载到集群中两台ECS实例。
  3. 登录阿里云控制台,云服务器>网络与安全>专有网络VPC>高可用虚拟IP,创建高可用虚拟IP;按规划这里创建192.168.10.12,并挂载到刚才新建的两台HANA ECS。

HANA ECS配置

  • 主机名及DNS解析

    分别在HA 集群两台HANA 服务器上,实现两台HANA ECS之间的主机名称解析。

    本示例的/etc/hosts:

    127.0.0.1       localhost
    192.168.10.168  saphana-01      saphana-01
    192.168.10.169  saphana-02      saphana-02
    
    192.168.20.19   hana-ha01 hana-ha01
    192.168.20.20   hana-ha02 hana-ha02
  • ECS SSH互信

    HA集群的两台HANA ECS的需要配置SSH互信。

    • 配置认证公钥

      在HANA主节点执行如下命令:

      saphana-01:~ # ssh-keygen -t rsa
      Generating public/private rsa key pair.
      Enter file in which to save the key (/root/.ssh/id_rsa): 直接回车
      Enter passphrase (empty for no passphrase): 直接回车
      Enter same passphrase again: 直接回车
      Your identification has been saved in /root/.ssh/id_rsa.
      Your public key has been saved in /root/.ssh/id_rsa.pub.
      The key fingerprint is:
      SHA256:6lX54zFixfUF7Ni+yEn8+lzBjj4XSF4QoVjznKNx15M root@saphana-01
      The key's randomart image is:
      +---[RSA 2048]----+
      |           o ++  |
      |          o =.o.o|
      |         . o XoEo|
      |           o=o*oo|
      |        S oo=.oo.|
      |       . . ooo+..|
      |      . . oo== oo|
      |     . . . o=*oo |
      |      .     o+=  |
      +----[SHA256]-----+
      saphana-01:~ # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.10.169
      /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
      The authenticity of host '192.168.10.169 (192.168.10.169)' can't be established.
      ECDSA key fingerprint is
      SHA256:iD5aepnspZcREGbGJpExnMd3YGXPM8FcmSq66KLCgsk.
      Are you sure you want to continue connecting (yes/no)? yes
      /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are
      already installed
      /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to
      install the new keys
      Password: 输入备节点root密码
      
      Number of key(s) added: 1
      
      Now try logging into the machine, with:   "ssh 'root@192.168.10.169'"
      and check to make sure that only the key(s) you wanted were added.
      
      saphana-01:~ #

      在hana备节点上执行如下命令:

      saphana-02:~ # ssh-keygen -t rsa
      Generating public/private rsa key pair.
      Enter file in which to save the key (/root/.ssh/id_rsa):
      Enter passphrase (empty for no passphrase):
      Enter same passphrase again:
      Your identification has been saved in /root/.ssh/id_rsa.
      Your public key has been saved in /root/.ssh/id_rsa.pub.
      The key fingerprint is:
      SHA256:116JLe/MTR494dejsZkgrtvfFdL6+WwGnmcc9QD38Zc root@saphana-02
      The key's randomart image is:
      +---[RSA 2048]----+
      |                 |
      |            . .. |
      |             o .+|
      |           . ooE=|
      |        S . +.+=+|
      |         . . +=.*|
      |         . ..+.X*|
      |        o . o+@=%|
      |       ooo.. *+B*|
      +----[SHA256]-----+
      saphana-02:~ # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.10.169
      /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
      The authenticity of host '192.168.10.169 (192.168.10.169)' can't be established.
      ECDSA key fingerprint is
      SHA256:iD5aepnspZcREGbGJpExnMd3YGXPM8FcmSq66KLCgsk.
      Are you sure you want to continue connecting (yes/no)? yes
      /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are
      already installed
      /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to
      install the new keys
      Password:
      
      Number of key(s) added: 1
      
      Now try logging into the machine, with:   "ssh 'root@192.168.10.169'"
      and check to make sure that only the key(s) you wanted were added.
    • 配置验证结果

      分别在两个节点上,使用 SSH 登录另外一个节点,如果不需要密码登录,则说明互信已经建立。

      在hana主节点,进行验证:

      saphana-01:~ # ssh saphana-02
      Last login: Mon Apr 22 23:36:01 2019 from 192.168.10.168
      
      Welcome to Alibaba Cloud Elastic Compute Service !
      saphana-02:~ #

      在hana备节点,进行验证:

      saphana-02:~ # ssh saphana-01
      Last login: Mon Apr 22 23:36:21 2019 from 192.168.10.169
      
      Welcome to Alibaba Cloud Elastic Compute Service !
      
      saphana-01:~ #
  • 部署ECS Metrics Collector for SAP监控代理

    ECS Metrics Collector监控代理程序,用于云平台上SAP系统收集需要的虚拟机配置信息和 底层物理资源使用相关的信息,供日后做性能统计和问题分析使用。 每台SAP应用和数据库都需要安装Metrics Collector,监控代理的部署请参考ECS Metrics Collector for SAP部署指南

  • 文件系统划分

    按前面的文件系统规划,用LVM来管理和配置云盘(集群两个节点)。

    有关LVM分区的介绍,请参考LVM HOWTO

    • 创建PV和VG
      # pvcreate /dev/vdb /dev/vdc /dev/vdd /dev/vdg
        Physical volume "/dev/vdb" successfully created
        Physical volume "/dev/vdc" successfully created
        Physical volume "/dev/vdd" successfully created
        Physical volume "/dev/vdg" successfully created
      # vgcreate hanavg /dev/vdb /dev/vdc /dev/vdd
        Volume group "hanavg" successfully created
      # vgcreate sapvg /dev/vdg
        Volume group "sapvg" successfully created
    • 创建LV
      # lvcreate -l 100%FREE -n usrsaplv sapvg
        Logical volume "usrsaplv" created.
      将三块500G的SSD云盘配置条带化
       # lvcreate -L 800G -n datalv -i 3 -I 64 hanavg
        Rounding size (204800 extents) up to stripe boundary size (204801 extents).
        Logical volume "datalv" created.
       # lvcreate -L 400G -n loglv -i 3 -I 64 hanavg
        Rounding size (102400 extents) up to stripe boundary size (102402 extents).
        Logical volume "loglv" created.
      # lvcreate -l 100%FREE -n sharedlv -i 3 -I 64 hanavg
        Rounding size (38395 extents) down to stripe boundary size (38394 extents)
        Logical volume "sharedlv" created.
    • 创建挂载点并格式化文件系统
      # mkdir -p /usr/sap /hana/data /hana/log /hana/shared
      # mkfs.xfs /dev/sapvg/usrsaplv
      # mkfs.xfs /dev/hanavg/datalv
      # mkfs.xfs /dev/hanavg/loglv
      # mkfs.xfs /dev/hanavg/sharedlv
    • 挂载文件系统并加到开机自启动项
      # vim /etc/fstab
      添加下列项:
      /dev/mapper/hanavg-datalv  /hana/data  xfs defaults    0   0
      /dev/mapper/hanavg-loglv   /hana/log   xfs defaults    0   0
      /dev/mapper/hanavg-sharedlv    /hana/shared    xfs defaults    0   0
      /dev/mapper/sapvg-usrsaplv /usr/sap    xfs defaults    0   0
      /dev/vdf    swap    swap    defaults    0   0
      # mount -a
      # df -h
      Filesystem                    Size  Used Avail Use% Mounted on
      devtmpfs                       32G     0   32G   0% /dev
      tmpfs                          48G   55M   48G   1% /dev/shm
      tmpfs                          32G  768K   32G   1% /run
      /dev/vda1                      99G   30G   64G  32% /
      tmpfs                         6.3G   16K  6.3G   1% /run/user/0
      /dev/mapper/hanavg-datalv    800G   34M  800G   1% /hana/data
      /dev/mapper/sapvg-usrsaplv    50G   33M   50G   1% /usr/sap
      /dev/mapper/hanavg-loglv     400G   33M  400G   1% /hana/log
      /dev/mapper/hanavg-sharedlv  300G   33M  300G   1% /hana/shared

安装SAP HANA并配置HANA System replication

  • 安装SAP HANA

    HANA的主节点和备节点的System ID和Instance ID要相同。本示例的HANA的System ID为H01,Instance ID为00。

    有关SAP HANA的安装请参考SAP HANA Platform

  • 配置HANA System Replication

    有关SAP HANA System Replication的配置请参考How To Perform System Replication for SAP HANA

SLES HAE安装与配置

  • 安装SLES HAE软件

    有关SLES HAE的文档请参考SUSE Linux Enterprise High Availability Extension 12

    在主、备节点,检查是否已经安装HAE组件和SAPHanaSR组件。

    注意 本示例使用的是SLES for SAP 12 SP3 CSP(Cloud Service Provider)镜像,此镜像已经预置 了阿里云SUSE SMT Server配置,可直接进行组件检查和安装。如果您使用的是自定义镜像,请先购买SUSE授权以获得并注册到SUSE官方的SMT Server或者手工配置zypper repository源,才能进行后面的操作。

    配置SLES HAE和安装管理SAP HANA资源,需要以下组件:

    • patterns-ha-ha_sles
    • SAPHanaSR
    • sap_suse_cluster_connector
    • patterns-sle-gnome-basic

    用以下命令安装需要的组件:

    zypper in patterns-ha-ha_sles SAPHanaSR sap_suse_cluster_connector
  • 配置集群

    本示例使用VNC打开图形界面,在HANA主节点上配置Corosync。

    yast2 cluster
    • 配置communication channels
      • Channel选择心跳网段,Redundant Channel选择业务网段。
      • 按正确的顺序依次添加Member address(前心跳地址,后业务地址)。
      • Excepted Votes: 2。
      • Transport: Unicast。配置communication channels
    • 配置Security

      选中Enable Security Auth,单击Generate Auth Key File

      配置Security
    • 配置Csync2
      • 添加Sync host
      • 点击Add Suggested Files
      • 点击Generate Pre-Shared-Keys
      • 点击Turn csync2 ON
      配置Csync2

      Configure conntrackd这一步使用默认,直接下一步。

      配置Csync2-2
    • 配置Service

      确认Cluster服务不要设成开机自启动。

      配置Service

      配置完成后保存退出,将Corosync配置文件复制到hana备节点,在主节点执行以下命令:

      #sudo scp -pr /etc/corosync/authkey /etc/corosync/corosync.conf root@saphana-
      02:/etc/corosync/
  • 验证集群状态

    在两个节点中分别执行如下命令,启动pacemaker服务。

    # systemctl start pacemaker

    确认集群中两个节点的状态为online。

    # crm_mon -r
    Stack: corosync
    Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
    Last updated: Tue Apr 23 11:22:38 2019
    Last change: Tue Apr 23 11:22:36 2019 by hacluster via crmd on saphana-02
    
    2 nodes configured
    0 resources configured
    
    Online: [ saphana-01 saphana-02 ]
    
    No resources

    激活hawk2的web服务。

    # passwd hacluster
    New password:
    Retype new password:
    passwd: password updated successfully
    
    # systemctl restart hawk.service

    设置hawk2服务开机自启动。

    # systemctl enable hawk.service

配置SBD(仲裁盘)

请确认已经按规划将20G的共享块存储正确挂载到了两台ECS上,本示例仲裁盘为 /dev/def

saphana-01:~ # lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    253:0    0  100G  0 disk
└─vda1 253:1    0  100G  0 part /
vdb    253:16   0  500G  0 disk
vdc    253:32   0  500G  0 disk
vdd    253:48   0  500G  0 disk
vde    253:64   0   64G  0 disk
vdf    253:80   0   20G  0 disk

saphana-02:~ # lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    253:0    0  100G  0 disk
└─vda1 253:1    0  100G  0 part /
vdb    253:16   0  500G  0 disk
vdc    253:32   0  500G  0 disk
vdd    253:48   0  500G  0 disk
vde    253:64   0   64G  0 disk
vdf    253:80   0   20G  0 disk
  • 配置 watchdog(集群两个节点)
    # echo "modprobe softdog" > /etc/init.d/boot.local
    # echo "softdog" > /etc/modules-load.d/watchdog.conf
    # modprobe softdog
    
    **watchdog配置检查**
    saphana-01:~ # ls -l /dev/watchdog
    crw------- 1 root root 10, 130 Apr 23 12:09 /dev/watchdog
    saphana-01:~ # lsmod | grep -e wdt -e dog
    softdog                16384  0
    saphana-01:~ # grep -e wdt -e dog /etc/modules-load.d/watchdog.conf
    softdog
  • 配置SBD(集群两个节点)
    # sbd -d /dev/vdf create
    Initializing device /dev/vdf
    Creating version 2.1 header on device 4 (uuid: e3874a81-47ae-4578-b7a2-4b32bd139e07)
    Initializing 255 slots on device 4
    Device /dev/vdf is initialized.
    # sbd -d /dev/vdf dump
    ==Dumping header on disk /dev/vdf
    Header version     : 2.1
    UUID               : e3874a81-47ae-4578-b7a2-4b32bd139e07
    Number of slots    : 255
    Sector size        : 512
    Timeout (watchdog) : 5
    Timeout (allocate) : 2
    Timeout (loop)     : 1
    Timeout (msgwait)  : 10
    ==Header on disk /dev/vdf is dumped
    
    配置SBD参数:
    # vim /etc/sysconfig/sbd
    
    修改以下参数:
    SBD_DEVICE="/dev/vdf" - 修改成SBD的云盘设备ID
    SBD_STARTMODE="clean"
    SBD_OPTS="-W"
  • 验证SBD服务

    两个节点分别启动sbd:

    #/usr/share/sbd/sbd.sh start

    验证SBD进程:

    # ps -ef | grep sbd
    root     16148     1  0 14:02 pts/0    00:00:00 sbd: inquisitor
    root     16150 16148  0 14:02 pts/0    00:00:00 sbd: watcher: /dev/vdf - slot: 1 - uuid:
    9b620112-1031-48b8-9510-8e1b77032472
    root     16151 16148  0 14:02 pts/0    00:00:00 sbd: watcher: Pacemaker
    root     16152 16148  0 14:02 pts/0    00:00:00 sbd: watcher: Cluster
    root     16162 15254  0 14:05 pts/0    00:00:00 grep --color=auto sbd
    检查SBD状态
    #sbd -d /dev/vdf list

    确保两个节点的状态为clear:

    # /usr/bin # sbd -d /dev/vdf list
    0       saphana-01      clear
    1       saphana-02      clear

    SBD fence验证:

    说明 请确保被fence的节点重要的服务进程已关闭。

    本示例,登录主节点saphana01,准备fence掉备节点saphana02:

    saphana-01 # sbd -d /dev/vdf message saphana-02 reset

    如果备节点saphana-02正常重启,表示SBD盘配置成功。

SAP HANA与SLES HAE集成

  • 使用SAPHanaSR配置SAP HANA资源

    在任意集群节点,新建脚本⽂件,替换脚本中的HANA SIDInstance NumberHAVIP三个参数。本示例中,HANA SID:H01,Instance Number:00,HAVIP:192.168.10.12,脚本⽂件名HANA_HA_script.sh

    ###SAP HANA Topology is a resource agent that monitors and analyze the HANA
    landscape and communicate the status between two nodes##
    primitive rsc_SAPHanaTopology_HDB ocf:suse:SAPHanaTopology \
            operations $id=rsc_SAPHanaTopology_HDB-operations \
            op monitor interval=10 timeout=600 \
            op start interval=0 timeout=600 \
            op stop interval=0 timeout=300 \
            params SID=H01 InstanceNumber=00
    ###This file defines the resources in the cluster together with the Virtual IP###
    primitive rsc_SAPHana_HDB ocf:suse:SAPHana \
            operations $id=rsc_SAPHana_HDB-operations \
            op start interval=0 timeout=3600 \
            op stop interval=0 timeout=3600 \
            op promote interval=0 timeout=3600 \
            op monitor interval=60 role=Master timeout=700 \
            op monitor interval=61 role=Slave timeout=700 \
            params SID=H01 InstanceNumber=00 PREFER_SITE_TAKEOVER=true
    DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
    #This is for sbd setting##
    primitive rsc_sbd stonith:external/sbd \
            op monitor interval=20 timeout=15 \
            meta target-role=Started maintenance=false
    #This is for VIP resource setting##
    primitive rsc_vip IPaddr2 \
            operations $id=rsc_vip-operations \
            op monitor interval=10s timeout=20s \
            params ip=192.168.10.12
    ms msl_SAPHana_HDB rsc_SAPHana_HDB \
            meta is-managed=true notify=true clone-max=2 clone-node-max=1 target-
    role=Started interleave=true maintenance=false
    clone cln_SAPHanaTopology_HDB rsc_SAPHanaTopology_HDB \
            meta is-managed=true clone-node-max=1 target-role=Started interleave=true
    maintenance=false
    colocation col_saphana_ip_HDB 2000: rsc_vip:Started msl_SAPHana_HDB:Master
    order ord_SAPHana_HDB 2000: cln_SAPHanaTopology_HDB msl_SAPHana_HDB
    property cib-bootstrap-options: \
            have-watchdog=true \
            cluster-infrastructure=corosync \
          cluster-name=cluster \
            no-quorum-policy=ignore \
            stonith-enabled=true \
            stonith-action=reboot \
            stonith-timeout=150s
    op_defaults op-options: \
            timeout=600 \
            record-pending=true

    运行以下命令使HAE接管SAP HANA:

    crm configure load update HANA_HA_script.sh
  • 验证集群状态

    正常的集群资源状态:

    • sbd和vip资源在当前的主节点。
    • SAPHana_HDB资源,在master和slave节点分别为绿色。
    • SAPHanaTopolopy资源,在master和slave节点同时为绿色。

    可以通过crmsh或hawk图形化界面来管理和配置HAE资源。

    • 通过Hawk web管理

      登录 https://<ECS IP address>:7630

      通过Hawk web管理通过Hawk web管理1
    • 通过crmsh管理
      # crm_mon -r
      Stack: corosync
      Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
      Last updated: Wed Apr 24 11:48:38 2019
      Last change: Wed Apr 24 11:48:35 2019 by root via crm_attribute on saphana-01
      
      2 nodes configured
      6 resources configured
      
      Online: [ saphana-01 saphana-02 ]
      
      Full list of resources:
      
      rsc_sbd (stonith:external/sbd): Started saphana-01
      rsc_vip (ocf::heartbeat:IPaddr2):       Started saphana-01
       Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
           Masters: [ saphana-01 ]
           Slaves: [ saphana-02 ]
       Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
           Started: [ saphana-01 saphana-02 ]

关联文档