备份还原

更新时间: 2022-07-19 15:17:24

为了解决客户业务的高可用问题,遇到故障能够快速恢复正常,ADP底座对于客户的运行时数据提供了集群、产品、组件等维度的存储备份及还原。

功能概述

ADP底座提供Kubernetes原生数据备份、还原功能:

  • 数据备份、还原基于规则策略,例如集群整体备份、按产品维度备份。

  • 单一 Workload 数据备份、还原。

Spec定义

备份存储配置 -BackupStorageLocation

# 文件备份 S3 repo 配置,需要确保 bucket 已经创建
# reference: https://velero.io/docs/v1.7/api-types/backupstoragelocation/
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: opcc-default
  # ADP 底座控制平面 NS, i.e., acs-system
  namespace: acs-system
spec:
  # 配置 Velero default storage location, 设置为 true 如果要跟 Velero backup 共享一个 bucket
  default: true
  provider: aws
  config:
    # region - 对象存储 region, AlibabaCloud OSS 需要填入对应的 region ID, i.e., cn-shanghai
    region: minio
    # s3ForcePathStyle - 注意使用 MinIO 需要配置为 'true';AlibabaCloud OSS 由于是 S3 完全兼容需要配置为 'false' 
    s3ForcePathStyle: "true"
    # S3 service URL, i.e., http://minio-minio-svc.default:9000 ; AlibabaCloud OSS S3 endpoints 文档 https://help.aliyun.com/document_detail/31837.htm?spm=a2c4g.11186623.0.0.3cc223c94uOcwY#concept-zt4-cvy-5db
    s3Url: <s3 URL>
  objectStorage:
    # bucket name
    bucket: <bucket>
  
  credential:
    key: cloud
    # credential 对应的 secret name
    name: opcc-default-storage
---
apiVersion: v1
kind: Secret
metadata:
  name: opcc-default-storage
  # ADP 底座控制平面 NS, i.e., acs-system
  namespace: acs-system
data:
  AWS_ACCESS_KEY_ID: <access key ID>
  AWS_SECRET_ACCESS_KEY: <secret>
  # AWS_SESSION_TOKEN: <session token>
  cloud: |
    [default]
    aws_access_key_id=<access key ID>
    aws_secret_access_key=<secret>
    # aws_session_token=<session token>

创建数据策略备份 -Backup

# 数据备份 https://velero.io/docs/v1.7/api-types/backup/
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: myapp-snap-202111051600
  # ADP 底座控制平面 NS, i.e., acs-system
  namespace: acs-system
  annotations:
    # 关联 ADP 底座 OPCC Policy Backup watcher
   "opcc.cnx.aliyun-inc.com/bound": "true"
spec:
  includedNamespaces:
  - '*'
  includedResources:
  - '*'
  excludedResources:
  - storageclasses.storage.k8s.io
  # Individual objects must match this label selector to be included in the backup. Optional.
  labelSelector:
    matchLabels:
      'adp.aliyuncs.com/application-name': adp

  snapshotVolumes: true
  # Where to store the tarball and logs.
  storageLocation: opcc-default
  ttl: 24h0m0s

status:
  # The version of this Backup. The only version supported is 1.
  version: 1
  # The date and time when the Backup is eligible for garbage collection.
  expiration: null
  # The current phase. Valid values are New, FailedValidation, InProgress, Completed, PartiallyFailed, Failed.
  phase: ""
  # An array of any validation errors encountered.
  validationErrors: null
  # Date/time when the backup started being processed.
  startTimestamp: 2019-04-29T15:58:43Z
  # Date/time when the backup finished being processed.
  completionTimestamp: 2019-04-29T15:58:56Z
  # Number of volume snapshots that Velero tried to create for this backup.
  volumeSnapshotsAttempted: 2
  # Number of volume snapshots that Velero successfully created for this backup.
  volumeSnapshotsCompleted: 1
  # Number of warnings that were logged by the backup.
  warnings: 2
  # Number of errors that were logged by the backup.
  errors: 0

创建定时策略备份 -Schedule

# 数据定期备份 https://velero.io/docs/v1.7/api-types/schedule/
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: myapp-schedule
  # ADP 底座控制平面 NS, i.e., acs-system
  namespace: acs-system
  annotations:
    # 关联 ADP 底座 OPCC Policy Backup watcher
   "opcc.cnx.aliyun-inc.com/bound": "true"
spec:
  # schedule - CRON 表达式指定备份周期
  schedule: 0 7 * * *
  # template - Backup spec 模板
  template:
    includedNamespaces:
    - '*'
    includedResources:
    - '*'
    excludedResources:
    - storageclasses.storage.k8s.io
    # Individual objects must match this label selector to be included in the backup. Optional.
    labelSelector:
      matchLabels:
        'adp.aliyuncs.com/application-name': adp
  
    snapshotVolumes: true
    # Where to store the tarball and logs.
    storageLocation: opcc-default
    ttl: 24h0m0s

status:
  # phase - 定期备份状态: New, FailedValidation, InProgress, Completed, PartiallyFailed, Failed.
  phase: ""
  # lastBackup - 上一次的备份时间
  lastBackup:
  # validationErrors - 校验错误信息
  validationErrors: []

创建数据策略还原 -Restore

# 数据还原 https://velero.io/docs/v1.7/api-types/restore/
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: myapp-snap-202111051600
  # Restore namespace. Must be the namespace of the Velero server. Required.
  namespace: acs-system
  annotations:
    # 关联 OPCC Policy Backup watcher
   "opcc.cnx.aliyun-inc.com/bound": "true"
spec:
  # BackupName is the unique name of the Velero backup to restore from.
  backupName: myapp-snap-202111051600

  # NamespaceMapping is a map of source namespace names to
  # target namespace names to restore into. Any source namespaces not
  # included in the map will be restored into namespaces of the same name.
  namespaceMapping:
    namespace-backup-from: namespace-to-restore-to


# RestoreStatus captures the current status of a Velero restore. Users should not set any data here.
status:
  # The current phase. Valid values are New, FailedValidation, InProgress, Completed, PartiallyFailed, Failed.
  phase: ""
  # An array of any validation errors encountered.
  validationErrors: null
  # Number of warnings that were logged by the restore.
  warnings: 2
  # Errors is a count of all error messages that were generated
  # during execution of the restore. The actual errors are stored in object
  # storage.
  errors: 0
  # FailureReason is an error that caused the entire restore
  # to fail.
  failureReason:

创建workload 数据备份 - BackupJob

apiVersion: opcc.cnx.aliyun-inc.com/v1alpha1
kind: BackupJob
metadata:
  name: <backupJobName>
  namespace: <instanceNS>
spec:
  instanceName: <instanceName>
  instanceKind: <instanceKind>
  instanceAPIGroup: <apiGroup>
  instanceAPIVersion: <apiVersion>
  # repo - 备份文件会存放在指定 storage repo
  repo:
  # ttl - optional backup retain for specific period duration, e.g., 24h0m0s
  ttl: 

status:
  # phase - The current phase. Valid values are New, InProgress , Completed, FailedValidation, PartiallyFailed, Failed, Deleting
  phase: 
  # expiration - optional expiration timestamp
  expiration: "<RFC3399-Timestamp>"
  # validationErrors - An array of any validation errors encountered.
  validationErrors: []
  # startTimestamp - optional start timestamp
  startTimestamp: "<RFC3399-Timestamp>"
  # completionTimestamp - optional completion timestamp
  completionTimestamp: "<RFC3399-Timestamp>"

创建workload 数据定时备份 - SchduledBackup

定期备份功能,在 CRON 指定的时间点创建 'backupjobs.opcc.cnx.aliyun.com' CR,并可以指定保留多少备份及过期时间

apiVersion: opcc.cnx.aliyun-inc.com/v1alpha1
kind: BackupCronJob
metadata:
  name: <backupCronjobName>
  namespace: <instanceNS>
spec:
  instanceName: <instanceName>
  instanceKind: <instanceKind>
  instanceAPIGroup: <apiGroup>
  instanceAPIVersion: <apiVersion>
  # schedule - Cron expression in UTC clock, i.e., 3AM in Asia/Shanghai TZ daily will be '0 19 * * *'. The actual start time won't be exact cron expression period, it's approximate start time according to backup task manager
  schedule:
  # ttl - backup retain for specific period duration, e.g., 1y5m7d2h. Min. duration is 1h
  ttl: 
  # repo - optional storage repository name, if not provided using `acs-system/default` BackupStorageLocation
  repo: 

status:
  # lastBackup - last backup timestamp
  lastBackup: "<RFC3399-Timestamp>"
  # phase - The current phase. Valid values are New, Enabled, FailedValidation.
  phase: Enabled
  # validationErrors - An array of any validation errors encountered.
  validationErrors: []

创建workload 数据还原 - RestoreJob

apiVersion: opcc.cnx.aliyun-inc.com/v1alpha1
kind: RestoreJob
metadata:
  name: <snapshotID>
  namespace: <instanceNS>
spec:
  instanceName: <instanceName>
  instanceKind: <instanceKind>
  instanceAPIGroup: <apiGroup>
  instanceAPIVersion: <apiVersion>

  restoreFrom:
    # type - backup or snapshot contents to restore from. Valida values are backupset, volumesnapshot.
    type:
    # name - associated BackupJob name or SnapshotJob name
    name:

status:
  # The current phase. Valid values are New, FailedValidation, InProgress, Completed, PartiallyFailed, Failed.
  phase: ""
  # An array of any validation errors encountered.
  validationErrors: []

上一篇: 运维操作 下一篇: 集群调度
阿里云首页 云原生应用交付平台 相关技术圈