备份还原
为了解决客户业务的高可用问题,遇到故障能够快速恢复正常,ADP底座对于客户的运行时数据提供了集群、产品、组件等维度的存储备份及还原。
功能概述
ADP底座提供Kubernetes原生数据备份、还原功能:
数据备份、还原基于规则策略,例如集群整体备份、按产品维度备份。
单一 Workload 数据备份、还原。
Spec定义
备份存储配置 -BackupStorageLocation
# 文件备份 S3 repo 配置,需要确保 bucket 已经创建
# reference: https://velero.io/docs/v1.7/api-types/backupstoragelocation/
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: opcc-default
# ADP 底座控制平面 NS, i.e., acs-system
namespace: acs-system
spec:
# 配置 Velero default storage location, 设置为 true 如果要跟 Velero backup 共享一个 bucket
default: true
provider: aws
config:
# region - 对象存储 region, AlibabaCloud OSS 需要填入对应的 region ID, i.e., cn-shanghai
region: minio
# s3ForcePathStyle - 注意使用 MinIO 需要配置为 'true';AlibabaCloud OSS 由于是 S3 完全兼容需要配置为 'false'
s3ForcePathStyle: "true"
# S3 service URL, i.e., http://minio-minio-svc.default:9000 ; AlibabaCloud OSS S3 endpoints 文档 https://help.aliyun.com/document_detail/31837.htm?spm=a2c4g.11186623.0.0.3cc223c94uOcwY#concept-zt4-cvy-5db
s3Url: <s3 URL>
objectStorage:
# bucket name
bucket: <bucket>
credential:
key: cloud
# credential 对应的 secret name
name: opcc-default-storage
---
apiVersion: v1
kind: Secret
metadata:
name: opcc-default-storage
# ADP 底座控制平面 NS, i.e., acs-system
namespace: acs-system
data:
AWS_ACCESS_KEY_ID: <access key ID>
AWS_SECRET_ACCESS_KEY: <secret>
# AWS_SESSION_TOKEN: <session token>
cloud: |
[default]
aws_access_key_id=<access key ID>
aws_secret_access_key=<secret>
# aws_session_token=<session token>
创建数据策略备份 -Backup
# 数据备份 https://velero.io/docs/v1.7/api-types/backup/
apiVersion: velero.io/v1
kind: Backup
metadata:
name: myapp-snap-202111051600
# ADP 底座控制平面 NS, i.e., acs-system
namespace: acs-system
annotations:
# 关联 ADP 底座 OPCC Policy Backup watcher
"opcc.cnx.aliyun-inc.com/bound": "true"
spec:
includedNamespaces:
- '*'
includedResources:
- '*'
excludedResources:
- storageclasses.storage.k8s.io
# Individual objects must match this label selector to be included in the backup. Optional.
labelSelector:
matchLabels:
'adp.aliyuncs.com/application-name': adp
snapshotVolumes: true
# Where to store the tarball and logs.
storageLocation: opcc-default
ttl: 24h0m0s
status:
# The version of this Backup. The only version supported is 1.
version: 1
# The date and time when the Backup is eligible for garbage collection.
expiration: null
# The current phase. Valid values are New, FailedValidation, InProgress, Completed, PartiallyFailed, Failed.
phase: ""
# An array of any validation errors encountered.
validationErrors: null
# Date/time when the backup started being processed.
startTimestamp: 2019-04-29T15:58:43Z
# Date/time when the backup finished being processed.
completionTimestamp: 2019-04-29T15:58:56Z
# Number of volume snapshots that Velero tried to create for this backup.
volumeSnapshotsAttempted: 2
# Number of volume snapshots that Velero successfully created for this backup.
volumeSnapshotsCompleted: 1
# Number of warnings that were logged by the backup.
warnings: 2
# Number of errors that were logged by the backup.
errors: 0
创建定时策略备份 -Schedule
# 数据定期备份 https://velero.io/docs/v1.7/api-types/schedule/
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: myapp-schedule
# ADP 底座控制平面 NS, i.e., acs-system
namespace: acs-system
annotations:
# 关联 ADP 底座 OPCC Policy Backup watcher
"opcc.cnx.aliyun-inc.com/bound": "true"
spec:
# schedule - CRON 表达式指定备份周期
schedule: 0 7 * * *
# template - Backup spec 模板
template:
includedNamespaces:
- '*'
includedResources:
- '*'
excludedResources:
- storageclasses.storage.k8s.io
# Individual objects must match this label selector to be included in the backup. Optional.
labelSelector:
matchLabels:
'adp.aliyuncs.com/application-name': adp
snapshotVolumes: true
# Where to store the tarball and logs.
storageLocation: opcc-default
ttl: 24h0m0s
status:
# phase - 定期备份状态: New, FailedValidation, InProgress, Completed, PartiallyFailed, Failed.
phase: ""
# lastBackup - 上一次的备份时间
lastBackup:
# validationErrors - 校验错误信息
validationErrors: []
创建数据策略还原 -Restore
# 数据还原 https://velero.io/docs/v1.7/api-types/restore/
apiVersion: velero.io/v1
kind: Restore
metadata:
name: myapp-snap-202111051600
# Restore namespace. Must be the namespace of the Velero server. Required.
namespace: acs-system
annotations:
# 关联 OPCC Policy Backup watcher
"opcc.cnx.aliyun-inc.com/bound": "true"
spec:
# BackupName is the unique name of the Velero backup to restore from.
backupName: myapp-snap-202111051600
# NamespaceMapping is a map of source namespace names to
# target namespace names to restore into. Any source namespaces not
# included in the map will be restored into namespaces of the same name.
namespaceMapping:
namespace-backup-from: namespace-to-restore-to
# RestoreStatus captures the current status of a Velero restore. Users should not set any data here.
status:
# The current phase. Valid values are New, FailedValidation, InProgress, Completed, PartiallyFailed, Failed.
phase: ""
# An array of any validation errors encountered.
validationErrors: null
# Number of warnings that were logged by the restore.
warnings: 2
# Errors is a count of all error messages that were generated
# during execution of the restore. The actual errors are stored in object
# storage.
errors: 0
# FailureReason is an error that caused the entire restore
# to fail.
failureReason:
创建workload 数据备份 - BackupJob
apiVersion: opcc.cnx.aliyun-inc.com/v1alpha1
kind: BackupJob
metadata:
name: <backupJobName>
namespace: <instanceNS>
spec:
instanceName: <instanceName>
instanceKind: <instanceKind>
instanceAPIGroup: <apiGroup>
instanceAPIVersion: <apiVersion>
# repo - 备份文件会存放在指定 storage repo
repo:
# ttl - optional backup retain for specific period duration, e.g., 24h0m0s
ttl:
status:
# phase - The current phase. Valid values are New, InProgress , Completed, FailedValidation, PartiallyFailed, Failed, Deleting
phase:
# expiration - optional expiration timestamp
expiration: "<RFC3399-Timestamp>"
# validationErrors - An array of any validation errors encountered.
validationErrors: []
# startTimestamp - optional start timestamp
startTimestamp: "<RFC3399-Timestamp>"
# completionTimestamp - optional completion timestamp
completionTimestamp: "<RFC3399-Timestamp>"
创建workload 数据定时备份 - SchduledBackup
定期备份功能,在 CRON 指定的时间点创建 'backupjobs.opcc.cnx.aliyun.com' CR,并可以指定保留多少备份及过期时间
apiVersion: opcc.cnx.aliyun-inc.com/v1alpha1
kind: BackupCronJob
metadata:
name: <backupCronjobName>
namespace: <instanceNS>
spec:
instanceName: <instanceName>
instanceKind: <instanceKind>
instanceAPIGroup: <apiGroup>
instanceAPIVersion: <apiVersion>
# schedule - Cron expression in UTC clock, i.e., 3AM in Asia/Shanghai TZ daily will be '0 19 * * *'. The actual start time won't be exact cron expression period, it's approximate start time according to backup task manager
schedule:
# ttl - backup retain for specific period duration, e.g., 1y5m7d2h. Min. duration is 1h
ttl:
# repo - optional storage repository name, if not provided using `acs-system/default` BackupStorageLocation
repo:
status:
# lastBackup - last backup timestamp
lastBackup: "<RFC3399-Timestamp>"
# phase - The current phase. Valid values are New, Enabled, FailedValidation.
phase: Enabled
# validationErrors - An array of any validation errors encountered.
validationErrors: []
创建workload 数据还原 - RestoreJob
apiVersion: opcc.cnx.aliyun-inc.com/v1alpha1
kind: RestoreJob
metadata:
name: <snapshotID>
namespace: <instanceNS>
spec:
instanceName: <instanceName>
instanceKind: <instanceKind>
instanceAPIGroup: <apiGroup>
instanceAPIVersion: <apiVersion>
restoreFrom:
# type - backup or snapshot contents to restore from. Valida values are backupset, volumesnapshot.
type:
# name - associated BackupJob name or SnapshotJob name
name:
status:
# The current phase. Valid values are New, FailedValidation, InProgress, Completed, PartiallyFailed, Failed.
phase: ""
# An array of any validation errors encountered.
validationErrors: []