云盘备份数据下载后恢复至自建MongoDB数据库中-云数据库 MongoDB 版(MongoDB)-阿里云帮助中心

本文介绍如何通过Mongorestore将云数据库MongoDB实例的云盘备份集文件恢复至自建MongoDB数据库中。

背景信息

MongoDB提供了一组官方备份恢复工具，分别是Mongodump和Mongorestore。云数据库MongoDB的逻辑备份通过Mongodump生成，当您需要将逻辑备份恢复到自建MongoDB数据库时可以通过Mongorestore进行恢复。

注意事项

由于MongoDB一直在迭代更新，旧版本的Mongorestore不能兼容新版本的MongoDB。请选择合适的Mongorestore版本，以兼容MongoDB，如何选择Mongorestore版本，请参见mongorestore。
即便某个表的数据很少，只有一个bson文件，比如myDatabase/myCollection/data/myCollection_0_part0.bson，也需要做bson合并或者重命名，因为mongorestore处理bson文件时会考虑文件名前缀。
云盘备份下载对于保留schema的空表也会做处理，得到一个包含库表名信息的空bson文件；mongorestore也可以正常处理这种空文件。
对于分片实例而言，下载的云盘备份文件中已经不再包含分片的路由信息，因此备份文件数据可以恢复到任意一个单节点、副本集或分片架构实例中。如果期望恢复到分片实例的话，需要自行做预分片的操作。

准备工作

下载并安装与云数据库MongoDB实例数据库版本相同的MongoDB至自建MongoDB数据库所在客户端（本地服务器或云服务器ECS实例），安装方法请参见Install MongoDB。
已完成逻辑备份下载，未完成可参考下载备份文件。

操作步骤

将下载的备份文件复制到自建MongoDB所在客户端（即安装有Mongorestore工具的客户端）的设备上。
解压备份文件压缩包。
备份文件下载分tar.zst和tar.gz两种格式，分别使用zstd和gzip的压缩算法，可通过CreateDownload API的UseZstd参数选择下载格式。
tar.zst（控制台下载）
```
zstd -d -c <备份文件的tar.zst包> | tar -xvf - -C <解压目录地址>
```
需要确保本地存在zstd工具且解压目录地址已存在。
示例：
```
mkdir -p ./download_test/test1
zstd -d -c test1.tar.zst | tar -xvf - -C /Users/xxx/Desktop/download_test/test1/
```
tar.gz（OpenAPI下载默认格式）
```
tar -zxvf <备份文件的tar.gz包> -C <解压目录地址>
```
需要确保解压目录地址已存在。
示例：
```
mkdir -p ./download_test/test1
tar -zxvf testDB.tar.gz -C /Users/xxx/Desktop/download_test/test1/
```

合并bson文件。

在有python环境的设备上，复制如下的merge_bson_files.py文件。

import os
import struct
import sys
import argparse
import shutil
import re

# 兼容Python 2和3的字符串处理
if sys.version_info[0] >= 3:
    unicode = str


def merge_single_bson_dir(input_dir: str, output_dir: str, namespace: str) -> None:
    """
    合并单个目录下的 bson 文件。

    参数:
        input_dir (str): 包含 bson 文件的目录路径。
        output_dir (str): 输出文件的目录路径。
        namespace (str): 输出文件的名称（不包括扩展名）。
    """
    try:
        # 获取所有匹配 ***_*_part*.bson 模式的 bson 文件并按文件名排序
        files = [f for f in os.listdir(input_dir) if re.match(r'^.+_.+_part\d+\.bson$', f)]
        files.sort()  # 按文件名排序

        if not files:
            print("No matching .bson files found in {}".format(input_dir))
            return

        output_file = os.path.join(output_dir, "{}.bson".format(namespace))
        if os.path.exists(output_file):
            print("Output file {} already exists, skipping...".format(output_file))
            return

        print("Merging {} files into {}...".format(len(files), output_file))

        # 流式读取并合并文件
        total_files = len(files)
        with open(output_file, "wb") as out_f:
            for index, filename in enumerate(files, 1):
                file_path = os.path.join(input_dir, filename)
                print("  Processing file {}/{}: {}...".format(index, total_files, filename))

                try:
                    with open(file_path, "rb") as in_f:
                        while True:
                            # 读取BSON文档大小
                            size_data = in_f.read(4)
                            if not size_data or len(size_data) < 4:
                                break

                            # 解析文档大小（小端序）
                            doc_size = struct.unpack("<i", size_data)[0]

                            # 重新读取完整的文档数据
                            in_f.seek(in_f.tell() - 4)
                            doc_data = in_f.read(doc_size)

                            if len(doc_data) != doc_size:
                                break

                            out_f.write(doc_data)
                except Exception as e:
                    print("Error reading {}: {}".format(filename, str(e)))
    except Exception as e:
        print("Error in merge_single_bson_dir: {}".format(str(e)))


def merge_bson_files_recursive(input_root: str, output_root: str = None) -> None:
    """
    递归遍历目录，合并所有 bson 文件。

    参数:
        input_root (str): 包含 bson 文件的根目录路径。
        output_root (str): 输出文件的根目录路径，默认为 input_root。
    """
    if output_root is None:
        output_root = input_root

    # 确保输出根目录存在
    if not os.path.exists(output_root):
        os.makedirs(output_root)

    print("Scanning directories in {}...".format(input_root))
    
    # 遍历输入根目录下的所有项目
    for item in os.listdir(input_root):
        item_path = os.path.join(input_root, item)
        
        # 如果是目录，则处理它
        if os.path.isdir(item_path):
            print("Processing directory: {}".format(item))
            
            # 创建对应的输出目录
            output_item_path = os.path.join(output_root, item)
            if not os.path.exists(output_item_path):
                os.makedirs(output_item_path)
            
            # 遍历该目录下的所有子目录和文件
            for item_d in os.listdir(item_path):
                sub_item_path = os.path.join(item_path, item_d)
                for sub_item in os.listdir(sub_item_path):
                    data_path = os.path.join(sub_item_path, sub_item)
                    # 如果是"data"目录，则合并其中的bson文件
                    if os.path.isdir(data_path) and sub_item == "data":
                        # 提取命名空间（父目录名）
                        namespace = os.path.basename(sub_item_path)
                        merge_single_bson_dir(data_path, output_item_path, namespace)
                    # 如果是.metadata.json文件，则直接复制到对应的输出目录
                    elif sub_item.endswith(".metadata.json"):
                        src_file = os.path.join(sub_item_path, sub_item)
                        target_dir = os.path.join(output_item_path, sub_item)
                        shutil.copy(src_file, target_dir)
                        print("Copied metadata file: {}".format(sub_item))
            print("Finished processing directory: {}".format(item))


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="递归合并 bson 文件")
    parser.add_argument("input_root", help="包含 bson 文件的根目录路径")
    parser.add_argument("-o", "--output_root", help="输出文件的根目录路径，默认为输入根目录")

    args = parser.parse_args()
    merge_bson_files_recursive(args.input_root, args.output_root)

执行命令：

python merge_bson_files.py <input_directory> -o <output_directory>

使用mongorestore工具将备份数据恢复到数据库实例中。

# 单表恢复
mongorestore --uri=<mongodb-uri> --db <db> --collection <collection>  <xxx.bson>
# 单表恢复示例
mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --db testDB --collection coll1 ./testDB/coll1.bson 
# 单库恢复
mongorestore --uri=<mongodb-uri> --db <db> --dir </path/to/bson/dir>
# 单库恢复示例
mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --db testDB --dir ./testDB 
# 整实例恢复
mongorestore --uri=<mongodb-uri>  --dir </path/to/bson/dir>
# 整实例恢复示例
mongorestore --uri='mongodb://127.x.x.x:27017/?authSource=admin' --dir ./

参数说明：

<mongodb-uri> ：自建或云MongoDB实例的服务器高可用地址。uri中包含了用户名、密码以及服务端的ip和端口，详情可参考官方文档。
<db>：要恢复的数据库名。
<collection>：要恢复的数据库表名。
<xxx.bson>：要进行单表恢复的对应备份bson文件。
<path/to/bson/dir>：要进行恢复的包含bson文件的目录。

常见问题

实例类型不支持下载备份文件时，如何将数据恢复至自建数据库？

您可以通过DTS将实例数据迁移至自建数据库中。具体操作，请参见源为自建MongoDB或云数据库MongoDB的迁移方案。
使用MongoDB数据库自带的备份还原工具Mongodump和Mongorestore，备份和恢复实例。

背景信息

注意事项

准备工作

操作步骤

tar.zst（控制台下载）

tar.gz（OpenAPI下载默认格式）

常见问题