Import and migration (MongoDB-compatible)

更新时间:
复制 MD 格式

After deploying a MongoDB-compatible PolarDB for PostgreSQL Lightweight Edition cluster, you need to migrate data from your existing MongoDB database. This topic covers two methods—online hot migration and offline restoration—to help you choose the best one for your scenario.

Choose a migration method

PolarDB provides the dsync tool for online migration and the mongorestore tool for offline restoration. The following table compares the two methods to help you choose the best migration plan based on your needs for downtime, data volume, and complexity.

Comparison item

Dsync online migration

Mongorestore offline restoration

Migration type

Online hot migration (full + incremental synchronization)

Offline cold migration (point-in-time recovery based on a backup)

Business downtime

Minutes. Business writes are interrupted only for a short time during the final cutover.

Hours or longer. The downtime is the total time required for backup and restoration.

Data consistency

High. Data can be synchronized up to the last second before the cutover, with no data loss.

Point-in-time consistency. After restoration, data generated after the backup was created is lost.

Core advantage

Minimizes business disruption, making it ideal for smooth migration of production systems.

Simple and straightforward, suitable for development and test environments, or scenarios where extended downtime is acceptable.

Source database requirements

Requires an active-passive architecture.

No special requirements. The mongodump command must be executable.

Recommended scenarios

  • Production environments that require high business continuity.

  • Migrating dynamic databases with continuous write operations.

  • Data initialization for development and test environments.

  • Migrating static or read-only databases.

  • Scenarios where a long maintenance window is acceptable.

For production systems that require high business continuity, online migration using dsync is the recommended method. For all other scenarios, mongorestore provides a simpler, more direct alternative.

Online migration with dsync

The dsync tool performs both full and incremental data synchronization from a source MongoDB database to a destination PolarDB for PostgreSQL Lightweight Edition cluster. This automated process minimizes business downtime.

Prerequisites

Before you begin, ensure that your environment meets the following requirements:

  • Source MongoDB instance

    1. The instance must use an active-passive architecture.

    2. The connection endpoint for data synchronization must be the primary node. To confirm this, connect to the source instance using mongosh and run the rs.status() command. In the output, verify that the stateStr field for the connected node is PRIMARY.

    3. Prepare a database account with root or readAnyDatabase permissions to allow dsync to read data.

  • Destination PolarDB for PostgreSQL Lightweight Edition cluster

    1. A MongoDB-compatible cluster must be deployed. If you have not deployed a cluster, see Installation and Deployment (MongoDB-Compatible).

    2. Prepare the MongoDB protocol connection string and a high-privilege account for the destination cluster, such as the admin account created during installation.

  • Install the dsync tool

    1. To obtain the RPM package for the dsync tool, submit a ticket.

    2. After you decompress the package, run the following command with root permissions to install:

      sudo rpm -ivh t-polardb-pg-dsync-xxx.an8.x86_64.rpm
    3. Run the following commands to verify the installation. By default, dsync is installed in the /u01/dsync/ directory.

      cd  /u01/dsync/
      ./dsync --version

      If a version number is returned, the installation is successful. For example:

      dsync version 0.15-beta (git commit dd1c8xxxx)
      dsync exited successfully

Procedure

Step 1: Optimize destination cluster configuration

To improve the performance of the full data import, adjust the write batch size of the destination cluster before migration.

  1. Connect to the destination PolarDB for PostgreSQL Lightweight Edition cluster.

    # Replace localhost with the IP address of your database host.
    PGPASSWORD=postgres /u01/polardb_pg/bin/psql -h localhost -p 1523 -U admin -d admin_db
    Note

    Check the port parameter in the postgresql.conf file to find the database's PostgreSQL protocol port.

  2. Run the following SQL commands to modify the parameter and apply the change. Ensure that you run the commands with an account that has superuser permissions, such as the admin account.

    ALTER SYSTEM SET documentdb.maxWriteBatchSize=100000;
    SELECT pg_reload_conf();
  3. Run the SHOW documentdb.maxWriteBatchSize; command to confirm that the return value is 100000.

Step 2: Start the migration

The dsync tool uses environment variables to configure connection information for the source and destination databases. After starting, it automatically performs full and incremental synchronization.

  1. Set the environment variables to configure the connection strings (URIs) for the source and destination databases.

    # Set the connection string for the source MongoDB instance.
    export MDB_SRC='mongodb://<src_username>:<src_password>@<src_ip>:<src_port>/<src_db>'
    # Set the connection string for the destination PolarDB instance.
    export FERRETDB_DEST='mongodb://<dest_username>:<dest_password>@<dest_ip>:<dest_port>/<dest_db>'

    Parameter description

    Parameter

    Description

    src_username

    The source database account, which must have root or readAnyDatabase permissions.

    src_password

    The password for the source database account.

    src_ip

    The IP address of the primary node of the source database.

    src_port

    The port of the source database.

    src_db

    The name of the source database to migrate.

    dest_username

    The destination database account. You can use a high-privilege account such as admin.

    dest_password

    The password for the destination database account.

    dest_ip

    The IP address of the destination database.

    dest_port

    The MongoDB protocol port of the destination database. The default is 27030.

    dest_db

    The name of the target database in the destination cluster.

    Example

    export MDB_SRC='mongodb://readAnyDatabase:password123@127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.5.0'
    export FERRETDB_DEST='mongodb://test_user:superuser_20250530@127.0.0.1:27030/postgres'
  2. Start the dsync process. We recommend using the --progress parameter to monitor the progress.

    ./dsync --progress --logfile dsync.log "$MDB_SRC" "$FERRETDB_DEST"

    Parameter description

    Field

    Description

    Example

    Namespace

    Indicates each collection being migrated.

    .ycsb.new_users: Indicates the new_users collection in the ycsb database.

    Percent complete

    Indicates the completion progress.

    100%: Indicates that the full synchronization is complete.

    Tasks completed

    Indicates the status of chunked tasks for the full migration.

    5/21: Indicates that 5 of 21 tasks are complete.

    Docs synced

    Indicates the number of documents synchronized for the collections.

    282132: Indicates that 282,132 documents have been synchronized.

    Throughput

    Indicates the throughput in documents synchronized per second.

    -

    Note

    To prevent an SSH session interruption from stopping the migration, run the dsync command in a terminal multiplexer such as tmux or screen. For example:

    tmux new -t mysession
    export MDB_SRC='mongodb://readAnyDatabase:password123@127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.5.0'
    export FERRETDB_DEST='mongodb://test_user:superuser_20250530@127.0.0.1:27030/postgres'
    ./dsync --progress --logfile dsync.log "$MDB_SRC" "$FERRETDB_DEST"

    If you created a session by using the tmux new -s mysession command, but the window timed out and closed, you can reattach to it by using the tmux attach -t mysession command.

Step 3: Monitor the migration progress

After dsync starts, a real-time progress report appears in the terminal. The migration process has two stages:

  1. Full Synchronization (InitialSync)

    This stage copies all existing data from the source database. You can view the synchronization progress for each collection.

    Dsync Progress Report : InitialSync
    Time Elapsed: 00:00:28    4/5 Namespaces synced      Docs Synced: 360935
    Namespace                   Percent Complete       Tasks Completed          Docs Synced          Throughput: Docs/s
    .test.books                 100%                   1/1                      4                    0
    .test_db.test_collection    100%                   1/1                      1                    0
    .ycsb.new_users             100%                   1/1                      3                    0
    .ycsb.usertable_dtstest     38%                    9/21 (3 active)          360924               2183
    .ycsb.users                 100%                   1/1                      3                    0
    [##############################                                              ] 38.10%      2183.14 docs/sec
    Error Logs
  2. Incremental Synchronization (Change Stream) After the full synchronization is complete, dsync automatically enters the incremental synchronization stage. It continuously captures and applies data changes (inserts, deletes, and updates) from the source database.

    Dsync Progress Report : ChangeStream
    Time Elapsed: 00:04:26      5/5 Namespaces synced
    Processing change stream events
    Change Stream Events- 8614     Deletes Caught- 0      Events to catch up: 1
    [----------------------------------------------------------------------------->>>---] 365.69 events/sec
    Error Logs

Step 4: (Optional) Data validation

Before the cutover, you can perform data validation to ensure that the data in the source and destination databases is identical.

Important
  • Stop writes to the source database before performing data validation.

  • The validation process scans entire collections and consumes significant resources. As a lightweight alternative, you can first run the db.collection.countDocuments() command to quickly compare the document counts between the two databases.

To perform a full data validation, first stop (Ctrl + C) the running dsync process, and then restart it with the --verify parameter:

# The --verify parameter indicates a full data validation. Note that the validation command must be run separately.
./dsync --verify --progress --logfile dsync.log "$MDB_SRC" "$FERRETDB_DEST"

If the verification results for all collections are OK, this indicates that the data is completely consistent.

Step 5: Cutover

When you have confirmed that the data synchronization is complete and validated, you can perform the final cutover.

  1. Stop application writes to the source MongoDB instance.

  2. Monitor the dsync incremental synchronization interface. When the Change Stream Events count stops increasing, all changes have been synchronized to the destination database.

  3. Press Ctrl + C to stop the dsync process.

    Important

    Do not shut down the source MongoDB instance until the dsync process has completely exited. Otherwise, the synchronization state may be lost or an error may occur.

  4. Update your application's configuration to use the connection string for the destination PolarDB for PostgreSQL Lightweight Edition cluster.

  5. Start your application to complete the migration.

Offline restoration with mongorestore

To restore data from a backup file, use the official MongoDB mongorestore tool. This is an offline operation.

  1. Have the backup directory created by mongodump ready.

  2. Run the following command to restore data to the destination PolarDB for PostgreSQL Lightweight Edition cluster.

    mongorestore --uri="mongodb://<user>:<password>@<dest_ip>:<dest_port>/<dest_db>" /path/to/backup_dir/

    Example:

    mongorestore --uri="mongodb://admin:postgres@10.0.0.1:27030/ycsb" /root/ycsb/

    The expected output is similar to 15 document(s), which indicates that 15 rows of data were imported.

    ...
    2025-08-25T11:05:20.505+0800    15 document(s) restored successfully. 0 document(s) failed to restore.
    Note

    Security tip: Providing the password directly in the --uri parameter poses a security risk. To prevent password exposure, we recommend omitting the password from the command and entering it when prompted.

FAQ

dsync: Do you need to manually create collections in the destination database during a migration?

No. dsync automatically creates all collections, indexes, and data from the source database in the destination database.

dsync What happens if the source or destination database restarts during the migration process?

  • Destination database restart: This does not affect the migration. dsync records synchronization breakpoints and automatically resumes from that point after the destination database recovers, ensuring no data loss.

  • Source database restart: dsync attempts to reconnect. However, if the source database is down for too long and its oplog is overwritten, dsync cannot continue the incremental synchronization and will exit. In this case, you must restart the dsync process after the source database recovers.