Considerations and limitations for MongoDB source migration

更新时间:
复制 MD 格式

Before you configure a data migration task with Data Transmission Service (DTS), review the precautions and limitations for migrating data from a MongoDB source database, such as a self-managed MongoDB or ApsaraDB for MongoDB instance.

MongoDB source migration scenarios

Review the precautions and limitations for each migration scenario:

Migrate a standalone MongoDB to any architecture

Type

Description

Source database limitations

  • Bandwidth requirement: The source database server must have sufficient egress bandwidth. Otherwise, data migration speed is affected.

  • Collections to be migrated must have a primary key or unique constraint, and the constrained fields must contain unique values. Otherwise, duplicate data may be created in the destination database.

  • If you migrate data at the collection level and need to edit the collections, such as by mapping collection names, a single migration task can migrate a maximum of 1,000 collections. If you exceed this limit, the task reports an error upon submission. In this case, you should split the collections into multiple batches and configure a separate task for each batch, or configure a task to migrate the entire database.

  • A single document in the source database cannot exceed 16 MB. Otherwise, the migration task will fail.

  • Operational limitations on the source database:

    • During the schema migration and full data migration phases, do not perform schema changes on databases or collections, including updating data in arrays. Such changes can cause the migration to fail or lead to data inconsistency between the source and destination databases.

    • This migration scenario does not support incremental data migration. To ensure data consistency, do not write new data to the source MongoDB database during full data migration.

  • If a collection to be migrated contains a Time-to-Live (TTL) index, data inconsistency may occur or instance latency may increase.

Other limitations

  • If the destination instance has a sharded cluster architecture:

    • You must clear orphaned documents because they can affect migration performance. If documents with _id conflicts are encountered during migration, data inconsistency or task failure can occur.

    • Before starting the task, you must add a shard key to the source data for each sharded collection in the destination. If you cannot add a shard key to the source data, see Migrate data from a MongoDB instance without a shard key to a MongoDB sharded cluster instance.

    • After the task starts, the INSERT command must include the shard key. The UPDATE command cannot change the shard key.

  • If the destination instance has a replica set architecture:

    • When the Access Method is Express Connect, VPN Gateway, or Smart Access Gateway, Public IP Address, or Cloud Enterprise Network (CEN), you must set Domain Name or IP and Port Number to the primary node's address and port, or configure a high-availability connection address. For more information about high-availability connection addresses, see Create a Source or Destination Instance for a High-availability MongoDB Database.

    • When the Access Method is Self-managed Database on ECS, you must set Port Number to the port of the primary node.

  • Incremental data migration is not supported.

  • DTS does not support connecting to a MongoDB database by using an SRV record.

  • If the destination collection has a unique index or its capped attribute is set to true, the collection does not support concurrent replay (only single-threaded writing is supported) during incremental data migration. This may increase task latency.

  • DTS does not support migrating data from the admin, config, or local databases.

  • DTS does not preserve transaction information. It converts transactions from the source database into individual statements in the destination database.

  • When DTS writes data to a destination collection, if a primary key or unique key conflict occurs, DTS skips the conflicting data write statement and retains the existing data in the destination collection.

  • Keep the MongoDB versions of the source and destination databases consistent, or migrate from an earlier version to a later version to ensure compatibility. Migrating from a later version to an earlier version may cause compatibility issues.

  • If the source database is a MongoDB version earlier than 3.6 and the destination database is MongoDB 3.6 or later, field order in the migrated data may differ from the source. The field-value pairs remain correct. This is caused by differences in the database engine's execution plan. If your application logic involves text matching on nested structures, evaluate the potential impact of this field order change.

  • Before you migrate data, evaluate the performance of both the source and destination databases and perform the migration during off-peak hours. During full data migration, DTS consumes read and write resources, which may increase the database load.

  • During full data migration, DTS performs concurrent INSERT operations, which can cause fragmentation in the destination collections. As a result, the destination collections may occupy more storage space than those in the source instance.

  • Verify that the migration precision DTS uses for FLOAT or DOUBLE columns meets your business requirements. DTS reads values of these types by using ROUND(COLUMN,PRECISION). If you do not explicitly define the precision, DTS migrates FLOAT values with a precision of 38 digits and DOUBLE values with a precision of 308 digits.

  • DTS attempts to resume failed migration tasks within seven days. Before the service cutover to the target instance, you must end or release the task, or use the revoke command to revoke the write permissions of the account that DTS uses to access the target instance. This prevents source data from overwriting data on the target instance if the task is automatically resumed.

  • Because DTS writes data concurrently, the destination database uses 5% to 10% more storage space than the source database.

  • Use the db.$table_name.aggregate([{ $count:"myCount"}]) syntax to query the count in the destination MongoDB.

  • Ensure that the destination MongoDB database does not have documents with the same primary key (the _id field by default) as the source database. Otherwise, data loss may occur. If such documents exist, delete those with conflicting _id values from the destination database before migration, provided that it does not affect your business.

  • If a task fails, DTS support staff will attempt to restore it within eight hours. During restoration, they may restart the task or adjust its parameters.

    Note

    Only DTS task parameters are modified—not database parameters. Parameters that may be adjusted include those listed in Modify instance parameters.

  • If the destination database is a MongoDB sharded cluster, after you switch your business to this database, you must ensure that your business operations comply with its requirements for sharded collections.

  • You cannot migrate capped collections when the source database is MongoDB 5.0 or later and the destination database is an earlier version. This can cause the task to fail or result in data inconsistency. This is because the behavior of capped collections changed in MongoDB 5.0, which allows explicit deletions and document size increases on update. Earlier database kernels are not compatible with these features.

  • Time series collections, introduced in MongoDB 5.0, are not supported for migration.

Special case

If the source is a self-managed MongoDB database, a primary/secondary switchover during migration will cause the task to fail.

Migrate from a MongoDB replica set to a replica set or sharded cluster

Type

Description

Source database limitations

  • Bandwidth requirement: The source database server must have sufficient egress bandwidth. Otherwise, data migration speed is affected.

  • Collections to be migrated must have a primary key or unique constraint, and the constrained fields must contain unique values. Otherwise, duplicate data may be created in the destination database.

  • If you migrate data at the collection level and need to edit the collections, such as by mapping collection names, a single migration task can migrate a maximum of 1,000 collections. If you exceed this limit, the task reports an error upon submission. In this case, you should split the collections into multiple batches and configure a separate task for each batch, or configure a task to migrate the entire database.

  • A single document in the source database cannot exceed 16 MB. Otherwise, the migration task will fail.

  • If the source database is an Azure Cosmos DB for MongoDB or an Amazon DocumentDB elastic cluster, only full data migration is supported.

  • To perform incremental data migration:

    The source database must have the oplog enabled with at least seven days of retention. Alternatively, change streams must be enabled, and DTS must be able to subscribe to data changes from the source database within the last seven days by using change streams. If these requirements are not met, the migration task may fail because it cannot obtain data changes from the source. In extreme cases, this can lead to data inconsistency or loss. Issues arising from this are not covered by the DTS Service Level Agreement (SLA).

    Important
    • We recommend using the oplog to obtain data changes from the source database.

    • Only MongoDB 4.0 and later versions support obtaining data changes through change streams.

    • If the source database is an Amazon DocumentDB (non-elastic) cluster, you must manually enable change streams. When you configure the task, set the Migration Method to ChangeStream and the Architecture to Sharded Cluster.

  • Operational limitations on the source database:

    • During the schema migration and full data migration phases, do not perform schema changes on databases or collections, including updating data in arrays. Such changes can cause the migration to fail or lead to data inconsistency between the source and destination databases.

    • If you perform only full data migration, do not write new data to the source instance. Otherwise, the source and destination databases become inconsistent. To maintain real-time data consistency, select Schema Migration, Full Data Migration, and Incremental Data Migration.

  • If a collection to be migrated contains a Time-to-Live (TTL) index, data inconsistency may occur or instance latency may increase.

Other limitations

  • If the destination instance has a sharded cluster architecture:

    • You must clear orphaned documents because they can affect migration performance. If documents with _id conflicts are encountered during migration, data inconsistency or task failure can occur.

    • Before starting the task, you must add a shard key to the source data for each sharded collection in the destination. If you cannot add a shard key to the source data, see Migrate data from a MongoDB instance without a shard key to a MongoDB sharded cluster instance.

    • After the task starts, the INSERT command must include the shard key. The UPDATE command cannot change the shard key.

  • If the destination instance has a replica set architecture:

    • When the Access Method is Express Connect, VPN Gateway, or Smart Access Gateway, Public IP Address, or Cloud Enterprise Network (CEN), you must set Domain Name or IP and Port Number to the primary node's address and port, or configure a high-availability connection address. For more information about high-availability connection addresses, see Create a Source or Destination Instance for a High-availability MongoDB Database.

    • When the Access Method is Self-managed Database on ECS, you must set Port Number to the port of the primary node.

  • DTS does not support connecting to a MongoDB database by using an SRV record.

  • Keep the MongoDB versions of the source and destination databases consistent, or migrate from an earlier version to a later version to ensure compatibility. Migrating from a later version to an earlier version may cause compatibility issues.

  • DTS does not support migrating data from the admin, config, or local databases.

  • If the destination collection has a unique index or its capped attribute is set to true, the collection does not support concurrent replay (only single-threaded writing is supported) during incremental data migration. This may increase task latency.

  • DTS does not preserve transaction information. It converts transactions from the source database into individual statements in the destination database.

  • When DTS writes data to a destination collection, if a primary key or unique key conflict occurs, DTS skips the conflicting data write statement and retains the existing data in the destination collection.

  • If the source database is a MongoDB version earlier than 3.6 and the destination database is MongoDB 3.6 or later, field order in the migrated data may differ from the source. The field-value pairs remain correct. This is caused by differences in the database engine's execution plan. If your application logic involves text matching on nested structures, evaluate the potential impact of this field order change.

  • Before you migrate data, evaluate the performance of both the source and destination databases and perform the migration during off-peak hours. During full data migration, DTS consumes read and write resources, which may increase the database load.

  • During full data migration, DTS performs concurrent INSERT operations, which can cause fragmentation in the destination collections. As a result, the destination collections may occupy more storage space than those in the source instance.

  • Verify that the migration precision DTS uses for FLOAT or DOUBLE columns meets your business requirements. DTS reads values of these types by using ROUND(COLUMN,PRECISION). If you do not explicitly define the precision, DTS migrates FLOAT values with a precision of 38 digits and DOUBLE values with a precision of 308 digits.

  • DTS attempts to resume failed migration tasks within seven days. Before the service cutover to the target instance, you must end or release the task, or use the revoke command to revoke the write permissions of the account that DTS uses to access the target instance. This prevents source data from overwriting data on the target instance if the task is automatically resumed.

  • Because DTS writes data concurrently, the destination database uses 5% to 10% more storage space than the source database.

  • Use the db.$table_name.aggregate([{ $count:"myCount"}]) syntax to query the count in the destination MongoDB.

  • Ensure that the destination MongoDB database does not have documents with the same primary key (the _id field by default) as the source database. Otherwise, data loss may occur. If such documents exist, delete those with conflicting _id values from the destination database before migration, provided that it does not affect your business.

  • If a task fails, DTS support staff will attempt to restore it within eight hours. During restoration, they may restart the task or adjust its parameters.

    Note

    Only DTS task parameters are modified—not database parameters. Parameters that may be adjusted include those listed in Modify instance parameters.

  • If the destination database is a MongoDB sharded cluster, after you switch your business to this database, you must ensure that your business operations comply with its requirements for sharded collections.

  • You cannot migrate capped collections when the source database is MongoDB 5.0 or later and the destination database is an earlier version. This can cause the task to fail or result in data inconsistency. This is because the behavior of capped collections changed in MongoDB 5.0, which allows explicit deletions and document size increases on update. Earlier database kernels are not compatible with these features.

  • Time series collections, introduced in MongoDB 5.0, are not supported for migration.

Special cases

If the source database is a self-managed MongoDB database:

  • A primary/secondary switchover on the source database during migration causes the task to fail.

  • DTS calculates latency by comparing the timestamp of the last record migrated to the destination with the current timestamp. If the source database has not been updated for a long time, the reported latency may be inaccurate. If the task shows excessive latency, perform a small update on the source database to refresh the latency value.

Note

If you migrate an entire database, you can also create a heartbeat table that is updated at regular intervals, such as every second.

Migrate a MongoDB sharded cluster to a replica set or sharded cluster

Type

Description

Source database limitations

  • Bandwidth requirement: The server hosting the source database must have sufficient egress bandwidth. Otherwise, the data migration speed will be reduced.

  • Collections to be migrated must have a primary key or unique constraint with unique values. Otherwise, duplicate data may be created in the target database.

  • The _id field in the collections to be migrated must be unique. Otherwise, data inconsistency may occur.

  • If you migrate data at the collection level and need to edit the collections, such as by mapping collection names, a single migration task can migrate a maximum of 1,000 collections. If you exceed this limit, the task reports an error upon submission. In this case, you should split the collections into multiple batches and configure a separate task for each batch, or configure a task to migrate the entire database.

  • A single document in the source database cannot exceed 16 MB. Otherwise, the migration task will fail.

  • During a full data migration, DTS consumes resources on both the source and target databases, increasing their load. If your database experiences high traffic or runs on servers with low specifications, this can further strain the database and potentially cause service outages. Evaluate the potential impact and perform the migration during off-peak hours.

  • For the supported versions and storage engines of MongoDB instances, see Versions and storage engines. If you need to migrate across different versions or storage engines, confirm compatibility in advance.

  • If the source database is an Azure Cosmos DB for MongoDB or an Amazon DocumentDB elastic cluster, only full data migration is supported.

  • To perform incremental data migration:

    The source database must have the oplog enabled with at least seven days of retention. Alternatively, change streams must be enabled, and DTS must be able to subscribe to data changes from the source database within the last seven days by using change streams. If these requirements are not met, the migration task may fail because it cannot obtain data changes from the source. In extreme cases, this can lead to data inconsistency or loss. Issues arising from this are not covered by the DTS Service Level Agreement (SLA).

    Important
    • We recommend using the oplog to obtain data changes from the source database.

    • Only MongoDB 4.0 and later versions support obtaining data changes through change streams.

    • If the source database is an Amazon DocumentDB (non-elastic) cluster, you must manually enable change streams. When you configure the task, set the Migration Method to ChangeStream and the Architecture to Sharded Cluster.

  • DTS does not support migrating data from the admin, config, or local databases.

  • If the source database is MongoDB 8.0 or later and the Migration Method is Oplog, you must ensure that the shard account used for the migration task has the directShardOperations permission. You can grant this permission by running the following command: db.adminCommand({ grantRolesToUser: "username", roles: [{ role: "directShardOperations", db: "admin"}]}).

    Note

    In the command, replace username with the shard account used by the migration task.

  • If the Migration Method is Oplog and the task includes full data migration, you must ensure that the mongos account of the source MongoDB sharded cluster has the permission to run the db.runCommand({"balancerStatus":1}) command. DTS uses this command during the precheck phase to verify that the source balancer is disabled.

  • The source self-managed MongoDB sharded cluster instance cannot have more than 10 mongos nodes.

  • If a collection to be migrated contains a Time-to-Live (TTL) index, data inconsistency may occur or instance latency may increase.

  • Ensure that no orphaned documents exist in the source and target MongoDB sharded cluster instances. Otherwise, data inconsistency or task failure may occur. For more information, see Orphaned Documents in Sharded Clusters and Clean up orphaned documents from a sharded cluster instance.

  • Operational limitations on the source database:

    • While the migration is running, do not run commands like shardCollection, reshardCollection, unshardCollection, moveCollection, and movePrimary on the source database. Otherwise, data inconsistency may occur.

    • During the schema migration and full data migration phases, do not perform schema changes on databases or collections, including updating data in arrays. Such changes can cause the migration task to fail or lead to data inconsistency between the source and target databases.

    • If you perform only a full data migration, do not write new data to the source instance. Otherwise, the source and target databases will be inconsistent. To ensure data consistency, select schema migration, full data migration, and incremental data migration.

  • If the balancer of the source database is rebalancing data, instance latency may increase.

Other limitations

  • If you purchase a DTS task before you configure it, you must specify the correct number of shards when you purchase the task.

  • If the destination instance has a replica set architecture:

    • When the Access Method is Express Connect, VPN Gateway, or Smart Access Gateway, Public IP Address, or Cloud Enterprise Network (CEN), you must set Domain Name or IP and Port Number to the primary node's address and port, or configure a high-availability connection address. For more information about high-availability connection addresses, see Create a Source or Destination Instance for a High-availability MongoDB Database.

    • When the Access Method is Self-managed Database on ECS, you must set Port Number to the port of the primary node.

  • DTS does not support connecting to a MongoDB database by using an SRV record.

  • If the target MongoDB instance uses a sharded cluster architecture and you do not need to use the schema migration feature provided by DTS (for example, if data sharding is already configured in the target database), do not select Schema Migration for Migration Types on the Configure Objects page. Otherwise, sharding conflicts may cause data inconsistency or task failure.

  • Before the task starts, you must ensure that the data to be migrated contains a field that corresponds to the shard key in the target database. After the task starts, the INSERT command must include the shard key. The UPDATE command cannot be used to change the shard key.

  • You must disable the source database's balancer during the full data migration and keep it disabled until all subtasks enter the incremental migration phase. Otherwise, data inconsistency may occur. For more information about how to manage the balancer, see Manage the balancer of a MongoDB sharded cluster.

  • To prevent data loss, ensure the target database does not have documents with the same primary key (the _id field by default) as those in the source database. If conflicting documents exist, delete them from the target database before migration, provided this does not affect your business.

  • DTS does not preserve transactions. Transactions from the source database are replayed as individual operations in the target database.

  • When DTS writes data to a destination collection, if a primary key or unique key conflict occurs, DTS skips the conflicting data write statement and retains the existing data in the destination collection.

  • During DTS migration, if you use the Oplog method for incremental data migration, scaling shards in or out in the source MongoDB sharded cluster is not supported. Otherwise, the DTS task fails and data inconsistency occurs.

  • To query the number of documents in the target MongoDB database, you must use the db.$collection_name.aggregate([{ $count:"myCount"}]) syntax.

  • Because DTS writes data concurrently, the target database will use 5% to 10% more storage space than the source database.

  • If the destination collection has a unique index or its capped attribute is set to true, the collection does not support concurrent replay (only single-threaded writing is supported) during incremental data migration. This may increase task latency.

  • If the source database is a MongoDB version earlier than 3.6 and the destination database is MongoDB 3.6 or later, field order in the migrated data may differ from the source. The field-value pairs remain correct. This is caused by differences in the database engine's execution plan. If your application logic involves text matching on nested structures, evaluate the potential impact of this field order change.

  • Before you migrate data, evaluate the performance of both the source and target databases. Perform the data migration during off-peak hours. During full data migration, DTS consumes read and write resources, which may increase the load on the databases.

  • During full data migration, DTS performs concurrent INSERT operations, which can cause fragmentation in the target collections. As a result, the collections in the target database will use more storage space than their counterparts in the source database.

  • To prevent a resumed task from overwriting data in the target instance after a cutover, you must end or release the task, or run the revoke command to revoke the write permissions of the account that DTS uses to access the target instance. DTS may automatically resume a failed task for up to seven days.

  • If a task fails, DTS support staff will attempt to restore it within eight hours. During restoration, they may restart the task or adjust its parameters.

    Note

    Only DTS task parameters are modified—not database parameters. Parameters that may be adjusted include those listed in Modify instance parameters.

  • If the destination database is a MongoDB sharded cluster, after you switch your business to this database, you must ensure that your business operations comply with its requirements for sharded collections.

  • You cannot migrate capped collections when the source database is MongoDB 5.0 or later and the destination database is an earlier version. This can cause the task to fail or result in data inconsistency. This is because the behavior of capped collections changed in MongoDB 5.0, which allows explicit deletions and document size increases on update. Earlier database kernels are not compatible with these features.

  • Time series collections, introduced in MongoDB 5.0, are not supported for migration.