Migrate data using Inventory

更新时间:
复制 MD 格式

This topic describes the considerations, limitations, and steps for migrating source data using Inventory.

Considerations

Keep the following in mind when migrating data with Data Online Migration:

  • Online Migration Service uses the public standard API from the source storage service provider to access source data. The service's behavior depends on how the provider has implemented the API.

  • Online migration consumes resources at both the source and destination addresses, potentially affecting your business operations. For mission-critical services, consider setting a rate limit or running the migration task during off-peak hours to minimize the impact.

  • The service checks files at the source and destination addresses before the migration starts. However, if a file with the same name exists in both locations and the migration task is configured to overwrite files, the service directly overwrites the destination file. If the two files have different content, you must rename one of them or create a backup to prevent data loss.

  • Online migration preserves the last modified time of source files. If a lifecycle rule is configured for the destination bucket, the rule may delete or transition a migrated file to a specified storage class if the file's last modified time meets the rule's criteria.

Migration limitations

You can migrate data from only one bucket at a time. You cannot migrate an entire account in a single operation.

The following properties apply to data migrated using a generic inventory:

  • Migratable properties depend on the specific data type, such as OSS or S3. For more information, see the migration tutorial for the corresponding data source.

  • Unsupported properties are ignored during migration.

Step 1: Select a region

  1. Log in to the Data Online Migration console as the RAM user you created.

  2. In the top navigation bar, use the region selector in the upper-left corner to set the migration service deployment region. Select the data source region or the region geographically closest to it, as shown in the following figure.选择地域

    The console supports the following migration service deployment regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), and China (Ulanqab) in mainland China, and China (Hong Kong), Singapore (Singapore), Germany (Frankfurt), and US (Virginia).

    Important
    • Data source addresses and migration tasks are specific to each region. Choose your region carefully.

    • Select the region of your data source. If that region is unavailable, create the migration task in the geographically closest region.

    • For cross-border migration, enable Transfer Acceleration to improve migration speed. Buckets with Transfer Acceleration enabled incur transfer acceleration fees. For more information about Transfer Acceleration, see Access OSS by using Transfer Acceleration.

Step 2: Create a source address

  1. In the left-side navigation pane, go to Data Online Migration > Address Management and click Create Address.

  2. In the Create Address panel, set the parameters and click OK.

  3. Parameter

    Required

    Description

    Name

    Yes

    Enter a name for the source. The name must meet the following requirements:

    • The name must be 3 to 63 characters in length.

    • The name is case-sensitive and can contain only lowercase letters, digits, hyphens (-), and underscores (_).

    • The name cannot start with a hyphen (-) or an underscore (_).

    Type

    Yes

    Select Inventory.

    Data type

    Yes

    Select the data source.

    Domain name

    Depends on the data type

    The source storage service endpoint. For example, the endpoint for an AWS S3 bucket.

    Region

    Yes (if Data type is set to Alibaba OSS)

    Select the region where the source data is located, for example, China (Hangzhou).

    Authorize role

    Yes (if Data type is set to Alibaba OSS)

    AccessKey pair

    Yes

    Enter the AccessKey pair, which consists of an AccessKeyId and a SecretAccessKey, for the account that owns the source data. The pair verifies your identity and confirms your read permissions for the source data.

    Storage

    Yes

    Enter the name of the source Bucket.

    Prefix

    No

    You can specify a prefix to migrate source data to a specific directory. The prefix cannot start with a forward slash (/) but must end with a forward slash (/). For example, data/to/oss/.

    • If you specify a prefix: For example, if the source prefix is example/src/ and it contains the file example.jpg, and you set the destination prefix to example/dest/, the full path of the migrated file example.jpg is example/dest/example.jpg.

    • If you do not specify a prefix: The source data is migrated to the root directory of the destination bucket.

    Inventory Location

    Yes

    The storage service that contains the inventory file. Valid values: Alibaba OSS and third-party source.

    Inventory Path

    Yes

    Enter the path of the manifest.json file.

    Inventory Domain Name

    Yes (if inventory location is set to third-party source)

    If Inventory Location is third-party source, enter the domain name for accessing the inventory.

    InventoryRegion

    Depends on the data storage class

    If you set the Inventory Location parameter to Alibaba OSS, specify the region in which the OSS inventory list resides.

    Authorize role

    Yes (if inventory location is set to Alibaba OSS)

    Inventory bucket

    Yes

    Enter the name of the Bucket that contains the inventory. The bucket must belong to the Alibaba Cloud account that you are currently using.

    Inventory AccessKey pair

    Yes (if inventory location is set to third-party source)

    When Inventory Location is set to Non-Alibaba OSS , enter the AccessKey pair (including AccessKeyId and SecretAccessKey). This AccessKey pair is used for identity verification to confirm that you have permission to read the inventory file.

    Tunnel

    No

    Select the channel that you want to use.

    Important
    • This parameter is required only when you migrate data from self-managed storage to the cloud, or when you migrate data over a dedicated connection or VPN.

    • An agent is required when the destination is a local file system (LocalFs) or when migrating over a dedicated connection for services like Finance Cloud or Apsara Stack.

    Agent

    No

    Select one or more agents.

    Important
    • This parameter is required only when you migrate data from self-managed storage to the cloud, or when you migrate data over a dedicated connection or VPN.

    • You can select up to 200 agents for a specified channel.

Step 3: Create a destination address

  1. In the left-side navigation pane, go to Data Online Migration > Address Management and click Create Address.

  2. In the Create Address panel, configure the following parameters and click OK.

  3. Parameter

    Required

    Description

    Name

    Yes

    Enter a name for the destination address. The name must meet the following requirements:

    • The name must be 3 to 63 characters in length.

    • The name is case-sensitive and can contain only lowercase letters, digits, hyphens (-), and underscores (_).

    • The name cannot start with a hyphen (-) or an underscore (_).

    Type

    Yes

    Select Alibaba OSS.

    Custom domain name

    No

    Enter the custom domain name for the destination bucket.

    Region

    Yes

    Select the region where the destination bucket is located, for example, China (Hangzhou).

    Role

    Yes

    Bucket

    Yes

    Enter the name of the destination bucket.

    Prefix

    No

    You can specify a prefix to migrate source data to a specific directory. The prefix cannot start with a forward slash (/) but must end with a forward slash (/). For example, data/to/oss/.

    • If you specify a prefix: For example, if a file named example.jpg is under the source prefix example/src/, setting the destination prefix to example/dest/ migrates the file to example/dest/example.jpg.

    • If you do not specify a prefix: The source data is migrated to the root directory of the destination bucket.

    Tunnel

    No

    Select the channel that you want to use.

    Important
    • This parameter is required only when you migrate data from self-managed storage to the cloud, or when you migrate data over a dedicated connection or VPN.

    • An agent is required when the destination is a local file system (LocalFs) or when migrating over a dedicated connection for services like Finance Cloud or Apsara Stack.

    Agent

    No

    Select one or more agents.

    Important
    • This parameter is required only when you migrate data from self-managed storage to the cloud, or when you migrate data over a dedicated connection or VPN.

    • You can select up to 200 agents for a specified channel.

Step 4: Create a migration task

  1. In the navigation pane on the left, choose Data Online Migration > Migration Tasks, and then click Create Task.

  2. On the Select Address page, configure the following parameters, and then click Next.

    Parameter

    Required

    Description

    Name

    Yes

    Enter a name for the migration task. The name must meet the following requirements:

    • The name must be 3 to 63 characters in length.

    • The name is case-sensitive and can contain only lowercase letters, digits, hyphens (-), and underscores (_).

    • The name cannot start with a hyphen (-) or an underscore (_).

    Source Address

    Yes

    Select a previously created source address.

    Destination Address

    Yes

    Select a previously created destination address.

  3. On the Task Configurations page, configure the following parameters.

    Parameter

    Required

    Description

    Basic configurations

    Migration Bandwidth

    No

    Select the migration bandwidth.

    • Default: Uses the maximum available bandwidth. The actual migration speed depends on the file size and the number of files.

    • Specify an upper limit: Specify a bandwidth cap as prompted on the console.

    Important
    • The actual migration bandwidth is affected by factors such as the data source, network conditions, destination-side throttling, and file sizes. The bandwidth may not reach the specified upper limit.

    • Evaluate your data source, destination, business workloads, and network bandwidth to select a reasonable value. Improper throttling may affect your business operations.

    Files Migrated Per Second

    No

    Select the number of files to migrate per second.

    • Default: The default number of files migrated per second.

    • Specify an upper limit: Specify an upper limit as prompted on the console.

    Important
    • The actual migration rate is affected by factors such as the data source, network conditions, destination-side throttling, and file sizes. The rate may not reach the specified upper limit.

    • Evaluate your data source, destination, business workloads, and network bandwidth to select a reasonable value. Improper throttling may affect your business operations.

    Overwrite Mode

    No

    Select how to handle files with the same name at the destination.

    • Do not overwrite: Skips migrating the file.

    • Overwrite All: The source file overwrites the destination file.

    • Overwrite based on the last modification time:

      • The destination file is overwritten if the source file's last modified time is later.

      • If the last modified times are the same, the destination file is overwritten if its Size or Content-Type differs.

    • Warning
      • The Overwrite based on the last modification time policy does not guarantee that an older file will not overwrite a newer one.

      • If you select Overwrite based on the last modification time, ensure your source data can return metadata such as last modified time, Size, and Content-Type. Otherwise, the overwrite policy may not work as expected and can lead to unintended migration results.

      • If you select Do not overwrite or Overwrite based on last modified time, the service requests object metadata from both the source and destination to perform the comparison. This incurs request fees on both the source and destination.

    Auditing

    Migration Report

    Yes

    Specifies whether to push the migration report.

    • Do not push (default): The migration report is not pushed to the destination bucket.

    • Push: The migration report is pushed to the destination bucket. For the detailed path, see What to do next.

    Important
    • Pushing migration reports consumes storage space at the destination.

    • There may be a delay before the report is pushed.

    • Each task execution record has a unique ID. Note that the migration report is pushed only once. Delete it with caution.

    Migration Logs

    Yes

    Specifies how to deliver the migration log.

    • Do not push (Default): The migration log is not pushed.

    • Push: Pushes the migration log to Log Service. You can view the migration log in Log Service.

    • Push only file error logs.: Pushes only logs for file migration errors to Log Service. You can view these error logs in Log Service.

    If you select Push or Push only file error logs., Online Migration Service creates a project in Log Service named aliyun-oss-import-log-Alibaba Cloud account ID-current region. For example: aliyun-oss-import-log-137918634953****-cn-hangzhou.

    Important

    Ensure that you complete the following actions before selecting Push or Push only file error logs.. Otherwise, the migration task may fail.

    • You have activated Log Service.

    • You have granted the required permissions on the Authorize page.

    Authorize

    No

    This option appears only when Migration Logs is set to Push or Push only file error logs..

    Click Authorize to go to the Cloud Resource Access Authorization page. The system creates a role named AliyunOSSImportSlsAuditRole and grants permissions to the role. Click Agree to Authorization to complete the authorization.

    Filters

    File Name

    No

    Filters files by name.

    Supports Include and Exclude filter rules. The rules follow the regular expression syntax of the RE2 library (only a subset of the syntax is supported). Examples:

    • .*\.jpg$ matches all files ending with .jpg.

    • ^file.* matches all files in the root directory that start with file by default.

      If the source address has a prefix, such as data/to/oss/, you need to use ^data/to/oss/file.* to match all files starting with file under the specified prefix.

    • .*/picture/.* matches a subdirectory named picture at any level.

    Important
    • When the filter rule is Include, all files that match the rule are migrated. If there are multiple rules, files that match any of the rules are migrated.

      For example, you have two files, picture.jpg and picture.png. If you set an Include rule to filter .*\.jpg$, only picture.jpg is migrated. If you also set an Include rule to filter .*\.png$, both files are migrated.

    • When the filter rule is Exclude, files that match the rule are not migrated. If there are multiple rules, files that match any of the rules are not migrated.

      For example, you have two files, picture.jpg and picture.png. If you set an Exclude rule to filter .*\.jpg$, only picture.png is migrated. If you also set an Exclude rule to filter .*\.png$, neither file is migrated.

    • Exclude rules take priority. If a file matches both an Exclude rule and an Include rule, the file is not migrated.

      For example, for the file file.txt, if you set an Exclude rule to filter .*\.txt$ and an Include rule to filter file.*, the file file.txt is not migrated.

    File Modification Time

    No

    Filters files by their last modified time.

    Migrates only the files last modified within the specified time range. Rules:

    • If you specify only a start time of January 1, 2019, and no end time, only files with a last modified time on or after January 1, 2019 are migrated.

    • If you specify only an end time of January 1, 2022, and no start time, only files with a last modified time on or before January 1, 2022 are migrated.

    • If you specify a start time of January 1, 2019 and an end time of January 1, 2022, only files with a last modified time between January 1, 2019 and January 1, 2022, inclusive, are migrated.

    Migration configurations

    Retain file last modified time

    Yes

    Specifies whether to retain the last modified time of the source file.

    • Retain (default): The destination object's last modified time is set to match the source file's.

    • Do not retain: The last modified time is not set.

    Specify the storage class for destination objects

    No

    Specifies whether to set a storage class for destination objects.

    • Specify: Migrated objects use the specified storage class. Supported storage classes:

      • Standard

      • Infrequent Access

      • Archive

      • Cold Archive

      • Deep Cold Archive

    • Do not specify (default): The storage class is not set. Objects migrated to the destination use the default storage class of the destination bucket.

    Important
    • This option is displayed and configurable only if you have been added to the allowlist.

    • This option is supported only for tasks where the destination is OSS.

    Task scheduling

    Execution time

    No

    Important
    1. If a task is still running when its next execution is scheduled, the current run will complete, the scheduled run is skipped, and the task will execute at the next interval.

    2. Concurrent migration task limit: Up to 10 in Chinese mainland and China (Hong Kong) regions, and up to 5 in other regions.

    Specify when to run the migration task.

    • Immediately: Runs the task immediately.

    • At the Specified Time: Sets a daily time window for the task to run. By default, the task starts at the specified start time and pauses at the specified stop time.

    • Periodic Scheduling: Runs the task based on a specified frequency and number of executions.

      • Execution Frequency: Supported frequencies are Hourly, Daily, Weekly, Specific days of the week, and Custom. For details, see Execution frequency.

      • Number of Executions: Specifies the number of times the task runs. If not set, the task runs once by default. For the maximum number of executions, refer to the prompt on the console.

    Important

    You can manually start and pause the task at any time, regardless of the scheduled execution time.

  4. Read the Online Migration Service Agreement, select the checkbox for I have understood and confirmed the compliance commitment statement, and I acknowledge my obligation and responsibility to verify the consistency of migrated data after the migration task is completed, and then click Next.

  5. Review the configuration information. If it is correct, click OK and wait for the migration task to run.

Execution frequency

Execution frequency

Description

Example

Hourly

Run the task once every hour. You can use this option with the maximum number of runs.

The current time is 8:05. The frequency is set to hourly with a maximum of 3 runs. The first run starts at the next hour, 9:00.

  • If a run finishes before the next hour, the second run starts at 10:00. This pattern continues until the specified number of runs is complete.

  • If a run has not finished by the next hour and ends at 12:30, the second run starts at the next hour, 13:00. This pattern continues until the specified number of runs is complete.

Daily

Run the task once a day. You must specify an hour (0-23) for the task to start. You can use this option with the maximum number of runs.

The current time is 8:05. The task is scheduled to run daily at 10:00, with a maximum of 5 runs. The first run starts at 10:00 today.

  • If a run finishes before 10:00 the next day, the second run starts at 10:00 the next day. This pattern continues until the specified number of runs is complete.

  • If a run has not finished by 10:00 the next day and ends at 12:05 the next day, the second run starts at 10:00 on the third day. This pattern continues until the specified number of runs is complete.

Weekly

Run the task once a week. You must specify a day of the week and an hour (0-23) for the task to start. You can use this option with the maximum number of runs.

The current time is Monday, 8:05. The task is scheduled to run every Monday at 10:00, with a maximum of 10 runs. The first run starts at 10:00 today.

  • If a run finishes before 10:00 next Monday, the second run starts at 10:00 next Monday. This pattern continues until the specified number of runs is complete.

  • If a run has not finished by 10:00 next Monday and ends at 12:05 next Monday, the second run starts at 10:00 on the following Monday. This pattern continues until the specified number of runs is complete.

Specific days of the week

Run the task on selected days of the week. You must specify the days and an hour (0-23) for the task to start.

The current time is Wednesday, 8:05. The task is scheduled to run on Mondays, Wednesdays, and Fridays at 10:00. The first run starts at 10:00 today.

  • If a run finishes before 10:00 on Friday, the second run starts at 10:00 on Friday. This pattern continues until the specified number of runs is complete.

  • If a run has not finished by 10:00 on Friday and ends at 12:05 next Monday, the second run starts at 10:00 next Wednesday. This pattern continues until the specified number of runs is complete.

Custom

Use a cron expression to define a custom schedule for the task start time.

Note

A cron expression consists of six space-separated fields that define the execution schedule: second, minute, hour, day of the month, month, and day of the week. The minimum interval is 1 hour.

The following cron expression examples are for reference only. For more options, use a cron expression generator.

  • 0 0 * * * *: Runs the task at the beginning of every hour (0 minutes, 0 seconds).

  • 0 30 0/3 * * ?: Runs the task every 3 hours at 30 minutes past the hour (for example, at 0:30, 3:30, 6:30, 9:30, 12:30, 15:30, 18:30, and 21:30).

  • 0 0 12 * * MON-FRI: Runs the task at 12:00 PM every weekday from Monday to Friday.

  • 0 0 12 1-15 * SAT,SUN: Runs the task at 12:00 PM on weekends (Saturday and Sunday) that fall between the 1st and 15th of the month.

  • 0 30 8 1,15 * *: Runs the task at 8:30 AM on the 1st and 15th of each month.

Step 5: Validate data

The Migration Service only transfers data and does not guarantee its consistency or integrity. After a migration task is complete, validate all migrated data to ensure consistency between the source and destination.

Warning

After the migration task is complete, you must verify the migrated data at the destination. You are solely responsible for any data loss and all associated consequences if you delete the source data before confirming the integrity of the destination data.