Migrate from OSS to a local file system
Review the considerations and limitations, then follow the procedure to migrate data from Alibaba Cloud OSS to a local file system.
Considerations
Keep the following in mind when migrating data:
-
The Directory To Be Migrated must be an absolute path that starts and ends with a forward slash (/). The path cannot contain environment variables or special characters.
-
Ensure the Directory To Be Migrated exists and is valid.
-
The migration service uses the standard public APIs of the source storage provider to access source data. The service's behavior depends on the provider's specific API implementation.
-
Migration consumes resources at both the source and destination. To avoid impacting business-critical workloads, configure bandwidth throttling or run the task during off-peak hours.
-
The service inspects files at both addresses before migration. If overwrite mode is enabled, destination files with matching names are overwritten. To prevent data loss when file contents differ, rename or back up destination files before starting the migration.
After migration completes, you must verify the data integrity at the destination. You are solely responsible for any data loss caused by deleting source data before completing verification.
Migration limitations
-
If static website hosting is enabled for the source bucket, the migration scan may detect nonexistent directories. For example, uploading myapp/resource/1.jpg causes the scan to detect myapp/, myapp/resource/, and myapp/resource/1.jpg. Migration of myapp/ and myapp/resource/ fails, but myapp/resource/1.jpg migrates successfully.
-
If a source object name ends with a forward slash (/), its attributes are applied to the corresponding directory on the destination file system. The task overwrite mode is ignored for such objects.
-
If the source bucket contains objects whose names end with a forward slash (/), they are treated as directories on the destination. For example, if the source contains the objects a/, a/b/, and a/b/c/, this creates the nested directory path a/b/c/ after migration.
-
You can migrate data from only one bucket per migration task. To migrate data from an entire account, you must create a separate task for each bucket.
-
Migration from Alibaba Finance Cloud or Alibaba Gov Cloud is not supported.
-
Migrating data from Alibaba Cloud OSS to a local file system does not preserve all object attributes:
-
Supported: LastModifyTime is migrated and set as ModifyTime on the local file system.
-
Unsupported (not exhaustive): x-oss-meta-*, Content-Type, Cache-Control, Content-Encoding, Content-Disposition, Content-Language, Expires, StorageClass, Acl, server-side encryption, tagging, and user-defined headers such as x-oss-persistent-header.
NoteThe behavior of unlisted attributes is not guaranteed. Always verify your data at the destination after migration.
-
Step 1: Select a region
-
Log in to the Data Online Migration console with the RAM user.
-
In the top-left corner of the top navigation bar, select the agent's region.
Important-
Tunnels, agents, data addresses, and migration tasks are region-specific. Choose your region carefully.
-
Select the agent's region. If unavailable, select the nearest region.
-
Step 2: Create a tunnel
-
In the left-side navigation pane, choose Data Online Migration > Channel Management, and then click Create Tunnel.
-
In the Create Tunnel dialog box, configure the following parameters and click OK.
Parameter
Required
Description
Name
Yes
The name of the tunnel.
-
The name cannot be empty and can be up to 100 characters in length.
-
The name can contain letters, digits, hyphens (-), and underscores (_).
Maximum Bandwidth
Yes
The maximum bandwidth that the tunnel can use.
-
If you do not configure this parameter, the default value 0 is used, which indicates that the bandwidth for the tunnel is not limited.
-
If you configure this parameter, enter a value based on the note in the console.
ImportantThe bandwidth that is available for the tunnel depends on the actual bandwidth of the network connection.
Requests/s
Yes
The maximum number of requests per second over the tunnel.
-
If you do not configure this parameter, the default value 0 is used, which indicates that the number of requests per second over the tunnel is not limited.
-
If you configure this parameter, enter a value based on the note in the console.
WarningWe recommend that you evaluate the capabilities of the storage system of the data source before you configure this parameter. If you set this parameter to a great value, your business is affected. We recommend that you enter a value based on the note in the console.
-
Manage tunnels in Tunnel Management.
Step 3: Create an agent
-
In the left-side navigation pane, choose Data Online Migration > Agent Management, and then click New Agent.
-
In the New Agent dialog box, configure the following parameters, and then click OK.
Parameter
Required
Description
Name
Yes
The agent's name.
-
The name must be 3 to 63 characters long.
-
The name can contain lowercase letters, digits, hyphens (-), and underscores (_). The name is case-sensitive.
-
The name must be UTF-8 encoded and cannot start with a hyphen (-) or an underscore (_).
-
Network Type
Yes
The network connection type for the agent. The following options are available:
-
VPC (Recommended): The agent connects to the Data Online Migration service over a VPC. This method requires the machine that hosts the agent to access the internal endpoint of Data Online Migration in the corresponding region. For example, if you use Data Online Migration in the China (Beijing) region, the agent machine must be able to access the internal endpoint {TunnelId}.cn-beijing.mgw-tc-internal.aliyuncs.com. We recommend using an ECS instance in the same region as the Data Online Migration console to deploy the agent.
-
Public network: The agent connects to the Data Online Migration service over the public network. This method requires the machine that hosts the agent to access the public endpoint of Data Online Migration in the corresponding region. For example, if you use Data Online Migration in the China (Beijing) region, the agent machine must be able to access the public endpoint {TunnelId}.cn-beijing.mgw-tc.aliyuncs.com.
Note-
{TunnelId} is a placeholder for the tunnel ID.
-
You can use the
pingcommand to test the network connectivity between the agent and the Data Online Migration service.
Deployment method
Yes
The agent's deployment method. Currently, only standalone process deployment is supported.
Tunnel
Yes
The tunnel to associate with the agent. An agent can be associated with only one tunnel. The bandwidth of the agent is limited by the total bandwidth of the tunnel.
For example, a tunnel named tunnel-1 is configured with a maximum bandwidth of 10 Gbit/s. tunnel-1 is associated with three agents: agent-1, agent-2, and agent-3. The combined bandwidth of these three agents cannot exceed 10 Gbit/s. If agent-1 is allocated 3 Gbit/s of bandwidth, only 7 Gbit/s of bandwidth remains available for agent-2 and agent-3. Plan and allocate your bandwidth carefully.
-
-
Generate the agent deployment script. For instructions, see Generate agent deployment script.
Manage agents in Agent Management.
Step 4: Create a source address
-
In the left-side navigation pane, choose Data Online Migration > Address Management, and then click Create Address.
-
In the Create Address panel, configure the following parameters and click OK.
Parameter
Required
Description
Name
Yes
Enter a source name. Requirements:
-
3 to 63 characters in length.
-
Case-sensitive. Allows only lowercase letters, digits, hyphens (-), and underscores (_).
-
Cannot start with a hyphen (-) or underscore (_).
Type
Yes
Select OSS.
Region
Yes
Select the region of the source bucket, for example, China (Hangzhou).
RAM role
Yes
-
If the source bucket belongs to your Alibaba Cloud account:
-
If the source bucket belongs to a different Alibaba Cloud account:
Bucket
Yes
Enter the source bucket name.
Agent
No
Select one or more agents.
Important-
Required only for self-managed-to-cloud migration, or migration over a dedicated connection or VPN.
-
You can select up to 200 agents for a specified channel.
-
Step 5: Create a destination address
-
In the left-side navigation pane, choose Data Online Migration > Address Management, and then click Create Address.
-
In the Create Address panel, configure the following parameters and click OK.
Parameter
Required
Description
Name
Yes
Enter a destination name. Requirements:
-
3 to 63 characters in length.
-
Case-sensitive. Allows only lowercase letters, digits, hyphens (-), and underscores (_).
-
Cannot start with a hyphen (-) or underscore (_).
Type
Yes
Select LocalFS.
Directory To Be Migrated
Yes
Path prefix for the destination directory. Source data is migrated to this location.
The path must be an absolute path that starts and ends with a forward slash (/). Environment variables and special characters are not supported.
For example, consider a source data address with the prefix
/example/src/that contains the file example.jpg. If you set this parameter to/example/dest/, the file is migrated to/example/dest/example.jpg.ImportantIf a data address is associated with multiple agents, ensure each agent can access the directory. Otherwise, some data may fail to migrate.
Tunnel
Yes
Select the channel to use.
Important-
Required only for self-managed-to-cloud migration, or migration over a dedicated connection or VPN.
-
An agent is required when the destination is a local file system (LocalFs) or when migrating over a dedicated connection for services like Finance Cloud or Apsara Stack.
Agent
Yes
Select one or more agents.
Important-
Required only for self-managed-to-cloud migration, or migration over a dedicated connection or VPN.
-
You can select up to 200 agents for a specified channel.
-
Step 6: Create a migration task
-
In the left-side navigation pane, choose Data Online Migration > Migration Tasks, and then click Create Task.
-
On the Select Address page, configure the following parameters, and then click Next.
Parameter
Required
Description
Name
Yes
Enter a task name. Requirements:
-
3 to 63 characters in length.
-
Case-sensitive. Allows only lowercase letters, digits, hyphens (-), and underscores (_).
-
Cannot start with a hyphen (-) or underscore (_).
Source Address
Yes
Select a source address created earlier.
Destination Address
Yes
Select a destination address created earlier.
-
-
On the Task Configurations page, configure the following parameters.
Parameter
Required
Description
Basic settings
Migration Bandwidth
No
Bandwidth limit for migration.
-
Default: Uses the maximum available bandwidth. The actual migration speed depends on the file size and the number of files.
-
Specify an upper limit: Specify a bandwidth cap as prompted on the console.
Important-
Actual bandwidth depends on data source, file sizes, network conditions, and destination throttling. The limit may not be reached.
-
Evaluate your data source, destination, workloads, and network capacity before setting this value. Improper throttling may affect your business.
Files Migrated Per Second
No
Files migrated per second.
-
Default: Uses the default migration rate.
-
Specify an upper limit: Specify an upper limit as prompted on the console.
Important-
Actual rate depends on data source, file sizes, network conditions, and destination throttling. The limit may not be reached.
-
Evaluate your data source, destination, workloads, and network capacity before setting this value. Improper throttling may affect your business.
Overwrite Mode
No
Behavior when a file with the same name exists at the destination.
-
Do not overwrite: Skips migrating the file.
-
Overwrite All: The source file overwrites the destination file.
-
Overwrite based on the last modification time:
-
The destination file is overwritten if the source file's last modified time is later.
-
If the last modified times are the same, the destination file is overwritten if its Size or Content-Type differs.
-
-
The Overwrite based on the last modification time policy does not guarantee that an older file will not overwrite a newer one.
-
If you select Overwrite based on the last modification time, ensure your source data can return metadata such as last modified time, Size, and Content-Type. Otherwise, the overwrite policy may not work as expected and can lead to unintended migration results.
-
If you select Do not overwrite or Overwrite based on last modified time, the service requests object metadata from both the source and destination to perform the comparison. This incurs request fees on both the source and destination.
WarningAudit settings
Migration Report
Yes
Delivery method for the migration report.
-
Do not push (Default): The report is not pushed to the destination.
-
Push: The report is pushed to the destination LocalFS. The report path is described in Subsequent operations.
Important-
Pushing a migration report consumes storage space at the destination.
-
Report delivery may be delayed. Wait for the report to be generated.
-
Each task execution has a unique ID. Reports are pushed only once. Exercise caution when deleting reports.
Migration Logs
Yes
Migration log delivery method.
-
Do not push (Default): Not pushed.
-
Push: Pushes full migration logs to Log Service.
-
Push only file error logs.: Pushes only error logs to Log Service.
If you select Push or Push only file error logs., the service creates a Log Service project named aliyun-oss-import-log-{Alibaba Cloud account ID}-{region}. Example: aliyun-oss-import-log-137918634953****-cn-hangzhou.
ImportantComplete the following before selecting Push or Push only file error logs.. Otherwise, the migration task may fail.
-
Log Service is activated.
-
Required permissions are granted on the Authorize page.
Authorize
No
This option appears only when Migration Logs is set to Push or Push only file error logs..
Click Authorize to go to the Cloud Resource Access Authorization page. The system creates a role named AliyunOSSImportSlsAuditRole and grants permissions to the role. Click Agree to Authorization to complete the authorization.
Filters
File Name
No
A filter for filenames.
Use Include and Exclude rules based on the RE2 library regular expression syntax (only a subset is supported). Examples:
-
.*\.jpg$ matches all files ending with .jpg.
-
^file.* matches all files in the root directory whose names start with file.
If the source address is configured with a prefix, for example data/to/oss/, you must use ^data/to/oss/file.* to match all files under that prefix whose names start with file.
-
.*/picture/.* matches any file within a subdirectory named picture at any level.
Important-
If you configure include rules, any file that matches at least one rule is migrated.
For example, consider two files: picture.jpg and picture.png. If you add an include rule .*\.jpg$, only picture.jpg is migrated. If you also add an include rule .*\.png$, both files are migrated.
-
If you configure exclude rules, any file that matches at least one rule is not migrated.
For example, consider two files: picture.jpg and picture.png. If you add an exclude rule .*\.jpg$, only picture.png is migrated. If you also add an exclude rule .*\.png$, neither file is migrated.
-
Exclude rules take precedence over include rules. A file is not migrated if it matches both an exclude rule and an include rule.
For example, consider the file file.txt. If you configure an exclude rule .*\.txt$ and an include rule file.*, the file file.txt is not migrated.
File Modification Time
No
Filter by last modified time.
Specify a time range to migrate only files modified within that range:
-
If you specify only a start time (for example, January 1, 2019) and no end time, only files last modified on or after January 1, 2019 are migrated.
-
If you specify only an end time (for example, January 1, 2022) and no start time, only files last modified on or before January 1, 2022 are migrated.
-
If you specify a start time of January 1, 2019 and an end time of January 1, 2022, only files last modified on or after January 1, 2019 and on or before January 1, 2022 are migrated.
Migrate special entities
No
Controls whether special entity types are migrated.
Symbolic link:
-
Enabled: Source symbolic links are added to the migration queue and counted in file and storage statistics. Corresponding symbolic link files are created at the destination with attributes based on migratable UserMeta attributes. The
Targetattribute depends on the Convert target option. -
Disabled: Source symbolic links are ignored and excluded from file count and storage statistics.
ImportantObjects that symbolic links point to are not automatically migrated. They migrate only if included in the task scope.
Migration configuration
Convert target
No
Controls whether the
Targetattribute of source symbolic links is converted to ensure migrated links point to correct target files.Important-
This option takes effect only when symbolic link migration is enabled.
-
Regardless of whether the target is converted, the migration service does not check if the target file exists, if its type is valid, or if you have the required access permissions.
Enabled: A string replacement is performed. If the
Targetattribute of the source symbolic link matches the source prefix (SrcPrefix), the prefix is replaced with the destination prefix (DestPrefix). The result is set as theTargetattribute for the destination symbolic link.NoteExample: Assume a migration task is configured with
SrcPrefix="cloud_base/"andDestPrefix="/mnt/nas1/". A source symbolic link object exists atcloud_base/links/a.lnk.-
If its
Targetattribute is "cloud_base/data/a.txt", the replacement matches. The finalTargetat the destination will be "/mnt/nas1/data/a.txt". -
If its
Targetattribute is "cloud_outer/data/a.txt", no match is found. The finalTargetat the destination remains "cloud_outer/data/a.txt".
Disabled: No conversion is performed. The original
Targetattribute from the source symbolic link is set on the destination symbolic link.Preserve last modification time
Yes
Controls whether the last modification time of the source file is preserved.
-
Preserve (Default): The last modification time of the source file is set on the destination file.
-
Do not preserve: The last modification time is not preserved.
Task scheduling
Execution time
No
Important-
If a task is still running when its next execution is scheduled, the current run will complete, the scheduled run is skipped, and the task will execute at the next interval.
-
Concurrent migration task limit: Up to 10 in Chinese mainland and China (Hong Kong) regions, and up to 5 in other regions.
Specify when to run the task.
-
Immediately: Runs the task immediately.
-
At the Specified Time: Sets a daily time window for the task to run. By default, the task starts at the specified start time and pauses at the specified stop time.
-
Periodic Scheduling: Runs the task based on a specified frequency and number of executions.
-
Execution Frequency: Supported frequencies are Hourly, Daily, Weekly, Specific days of the week, and Custom. Execution frequency.
-
Number of Executions: Specifies the number of times the task runs. If not set, the task runs once by default. For the maximum number of executions, refer to the prompt on the console.
-
ImportantYou can manually start or pause the task at any time, regardless of the scheduled execution time.
-
-
Read the Online Migration Service Agreement, select the checkbox for I have understood and confirmed the compliance commitment statement, and I acknowledge my obligation and responsibility to verify the consistency of migrated data after the migration task is completed, and then click Next.
-
Review the configuration and click OK and wait for the migration task to run.
Execution frequency
Execution frequency | Description | Example |
Hourly | Run the task once every hour. You can use this option with the maximum number of runs. | The current time is 8:05. The frequency is set to hourly with a maximum of 3 runs. The first run starts at the next hour, 9:00.
|
Daily | Run the task once a day. You must specify an hour (0-23) for the task to start. You can use this option with the maximum number of runs. | The current time is 8:05. The task is scheduled to run daily at 10:00, with a maximum of 5 runs. The first run starts at 10:00 today.
|
Weekly | Run the task once a week. You must specify a day of the week and an hour (0-23) for the task to start. You can use this option with the maximum number of runs. | The current time is Monday, 8:05. The task is scheduled to run every Monday at 10:00, with a maximum of 10 runs. The first run starts at 10:00 today.
|
Specific days of the week | Run the task on selected days of the week. You must specify the days and an hour (0-23) for the task to start. | The current time is Wednesday, 8:05. The task is scheduled to run on Mondays, Wednesdays, and Fridays at 10:00. The first run starts at 10:00 today.
|
Custom | Use a cron expression to define a custom schedule for the task start time. | Note A cron expression consists of six space-separated fields that define the execution schedule: second, minute, hour, day of the month, month, and day of the week. The minimum interval is 1 hour. The following cron expression examples are for reference only. For more options, use a cron expression generator.
|
Step 7: Verify data
Online Migration Service does not guarantee data consistency or integrity. Validate all migrated data after the task completes.
After migration completes, verify destination data integrity. You are solely responsible for data loss if you delete source data before confirming destination data integrity.