Migrate data to OSS with ossimport
ossimport migrates local, third-party, or Object Storage Service (OSS) data to any OSS region. This topic walks through a 500 TB migration from Tencent COS to OSS in distributed mode.
Example scenario
You have 500 TB in Tencent Cloud Object Storage (COS) in Guangzhou and need to migrate it to an OSS bucket in China (Hangzhou) within one week without disrupting business.
Deployment modes
ossimport supports two deployment modes:
|
Mode |
Data volume |
Use case |
|
Standalone mode |
Less than 30 TB |
Small-scale migrations |
|
Distributed mode |
More than 30 TB |
Large-scale migrations |
This 500 TB scenario requires distributed mode.
Data Online Migration offers a simpler alternative. Background information.
Prerequisites
Ensure that you have:
-
Activated OSS (Activate OSS) and created a bucket (Create buckets) in China (Hangzhou).
-
Created a Resource Access Management (RAM) user with OSS access permissions and obtained the AccessKey pair. Preparations.
-
(Optional) Purchased Elastic Compute Service (ECS) instances in the same region as the OSS bucket. Use on-premises machines for small deployments or ECS for large ones. This example uses General-purpose instance families (g series). To release instances after migration, select a billing method based on your requirements.
-
Configured ossimport in distributed mode on the ECS instances, including
conf/job.cfg,conf/sys.properties, and concurrency settings. Overview (discontinued) | Distributed deployment.
Calculate the ECS instance count
Use this formula:
Number of ECS instances = X / Y / (Z / 100)
|
Variable |
Description |
|
X |
Data to migrate (TB) |
|
Y |
Migration duration (days) |
|
Z |
Throughput per ECS instance (Mbit/s) |
Each instance migrates about Z/100 TB per day. At 200 Mbit/s, that is about 2 TB/day.
Example: 500 TB in 7 days at 200 Mbit/s per instance:
500 / 7 / (200 / 100) = 36 ECS instances
Data flow
Distributed mode transfers data in two stages:
-
ossimport pulls data from COS (Guangzhou) to ECS in China (Hangzhou) over the internet.
-
ossimport pushes data from ECS to OSS over the internal network (same region).
Fees
Migration costs include:
|
Fee type |
Description |
|
Source access fees |
Charged by the source provider |
|
Destination access fees |
OSS request and storage charges |
|
Outbound traffic fees |
Egress bandwidth charged by the source provider |
|
ECS instance fees |
Compute costs during migration |
|
Data storage fees |
Scales with migration duration for volumes over 1 TB |
More ECS instances shorten migration time and can reduce total outbound traffic and storage costs.
Procedure
Step 1: Migrate historical data
Migrate all data last modified before time T1.
T1 is a UNIX timestamp (seconds since January 1, 1970, 00:00:00 UTC). Generate it with:
date +%s
Step 2: Configure mirroring-based back-to-origin rules
During migration, new data continues arriving at the source. Configure mirroring-based back-to-origin rules on the destination bucket to automatically fetch missing objects.
Step 3: Migrate incremental data
Set importSince to T1 in job.cfg and restart the task to migrate data created between T1 and T2.
Step 4: Switch your business system to OSS
Switch all read/write operations to OSS. Record this time as T2.
After the switchover:
-
All read/write operations use OSS.
-
Third-party storage data becomes a historical copy. Retain or delete it as needed.
-
ossimport migrates and verifies data but does not delete source data.