Migrate data to OSS with ossimport

更新时间:
复制 MD 格式

ossimport migrates local, third-party, or Object Storage Service (OSS) data to any OSS region. This topic walks through a 500 TB migration from Tencent COS to OSS in distributed mode.

Example scenario

You have 500 TB in Tencent Cloud Object Storage (COS) in Guangzhou and need to migrate it to an OSS bucket in China (Hangzhou) within one week without disrupting business.

Deployment modes

ossimport supports two deployment modes:

Mode

Data volume

Use case

Standalone mode

Less than 30 TB

Small-scale migrations

Distributed mode

More than 30 TB

Large-scale migrations

This 500 TB scenario requires distributed mode.

Note

Data Online Migration offers a simpler alternative. Background information.

Prerequisites

Ensure that you have:

  • Activated OSS (Activate OSS) and created a bucket (Create buckets) in China (Hangzhou).

  • Created a Resource Access Management (RAM) user with OSS access permissions and obtained the AccessKey pair. Preparations.

  • (Optional) Purchased Elastic Compute Service (ECS) instances in the same region as the OSS bucket. Use on-premises machines for small deployments or ECS for large ones. This example uses General-purpose instance families (g series). To release instances after migration, select a billing method based on your requirements.

  • Configured ossimport in distributed mode on the ECS instances, including conf/job.cfg, conf/sys.properties, and concurrency settings. Overview (discontinued) | Distributed deployment.

Calculate the ECS instance count

Use this formula:

Number of ECS instances = X / Y / (Z / 100)

Variable

Description

X

Data to migrate (TB)

Y

Migration duration (days)

Z

Throughput per ECS instance (Mbit/s)

Each instance migrates about Z/100 TB per day. At 200 Mbit/s, that is about 2 TB/day.

Example: 500 TB in 7 days at 200 Mbit/s per instance:

500 / 7 / (200 / 100) = 36 ECS instances

Data flow

Distributed mode transfers data in two stages:

  1. ossimport pulls data from COS (Guangzhou) to ECS in China (Hangzhou) over the internet.

  2. ossimport pushes data from ECS to OSS over the internal network (same region).

Migration architecture diagram

Fees

Migration costs include:

Fee type

Description

Source access fees

Charged by the source provider

Destination access fees

OSS request and storage charges

Outbound traffic fees

Egress bandwidth charged by the source provider

ECS instance fees

Compute costs during migration

Data storage fees

Scales with migration duration for volumes over 1 TB

More ECS instances shorten migration time and can reduce total outbound traffic and storage costs.

Procedure

Step 1: Migrate historical data

Migrate all data last modified before time T1.

Migration.

Important

T1 is a UNIX timestamp (seconds since January 1, 1970, 00:00:00 UTC). Generate it with:

date +%s

Step 2: Configure mirroring-based back-to-origin rules

During migration, new data continues arriving at the source. Configure mirroring-based back-to-origin rules on the destination bucket to automatically fetch missing objects.

Overview.

Step 3: Migrate incremental data

Set importSince to T1 in job.cfg and restart the task to migrate data created between T1 and T2.

Step 4: Switch your business system to OSS

Switch all read/write operations to OSS. Record this time as T2.

Note

After the switchover:

  • All read/write operations use OSS.

  • Third-party storage data becomes a historical copy. Retain or delete it as needed.

  • ossimport migrates and verifies data but does not delete source data.

References