Full and incremental data synchronization

更新时间:
复制 MD 格式

This topic covers the use cases, features, benefits, and limitations of full and incremental data synchronization and explains how to create data synchronization tasks.

Use cases

  • Major version upgrades: for example, from HBase 1.x to HBase 2.x.

  • Cross-region migration: for example, from the China (Qingdao) region to the China (Beijing) region.

  • Cross-account migration: Migrating data between different accounts.

  • Cluster upgrades: for example, migrating from a 4-core, 8 GB cluster to an 8-core, 16 GB cluster.

  • Workload splitting: for example, moving a portion of your workloads to a new cluster.

Features

  • Supports zero-downtime migration between any of the following versions: HBase 0.94, HBase 0.98, HBase 1.x, HBase 2.x, and Lindorm.

  • Supports table schema migration, real-time data replication, and full data migration.

  • Supports migration at the database, namespace, and table levels.

  • Supports renaming tables during migration.

  • Allows you to specify a time range, row key range, or specific columns for migration.

  • Provides an OpenAPI that you can use to create migration tasks.

Benefits

  • Achieve zero-downtime data migration by handling both historical data migration and real-time incremental data synchronization in a single task.

  • The migration process reads data directly from the HDFS of the source cluster without interacting with the HBase service, minimizing the impact on your online services.

  • File-level data copying is more efficient, typically reducing data transfer by over 50% compared to API-level migration.

  • A single node can migrate data at up to 150 MB/s. You can horizontally scale the number of nodes to support stable migration of terabytes or petabytes of data.

  • Includes a robust error retry mechanism, real-time monitoring of task speed and progress, and alerts for task failures.

  • Automatically synchronizes the schema to ensure partition consistency.

Limitations

  • Clusters with Kerberos enabled are not supported.

  • Single-node ApsaraDB for HBase instances are not supported.

  • ApsaraDB for HBase instances in the classic network are not supported due to network limitations.

  • Incremental data synchronization is implemented asynchronously based on HBase write-ahead logging (WAL). Data imported using BulkLoad or written without a WAL will not be synchronized by Lindorm Tunnel Service (LTS).

Log management for incremental synchronization

  • After you enable incremental synchronization, if data is not consumed, logs are retained for 48 hours by default. After this retention period, the subscription is automatically canceled, and the retained data is deleted.

  • Data might not be consumed in the following scenarios: the Lindorm Tunnel Service (LTS) cluster is released before the task is terminated, the synchronization task is suspended, or the task is blocked by an error.

Usage notes

  • Before migration, ensure that the target cluster has sufficient HDFS capacity to prevent storage exhaustion during the process.

  • Before you start an incremental data synchronization task, increase the log retention period on the source cluster. This provides extra time to handle potential errors. Set the hbase.master.logcleaner.ttl parameter in hbase-site.xml to more than 12 hours and restart the HMaster.

  • You do not need to create tables in the target cluster. Lindorm Tunnel Service (LTS) automatically creates tables that match the schema and partition information of the source cluster. If you manually create tables, their partitions might be inconsistent with the source tables, leading to frequent splits and compactions after migration, which can be time-consuming for large tables.

  • If a source table uses a coprocessor, ensure that the target cluster contains the corresponding coprocessor JAR file before creating the target table.

Before you begin

  1. Verify network connectivity between the source cluster, the target cluster, and Lindorm Tunnel Service (LTS).

  2. Add the HBase and Lindorm data sources.

  3. Log on to the Lindorm Tunnel Service (LTS) console.

Create a task

  1. In the left-side navigation pane, choose Lindorm/HBase Migration > Quick Migration.

  2. Click Create Task.

  • Task name: Optional. The name can contain only letters and digits. If you leave this field empty, the task ID is used as the name.

  • Set the Source Cluster and Target Cluster.

  • Select the required operations:

    • Table Schema Migration: Creates tables in the target cluster with the same schema and partition information as the source tables.

    • Real-time Data Replication: Synchronizes incremental data from the source cluster in real time.

    • Historical Data Migration: Migrates all existing data at the file level.

  • Table Mapping: Enter the names of the tables to migrate, separated by line breaks.

  • Advanced Configuration: Optional.

View a task

  1. In the left-side navigation pane, choose Lindorm/HBase Migration > Quick Migration to view your tasks.

View task details

  1. In the left-side navigation pane, choose Lindorm/HBase Migration > Quick Migration.

  2. Click the name of the task to view its execution status.

Switchover

  1. Wait for the full data migration to complete and the incremental synchronization latency to drop to a few seconds or hundreds of milliseconds.

  2. Enable data sampling and verification in Lindorm Tunnel Service (LTS). For large tables, use a low sampling rate to avoid impacting online workloads.

  3. Validate your application.

  4. Perform the switchover.