Metadata migration

更新时间:
复制 MD 格式

The metadata migration feature provides a visual interface to migrate metadata from your Hive Metastore to Data Lake Formation (DLF).

Limitations

  • Supported Hive versions: 2.3.x and 3.1.x.

  • Supported database type: MySQL.

Create a metadata migration task

  1. Log on to the Data Lake Formation console.

  2. From the left-side navigation pane, click Metadata > Migrate Metadata.

  3. On the Migration Task tab, click Create Migration Task.

  4. Configure the source database settings and click Next.

    Parameter

    Description

    Database Type

    Only MySQL is supported.

    MySQL Type

    Select the type of MySQL instance that hosts your Hive Metastore.

    • Alibaba Cloud RDS: An ApsaraDB RDS for MySQL instance. For more information, see ApsaraDB RDS for MySQL. You must select an RDS Instance and enter the Name, Username, and Password.

      Important

      Access to RDS metadata requires a Alibaba Cloud VPC connection.

    • Other MySQL Databases: The built-in MySQL of an E-MapReduce (EMR) cluster, a self-managed MySQL database, or another type of MySQL database. You must enter the JDBC URL, Username, and Password.

      Important

      Enter an internal IP address for the JDBC URL and connect using an Alibaba Cloud VPC. If you select Internet, enter a public IP address.

    Network Type

    The available options are Alibaba Cloud VPC and public network connection. Configure this parameter based on the MySQL Type you selected.

    • Alibaba Cloud VPC: Select the VPC, vSwitch, and Security Group associated with your E-MapReduce (EMR) cluster or RDS instance.

    • Internet: If you select this option, add a rule in the E-MapReduce (EMR) console to open port 3306 (default) of the EMR cluster to the DLF service's Elastic IP Address (EIP).

      Note
  5. Configure the migration task settings and click Next.

    Parameter

    Description

    Task Name

    Enter a name for the metadata migration task.

    Task Description

    Optional. Enter a description for the task.

    Catalogs

    Select the target data catalog.

    Conflict Resolution Strategy

    • Update existing metadata (Recommended): This option updates the metadata in DLF based on the source metadata. Existing metadata is not deleted.

    • Rebuild metadata: Deletes the existing DLF metadata before creating new metadata from the source.

    Log Storage Path

    The migration task stores all logs in the specified Object Storage Service (OSS) location.

    Object to Synchronize

    Select the object types to synchronize, including Database, Function, Table, and Partition. All object types are typically selected.

    Location Replacement

    Optional. Use this feature to replace the location of a database or table during migration. For example, when migrating from a traditional HDFS architecture to an OSS-based architecture with decoupled storage and compute, you might need to replace hdfs:// paths with oss:// paths.

  6. Verify the task settings and click OK.

Manage metadata migration tasks

  1. On the Migration Task tab, find the migration task and choose an option from the Actions column:

    • Click Run to start the metadata migration task.

    • Click Runtime Record to view the details of task runs.

    • Click Modify to modify the Source Database Settings and Migration Task Settings.

    • Click Delete to delete the migration task.

    • Click Stop to stop a running task.

  2. Click the Execution History tab, find the target task, and then click View Logs in the Actions column to view its execution logs.

    After the metadata migration is complete, you can check the log for success or failure messages.

Verify migration results

  1. From the left-side navigation pane, click Metadata > Metadata.

  2. Click the Database tab. Select the Catalogs and enter the Name that you synchronized to view the database information.

  3. Click the Data Table tab. Select the Catalogs and Database Name, and then enter the Table Name that you synchronized to view the data table information.

Best Practices

Best practices for migrating EMR metadata to DLF

Related documentation

DLF region and Elastic IP Address (EIP)

Region

Elastic IP Address (EIP)

China (Hangzhou)

121.41.166.235

China (Shanghai)

47.103.63.0

China (Beijing)

47.94.234.203

China (Shenzhen)

39.108.114.206

Singapore

161.117.233.48

Germany (Frankfurt)

8.211.38.47

China (Zhangjiakou)

8.142.121.7

China (Hong Kong)

8.218.148.213