Migrate data between Apsara File Storage for HDFS and a MySQL database

更新时间:
复制 MD 格式

This topic describes how to use Sqoop to migrate data between Apsara File Storage for HDFS and a MySQL database.

Prerequisites

  • You have activated the Apsara File Storage for HDFS service and created a file system instance and a mount point. For more information, see Quick Start for HDFS.

  • You have deployed a Hadoop cluster. We recommend that you use Hadoop 2.7.2 or later. This topic uses Apache Hadoop 2.8.5 as an example.

  • You have installed Java Development Kit (JDK) 1.8 or later on all nodes in the Hadoop cluster. For more information, see Download JDK.

Background information

Sqoop is an open-source tool designed to efficiently transfer bulk data between Hadoop and structured datastores such as relational databases. Use Sqoop to import data from a relational database (such as MySQL, Oracle, or PostgreSQL) into Apsara File Storage for HDFS, or to export data from Apsara File Storage for HDFS to a relational database.

Step 1: Mount an Apsara File Storage for HDFS instance on a Hadoop cluster

Mount the Apsara File Storage for HDFS file system on the Hadoop cluster. For more information, see Mount a file system.

Step 2: Install Sqoop

Sqoop is available in two versions, Sqoop1 and Sqoop2, which are incompatible. This topic uses Sqoop 1.4.7, a stable version of Sqoop1, as an example.

  1. Download Sqoop 1.4.7.

  2. Run the following command to extract the installation package.

    tar -zxf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/
  3. Configure environment variables.

    1. Run the vim /etc/profile command to open the configuration file and add the following content:

      export SQOOP_HOME=/usr/local/sqoop-1.4.7.bin__hadoop-2.6.0
      export PATH=$PATH:$SQOOP_HOME/bin
    2. Run the source /etc/profile command for the configuration to take effect.

  4. Add the database driver.

    1. Download the MySQL Connector package.

      wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.48.tar.gz
    2. Extract the MySQL Connector package.

      tar -zxf mysql-connector-java-5.1.48.tar.gz
    3. Copy the MySQL Connector JAR file to the lib directory of your Sqoop installation.

      cp ./mysql-connector-java-5.1.48/mysql-connector-java-5.1.48.jar ${SQOOP_HOME}/lib/
  5. Modify the configuration file.

    1. Run the following command to copy sqoop-env-template.sh and rename it to sqoop-env.sh.

      cp ${SQOOP_HOME}/conf/sqoop-env-template.sh ${SQOOP_HOME}/conf/sqoop-env.sh
    2. Run the vim ${SQOOP_HOME}/conf/sqoop-env.sh command to open the configuration file and add the following content.

      export HADOOP_COMMON_HOME=/usr/local/hadoop-2.8.5
      export HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME
      export HIVE_HOME=/usr/local/apache-hive-2.3.9-bin   # This configuration is not required if Hive is not installed.
    3. Run the cp ${HIVE_HOME}/lib/hive-common-2.3.9.jar ${SQOOP_HOME}/lib/ command to copy the file.

  6. Run the following command to verify the database connection.

    sqoop list-databases --connect jdbc:mysql://<dburi> --username 'username' --password 'password' 

    Parameter

    Description

    dburi

    The database connection URL. For example: jdbc:mysql://0.0.0.0:3306/.

    username

    The username for logging in to the database.

    password

    The user's password.

    If the output lists the names of your MySQL databases, the connection is successful. The following output indicates a successful connection.

    19/08/21 11:07:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    19/08/21 11:07:12 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    information_schema
    hivemeta
    hue
    mysql
    performance_schema
    test

Step 3: Migrate data

The following sections describe different data migration methods and how to verify the results.

Practice 1: Migrate data from Apsara File Storage for HDFS to MySQL

To migrate data from Apsara File Storage for HDFS to MySQL, first create a table with a matching schema in MySQL. Then, run the sqoop export command on a Sqoop node.

This example shows how to migrate data from the mysqltest.txt file in the /sqoop2mysql/table/ HDFS directory. The file contains the following data.

6,test_user6,2019-08-10,Male
7,test_user7,2019-08-11,Male
8,test_user8,2019-08-12,Male
9,test_user9,2019-08-13,Female
10,test_user10,2019-08-14,Female
  1. Run the create database sqoop_migrate; command to create a MySQL database.

  2. Run the following commands to create a table.

    use sqoop_migrate;
    CREATE TABLE `employee` (
      `e_id` varchar(20) NOT NULL DEFAULT '',
      `e_name` varchar(20) NOT NULL DEFAULT '',
      `e_birth` varchar(20) NOT NULL DEFAULT '',
      `e_sex` varchar(10) NOT NULL DEFAULT '',
      PRIMARY KEY (`e_id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 
  3. Run the following command to migrate the data.

     sqoop export --connect jdbc:mysql://localhost:3306/sqoop_migrate  --username 'userName' --password 'userPW'  --num-mappers 1  --table employee  --columns "e_id,e_name,e_birth,e_sex"  --export-dir '/sqoop2mysql/table/mysqltest.txt'  --fields-terminated-by ',' 

    Command syntax: sqoop export --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <hdfs-dir>

    Parameter

    Description

    dburi

    The database connection URL. For example: jdbc:mysql://localhost:3306/. If your connection URL contains parameters, enclose it in single quotation marks. For example: 'jdbc:mysql://localhost:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8'.

    dbname

    The name of the database. Example: user.

    username

    The username for logging in to the database.

    password

    The user's password.

    tablename

    The name of the table in the MySQL database.

    hdfs-dir

    The Apsara File Storage for HDFS directory that stores the data to be migrated.

  4. Verify the migration.

    Run the select * from employee; command to view the table data. If the table contains the following data, the migration was successful.

    mysql> select * from employee;
    +------+-------------+------------+--------+
    | e_id | e_name      | e_birth    | e_sex  |
    +------+-------------+------------+--------+
    | 10   | test_user10 | 2019-08-14 | Female |
    | 6    | test_user6  | 2019-08-10 | Male   |
    | 7    | test_user7  | 2019-08-11 | Male   |
    | 8    | test_user8  | 2019-08-12 | Male   |
    | 9    | test_user9  | 2019-08-13 | Female |
    +------+-------------+------------+--------+
    5 rows in set (0.00 sec)

Practice 2: Migrate data from MySQL to HDFS

On a Sqoop node, use the sqoop import command to migrate data from MySQL to Apsara File Storage for HDFS.

This example shows how to migrate the employee table from MySQL. The table contains the following data.

mysql> select * from employee;
+------+-------------+------------+--------+
| e_id | e_name      | e_birth    | e_sex  |
+------+-------------+------------+--------+
| 10   | test_user10 | 2019-08-14 | Female |
| 6    | test_user6  | 2019-08-10 | Male   |
| 7    | test_user7  | 2019-08-11 | Male   |
| 8    | test_user8  | 2019-08-12 | Male   |
| 9    | test_user9  | 2019-08-13 | Female |
+------+-------------+------------+--------+
5 rows in set (0.00 sec)
  1. Run the following command to migrate the data.

    sqoop import --connect jdbc:mysql://localhost:3306/sqoop_migrate --username 'userid' --password 'userPW' --table employee  --target-dir /mysql2sqoop/table/sqoop_migrate  --num-mappers 1  --columns "e_id,e_name,e_birth,e_sex"  --direct

    Command syntax: sqoop import --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --check-column <col> --incremental <mode> --last-value <value> --target-dir <hdfs-dir>

    For more information about the parameters, see Sqoop Import.

    Parameter

    Description

    dburi

    The database connection URL. For example: jdbc:mysql://172.x.x.x:3306/.

    If your connection URL contains parameters, enclose it in single quotation marks. For example: 'jdbc:mysql://172.x.x.x.235:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8'.

    dbname

    The name of the database. Example: user.

    username

    The username for logging in to the database.

    password

    The user's password.

    tablename

    The name of the table in the MySQL database.

    col

    The name of a column in the table to be migrated.

    mode

    Specifies how Sqoop identifies new rows. Valid values are append and lastmodified.

    value

    The maximum value of the check column from the previous import.

    hdfs-dir

    The target directory in Apsara File Storage for HDFS.

  2. Check the migration result.

    1. Run the hadoop fs -ls /mysql2sqoop/table/sqoop_migrate command to list the migrated files.

      [root@master1 test]# /usr/local/hadoop-2.8.5/bin/hadoop fs -ls /mysql2sqoop/table/sqoop_migrate
      Found 2 items
      -rwxrwxrwx   3 root root          0 2021-11-09 11:34 /mysql2sqoop/table/sqoop_migrate/_SUCCESS
      -rwxrwxrwx   3 root root        157 2021-11-09 11:34 /mysql2sqoop/table/sqoop_migrate/part-m-00000
    2. Run the hadoop fs -cat /mysql2sqoop/table/sqoop_migrate/part-m-00000 command to view the contents of the file.

      If the part-m-00000 file contains the following content, the migration was successful.

      [root@master1 test]# /usr/local/hadoop-2.8.5/bin/hadoop fs -cat /mysql2sqoop/table/sqoop_migrate/part-m-00000
      10,test_user10,2019-08-14,Female
      6,test_user6,2019-08-10,Male
      7,test_user7,2019-08-11,Male
      8,test_user8,2019-08-12,Male
      9,test_user9,2019-08-13,Female

Example 3: MySQL to Hive

On a Sqoop node, use the sqoop import command to migrate data from MySQL to Hive.

This example shows how to migrate the employee table from MySQL. The table contains the following data.

mysql> select * from employee;
+------+-------------+------------+--------+
| e_id | e_name      | e_birth    | e_sex  |
+------+-------------+------------+--------+
| 10   | test_user10 | 2019-08-14 | Female |
| 6    | test_user6  | 2019-08-10 | Male   |
| 7    | test_user7  | 2019-08-11 | Male   |
| 8    | test_user8  | 2019-08-12 | Male   |
| 9    | test_user9  | 2019-08-13 | Female |
+------+-------------+------------+--------+
5 rows in set (0.00 sec)
  1. Run the following command to migrate the data.

    sqoop import --connect jdbc:mysql://localhost:3306/sqoop_migrate --username 'userid' --password 'PW'   --table employee   --hive-import --hive-database default  --create-hive-table --hive-overwrite  -m 1

    Command syntax: sqoop import --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --target-dir <hdfs-dir> --hive-table <hive-tablename>

    Parameter

    Description

    dburi

    The database connection URL. For example: jdbc:mysql://localhost:3306/. If your connection URL contains parameters, enclose it in single quotation marks. For example: 'jdbc:mysql://localhost:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8'.

    dbname

    The name of the database. Example: user.

    username

    The username for logging in to the database.

    password

    The user's password.

    tablename

    The name of the table in the MySQL database.

    hdfs-dir

    The target directory in Apsara File Storage for HDFS.

    hive-tablename

    The name of the corresponding table in Hive.

  2. Verify the migration.

    Run the select * from default.employee; command to view the table data. If the table contains the expected data, the migration was successful.

Example 4: Hive to MySQL

To migrate data from Hive to MySQL, first create a table with a matching schema in MySQL. Then, run the sqoop export command on a Sqoop node.

This example shows how to migrate data from the default.employee table in Hive. The table contains the following data.

hive> select * from default.employee;
OK
10	test_user10	2019-08-14	Female
6	test_user6	2019-08-10	Male
7	test_user7	2019-08-11	Male
8	test_user8	2019-08-12	Male
9	test_user9	2019-08-13	Female
Time taken: 1.153 seconds, Fetched: 5 row(s)
  1. Create the target table in the sqoop_migrate database in MySQL.

    use sqoop_migrate;
    CREATE TABLE `employee_from_hive`(
      `id` VARCHAR(20),
      `name` VARCHAR(20) NOT NULL DEFAULT '',
      `birth` VARCHAR(20) NOT NULL DEFAULT '',
      `sex` VARCHAR(10) NOT NULL DEFAULT '',
      PRIMARY KEY(`id`)
    );
  2. Run the DESCRIBE FORMATTED default.employee; command to view the table information.

    hive> DESCRIBE FORMATTED default.employee;
    OK
    # col_name            data_type            comment
    e_id                 string
    e_name               string
    e_birth              string
    e_sex                string
    
    # Detailed Table Information
    Database:            default
    Owner:               root
    CreateTime:          Tue Nov 09 15:08:51 CST 2021
    LastAccessTime:      UNKNOWN
    Retention:           0
    Location:            dfs://f-xxx.cn-beijing.dfs.aliyuncs.com:10290/user/hive/warehouse/employee
    Table Type:          MANAGED_TABLE
    Table Parameters:
            comment              Imported by sqoop on 2021/11/09 15:08:42
            numFiles             1
            numRows              0
            rawDataSize          0
            totalSize            157
            transient_lastDdlTime 1636441731
    
    # Storage Information
    SerDe Library:       org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    InputFormat:         org.apache.hadoop.mapred.TextInputFormat
    OutputFormat:        org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    Compressed:          No
    Num Buckets:         -1
    Bucket Columns:      []
    Sort Columns:        []
    Storage Desc Params:
            field.delim          \u0001
            line.delim           \n
            serialization.format \u0001
    Time taken: 0.038 seconds, Fetched: 35 row(s)
  3. Run the following command to migrate the data.

    sqoop export --connect jdbc:mysql://localhost:3306/sqoop_migrate --username 'userid' --password 'userPW' --table employee_from_hive -m 1 --fields-terminated-by '\0001'  --export-dir /user/hive/warehouse/employee

    Command syntax: sqoop export --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <hive-dir> --fields-terminated-by <delimiter>

    Parameter

    Description

    dburi

    The database connection URL. For example: jdbc:mysql://localhost:3306/. If your connection URL contains parameters, enclose it in single quotation marks. For example: 'jdbc:mysql://localhost:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8'.

    dbname

    The name of the database. Example: user.

    username

    The username for logging in to the database.

    password

    The user's password.

    tablename

    The name of the table in the MySQL database.

    hive-dir

    The Apsara File Storage for HDFS directory that stores the data to be migrated.

    delimiter

    The delimiter used by the data in the Hive table.

  4. Verify the migration.

    Run the select * from sqoop_migrate.employee_from_hive; command to view the table data. If the table contains the following data, the migration was successful.

    mysql> select * from sqoop_migrate.employee_from_hive;
    +------+-----------------+------------+------+
    | id   | name            | birth      | sex  |
    +------+-----------------+------------+------+
    |   10 | test_user10     | 2019-08-14 | Female |
    |    6 | test_user6      | 2019-08-10 | Male   |
    |    7 | test_user7      | 2019-08-11 | Male   |
    |    8 | test_user8      | 2019-08-12 | Male   |
    |    9 | test_user9      | 2019-08-13 | Female |
    +------+-----------------+------------+------+
    5 rows in set (0.00 sec)