Prerequisites
-
You have activated the Apsara File Storage for HDFS service and created a file system instance and a mount point. For more information, see Quick Start for HDFS.
-
You have deployed a Hadoop cluster. We recommend that you use Hadoop 2.7.2 or later. This topic uses Apache Hadoop 2.8.5 as an example.
-
You have installed Java Development Kit (JDK) 1.8 or later on all nodes in the Hadoop cluster. For more information, see Download JDK.
Background information
Sqoop is an open-source tool designed to efficiently transfer bulk data between Hadoop and structured datastores such as relational databases. Use Sqoop to import data from a relational database (such as MySQL, Oracle, or PostgreSQL) into Apsara File Storage for HDFS, or to export data from Apsara File Storage for HDFS to a relational database.
Step 1: Mount an Apsara File Storage for HDFS instance on a Hadoop cluster
Mount the Apsara File Storage for HDFS file system on the Hadoop cluster. For more information, see Mount a file system.
Step 2: Install Sqoop
Sqoop is available in two versions, Sqoop1 and Sqoop2, which are incompatible. This topic uses Sqoop 1.4.7, a stable version of Sqoop1, as an example.
-
Download Sqoop 1.4.7.
-
Run the following command to extract the installation package.
tar -zxf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/
-
Configure environment variables.
-
Run the vim /etc/profile command to open the configuration file and add the following content:
export SQOOP_HOME=/usr/local/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=$PATH:$SQOOP_HOME/bin
-
Run the source /etc/profile command for the configuration to take effect.
-
Add the database driver.
-
Download the MySQL Connector package.
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.48.tar.gz
-
Extract the MySQL Connector package.
tar -zxf mysql-connector-java-5.1.48.tar.gz
-
Copy the MySQL Connector JAR file to the lib directory of your Sqoop installation.
cp ./mysql-connector-java-5.1.48/mysql-connector-java-5.1.48.jar ${SQOOP_HOME}/lib/
-
Modify the configuration file.
-
Run the following command to copy sqoop-env-template.sh and rename it to sqoop-env.sh.
cp ${SQOOP_HOME}/conf/sqoop-env-template.sh ${SQOOP_HOME}/conf/sqoop-env.sh
-
Run the vim ${SQOOP_HOME}/conf/sqoop-env.sh command to open the configuration file and add the following content.
export HADOOP_COMMON_HOME=/usr/local/hadoop-2.8.5
export HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME
export HIVE_HOME=/usr/local/apache-hive-2.3.9-bin # This configuration is not required if Hive is not installed.
-
Run the cp ${HIVE_HOME}/lib/hive-common-2.3.9.jar ${SQOOP_HOME}/lib/ command to copy the file.
-
Run the following command to verify the database connection.
sqoop list-databases --connect jdbc:mysql://<dburi> --username 'username' --password 'password'
|
Parameter
|
Description
|
|
dburi
|
The database connection URL. For example: jdbc:mysql://0.0.0.0:3306/.
|
|
username
|
The username for logging in to the database.
|
|
password
|
The user's password.
|
If the output lists the names of your MySQL databases, the connection is successful. The following output indicates a successful connection.
19/08/21 11:07:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/08/21 11:07:12 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hivemeta
hue
mysql
performance_schema
test
Step 3: Migrate data
The following sections describe different data migration methods and how to verify the results.
Practice 1: Migrate data from Apsara File Storage for HDFS to MySQL
To migrate data from Apsara File Storage for HDFS to MySQL, first create a table with a matching schema in MySQL. Then, run the sqoop export command on a Sqoop node.
This example shows how to migrate data from the mysqltest.txt file in the /sqoop2mysql/table/ HDFS directory. The file contains the following data.
6,test_user6,2019-08-10,Male
7,test_user7,2019-08-11,Male
8,test_user8,2019-08-12,Male
9,test_user9,2019-08-13,Female
10,test_user10,2019-08-14,Female
-
Run the create database sqoop_migrate; command to create a MySQL database.
-
Run the following commands to create a table.
use sqoop_migrate;
CREATE TABLE `employee` (
`e_id` varchar(20) NOT NULL DEFAULT '',
`e_name` varchar(20) NOT NULL DEFAULT '',
`e_birth` varchar(20) NOT NULL DEFAULT '',
`e_sex` varchar(10) NOT NULL DEFAULT '',
PRIMARY KEY (`e_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-
Run the following command to migrate the data.
sqoop export --connect jdbc:mysql://localhost:3306/sqoop_migrate --username 'userName' --password 'userPW' --num-mappers 1 --table employee --columns "e_id,e_name,e_birth,e_sex" --export-dir '/sqoop2mysql/table/mysqltest.txt' --fields-terminated-by ','
Command syntax: sqoop export --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <hdfs-dir>
|
Parameter
|
Description
|
|
dburi
|
The database connection URL. For example: jdbc:mysql://localhost:3306/. If your connection URL contains parameters, enclose it in single quotation marks. For example: 'jdbc:mysql://localhost:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8'.
|
|
dbname
|
The name of the database. Example: user.
|
|
username
|
The username for logging in to the database.
|
|
password
|
The user's password.
|
|
tablename
|
The name of the table in the MySQL database.
|
|
hdfs-dir
|
The Apsara File Storage for HDFS directory that stores the data to be migrated.
|
-
Verify the migration.
Run the select * from employee; command to view the table data. If the table contains the following data, the migration was successful.
mysql> select * from employee;
+------+-------------+------------+--------+
| e_id | e_name | e_birth | e_sex |
+------+-------------+------------+--------+
| 10 | test_user10 | 2019-08-14 | Female |
| 6 | test_user6 | 2019-08-10 | Male |
| 7 | test_user7 | 2019-08-11 | Male |
| 8 | test_user8 | 2019-08-12 | Male |
| 9 | test_user9 | 2019-08-13 | Female |
+------+-------------+------------+--------+
5 rows in set (0.00 sec)
Practice 2: Migrate data from MySQL to HDFS
On a Sqoop node, use the sqoop import command to migrate data from MySQL to Apsara File Storage for HDFS.
This example shows how to migrate the employee table from MySQL. The table contains the following data.
mysql> select * from employee;
+------+-------------+------------+--------+
| e_id | e_name | e_birth | e_sex |
+------+-------------+------------+--------+
| 10 | test_user10 | 2019-08-14 | Female |
| 6 | test_user6 | 2019-08-10 | Male |
| 7 | test_user7 | 2019-08-11 | Male |
| 8 | test_user8 | 2019-08-12 | Male |
| 9 | test_user9 | 2019-08-13 | Female |
+------+-------------+------------+--------+
5 rows in set (0.00 sec)
-
Run the following command to migrate the data.
sqoop import --connect jdbc:mysql://localhost:3306/sqoop_migrate --username 'userid' --password 'userPW' --table employee --target-dir /mysql2sqoop/table/sqoop_migrate --num-mappers 1 --columns "e_id,e_name,e_birth,e_sex" --direct
Command syntax: sqoop import --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --check-column <col> --incremental <mode> --last-value <value> --target-dir <hdfs-dir>
For more information about the parameters, see Sqoop Import.
|
Parameter
|
Description
|
|
dburi
|
The database connection URL. For example: jdbc:mysql://172.x.x.x:3306/.
If your connection URL contains parameters, enclose it in single quotation marks. For example: 'jdbc:mysql://172.x.x.x.235:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8'.
|
|
dbname
|
The name of the database. Example: user.
|
|
username
|
The username for logging in to the database.
|
|
password
|
The user's password.
|
|
tablename
|
The name of the table in the MySQL database.
|
|
col
|
The name of a column in the table to be migrated.
|
|
mode
|
Specifies how Sqoop identifies new rows. Valid values are append and lastmodified.
|
|
value
|
The maximum value of the check column from the previous import.
|
|
hdfs-dir
|
The target directory in Apsara File Storage for HDFS.
|
-
Check the migration result.
-
Run the hadoop fs -ls /mysql2sqoop/table/sqoop_migrate command to list the migrated files.
[root@master1 test]# /usr/local/hadoop-2.8.5/bin/hadoop fs -ls /mysql2sqoop/table/sqoop_migrate
Found 2 items
-rwxrwxrwx 3 root root 0 2021-11-09 11:34 /mysql2sqoop/table/sqoop_migrate/_SUCCESS
-rwxrwxrwx 3 root root 157 2021-11-09 11:34 /mysql2sqoop/table/sqoop_migrate/part-m-00000
-
Run the hadoop fs -cat /mysql2sqoop/table/sqoop_migrate/part-m-00000 command to view the contents of the file.
If the part-m-00000 file contains the following content, the migration was successful.
[root@master1 test]# /usr/local/hadoop-2.8.5/bin/hadoop fs -cat /mysql2sqoop/table/sqoop_migrate/part-m-00000
10,test_user10,2019-08-14,Female
6,test_user6,2019-08-10,Male
7,test_user7,2019-08-11,Male
8,test_user8,2019-08-12,Male
9,test_user9,2019-08-13,Female
Example 3: MySQL to Hive
On a Sqoop node, use the sqoop import command to migrate data from MySQL to Hive.
This example shows how to migrate the employee table from MySQL. The table contains the following data.
mysql> select * from employee;
+------+-------------+------------+--------+
| e_id | e_name | e_birth | e_sex |
+------+-------------+------------+--------+
| 10 | test_user10 | 2019-08-14 | Female |
| 6 | test_user6 | 2019-08-10 | Male |
| 7 | test_user7 | 2019-08-11 | Male |
| 8 | test_user8 | 2019-08-12 | Male |
| 9 | test_user9 | 2019-08-13 | Female |
+------+-------------+------------+--------+
5 rows in set (0.00 sec)
-
Run the following command to migrate the data.
sqoop import --connect jdbc:mysql://localhost:3306/sqoop_migrate --username 'userid' --password 'PW' --table employee --hive-import --hive-database default --create-hive-table --hive-overwrite -m 1
Command syntax: sqoop import --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --target-dir <hdfs-dir> --hive-table <hive-tablename>
|
Parameter
|
Description
|
|
dburi
|
The database connection URL. For example: jdbc:mysql://localhost:3306/. If your connection URL contains parameters, enclose it in single quotation marks. For example: 'jdbc:mysql://localhost:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8'.
|
|
dbname
|
The name of the database. Example: user.
|
|
username
|
The username for logging in to the database.
|
|
password
|
The user's password.
|
|
tablename
|
The name of the table in the MySQL database.
|
|
hdfs-dir
|
The target directory in Apsara File Storage for HDFS.
|
|
hive-tablename
|
The name of the corresponding table in Hive.
|
-
Verify the migration.
Run the select * from default.employee; command to view the table data. If the table contains the expected data, the migration was successful.
Example 4: Hive to MySQL
To migrate data from Hive to MySQL, first create a table with a matching schema in MySQL. Then, run the sqoop export command on a Sqoop node.
This example shows how to migrate data from the default.employee table in Hive. The table contains the following data.
hive> select * from default.employee;
OK
10 test_user10 2019-08-14 Female
6 test_user6 2019-08-10 Male
7 test_user7 2019-08-11 Male
8 test_user8 2019-08-12 Male
9 test_user9 2019-08-13 Female
Time taken: 1.153 seconds, Fetched: 5 row(s)
-
Create the target table in the sqoop_migrate database in MySQL.
use sqoop_migrate;
CREATE TABLE `employee_from_hive`(
`id` VARCHAR(20),
`name` VARCHAR(20) NOT NULL DEFAULT '',
`birth` VARCHAR(20) NOT NULL DEFAULT '',
`sex` VARCHAR(10) NOT NULL DEFAULT '',
PRIMARY KEY(`id`)
);
-
Run the DESCRIBE FORMATTED default.employee; command to view the table information.
hive> DESCRIBE FORMATTED default.employee;
OK
# col_name data_type comment
e_id string
e_name string
e_birth string
e_sex string
# Detailed Table Information
Database: default
Owner: root
CreateTime: Tue Nov 09 15:08:51 CST 2021
LastAccessTime: UNKNOWN
Retention: 0
Location: dfs://f-xxx.cn-beijing.dfs.aliyuncs.com:10290/user/hive/warehouse/employee
Table Type: MANAGED_TABLE
Table Parameters:
comment Imported by sqoop on 2021/11/09 15:08:42
numFiles 1
numRows 0
rawDataSize 0
totalSize 157
transient_lastDdlTime 1636441731
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \u0001
line.delim \n
serialization.format \u0001
Time taken: 0.038 seconds, Fetched: 35 row(s)
-
Run the following command to migrate the data.
sqoop export --connect jdbc:mysql://localhost:3306/sqoop_migrate --username 'userid' --password 'userPW' --table employee_from_hive -m 1 --fields-terminated-by '\0001' --export-dir /user/hive/warehouse/employee
Command syntax: sqoop export --connect jdbc:mysql://<dburi>/<dbname> --username <username> --password <password> --table <tablename> --export-dir <hive-dir> --fields-terminated-by <delimiter>
|
Parameter
|
Description
|
|
dburi
|
The database connection URL. For example: jdbc:mysql://localhost:3306/. If your connection URL contains parameters, enclose it in single quotation marks. For example: 'jdbc:mysql://localhost:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8'.
|
|
dbname
|
The name of the database. Example: user.
|
|
username
|
The username for logging in to the database.
|
|
password
|
The user's password.
|
|
tablename
|
The name of the table in the MySQL database.
|
|
hive-dir
|
The Apsara File Storage for HDFS directory that stores the data to be migrated.
|
|
delimiter
|
The delimiter used by the data in the Hive table.
|
-
Verify the migration.
Run the select * from sqoop_migrate.employee_from_hive; command to view the table data. If the table contains the following data, the migration was successful.
mysql> select * from sqoop_migrate.employee_from_hive;
+------+-----------------+------------+------+
| id | name | birth | sex |
+------+-----------------+------------+------+
| 10 | test_user10 | 2019-08-14 | Female |
| 6 | test_user6 | 2019-08-10 | Male |
| 7 | test_user7 | 2019-08-11 | Male |
| 8 | test_user8 | 2019-08-12 | Male |
| 9 | test_user9 | 2019-08-13 | Female |
+------+-----------------+------------+------+
5 rows in set (0.00 sec)