This topic describes how to create an ApsaraDB for MySQL external table and write data to it over the public network or a VPC.
Introduction
ApsaraDB for RDS (Relational Database Service) is a managed relational database service from Alibaba Cloud. You typically access ApsaraDB for RDS instances by using an internal endpoint. MaxCompute lets you read data from and write data to tables in an ApsaraDB for MySQL instance.
Limitations
-
Region limitations: The following regions are supported: China (Beijing), China (Shanghai), China (Zhangjiakou), China (Ulanqab), China (Hangzhou), China (Shenzhen), China (Hong Kong), China (Shanghai) Finance Cloud (Zone F), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).
-
Engine limitations: Only ApsaraDB for MySQL versions 5.x and 8.0 are supported. Other ApsaraDB for RDS engines are not supported.
-
PrivateZone domain names are not supported.
-
ApsaraDB for MySQL external tables do not support the cluster property.
-
When writing large amounts of data to an ApsaraDB for MySQL external table, MaxCompute uses a parallel multi-process writing method. In rare cases, a write process can rewrite data, resulting in duplicates.
-
Scale limit: For RDS external tables in MaxCompute, the
DECIMALdata type has a default scale of 18 that cannot be modified. The data type is created asDECIMAL(38,18). If you need fewer decimal places, you can define the data type asStringwhen you create the external table. Then, use theCASTfunction to perform a type conversion when you use the data.
Notes on schema mismatch
If the schema of the ApsaraDB for MySQL source table does not match the schema of the external table:
-
Column count mismatch: If the ApsaraDB for MySQL source table has fewer columns than specified in the DDL of the external table, the system reports an error when it reads the data. An example error is
Unknown column 'xxx' in 'field list'. If the ApsaraDB for MySQL source table has more columns than specified in the DDL of the external table, the system discards the data from the extra columns. -
Column type mismatch: You cannot read STRING data from the source table as the INT type. While you can read INT data as the STRING type, it is not recommended.
Create an external table
Syntax
Table and column names are case-insensitive. Forcing case conversion is not supported.
-- Enable Hive-compatible mode.
SET odps.sql.hive.compatible = true;
CREATE EXTERNAL TABLE <table_name>(
<col_name1> <data_type>,
<col_name2> <data_type>,
......
)
STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' -- The handler for JDBC-connected data sources.
location '<jdbc:mysql://<realm_name:port>/<rds_database_name>?useSSL=false&user=<user_name>&password=<password_value>&table=<rds_table_name>>'
TBLPROPERTIES(
['odps.federation.jdbc.colmapping'='<col_name1:rdstable_colname1|select_alias1>,[<col_name2:rdstable_colname2|select_alias2>,...]',]
'mcfed.mapreduce.jdbc.input.query'='<select_sentence>',
'networklink'='<networklink_name>');
Parameters
-
table_name: Required. The name of the external table.
-
col_name: Required. The name of a column in the external table.
-
data_type: Required. The data type of the column.
-
jdbc:mysql://realm_name:port/rds_database_name?useSSL=false&user=user_name&password=password_value&table=rds_table_name: Required. The connection string for the ApsaraDB for MySQL source table.
If the connection string contains special characters, you must URL-encode them. For more information, see URL_ENCODE.
-
realm_name:port: The internal endpoint and port of the ApsaraDB for RDS instance.
-
Log on to the RDS console.
-
In the navigation pane on the left, click Instances. Then, select a region in the upper-left corner.
-
On the Instances page, click the target instance's Instance ID/Name to open its details page.
-
In the left navigation pane, click Database Connection.
-
You can view the database's Internal Endpoint, Public Endpoint, and Internal Port.
-
-
rds_database_name: The name of the ApsaraDB for MySQL database.
-
user_name: The username of the ApsaraDB for MySQL database account.
-
password_value: The password of the ApsaraDB for MySQL database account.
-
rds_table_name: The name of the ApsaraDB for MySQL source table.
-
-
TBLPROPERTIES:
-
odps.federation.jdbc.colmapping: Optional.
The mapping between the columns of the MaxCompute external table and the columns of the ApsaraDB for MySQL source table. The number of mapped columns must match the number of columns that are defined in the MaxCompute external table.
In the mapping, rdstable_colname is a column name in the ApsaraDB for MySQL source table (used to map all columns), and select_alias is a column alias in the query result (used to map specified columns).
-
If you do not configure this parameter, MaxCompute maps columns by name.
-
If you map only some columns, MaxCompute maps them as specified and attempts to map the remaining columns by name. An error occurs if a name or type mismatch is found for the automatically mapped columns.
-
-
mcfed.mapreduce.jdbc.input.query: Optional.
Specifies a query to read data from the ApsaraDB for MySQL source table. The schema of the external table (column count, names, and data types) must exactly match the schema of the query result, including any aliases. The format of
select_sentenceisSELECT xxx FROM <rds_database_name>.<rds_table_name>. -
networklink: Required. The name of the MaxCompute network connection for the VPC where the ApsaraDB for RDS instance resides.
-
Log on to the MaxCompute console, and select a region in the upper-left corner.
-
In the navigation pane on the left, choose .
-
On the Network Connection page, find the name of the network connection for the VPC where your ApsaraDB for RDS instance resides.
After you log in to the RDS console and select an instance, click Database Connection in the left navigation bar to view the VPC where the database is located.
-
-
Procedure
Follow these steps to create a MaxCompute external table mapped to an ApsaraDB for MySQL data source and load data into it:
-
Ensure that a network connection is established between MaxCompute and ApsaraDB for RDS. For more information, see Solutions for accessing the public network.
After the network connection is established, MaxCompute can connect only to the network of the specified VPC ID. To access other regions or other VPCs in the same region, you must establish connectivity between the connected VPC and the target VPCs based on your existing VPC connectivity solution.
-
Log on to the ApsaraDB for MySQL database, create a table, and insert data. For more information, see Log on to an ApsaraDB for MySQL instance by using Data Management Service (DMS).
-
Log on to the RDS console.
-
In the navigation pane on the left, click Instances. Then, select a region in the upper-left corner.
-
If you do not have any instances, click Create Instance on the Instances page. Otherwise, click the Instance ID/Name of the target instance to open its details page.
When you create an instance, set the RDS engine to ApsaraDB for MySQL 5.x or 8.0. Other ApsaraDB for RDS engines are not supported.
-
In the left navigation pane, click Databases.
-
Click Create Database. Configure the following parameters:
Parameter
Required
Description
Example
Database name
Yes
-
The name must be 2 to 64 characters in length.
-
The name must start with a letter and end with a letter or a digit.
-
The name can contain lowercase letters, digits, underscores (_), and hyphens (-).
-
The database name must be unique within the instance.
-
If a database name contains a
-, the-in the name of the created database folder is changed to@002d.
rds_mc_testSupported character set
Yes
Select a character set based on your business requirements.
utf8Authorized account
Optional
-
Select an account that is authorized to access the database. You can leave this parameter empty and attach an account after the database is created.
-
Only standard accounts are displayed. Privileged accounts have permissions to access all databases and do not require authorization.
DefaultDescription
Optional
An optional description for the database, up to 256 characters, to help with management.
Test database for an ApsaraDB for MySQL external table -
Click Log On to Database. In the left navigation pane, select Database Instances. Double-click the database that you created. On the SQLConsole page, execute the following statements to create a test table and write test data.
If the instance exists but the target database is not displayed after you expand the instance, the reason may be one of the following:
The logon account does not have access to the target database: You can go to the Accounts page of the RDS instance to modify the account permissions or change the logon database account.
The metadata is not synchronized, which prevents the directory from being displayed: Hover the mouse pointer over the instance that contains the target database. Click the
button to the right of the instance name to refresh the database list.
-
Example statements:
CREATE TABLE `rds_mc_external` ( `id` int(11) DEFAULT NULL, `name` varchar(32) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO `rds_mc_external`(`id` ,`name` ) VALUES(1,"Alice"); INSERT INTO `rds_mc_external`(`id` ,`name` ) VALUES(1,"Bob");
-
-
Create an external table in the MaxCompute client that maps to the ApsaraDB for MySQL data source
Map all columns
-
In the MaxCompute client, create an external table whose column names match the column names in the ApsaraDB for MySQL table. Example:
SET odps.sql.hive.compatible = true; CREATE EXTERNAL TABLE mc_vpc_rds_external ( id INT, name STRING) STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' location 'jdbc:mysql://rm-2ze01y92y1tzp****.mysql.rds.aliyuncs.com:3306/rds_mc_test?useSSL=false&user=<your_username>&password=<your_password>&table=rds_mc_external' TBLPROPERTIES( 'odps.federation.jdbc.colmapping'='key:id,value:name', 'mcfed.mapreduce.jdbc.input.query'='select * from rds_mc_test.rds_mc_external', 'networklink'='<your_network_connection_name>'); -
Insert data into the new MaxCompute table.
INSERT INTO TABLE mc_vpc_rds_external VALUES(2,"Zoey"); -
Query the result.
-- Query the data insertion result. SELECT * FROM mc_vpc_rds_external; -- The following result is returned: +------------+------------+ | id | name | +------------+------------+ | 1 | Alice | | 1 | Bob | | 2 | Zoey | +------------+------------+
Map specified columns
-
In the MaxCompute client, create an external table and map its columns to specific columns in the ApsaraDB for MySQL table. Example:
SET odps.sql.hive.compatible = true; CREATE EXTERNAL TABLE mc_vpc_rds_external_mapping ( id INT, name STRING ) STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' location 'jdbc:mysql://rm-2ze01y92y1tzp****.mysql.rds.aliyuncs.com:3306/rds_mc_test?useSSL=false&user=<your_username>&password=<your_password>&table=rds_mc_external' TBLPROPERTIES( 'mcfed.mapreduce.jdbc.input.query'='select * from rds_mc_test.rds_mc_external', 'networklink'='<your_network_connection_name>'); -
Insert data into the new MaxCompute table.
INSERT INTO TABLE mc_vpc_rds_external_mapping VALUES(4,"Lisa"); -
Query the data insertion result.
SELECT * FROM mc_vpc_rds_external_mapping; -- The following result is returned: +------------+------------+ | id | name | +------------+------------+ | 1 | Alice | | 1 | Bob | | 4 | Lisa | +------------+------------+
-