Data Warehouse Edition quick start
Welcome to the getting started guide for AnalyticDB for MySQL Data Warehouse Edition. AnalyticDB for MySQL is a cloud-managed, petabyte-scale, high-concurrency, real-time data warehouse built for online analytical processing (OLAP) workloads. This guide shows you how to use a Data Warehouse Edition cluster.
Data Warehouse Edition is no longer available for new purchases. You can purchase Enterprise Edition or Basic Edition. If you have already purchased a Data Warehouse Edition cluster, you can still refer to this guide to continue using the Data Warehouse Edition cluster.
Video tutorial: Data Warehouse Edition
Workflow
If you are new to AnalyticDB for MySQL Data Warehouse Edition, we recommend that you read the following topics first:
Product overview: Learn about the product concepts, benefits, and use cases of AnalyticDB for MySQL.
Pricing: Learn about the pricing details and billing methods for AnalyticDB for MySQL.
Quick start (this guide): Provides a tutorial on how to use an AnalyticDB for MySQL Data Warehouse Edition cluster.
Step 1: Create a database account
AnalyticDB for MySQL supports two types of database accounts: privileged accounts and standard accounts. For more information about the differences between them, see Database account types.
Create a privileged account
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, click Accounts.
On the Accounts page, click Create Privileged Account.
In the Create Account panel, configure the parameters.
Parameter
Description
Database Account
The name of the privileged account. Enter a name that meets the on-screen requirements.
Account Type
For a Data Warehouse Edition cluster, this parameter is automatically set to Privileged Account and cannot be changed.
New Password
The password for the privileged account. Enter a password that meets the on-screen requirements.
Confirm Password
Enter the password again to confirm.
Description
Optional. Enter a description for the account for future reference.
Click OK to create the account.
Create and authorize a standard account
Standard accounts created by running SQL statements on a Data Warehouse Edition cluster are not displayed in the console.
To create a database account, see CREATE USER.
To grant permissions to a database account, see GRANT.
To revoke permissions from a database account, see REVOKE.
To change the name of a database account, see RENAME USER.
To delete a database account, see DROP USER.
Step 2: Configure IP whitelist
By default, an IP address whitelist contains only 127.0.0.1, which prevents any device from accessing the cluster. Add IP addresses to the whitelist to allow access from other devices. For example, you can add the CIDR block 10.10.10.0/24 to allow any IP address in the 10.10.10.X range to access the cluster. To add multiple IP addresses or CIDR blocks, separate them with a comma (,) without any spaces. For example: 192.168.0.1,172.16.213.9.
WarningDo not enter 0.0.0.0 into the IP address whitelist.
If your public IP address changes frequently and you want to allow access from all public IP addresses to your AnalyticDB for MySQL cluster, contact technical support.
An IP address whitelist provides a high level of security for your AnalyticDB for MySQL cluster. We recommend that you maintain your whitelist regularly.
Configuring an IP address whitelist does not affect the normal operation of your AnalyticDB for MySQL cluster. Changes to the whitelist take effect within one minute.
Procedure
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, click Data Security.
On the Whitelist Settings page, find the default whitelist group and click Modify in the Actions column.
NoteYou can also click Create Whitelist to create a custom group.
In the Edit Whitelist dialog box, delete the default IP address 127.0.0.1. Enter the IP addresses or CIDR blocks that need to access the cluster, and then click OK.
NoteTo add your client's public IP address to the whitelist, you must first determine the IP address. For more information, see Connections.
Step 3: Connect to the cluster
AnalyticDB for MySQL supports connecting to an AnalyticDB for MySQL cluster by using Data Management Service (DMS), MySQL clients such as Navicat for MySQL, DBeaver, DBVisualizer, and SQL WorkBench/J, BI visualization tools, or the MySQL command-line tool. You can also connect to an AnalyticDB for MySQL cluster from an application by configuring information such as the cluster connection address, port, and database account.
Using DMS to connect to AnalyticDB for MySQL
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
On the Cluster Information page, click Log On to Database in the upper-right corner.
In the dialog box that appears, enter the logon information.
Parameter
Description
Database Type
The default value is AnalyticDB for MySQL 3.0. You do not need to change it.
Instance Region
The default value is the region where the current instance is located. You do not need to change it.
Cluster ID
The default value is the cluster ID of the current cluster. You do not need to change it.
Database Account
The name of the database account for the cluster.
Database Password
The password for the database account.
NoteYou can select Remember Password to automatically log on to the current AnalyticDB for MySQL cluster in the future without entering the database account and password.
NoteWhen you use DMS to log in to an AnalyticDB for MySQL cluster for the first time, the control mode is set to Flexible Management by default. After you log in, you can also change the control mode by editing the instance. For more information, see Edit an instance and Control modes.
After you configure the logon parameters, you can click Test Connection in the lower-left corner. If the test connection fails, check the cluster information, such as the account or password, based on the error message.
The system automatically adds the server IP addresses of DMS to the IP address whitelist of the database. If the IP addresses are not added automatically, add them manually. For more information, see Step 2: Configure an IP address whitelist and DMS IP address whitelist.
Click Log On.
Connect programmatically to AnalyticDB for MySQL
Use the MySQL command-line tool to connect to AnalyticDB for MySQL
Use the MySQL command-line client to connect to an AnalyticDB for MySQL cluster
Use a client to connect to AnalyticDB for MySQL
Connect AnalyticDB for MySQL to data visualization tools
Step 4: Create a database
You can create a maximum of 2,048 databases in each cluster.
On the SQL INFORMATION_SCHEMA tab, enter a CREATE DATABASE statement in the SQL Console to create a database.
Syntax:
CREATE DATABASE [IF NOT EXISTS] $db_nameParameter description:
db_nameis the database name. The name must start with a lowercase letter, can contain letters, digits, and underscores (_), and cannot contain two or more consecutive underscores (_). The name can be up to 64 characters in length.NoteThe database name cannot be
analyticdbbecause it is a built-in database.Examples:
create database adb_demo;create database if not exists adb_demo2;
Click Execute in the upper-left corner. The database is created.
Step 5: Import and query data
Prerequisites
Create a directory in Object Storage Service (OSS) to store AnalyticDB for MySQL data.
Activate OSS. For more information, see Activate OSS.
Create a bucket. For more information, see Create buckets.
ImportantThe OSS bucket must be in the same region as your AnalyticDB for MySQL cluster.
Create a directory. For more information, see Create a directory.
Upload a test object. For more information, see Upload objects.
In this example, the
oss_import_test_data.txtobject is uploaded to the<bucket-name>.oss-cn-hangzhou.aliyuncs.com/adb/directory in OSS. Rows are delimited by line breaks and columns are delimited by semicolons (;). The sample data is as follows:uid;other 12;hello_world_1 27;hello_world_2 28;hello_world_3 33;hello_world_4 37;hello_world_5 40;hello_world_6 ...
Follow the AnalyticDB for MySQL Getting Started Guide to complete preparations, such as creating a cluster, setting up an IP address whitelist, and creating an account and a database. For more information, see Procedure.
Procedure
Run the CREATE TABLE statement to create an external table in the
adb_demodatabase. For information about the syntax for creating an OSS external table for data in CSV, Parquet, or TEXT format, see Syntax for OSS external tables.Query the OSS data.
Querying an external table uses the same syntax as querying an AnalyticDB for MySQL internal table. You can query the data directly, as shown in the following example:
select uid, other from oss_import_test_external_table where uid < 100 limit 10;For large CSV or TEXT data objects, we strongly recommend that you import the data into AnalyticDB for MySQL before querying for better query performance.
For Parquet data objects, direct queries generally have high performance. You can decide whether to import the data into AnalyticDB for MySQL before querying based on your needs.
Run the CREATE TABLE statement to create a destination table named
adb_oss_import_testin theadb_demodatabase to store the data imported from OSS.CREATE TABLE IF NOT EXISTS adb_oss_import_test ( uid string, other string ) DISTRIBUTED BY HASH(uid);-
Run an INSERT statement to import data from the OSS external table into AnalyticDB for MySQL.
ImportantBy default, the
INSERT INTOandINSERT OVERWRITE SELECTstatements import data synchronously. If you import a large amount of data, such as hundreds of gigabytes, the connection between the client and the AnalyticDB for MySQL server must be maintained for a long time. During this period, network issues may interrupt the connection, which causes the data import to fail. Therefore, if you are importing a large amount of data, we recommend that you use theSUBMIT JOB INSERT OVERWRITE SELECTstatement to import data asynchronously.-
Method 1: Run the
INSERT INTOstatement to import data. If a primary key has duplicate values, the new data is ignored and the existing data is not updated. This behavior is equivalent to usingINSERT IGNORE INTO. For more information, see INSERT INTO. The following is a sample statement:INSERT INTO adb_oss_import_test SELECT * FROM oss_import_test_external_table; -
Method 2: Run the INSERT OVERWRITE statement to import data. This statement overwrites the existing data in the table. The following is a sample statement:
INSERT OVERWRITE adb_oss_import_test SELECT * FROM oss_import_test_external_table; -
Method 3: Run the
INSERT OVERWRITEstatement to import data asynchronously. You can use theSUBMIT JOBstatement to submit an asynchronous task. To accelerate the write task, you can add the/*+ direct_batch_load=true*/hint. For more information, see Asynchronous write. The following is a sample statement:SUBMIT JOB INSERT OVERWRITE adb_oss_import_test SELECT * FROM oss_import_test_external_table;The following result is returned:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2020112122202917203100908203303****** |For more information about how to submit an asynchronous task, see Submit an asynchronous import task.
-
Run the following command to query the data in the
adb_oss_import_testtable.SELECT * FROM adb_oss_import_test;
Syntax for OSS external tables
Non-partitioned OSS external tables
CREATE TABLE [IF NOT EXISTS] table_name
(column_name column_type[, …])
ENGINE='OSS'
TABLE_PROPERTIES='{
"endpoint":"endpoint",
"url":"OSS_LOCATION",
"accessid":"accesskey_id",
"accesskey":"accesskey_secret",
"format":"CSV|ORC|Parquet",
"delimiter":";",
"skip_header_line_count":1,
"charset":"utf-8"
}';
|
External table type |
Parameter |
Required |
Description |
|
External tables for files in CSV, Parquet, or OSS ORC format |
ENGINE='OSS' |
Yes |
The table engine. Set this parameter to OSS. |
|
endpoint |
The Endpoint of the OSS bucket. AnalyticDB for MySQL can access OSS only over a VPC. Note
Log on to the OSS console. Click the name of the destination bucket. On the Overview page of the bucket, view the Endpoint. |
||
|
url |
The path of the OSS file or folder.
|
||
|
accessid |
The AccessKey ID of the Alibaba Cloud account or a RAM user that has the management permissions on OSS. For more information about how to obtain an AccessKey ID, see Accounts and Permissions. |
||
|
accesskey |
The AccessKey secret of the Alibaba Cloud account or a RAM user that has the management permissions on OSS. For more information about how to obtain an AccessKey secret, see Accounts and Permissions. |
||
|
External tables for CSV files |
delimiter |
Defines the column delimiter for the CSV data file. |
|
|
External tables for Parquet or OSS ORC files |
format |
The format of the data file.
Note
|
|
|
External tables for CSV files |
null_value |
No |
Defines the Important
This parameter is supported only by clusters whose kernel version is 3.1.4.2 or later. |
|
ossnull |
The rule that maps values to the
Note
The preceding examples assume that |
||
|
skip_header_line_count |
The number of header rows to skip at the beginning of the file when you import data. The first row of a CSV file is the table header. If you set this parameter to 1, the first row is automatically skipped during data import. The default value is 0, which means no rows are skipped. |
||
|
oss_ignore_quote_and_escape |
Specifies whether to ignore quotation marks and escape characters in field values. The default value is false, which means quotation marks and escape characters are not ignored. Important
This parameter is supported only by clusters whose kernel version is 3.1.4.2 or later. |
||
|
charset |
The character set of the OSS external table. Valid values:
Important
This parameter is supported only by clusters whose kernel version is 3.1.10.4 or later. |
-
The column names and their order in the CREATE EXTERNAL TABLE statement must match those in the source Parquet or ORC file. Column names are case-insensitive.
-
You can create an external table using a subset of columns from the source file. Columns not specified in the CREATE EXTERNAL TABLE statement are ignored.
-
If the CREATE EXTERNAL TABLE statement includes a column that does not exist in the Parquet or ORC file, queries on that column return NULL.
AnalyticDB for MySQL lets you read and write Hive TEXT files using OSS external tables in CSV format. The following is a sample statement to create such a table:
CREATE TABLE adb_csv_hive_format_oss (
a tinyint,
b smallint,
c int,
d bigint,
e boolean,
f float,
g double,
h varchar,
i varchar, -- binary
j timestamp,
k DECIMAL(10, 4),
l varchar, -- char(10)
m varchar, -- varchar(100)
n date
) ENGINE = 'OSS' TABLE_PROPERTIES='{
"format": "csv",
"endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
"accessid":"LTAI****************",
"accesskey":"yourAccessKeySecret",
"url":"oss://testBucketname/adb_data/",
"delimiter": "\\1",
"null_value": "\\\\N",
"oss_ignore_quote_and_escape": "true",
"ossnull": 2
}';
When you create an OSS external table in CSV format to read Hive TEXT files, note the following:
-
The default column delimiter for Hive TEXT files is \1. If you want to read and write Hive TEXT files using an OSS external table in CSV format, you must escape
\1as\\1in thedelimiterparameter. -
The default
NULLvalue for Hive TEXT files is \N. If you want to read and write Hive TEXT files using an OSS external table in CSV format, you must escape\Nas\\\\Nin thenull_valueparameter. -
Other primitive data types in Hive, such as
BOOLEAN, map one-to-one to data types in AnalyticDB for MySQL. However, theBINARY,CHAR(n), andVARCHAR(n)types all map to theVARCHARtype in AnalyticDB for MySQL.
Partitioned OSS external tables
If your OSS data source contains partitions, the data is stored in a hierarchical directory structure in OSS. The following is an example:
parquet_partition_classic/
├── p1=2020-01-01
│ ├── p2=4
│ │ ├── p3=SHANGHAI
│ │ │ ├── 000000_0
│ │ │ └── 000000_1
│ │ └── p3=SHENZHEN
│ │ └── 000000_0
│ └── p2=6
│ └── p3=SHENZHEN
│ └── 000000_0
├── p1=2020-01-02
│ └── p2=8
│ ├── p3=SHANGHAI
│ │ └── 000000_0
│ └── p3=SHENZHEN
│ └── 000000_0
└── p1=2020-01-03
└── p2=6
├── p2=HANGZHOU
└── p3=SHENZHEN
└── 000000_0
In the preceding data, p1, p2, and p3 are the level-1, level-2, and level-3 partitions, respectively. For this type of data source, you typically query data by partition. To do this, you must specify the partition key columns when you create the OSS external table. The following statement is an example of how to create a partitioned OSS external table for a Parquet file:
CREATE TABLE [IF NOT EXISTS] table_name
(column_name column_type[, …])
ENGINE='OSS'
TABLE_PROPERTIES='{
"endpoint":"endpoint",
"url":"OSS_LOCATION",
"accessid":"accesskey_id",
"accesskey":"accesskey_secret",
"format":"parquet",
"partition_column":"p1, p2, p3"
}';
-
The
partition_columnparameter inTABLE_PROPERTIESspecifies the partition key columns, such as p1, p2, and p3. The partition key columns must be declared in thepartition_columnparameter in order from level-1 to level-3 partitions. -
The column definition must include the partition key columns, such as p1, p2, and p3, and their data types. The partition key columns must be placed at the end of the column definition.
-
The order of the partition key columns in the column definition must match the order in the
partition_columnparameter. -
Partition key columns support the following data types:
BOOLEAN,TINYINT,SMALLINT,INT,INTEGER,BIGINT,FLOAT,DOUBLE,DECIMAL,VARCHAR,STRING,DATE, andTIMESTAMP. -
When you query data, partition key columns are displayed and used in the same way as other data columns.
-
If you do not specify the format, the default format is CSV.
-
For more information about other parameters, see Parameter description.
References
For more ways to import data, see Supported data sources.