Data Warehouse Edition quick start

更新时间:
复制 MD 格式

Welcome to the getting started guide for AnalyticDB for MySQL Data Warehouse Edition. AnalyticDB for MySQL is a cloud-managed, petabyte-scale, high-concurrency, real-time data warehouse built for online analytical processing (OLAP) workloads. This guide shows you how to use a Data Warehouse Edition cluster.

Important

Data Warehouse Edition is no longer available for new purchases. You can purchase Enterprise Edition or Basic Edition. If you have already purchased a Data Warehouse Edition cluster, you can still refer to this guide to continue using the Data Warehouse Edition cluster.

Video tutorial: Data Warehouse Edition

Workflow

If you are new to AnalyticDB for MySQL Data Warehouse Edition, we recommend that you read the following topics first:

  • Product overview: Learn about the product concepts, benefits, and use cases of AnalyticDB for MySQL.

  • Pricing: Learn about the pricing details and billing methods for AnalyticDB for MySQL.

  • Quick start (this guide): Provides a tutorial on how to use an AnalyticDB for MySQL Data Warehouse Edition cluster.

image

Step 1: Create a database account

AnalyticDB for MySQL supports two types of database accounts: privileged accounts and standard accounts. For more information about the differences between them, see Database account types.

Create a privileged account

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, click Accounts.

  3. On the Accounts page, click Create Privileged Account.

  4. In the Create Account panel, configure the parameters.

    Parameter

    Description

    Database Account

    The name of the privileged account. Enter a name that meets the on-screen requirements.

    Account Type

    For a Data Warehouse Edition cluster, this parameter is automatically set to Privileged Account and cannot be changed.

    New Password

    The password for the privileged account. Enter a password that meets the on-screen requirements.

    Confirm Password

    Enter the password again to confirm.

    Description

    Optional. Enter a description for the account for future reference.

  5. Click OK to create the account.

Create and authorize a standard account

Standard accounts created by running SQL statements on a Data Warehouse Edition cluster are not displayed in the console.

  • To create a database account, see CREATE USER.

  • To grant permissions to a database account, see GRANT.

  • To revoke permissions from a database account, see REVOKE.

  • To change the name of a database account, see RENAME USER.

  • To delete a database account, see DROP USER.

Step 2: Configure IP whitelist

  • By default, an IP address whitelist contains only 127.0.0.1, which prevents any device from accessing the cluster. Add IP addresses to the whitelist to allow access from other devices. For example, you can add the CIDR block 10.10.10.0/24 to allow any IP address in the 10.10.10.X range to access the cluster. To add multiple IP addresses or CIDR blocks, separate them with a comma (,) without any spaces. For example: 192.168.0.1,172.16.213.9.

    Warning

    Do not enter 0.0.0.0 into the IP address whitelist.

  • If your public IP address changes frequently and you want to allow access from all public IP addresses to your AnalyticDB for MySQL cluster, contact technical support.

  • An IP address whitelist provides a high level of security for your AnalyticDB for MySQL cluster. We recommend that you maintain your whitelist regularly.

  • Configuring an IP address whitelist does not affect the normal operation of your AnalyticDB for MySQL cluster. Changes to the whitelist take effect within one minute.

Procedure

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, click Data Security.

  3. On the Whitelist Settings page, find the default whitelist group and click Modify in the Actions column.

    Note

    You can also click Create Whitelist to create a custom group.

  4. In the Edit Whitelist dialog box, delete the default IP address 127.0.0.1. Enter the IP addresses or CIDR blocks that need to access the cluster, and then click OK.

    Note

    To add your client's public IP address to the whitelist, you must first determine the IP address. For more information, see Connections.

Step 3: Connect to the cluster

AnalyticDB for MySQL supports connecting to an AnalyticDB for MySQL cluster by using Data Management Service (DMS), MySQL clients such as Navicat for MySQL, DBeaver, DBVisualizer, and SQL WorkBench/J, BI visualization tools, or the MySQL command-line tool. You can also connect to an AnalyticDB for MySQL cluster from an application by configuring information such as the cluster connection address, port, and database account.

Using DMS to connect to AnalyticDB for MySQL

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

  2. On the Cluster Information page, click Log On to Database in the upper-right corner.

  3. In the dialog box that appears, enter the logon information.

    Parameter

    Description

    Database Type

    The default value is AnalyticDB for MySQL 3.0. You do not need to change it.

    Instance Region

    The default value is the region where the current instance is located. You do not need to change it.

    Cluster ID

    The default value is the cluster ID of the current cluster. You do not need to change it.

    Database Account

    The name of the database account for the cluster.

    Database Password

    The password for the database account.

    Note

    You can select Remember Password to automatically log on to the current AnalyticDB for MySQL cluster in the future without entering the database account and password.

    Note
    • When you use DMS to log in to an AnalyticDB for MySQL cluster for the first time, the control mode is set to Flexible Management by default. After you log in, you can also change the control mode by editing the instance. For more information, see Edit an instance and Control modes.

    • After you configure the logon parameters, you can click Test Connection in the lower-left corner. If the test connection fails, check the cluster information, such as the account or password, based on the error message.

    • The system automatically adds the server IP addresses of DMS to the IP address whitelist of the database. If the IP addresses are not added automatically, add them manually. For more information, see Step 2: Configure an IP address whitelist and DMS IP address whitelist.

  4. Click Log On.

Connect programmatically to AnalyticDB for MySQL

Use the MySQL command-line tool to connect to AnalyticDB for MySQL

Use the MySQL command-line client to connect to an AnalyticDB for MySQL cluster

Use a client to connect to AnalyticDB for MySQL

Connect AnalyticDB for MySQL to data visualization tools

Step 4: Create a database

Note

You can create a maximum of 2,048 databases in each cluster.

  1. On the SQL INFORMATION_SCHEMA tab, enter a CREATE DATABASE statement in the SQL Console to create a database.

    • Syntax: CREATE DATABASE [IF NOT EXISTS] $db_name

    • Parameter description: db_name is the database name. The name must start with a lowercase letter, can contain letters, digits, and underscores (_), and cannot contain two or more consecutive underscores (_). The name can be up to 64 characters in length.

      Note

      The database name cannot be analyticdb because it is a built-in database.

    • Examples:

      create database adb_demo;                          
      create database if not exists adb_demo2;                         
  2. Click Execute in the upper-left corner. The database is created.

Step 5: Import and query data

Prerequisites

  • Create a directory in Object Storage Service (OSS) to store AnalyticDB for MySQL data.

    1. Activate OSS. For more information, see Activate OSS.

    2. Create a bucket. For more information, see Create buckets.

      Important

      The OSS bucket must be in the same region as your AnalyticDB for MySQL cluster.

    3. Create a directory. For more information, see Create a directory.

    4. Upload a test object. For more information, see Upload objects.

      In this example, the oss_import_test_data.txt object is uploaded to the <bucket-name>.oss-cn-hangzhou.aliyuncs.com/adb/ directory in OSS. Rows are delimited by line breaks and columns are delimited by semicolons (;). The sample data is as follows:

      uid;other
      12;hello_world_1
      27;hello_world_2
      28;hello_world_3
      33;hello_world_4
      37;hello_world_5
      40;hello_world_6
      ...     
  • Follow the AnalyticDB for MySQL Getting Started Guide to complete preparations, such as creating a cluster, setting up an IP address whitelist, and creating an account and a database. For more information, see Procedure.

Procedure

  1. Run the CREATE TABLE statement to create an external table in the adb_demo database. For information about the syntax for creating an OSS external table for data in CSV, Parquet, or TEXT format, see Syntax for OSS external tables.

  2. Query the OSS data.

    Querying an external table uses the same syntax as querying an AnalyticDB for MySQL internal table. You can query the data directly, as shown in the following example:

    select uid, other from oss_import_test_external_table where uid < 100 limit 10;
    • For large CSV or TEXT data objects, we strongly recommend that you import the data into AnalyticDB for MySQL before querying for better query performance.

    • For Parquet data objects, direct queries generally have high performance. You can decide whether to import the data into AnalyticDB for MySQL before querying based on your needs.

  3. Run the CREATE TABLE statement to create a destination table named adb_oss_import_test in the adb_demo database to store the data imported from OSS.

    CREATE TABLE IF NOT EXISTS adb_oss_import_test
    (
        uid string,
        other string
    )
    DISTRIBUTED BY HASH(uid);                  
  4. Run an INSERT statement to import data from the OSS external table into AnalyticDB for MySQL.

    Important

    By default, the INSERT INTO and INSERT OVERWRITE SELECT statements import data synchronously. If you import a large amount of data, such as hundreds of gigabytes, the connection between the client and the AnalyticDB for MySQL server must be maintained for a long time. During this period, network issues may interrupt the connection, which causes the data import to fail. Therefore, if you are importing a large amount of data, we recommend that you use the SUBMIT JOB INSERT OVERWRITE SELECT statement to import data asynchronously.

    • Method 1: Run the INSERT INTO statement to import data. If a primary key has duplicate values, the new data is ignored and the existing data is not updated. This behavior is equivalent to using INSERT IGNORE INTO. For more information, see INSERT INTO. The following is a sample statement:

      INSERT INTO adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;
    • Method 2: Run the INSERT OVERWRITE statement to import data. This statement overwrites the existing data in the table. The following is a sample statement:

      INSERT OVERWRITE adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;
    • Method 3: Run the INSERT OVERWRITE statement to import data asynchronously. You can use the SUBMIT JOB statement to submit an asynchronous task. To accelerate the write task, you can add the /*+ direct_batch_load=true*/ hint. For more information, see Asynchronous write. The following is a sample statement:

      SUBMIT JOB INSERT OVERWRITE adb_oss_import_test
      SELECT * FROM oss_import_test_external_table;

      The following result is returned:

      +---------------------------------------+
      | job_id                                |
      +---------------------------------------+
      | 2020112122202917203100908203303****** |

      For more information about how to submit an asynchronous task, see Submit an asynchronous import task.

  5. Run the following command to query the data in the adb_oss_import_test table.

    SELECT * FROM adb_oss_import_test;

Syntax for OSS external tables

Non-partitioned OSS external tables

CREATE TABLE [IF NOT EXISTS] table_name
(column_name column_type[, …])
ENGINE='OSS'
TABLE_PROPERTIES='{
    "endpoint":"endpoint",
    "url":"OSS_LOCATION",
    "accessid":"accesskey_id",
    "accesskey":"accesskey_secret",
    "format":"CSV|ORC|Parquet",
    "delimiter":";",
    "skip_header_line_count":1,
    "charset":"utf-8"
}';

External table type

Parameter

Required

Description

External tables for files in CSV, Parquet, or OSS ORC format

ENGINE='OSS'

Yes

The table engine. Set this parameter to OSS.

endpoint

The Endpoint of the OSS bucket. AnalyticDB for MySQL can access OSS only over a VPC.

Note

Log on to the OSS console. Click the name of the destination bucket. On the Overview page of the bucket, view the Endpoint.

url

The path of the OSS file or folder.

  • Specify the absolute path of the OSS file. Example: oss://testBucketname/adb/oss_import_test_data.csv.

  • The path of a folder must end with a forward slash (/). Example: oss://testBucketname/adb/.

    Note

    If you specify the path of a folder, the external table contains all data in the folder after the table is created.

  • You can add a wildcard character (*) to the end of the path to match all files or folders that follow the specified pattern. Example: oss://testBucketname/adb/list_file_with_prefix/test*

    Note

    This fuzzy query example matches all files and folders that have the specified prefix, such as oss://testBucketname/adb/list_file_with_prefix/testfile1 and

    oss://testBucketname/adb/list_file_with_prefix/test1/file2.

accessid

The AccessKey ID of the Alibaba Cloud account or a RAM user that has the management permissions on OSS.

For more information about how to obtain an AccessKey ID, see Accounts and Permissions.

accesskey

The AccessKey secret of the Alibaba Cloud account or a RAM user that has the management permissions on OSS.

For more information about how to obtain an AccessKey secret, see Accounts and Permissions.

External tables for CSV files

delimiter

Defines the column delimiter for the CSV data file.

External tables for Parquet or OSS ORC files

format

The format of the data file.

  • To create an external table for a Parquet file, set this parameter to parquet.

  • To create an external table for an ORC file, set this parameter to orc.

Note
  • Specify this parameter only when you create an external table for an OSS Parquet or OSS ORC file.

  • If you do not specify this parameter, the default format is CSV.

External tables for CSV files

null_value

No

Defines the NULL value for the CSV data file. By default, an empty value is defined as NULL, which is "null_value": "".

Important

This parameter is supported only by clusters whose kernel version is 3.1.4.2 or later.

ossnull

The rule that maps values to the NULL value in the CSV data file. Valid values:

  • 1 (default): EMPTY_SEPARATORS. Only empty values are defined as NULL.

    Example: a,"",,c --> "a","",NULL,"c"

  • 2: EMPTY_QUOTES. Only "" is defined as NULL.

    Example: a,"",,c --> "a",NULL,"","c"

  • 3: BOTH. Both empty values and "" are defined as NULL.

    Example: a,"",,c --> "a",NULL,NULL,"c"

  • 4: NEITHER. Neither empty values nor "" are defined as NULL.

    Example: a,"",,c --> "a","","","c"

Note

The preceding examples assume that "null_value": "".

skip_header_line_count

The number of header rows to skip at the beginning of the file when you import data. The first row of a CSV file is the table header. If you set this parameter to 1, the first row is automatically skipped during data import.

The default value is 0, which means no rows are skipped.

oss_ignore_quote_and_escape

Specifies whether to ignore quotation marks and escape characters in field values. The default value is false, which means quotation marks and escape characters are not ignored.

Important

This parameter is supported only by clusters whose kernel version is 3.1.4.2 or later.

charset

The character set of the OSS external table. Valid values:

  • utf-8 (default)

  • gbk

Important

This parameter is supported only by clusters whose kernel version is 3.1.10.4 or later.

Note
  • The column names and their order in the CREATE EXTERNAL TABLE statement must match those in the source Parquet or ORC file. Column names are case-insensitive.

  • You can create an external table using a subset of columns from the source file. Columns not specified in the CREATE EXTERNAL TABLE statement are ignored.

  • If the CREATE EXTERNAL TABLE statement includes a column that does not exist in the Parquet or ORC file, queries on that column return NULL.

AnalyticDB for MySQL lets you read and write Hive TEXT files using OSS external tables in CSV format. The following is a sample statement to create such a table:

CREATE TABLE adb_csv_hive_format_oss (
  a tinyint,
  b smallint,
  c int,
  d bigint,
  e boolean,
  f float,
  g double,
  h varchar,
  i varchar, -- binary
  j timestamp,
  k DECIMAL(10, 4),
  l varchar, -- char(10)
  m varchar, -- varchar(100)
  n date
) ENGINE = 'OSS' TABLE_PROPERTIES='{
    "format": "csv",
    "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
    "accessid":"LTAI****************",
    "accesskey":"yourAccessKeySecret",
    "url":"oss://testBucketname/adb_data/",
    "delimiter": "\\1",
    "null_value": "\\\\N",
    "oss_ignore_quote_and_escape": "true",
    "ossnull": 2
}';
Note

When you create an OSS external table in CSV format to read Hive TEXT files, note the following:

  • The default column delimiter for Hive TEXT files is \1. If you want to read and write Hive TEXT files using an OSS external table in CSV format, you must escape \1 as \\1 in the delimiter parameter.

  • The default NULL value for Hive TEXT files is \N. If you want to read and write Hive TEXT files using an OSS external table in CSV format, you must escape \N as \\\\N in the null_value parameter.

  • Other primitive data types in Hive, such as BOOLEAN, map one-to-one to data types in AnalyticDB for MySQL. However, the BINARY, CHAR(n), and VARCHAR(n) types all map to the VARCHAR type in AnalyticDB for MySQL.

Partitioned OSS external tables

If your OSS data source contains partitions, the data is stored in a hierarchical directory structure in OSS. The following is an example:

parquet_partition_classic/
├── p1=2020-01-01
│ ├── p2=4
│ │ ├── p3=SHANGHAI
│ │ │ ├── 000000_0
│ │ │ └── 000000_1
│ │ └── p3=SHENZHEN
│ │     └── 000000_0
│ └── p2=6
│     └── p3=SHENZHEN
│         └── 000000_0
├── p1=2020-01-02
│ └── p2=8
│     ├── p3=SHANGHAI
│     │ └── 000000_0
│     └── p3=SHENZHEN
│         └── 000000_0
└── p1=2020-01-03
    └── p2=6
        ├── p2=HANGZHOU
        └── p3=SHENZHEN
            └── 000000_0

In the preceding data, p1, p2, and p3 are the level-1, level-2, and level-3 partitions, respectively. For this type of data source, you typically query data by partition. To do this, you must specify the partition key columns when you create the OSS external table. The following statement is an example of how to create a partitioned OSS external table for a Parquet file:

CREATE TABLE [IF NOT EXISTS] table_name
(column_name column_type[, …])
ENGINE='OSS'
TABLE_PROPERTIES='{
    "endpoint":"endpoint",
    "url":"OSS_LOCATION",
    "accessid":"accesskey_id",
    "accesskey":"accesskey_secret",
    "format":"parquet",
    "partition_column":"p1, p2, p3"
}';
Note
  • The partition_column parameter in TABLE_PROPERTIES specifies the partition key columns, such as p1, p2, and p3. The partition key columns must be declared in the partition_column parameter in order from level-1 to level-3 partitions.

  • The column definition must include the partition key columns, such as p1, p2, and p3, and their data types. The partition key columns must be placed at the end of the column definition.

  • The order of the partition key columns in the column definition must match the order in the partition_column parameter.

  • Partition key columns support the following data types: BOOLEAN, TINYINT, SMALLINT, INT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR, STRING, DATE, and TIMESTAMP.

  • When you query data, partition key columns are displayed and used in the same way as other data columns.

  • If you do not specify the format, the default format is CSV.

  • For more information about other parameters, see Parameter description.

References

For more ways to import data, see Supported data sources.