Export data to OSS

更新时间:
复制 MD 格式

AnalyticDB for MySQL lets you export data from AnalyticDB for MySQL Data Warehouse Edition (V3.0) to Object Storage Service (OSS) using external tables and the INSERT INTO statement. This export feature supports only the CSV and Parquet file formats.

Prerequisites

  • Create a directory in Object Storage Service (OSS) for data from AnalyticDB for MySQL:

    1. Activate OSS.

      Note

      The OSS bucket and your AnalyticDB for MySQL Data Warehouse Edition (V3.0) cluster must be in the same region.

    2. Create a bucket.

    3. Create a directory in OSS.

      For example, create a directory named adb_data/ in OSS. Data exported from AnalyticDB for MySQL Data Warehouse Edition (V3.0) is stored in this directory.

  • You have created a Data Warehouse Edition (V3.0) cluster, configured its whitelist, and created an account and a database. For more information, see Quick start for AnalyticDB for MySQL.

Note

If the AnalyticDB for MySQL Data Warehouse Edition cluster is in elastic mode, log on to the AnalyticDB for MySQL console to enable the ENI (Elastic Network Interface) network. On the Network Information panel of the instance, turn on the ENI Network switch. The ENI network must be enabled for instances in elastic mode to import data by using external tables.

Procedure

本示例将AnalyticDB for MySQL数仓版adb_demo库中的source_table表数据导出至OSSadb_data文件夹下。

  1. Connect to the target cluster and go to the source database.

  2. In the adb_demo database, create an external table. For more information, see Non-partitioned OSS external tables.

  3. Execute a write statement to copy the source data to the previously created external table. For information about the supported syntax for different external table types, see Syntax for non-partitioned external tables and Syntax for partitioned external tables.

  4. After the write job in Step 3 completes, log on to the OSS console to view the exported data files in the destination folder. You can also directly query the data exported to the external table from AnalyticDB for MySQL.

Syntax for non-partitioned external tables

When you use INSERT INTO or INSERT OVERWRITE SELECT to write data, the job runs synchronously by default. If you write a large amount of data, such as several hundred gigabytes, the connection between the client and the AnalyticDB for MySQL server might time out, causing the job to fail. Therefore, if you write a large amount of data, we recommend running the INSERT OVERWRITE SELECT statement asynchronously by adding SUBMIT JOB before the statement. Example: SUBMIT JOB INSERT OVERWRITE adb_table SELECT * FROM oss_table;. For more information about submitting jobs asynchronously, see asynchronous import job submission.

Important

By default, files exported to OSS contain only data and do not include column headers. To export files with column headers, add the /*+sql_output_oss_file_head_enable=true*/ hint before the statement.

  • Insert select from

    • Description: If your data already exists in another table, you can use INSERT SELECT FROM to copy it to an external table. This statement writes data from the source table to the external table's OSS location. Each write operation generates new OSS data files.

      Important

      You must write all columns to the external table. You cannot specify only a subset of columns.

      Syntax:

      INSERT INTO table_name 
      SELECT select_statement FROM from_statement;       

      Example:

      insert into oss_table select col1, col2, col3 from source_table;
  • Insert overwrite select

    • Description: Use INSERT OVERWRITE SELECT to perform a bulk insert into a table. If the destination external table already contains data, this statement first deletes all existing data files in the external table's path and then generates new OSS data files.

      Important

      You must write all columns to the external table. You cannot specify only a subset of columns.

      Syntax:

      INSERT OVERWRITE table_name 
      SELECT select_statement FROM from_statement;

      Example:

      INSERT OVERWRITE oss_table 
      SELECT col1, col2, col3 FROM source_table;
  • Replace select from

    • Description: Because OSS external tables do not support primary keys, REPLACE SELECT FROM behaves identically to INSERT SELECT FROM. It copies data to another table. Existing data in the destination table remains unchanged, and new data is appended as new OSS data files.

  • Export to a single OSS file (CSV format only; not supported for Parquet format)

    • Description: You can use a hint to export data to a single specified OSS file. If the OVERWRITE keyword is included, any existing file with the same name in the directory defined in TABLE_PROPERTIES of the external table is overwritten. Other files in the directory are not affected.

      Important

      You must write all columns to the external table. You cannot specify only a subset of columns.

      Version notes:

      • For AnalyticDB for MySQL clusters earlier than V3.1.2: This feature is not supported. The system automatically generates the file names and exports the data into multiple files. The number of destination files is dynamically determined by the job concurrency.

      • For AnalyticDB for MySQL clusters of version 3.1.2 and later, you can use a hint to export regular external tables in the AnalyticDB for MySQL CSV format to a single file in OSS. You can also customize the filename during the export. Without a hint, the export behavior is consistent with that of versions earlier than 3.1.2: the data is exported to multiple files.

      Syntax:

      /*+output_filename=adb.txt*/INSERT [OVERWRITE] table_name 
      SELECT select_statement FROM from_statement;

      Example:

      /*+output_filename=adb.txt*/INSERT [OVERWRITE] oss_table 
      SELECT * FROM source_table;

      If you export data without a hint, file names are automatically generated by the system, and multiple files are exported.

      When you export to a single file, the exported file, such as adb.txt, is stored in the OSS directory defined for the external table, such as adb_data/.

Syntax for partitioned external tables

When you use INSERT INTO or INSERT OVERWRITE SELECT to write data, the job runs synchronously by default. If you write a large amount of data, such as several hundred gigabytes, the connection between the client and the AnalyticDB for MySQL server might time out, causing the job to fail. Therefore, if you write a large amount of data, we recommend running the INSERT OVERWRITE SELECT statement asynchronously by adding SUBMIT JOB before the statement. Example: SUBMIT JOB INSERT OVERWRITE adb_table SELECT * FROM oss_table;. For more information about submitting jobs asynchronously, see asynchronous import job submission.

Important

By default, files exported to OSS contain only data and do not include column headers. To export files with column headers, add the /*+sql_output_oss_file_head_enable=true*/ hint before the statement.

When you write data from a partitioned table, the data files do not include the data of partition key columns. The partition information is stored in the OSS directory structure.

Important

Exporting data from a partitioned external table to a single OSS file is not supported.

For example, consider a partitioned table that has two partition key columns and three regular columns. The first-level partition key is pcol1 with a value of 1, and the second-level partition key is pcol2 with a value of a. The data from the partitioned table is exported to the OSS path adb_data/. When you write data to the partition where pcol1=1 and pcol2=a, the relative path of the data file is adb_data/pcol1=1/pcol2=a/. The resulting CSV or Parquet data files do not contain the values for pcol1 and pcol2. They contain only the values of the three regular columns.

  • Insert into partition select from

Function: The INSERT INTO PARTITION SELECT FROM command bulk inserts data into a partitioned external table. When you write data, you can use the PARTITION clause to specify all partition columns and values. Alternatively, you can specify only the high-level partition values and allow the low-level partitions to be generated dynamically. You can also omit the PARTITION clause entirely, which dynamically generates all partition levels.

When data is written, it is appended to the corresponding partition. Each write operation generates new OSS data files.

You must write all columns to the external table. You cannot specify only a subset of columns.

Fully static partitions

Syntax

INSERT INTO table_name PARTITION(par1=val1,par2=val2,...) 
SELECT select_statement FROM from_statement;

Example

INSERT INTO oss_table_par PARTITION(par1=val1,par2=val2) 
select col1, col2, col3 FROM source_table;

Partially static and partially dynamic partitions

Syntax

INSERT INTO table_name PARTITION(par1=val1,par2,...) 
SELECT select_statement FROM from_statement;

Example

INSERT INTO oss_table_par PARTITION(par1=val1,par2) 
SELECT col1, col2, col3, par2col FROM source_table;

Fully dynamic partitions

Syntax

INSERT INTO table_name 
SELECT select_statement FROM from_statement;

Example

INSERT INTO oss_table_par
SELECT col1, col2, col3, par1col, par2col FROM source_table;
  • Insert overwrite partition select

Function: The usage is the same as INSERT INTO PARTITION SELECT. However, this command also deletes the existing data files from the target partitions involved in the execution. Data files in partitions that do not receive new data are not deleted.

Syntax:

INSERT OVERWRITE table_name PARTITION(par1=val1,par2=val2,...)[IF NOT EXISTS] 
SELECT select_statement FROM from_statement;

Example:

INSERT OVERWRITE oss_table_par PARTITION(par1=val1,par2=val2) IF NOT EXISTS 
SELECT col1, col2, col3 FROM source_table;
  • Replace into partition select from

Description: Because external tables do not support primary keys, REPLACE INTO PARTITION SELECT FROM behaves identically to INSERT INTO PARTITION SELECT FROM.

Unsupported syntax

AnalyticDB for MySQL external tables do not support row-level write syntax. Specifically, the INSERT INTO VALUES and REPLACE INTO VALUES statements are not supported.