Common questions and solutions for writes and queries in AnalyticDB for MySQL-AnalyticDB(AnalyticDB)-阿里云帮助中心

FAQ overview

Can I use JDBC to query data from Hudi tables?
Can I read data from Hudi tables in OSS?
Can a cluster automatically switch between a XIHE MPP job and a XIHE BSP job?
How do I choose between XIHE MPP and XIHE BSP to run a job?
How can I check the status of a XIHE BSP job?
How can I implement resource isolation to reduce interference between SQL jobs?
What should I do if too many items are specified in an IN clause?
What should I do if the "Query exceeded maximum time limit of 1800000.00ms" error is reported when I query data?
What should I do if the "Out of Memory Pool size pre cal. available 0 require 3" error is reported when I query data?
What should I do if the "STORAGE_INDEX_ERROR, msg:Index does not exist" error is reported when I query data?
What should I do if the "multi-statement found" error is reported when I use the multi-statement feature to run multiple SQL statements?
How can I troubleshoot an issue where time values in query results are truncated?
How can I fix an error with the built-in AES_ENCRYPT function?
Why do my query results change unexpectedly?
Why does an ORDER BY query on a single table take a long time to complete?
Why is the total number of scanned rows in an execution plan not equal to the total number of rows scanned by the query?
Why does data duplication occur when I use INSERT OVERWRITE on a table that does not have a primary key?
Why do I receive the "Column 'XXX' not in GROUP BY clause" error when I run a SELECT * FROM ... GROUP BY ... query?
Why is data in the original table not overwritten after an INSERT OVERWRITE SELECT statement is executed?
Is there a limit on the number of values in an IN operator when query results are returned in JSON format?
Can GZIP-compressed CSV files in OSS be used as a data source for an external table?
Is INSERT ON DUPLICATE KEY supported?
Can I use a JOIN clause in an UPDATE statement?
Can I set variables in an SQL statement?
Can I use the Logstash plug-in to batch insert data with an INSERT ON DUPLICATE KEY UPDATE statement?
What are the prerequisites for loading a built-in dataset?
How can I verify that a built-in dataset is loaded successfully?
What should I do if loading a built-in dataset fails or takes a long time?
How can a standard database account use a built-in dataset?
How can I test a built-in dataset after it is loaded?

Do Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters support querying data from Hudi tables via JDBC?

Yes. After you create a Hudi table in an Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster, you can directly query the Hudi table by using JDBC.

Do Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters support reading Hudi table data from OSS?

Yes. For more information about reading data from Hudi tables in OSS by using an external table, see Use an external table to import data into a Data Lakehouse Edition cluster.

Enterprise Edition, Basic Edition, and Data Lakehouse Edition: Do clusters support automatic switching between XIHE MPP jobs and XIHE BSP jobs?

No. When you submit a job, you must manually specify whether to submit the job to an Interactive or a Job resource group. This determines whether the job runs as a XIHE MPP job or a XIHE BSP job.

How do you choose between XIHE MPP and XIHE BSP to run jobs on Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters?

By default, a XIHE BSP job is submitted asynchronously. The difference between synchronous and asynchronous submission is whether the client must wait for the query to complete.

Asynchronous submission has the following limits:

The result set can contain a maximum of 10,000 rows.
A maximum of 1,000 result sets, including their CSV file download links, can be retained for up to 30 days.

We recommend that you use asynchronous submission for queries that have long execution times and high computational loads but return small result sets, such as INSERT INTO SELECT, INSERT OVERWRITE SELECT, or CREATE TABLE AS SELECT.

Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters: How to view the status of a XIHE BSP job?

If you submit a XIHE BSP job by using the job editor in a Data Lakehouse Edition cluster, go to the Job Editor > SQL Development page and view the job status on the Execution Records tab at the bottom of the page.
If you did not submit the XIHE BSP job from the job editor, you can query its status from an internal system table. Run the following statement:
```
SELECT status FROM information_schema.kepler_meta_elastic_job_list WHERE process_id='<job_id>';
```
To get a count of all XIHE BSP jobs by status, run the following statement:
```
SELECT status,count(*) FROM information_schema.kepler_meta_elastic_job_list GROUP BY status;
```

Resource isolation for SQL jobs

Data Warehouse Edition (Elastic Mode) clusters and Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters support resource groups. For more information about resource group types, see Resource group overview (Data Warehouse Edition) and Resource group overview (Data Lakehouse Edition). You can create different types of resource groups and submit SQL jobs to the appropriate groups for resource isolation.

Handling too many items in an IN clause

AnalyticDB for MySQL limits the number of items in an IN list. By default, the limit is 2,000, but you can adjust this value.

Note

The number of items in an IN list cannot exceed 5,000. A larger number may degrade performance.

For example, to set the limit to 3,000, run the following statement:

SET ADB_CONFIG max_in_items_count=3000;

Resolving the "Query exceeded maximum time limit" error

This error occurs because the SQL query exceeded the default execution timeout of 1,800,000 ms (30 minutes) set in AnalyticDB for MySQL. You can configure the query timeout for a single query or for all queries in the cluster:

Set the timeout for a single query.

/*+ QUERY_TIMEOUT=xxx */SELECT count(*) FROM t;

Set the timeout for all queries in the cluster.
```
SET ADB_CONFIG QUERY_TIMEOUT=xxx;
```
For more information, see Common configuration parameters.

Resolving out of memory errors

Cause: There may be large, long-running SQL queries in the AnalyticDB for MySQL cluster. These queries consume a large amount of memory, which causes an out-of-memory error when you submit a new SQL query.

Solution: Wait a few minutes and run the SQL statement again.

Resolving the "STORAGE_INDEX_ERROR" error

Cause: This error is caused by a known defect in some earlier AnalyticDB for MySQL kernel versions. Under specific conditions, this defect can cause inconsistencies in a table's index metadata. As a result, the index cannot be found during a query, which triggers the STORAGE_INDEX_ERROR exception.

Solution: Upgrade the kernel version of your cluster. This issue is fixed in later kernel versions.

Important

Upgrade duration: The kernel version upgrade typically takes about 30 minutes.
Service interruption: During the upgrade, a transient connection disruption of a few seconds occurs.
Recommendation: Perform the upgrade during a scheduled maintenance window or off-peak hours. Ensure that your application has an automatic reconnection mechanism to smoothly handle the transient connection disruption.

Resolve the "multi-statement found" error

Only clusters with kernel version 3.1.9.3 or later support the multi-statement feature. First, verify that your cluster's kernel version is 3.1.9.3 or later. If the kernel version is earlier than 3.1.9.3, contact technical support to upgrade. If the kernel version is 3.1.9.3 or later but the error persists, the multi-statement feature may be disabled on the client.

For example, when you use a MySQL JDBC client to connect to a cluster, you must not only execute the SET ADB_CONFIG ALLOW_MULTI_QUERIES=true; command to manually enable the Multi-Statement feature, but also set the allowMultiQueries JDBC connection property to true.

Troubleshooting truncated time values in query results

First, verify the result using a MySQL client. If the time value is displayed correctly in the MySQL client, the issue might be caused by another client tool that is processing the result set.

Fixing errors with the AES_ENCRYPT function

The following statement reports an error.

 SELECT CONVERT(AES_DECRYPT(AES_ENCRYPT('ABC123','key_string'),'key_string'),char(10));

Cause: The error occurs because the data type of the first argument x in the AES_ENCRYPT(varbinary x, varchar y) function must be varbinary. The following example shows a valid statement:

SELECT CONVERT(AES_DECRYPT(AES_ENCRYPT(CAST('ABC123' AS VARBINARY), 'key_string'), 'key_string'),char(10));

Unexpected changes in query results

If you confirm that the data has not been updated, the query results may change unexpectedly for the following reasons:

A LIMIT clause is used without an ORDER BY clause. AnalyticDB for MySQL is a distributed database that runs queries across multiple nodes in multiple threads. If some threads return enough rows to satisfy the LIMIT clause, the query terminates. Therefore, without an ORDER BY clause, the order of results is not guaranteed because the system cannot ensure a fixed thread response order.
In a grouped aggregate query, if a field in the SELECT list is not in an aggregate function and is not included in the GROUP BY clause, a random value from the group is returned for that field.

If the issue persists, contact technical support.

Slow ORDER BY queries on a single table

Cause: The data is not sorted at the storage layer and is stored in a scattered manner. This can trigger a large amount of unnecessary data reads, which increases query time.

Solution: Create a clustered index on the field specified in the ORDER BY clause. With a clustered index, data is partially sorted at the storage layer. ORDER BY queries then read less data, which improves performance. For more information about how to create a clustered index, see Add a clustered index.

Note

Each table supports only one clustered index. If a clustered index already exists on another field, you must delete it before you create a new one on the field specified in the ORDER BY clause.
After you add a clustered index to a large table, the time required for BUILD jobs increases, which in turn affects the CPU utilization of storage nodes.

Discrepancy in scanned row counts

This issue is typically caused by replicated tables. In AnalyticDB for MySQL, a copy of a replicated table is stored on each shard. When you query a replicated table, the number of scanned rows is counted repeatedly for each copy.

Data duplication with INSERT OVERWRITE

Automatic deduplication is not supported for tables in AnalyticDB for MySQL that do not have a primary key.

Resolving the "Column not in GROUP BY clause" error

In a grouped query, you cannot use the query statement SELECT * FROM table GROUP BY key to retrieve all fields. You must explicitly list all fields. The following is an SQL example.

SELECT nation.name FROM nation GROUP BY nation.nationkey

Data not overwritten by INSERT OVERWRITE SELECT

Reason: An INSERT OVERWRITE statement overwrites data in a partition by replacing the old partition with a new one. When the SELECT result is empty, no new partition is generated. Therefore, the existing data in the original table is not overwritten.

Solution: To overwrite data in the table, modify the INSERT OVERWRITE SELECT statement to ensure that the SELECT clause does not return an empty result set.

Limit on IN operator values in JSON results

For AnalyticDB for MySQL clusters with a kernel version of 3.1.4 or earlier, the number of values specified in an IN operator cannot exceed 16. For clusters with a kernel version later than 3.1.4, no limit is imposed. For information about how to check the kernel version of your cluster, see View the kernel version of a cluster.

Using GZIP-compressed CSV files from OSS as a data source

AnalyticDB for MySQL supports using GZIP-compressed CSV files from OSS as a data source for an external table. To do this, you must add compress_type=gzip to the external table definition. For more information about the OSS external table syntax, see Non-partitioned OSS external tables.

Is INSERT ON DUPLICATE KEY supported?

AnalyticDB for MySQL supports only value equality updates, not arithmetic expressions.

Using a JOIN clause in UPDATE statements

This feature is available only for AnalyticDB for MySQL clusters with kernel version 3.1.6.4 or later. For more information, see UPDATE.

Can I set variables in an SQL statement?

AnalyticDB for MySQL does not support setting variables in SQL statements.

Can you use the `INSERT ON DUPLICATE KEY UPDATE` statement to bulk insert data with the Logstash plugin?

Yes. When you use the INSERT ON DUPLICATE KEY UPDATE statement to insert data in batches, you do not need to add ON DUPLICATE KEY UPDATE after every VALUES() statement. You only need to add it after the last VALUES() statement.

For example, to batch insert 3 records into the student_course table, execute the following statement:

INSERT INTO student_course(`id`, `user_id`, `nc_id`, `nc_user_id`, `nc_commodity_id`, `course_no`, `course_name`, `business_id`)
VALUES(277943, 11056941, '1001EE1000000043G2T5', '1001EE1000000043G2TO', '1001A5100000003YABO2', 'kckm303', 'Industrial Accounting Practice V9.0--77', 'kuaiji'),
(277944, 11056943, '1001EE1000000043G2T5', '1001EE1000000043G2TO', '1001A5100000003YABO2', 'kckm303', 'Industrial Accounting Practice V9.0--88', 'kuaiji'),
(277945, 11056944, '1001EE1000000043G2T5', '1001EE1000000043G2TO', '1001A5100000003YABO2', 'kckm303', 'Industrial Accounting Practice V9.0--99', 'kuaiji')
ON DUPLICATE KEY UPDATE
course_name = 'Industrial Accounting Practice V9.0--77',
business_id = 'kuaiji';

Prerequisites for loading a built-in dataset

The cluster must have at least 24 ACUs of reserved storage resources, and the user_default resource group must have at least 16 ACUs of reserved compute resources.

Verifying the loading of a built-in dataset

You can view the loading progress on the Job Development > SQL Development page. The dataset is loaded successfully if the icon on the Load Built-in Dataset button is grayed out, and the ADB_SampleData_TPCH database and its tables are visible on the Databases and Tables tab.

Handling failures when loading a built-in dataset

You must first run the DROP TABLE table_name; SQL statement to delete all tables in the database. Then, run the DROP DATABASE ADB_SampleData_TPCH; SQL statement to delete the database of the built-in dataset. After the ADB_SampleData_TPCH database is deleted, reload the dataset.

Using a built-in dataset with a standard account

The built-in dataset feature follows the permission management rules of AnalyticDB for MySQL. Even if a built-in dataset is loaded into the cluster, a standard database account cannot use it without permissions on the ADB_SampleData_TPCH database. A privileged account must grant the required permissions to the standard account by running the following statement:

GRANT select ON ADB_SampleData_TPCH.* TO <user_name>;

Testing a built-in dataset

After the dataset is loaded successfully, AnalyticDB for MySQL provides a set of corresponding query scripts. On the SQL Development page, open the Scripts tab and run the sample query statements. For more information about the queries, see TPC-H test queries.