This topic answers frequently asked questions (FAQs) about performing Data Manipulation Language (DML) operations on tables.
Category | FAQ |
Insert or update data | |
Delete data |
If an error occurs during an INSERT operation, is the original data corrupted?
No, the original data is not corrupted. MaxCompute ensures atomicity. If an INSERT operation succeeds, the data is updated. If the operation fails, the data is rolled back.
What do I do if the Table xxx has n columns, but query has m columns error is reported when I run an INSERT INTO or INSERT OVERWRITE statement?
When you insert data by running an INSERT INTO or INSERT OVERWRITE statement, ensure that the fields in the SELECT clause match the fields in the destination table. The fields must match in number, order, and data type. MaxCompute does not support inserting data into specific fields of a table while setting other fields to NULL or their default values. To achieve this, set the values to NULL in the SELECT clause. For example, select 'a', null, col_name from table_name;.
What do I do if the "a single instance cannot output data to more than 10000 partitions" error is reported when I run an INSERT INTO or INSERT OVERWRITE statement?
Symptoms
When you run an INSERT INTO or INSERT OVERWRITE statement, the following error is returned.
FAILED: ODPS-0123031:Partition exception - a single instance cannot output data to more than 10000 partitionsCause
A MaxCompute table can have up to 60,000 partitions. However, a single job can write data to a maximum of 10,000 partitions in an output table. This error usually occurs because the partition field is incorrectly set. For example, partitioning a table by an ID field can create too many partitions.
Solution
A job that outputs several thousand dynamic partitions is considered large. If a job outputs more than 10,000 partitions, this may indicate an issue with the business logic or SQL syntax. If you confirm that the logic and syntax are correct, you can prevent this error by changing the partition field of the partitioned table or by splitting the business logic into multiple jobs.
What do I do if the "invalid dynamic partition value" error is reported when I insert data into a dynamic partition of a MaxCompute table?
Symptoms
When you insert data into a dynamic partition, the following error is returned.
FAILED: ODPS-0123031:Partition exception - invalid dynamic partition value: province=ShanghaiCause
This error occurs because an invalid value is used for a dynamic partition. Dynamic partitioning creates partitions based on specified fields. The values of these partition fields cannot contain special characters or Chinese characters.
Solution
When you insert data into dynamic partitions, note the following:
When you insert data into dynamic partitions in a distributed environment, a single process can output a maximum of 512 dynamic partitions.
A dynamic partition SQL statement cannot generate more than 2,000 dynamic partitions.
Dynamically generated partition values cannot be NULL.
If the destination table has multiple partition levels, you can specify some partitions as static when you run an INSERT statement. However, the static partitions must be the higher-level partitions.
What do I do if an error is reported when I insert FLOAT data into a MaxCompute table?
For more information about the basic data types that MaxCompute V2.0 supports, see Data type versions. The FLOAT data type does not have a constant definition. To insert FLOAT data, use the CAST function to convert the data type. For example, cast(5.1 as float) converts the string '5.1' to the FLOAT value 5.1.
When you use new data types such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY in MaxCompute SQL, you must enable them using one of the following methods:
Session level: Add
set odps.sql.type.system.odps2=true;before the SQL statement and submit them together.Project level: The project owner runs
setproject odps.sql.type.system.odps2=true;to enable new data types for the project.
Why are the results of an INSERT SELECT operation and a SELECT operation different for the same data?
Symptoms
When you run SQL statements on the same STRING field, the number of decimal places is inconsistent. A SELECT operation retains two decimal places, but an INSERT SELECT operation shows multiple decimal places.
Causes
In an INSERT SELECT operation, when a STRING field is implicitly converted to a DECIMAL type, the value is first converted to a DOUBLE type before a ROUND operation is performed. Because the DOUBLE type is imprecise, the result of the ROUND operation may still have multiple decimal places.
Solution
Use explicit conversion. Use the `CAST` function to explicitly convert the data to the `DECIMAL` type.
case when pcm.abc is null then 0 else round(cast(pcm.abc as decimal) ,2) end abc
If the data type of a field in the destination table is VARCHAR(10), is an error reported when data overflow occurs during an insert operation?
No. When you insert data into a field of the VARCHAR(10) data type, the data is truncated if its length exceeds the limit.
What do I do if the "Transaction timeout because cannot acquire exclusive lock" error is reported when I run a MaxCompute SQL statement?
Symptoms
The following error occurs when a MaxCompute SQL statement is executed.
Failed to run ddltask - Modify DDL meta encounter exception : ODPS-0121096:MetaStore transaction conflict - Reached maximum retry times because of OTSStorageTxnLockKeyFail(Inner exception: Transaction timeout because cannot acquire exclusive lock.)Cause
MaxCompute allows multiple jobs to write data to a single table concurrently. When multiple jobs enter the metadata commit phase at the same time, each job must acquire a lock on the table's metadata, write to it, and then release the lock. If many jobs attempt this simultaneously, the metadata can remain locked for an extended period. This can cause other jobs to time out while waiting to acquire the lock, resulting in the
cannot acquire exclusive lockerror. A job times out if it cannot acquire a lock within approximately 30 seconds. The lock granularity is at the table level.Solution
Check whether multiple read and write operations are performed on the same table or table partition at the same time. Avoid performing multiple read and write operations on a table or table partition simultaneously.
How do I update data in a MaxCompute table or partition?
MaxCompute supports row-level updates to data in transactional tables using the update operation.
If you run an update operation on a non-transactional table, the error trying to update a non-transactional table is not allowed. Set tblproperties ("transactional" = "true") to use this feature is reported.
You can import data from a source partition or table to a destination partition or table and execute update operations during the import. The destination can also be the source itself, which is known as an in-place update.
For more information about how to create a transactional table, see Table operations. MaxCompute lets you set the transactional property only when you create a table. You cannot use alter table to modify the transactional property of an existing table.
How do I delete data from a MaxCompute table or partition?
MaxCompute supports row-level deletions of data in transactional tables using the delete operation.
If the table is not a transactional table, you can delete data in the following ways:
Delete the entire table by running the
DROPcommand.For a non-partitioned table, run the
truncate table table_name;command to delete all data from the table, or use theinsert overwritecommand to achieve a similar result.Example 1: Delete data where `Col=1` from `TableA`.
insert overwrite table TableA select a,b,c.... from TableA where Col <> 1;Example 2: Delete all data.
insert overwrite table TableA select a,b,c.... from TableA where 1=2;
For a partitioned table, run the
alter table table_name drop if exists partition(partition_name='specific_partition_value')command to delete the corresponding partition.For example, the partition key column of the `testtable` table is `ds`. Run the following command to delete the
ds='20170520'partition.alter table testtable drop if exists partition (ds='20170520');Use an INSERT statement with a WHERE clause to import the required data into a new partition or table. The source table and destination table for the INSERT statement can be the same.
insert overwrite table sale_detail select * from sale_detail where name='mengyonghui';
If a non-partitioned table contains a large volume of data, how do I delete duplicate data from it?
If all columns in the duplicate rows are identical, you can perform a GROUP BY operation on all columns. For example, if the non-partitioned table table1 has columns c1, c2, and c3, you can run the following command.
insert overwrite table table1 select c1, c2, c3 from table1 group by
c1, c2, c3;Before you perform this operation, back up your data. Based on the data volume, evaluate whether the cost of this method is lower than the cost of re-importing the data.