MaxCompute SQL provides other common functions for development. You can select functions as needed. This topic describes the command formats, parameters, and examples of functions such as CAST, FAILIF, and HASH.
Function | Feature |
Filters data that meets specified range conditions. | |
Returns different values based on the result of an expression. | |
Converts the result of an expression to a target data type. | |
Returns the first non-NULL value in a parameter list. | |
Compresses an input parameter of the STRING or BINARY type using the GZIP algorithm. | |
Calculates the cyclic redundancy check value of a string or binary data. | |
Decompresses an input parameter of the BINARY type using the GZIP algorithm. | |
Returns true or a custom error message based on the result of an expression. | |
Returns the current age based on a Chinese ID card number. | |
Returns the date of birth based on a Chinese ID card number. | |
Returns the gender based on a Chinese ID card number. | |
Obtains the ID of the current account. | |
Calculates a hash value based on input parameters. | |
Checks whether a specified condition is true. | |
Returns the maximum value of a level-1 partition in a partitioned table. | |
Compares two input parameters to check whether they are equal. | |
A result is returned if a parameter has a NULL value. | |
Sorts input variables in ascending order and returns the value at a specified position. | |
Queries whether a specified partition exists. | |
Samples all read column values and filters out rows that do not meet the sampling conditions. | |
Calculates the SHA-1 hash value of a string or binary data. | |
Calculates the SHA-1 hash value of a string or binary data. | |
Calculates the SHA-2 hash value of a string or binary data. | |
Splits a specified group of parameters into a specified number of rows. | |
Splits a string into key-value pairs based on specified separators. | |
Queries whether a specified table exists. | |
A user-defined table-valued function (UDTF) that transposes one row into multiple rows. It converts an array that is stored in a column and delimited by a fixed separator into multiple rows. | |
A UDTF that transposes one row into multiple rows. It splits different columns into different rows. | |
Returns a random ID. This function is more efficient than the UUID function. | |
Returns a random ID. |
BASE64
Command format
string base64(binary <value>)Description
Converts the binary value to a BASE64-encoded string.
Parameters
value: Required. A value of the BINARY type.
Return value
Returns a value of the STRING type. If the input is NULL, NULL is returned.
Examples
Example 1: Convert the binary result of
cast('alibaba' as binary)to a BASE64-encoded string.-- YWxpYmFiYQ== is returned. select base64(cast ('alibaba' as binary));Example 2: The input parameter is NULL. The command is:
-- NULL is returned. select base64(null);
BETWEEN AND expression
Command format
<a> [NOT] between <b> and <c>Description
Filters data where the value of a is between b and c, or not between b and c.
Parameters
a: Required. The field to be filtered.
b, c: Required. The start and end of the specified range. The data types must be the same as the data type of a.
Return value
Returns the data that meets the specified condition.
If a, b, or c is NULL, the expression evaluates to NULL.
Examples
The following data is in the
emptable.| empno | ename | job | mgr | hiredate| sal| comm | deptno | 7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20 7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30 7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30 7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20 7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30 7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30 7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10 7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20 7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10 7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30 7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20 7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30 7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20 7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10 7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10 7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10 7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10The following example queries for data where the value of
salis greater than or equal to 1000 and less than or equal to 1500.select * from emp where sal between 1000 and 1500;The following result is returned.
+-------+-------+-----+------------+------------+------------+------------+------------+ | empno | ename | job | mgr | hiredate | sal | comm | deptno | +-------+-------+-----+------------+------------+------------+------------+------------+ | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 00:00:00 | 1250.0 | 500.0 | 30 | | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 00:00:00 | 1250.0 | 1400.0 | 30 | | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 00:00:00 | 1500.0 | 0.0 | 30 | | 7876 | ADAMS | CLERK | 7788 | 1987-05-23 00:00:00 | 1100.0 | NULL | 20 | | 7934 | MILLER | CLERK | 7782 | 1982-01-23 00:00:00 | 1300.0 | NULL | 10 | | 7956 | TEBAGE | CLERK | 7748 | 1982-12-30 00:00:00 | 1300.0 | NULL | 10 | +-------+-------+-----+------------+------------+------------+------------+------------+
CASE WHEN expression
Command format
MaxCompute provides the following two
case whenformats:case <value> when <value1> then <result1> when <value2> then <result2> ... else <resultn> endcase when (<_condition1>) then <result1> when (<_condition2>) then <result2> when (<_condition3>) then <result3> ... else <resultn> end
Description
Returns different result values based on the evaluation of value or _condition.
Parameters
value: Required. The value to be compared.
_condition: Required. The condition to be evaluated.
result: Required. The return value.
Return value
If result contains only BIGINT or DOUBLE values, the function converts them to DOUBLE before returning the result.
If result contains a value of the STRING type, the function converts all values to the STRING type before returning the result. If a data type conversion is not supported, an error is returned. For example, data of the BOOLEAN type cannot be converted to the STRING type.
Conversions between other data types are not allowed.
Examples
The
sale_detailtable contains the fieldsshop_name string, customer_id string, total_price doubleand the following data.+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | | null | c5 | NULL | 2014 | shanghai | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | +------------+-------------+-------------+------------+------------+The following command is an example.
select case when region='china' then 'default_region' when region like 'shang%' then 'sh_region' end as region from sale_detail;The following result is returned.
+------------+ | region | +------------+ | default_region | | default_region | | default_region | | sh_region | | sh_region | | sh_region | +------------+
CAST
Command format
cast(<expr> as <type>)Description
Converts the result of expr to the target data type type.
Parameters
expr: Required. The source data to be converted.
type: Required. The target data type.
cast(double as bigint): Converts a value of the DOUBLE data type to the BIGINT data type.cast(string as bigint): When you convert a string to the BIGINT data type, if the string represents an integer, it is directly converted to BIGINT. If the string represents a floating-point number or a number in exponential notation, it is first converted to DOUBLE and then to BIGINT.cast(string as datetime)orcast(datetime as string): The default date formatyyyy-mm-dd hh:mi:ssis used.
Return value
The return value is of the target data type.
If you run the
setproject odps.function.strictmode=falsecommand, the function returns the digits that appear before any letters.If you run the
setproject odps.function.strictmode=truecommand, an error is returned.When you convert a value to the DECIMAL type, if you set
odps.sql.decimal.tostring.trimzero=true, trailing zeros after the decimal point are removed. If you setodps.sql.decimal.tostring.trimzero=false, trailing zeros after the decimal point are retained.ImportantThe
odps.sql.decimal.tostring.trimzeroparameter applies to both data retrieved from tables and static values.
Examples
Example 1: Common usage.
--Returns 1. select cast('1' as bigint);Example 2: Convert a STRING value to a BOOLEAN value. If the STRING is empty,
falseis returned. Otherwise,trueis returned.The STRING is empty.
select cast("" as boolean); --Returns +------+ | _c0 | +------+ | false | +------+The STRING is not empty.
select cast("false" as boolean); --Returns true +------+ | _c0 | +------+ | true | +------+
Example 3: Convert a string to a date.
--Convert a string to a date. select cast("2022-12-20" as date); --Returns +------------+ | _c0 | +------------+ | 2022-12-20 | +------------+ --Convert a date string with a time part to a date. select cast("2022-12-20 00:01:01" as date); --Returns +------------+ | _c0 | +------------+ | NULL | +------------+ --To ensure the value is displayed correctly, set the following parameter: set odps.sql.executionengine.enable.string.to.date.full.format= true; select cast("2022-12-20 00:01:01" as date); --Returns +------------+ | _c0 | +------------+ | 2022-12-20 | +------------+NoteBy default, the
odps.sql.executionengine.enable.string.to.date.full.formatparameter isfalse. To convert a date string that includes a time part, set this parameter totrue.Example 4 (Incorrect command example): Invalid usage. An exception is thrown if a conversion fails or is not supported. The following command is an example of incorrect usage.
select cast('abc' as bigint);Example 5: Example of a scenario in which
setproject odps.function.strictmode=falseis set.setprojectodps.function.strictmode=false; select cast('123abc'as bigint); --Returns +------------+ |_c0| +------------+ |123| +------------+Example 6: Example of a scenario in which
setproject odps.function.strictmode=trueis set.setprojectodps.function.strictmode=true; select cast('123abc' as bigint); --Returns FAILED:ODPS-0130071:[0,0]Semanticanalysisexception-physicalplangenerationfailed:java.lang.NumberFormatException:ODPS-0123091:Illegaltypecast-Infunctioncast,value'123abc'cannotbecastedfromStringtoBigint.Example 7: Example of a scenario in which
odps.sql.decimal.tostring.trimzerois set.--Create a table. create table mf_dot (dcm1 decimal(38,18), dcm2 decimal(38,18)); --Insert data. insert into table mf_dot values (12.45500BD,12.3400BD); --When the flag is true or not set. set odps.sql.decimal.tostring.trimzero=true; --Remove trailing zeros after the decimal point. select cast(round(dcm1,3) as string),cast(round(dcm2,3) as string) from mf_dot; --Return value +------------+------------+ | _c0 | _c1 | +------------+------------+ | 12.455 | 12.34 | +------------+------------+ --When the flag is false. set odps.sql.decimal.tostring.trimzero=false; --Keep trailing zeros after the decimal point. select cast(round(dcm1,3) as string),cast(round(dcm2,3) as string) from mf_dot; --Return value +------------+------------+ | _c0 | _c1 | +------------+------------+ | 12.455 | 12.340 | +------------+------------+ --This parameter also applies to static values. set odps.sql.decimal.tostring.trimzero=false; select cast(round(12345.120BD,3) as string); --Returns: +------------+ | _c0 | +------------+ | 12345.120 | +------------+
COALESCE
Command format
coalesce(<expr1>, <expr2>, ...)Description
Returns the first non-NULL value in
<expr1>, <expr2>, ....Parameters
expr: Required. The value to be evaluated.
Return value
The return value is of the same data type as the parameters.
Examples
Example 1: Common usage.
-- 1 is returned. select coalesce(null,null,1,null,3,5,7);Example 2: If a parameter's value type is undefined, an error is returned.
Incorrect command example
-- The data type of the parameter abc is not defined. The system engine cannot recognize it, and an error is returned. select coalesce(null,null,1,null,abc,5,7);Correct command example
select coalesce(null,null,1,null,'abc',5,7);
Example 3: If all parameter values are NULL and data is not read from a table, an error is returned. The following is an incorrect command example.
-- An error is returned, indicating that at least one parameter must not be NULL. select coalesce(null,null,null,null);Example 4: Reading from a table returns NULL if all parameter values are empty.
Source data table:
+-----------+-------------+------------+ | shop_name | customer_id | toal_price | +-----------+-------------+------------+ | ad | 10001 | 100.0 | | jk | 10002 | 300.0 | | ad | 10003 | 500.0 | | tt | NULL | NULL | +-----------+-------------+------------+In the source data table, all values for tt are NULL. When you execute the following statement, NULL is returned.
select coalesce(customer_id,total_price) from sale_detail where shop_name='tt';
COMPRESS
Command format
binary compress(string <str>) binary compress(binary <bin>)Description
Compresses str or bin using the GZIP algorithm.
Parameters
str: Required. A value of the STRING type.
bin: Required. A value of the BINARY type.
Return value
Returns a value of the BINARY type. If the input is NULL, NULL is returned.
Examples
-- Returns: 1F=8B=08=00=00=00=00=00=00=03=CBH=CD=C9=C9=07=00=86=A6=106=05=00=00=00 select compress('hello');Example 2: The input parameter is an empty string.
--Returns=1F=8B=08=00=00=00=00=00=00=03=03=00=00=00=00=00=00=00=00=00 select compress('');Example 3: The input parameter is NULL.
-- NULL is returned. select compress(null);
CRC32
Command format
bigint crc32(string|binary <expr>)Description
Calculates the cyclic redundancy check value of the string or binary expr.
Parameters
expr: Required. A value of the STRING or BINARY type.
Return value
Returns a value of the BIGINT type. The following rules apply:
If the input parameter is NULL, NULL is returned.
If the input parameter is an empty string, 0 is returned.
Examples
Example 1: Calculate the cyclic redundancy check value of the string
ABC.-- 2743272264 is returned. select crc32('ABC');Example 2: The input parameter is NULL.
-- NULL is returned. select crc32(null);
DECODE
Command format
decode(<expression>, <search>, <result>[, <search>, <result>]...[, <default>])Description
Implements the
if-then-elsebranching logic.Parameters
expression: Required. The expression to be compared.
search: Required. The search item to be compared with expression.
result: Required. The value that is returned if the value of search matches the value of expression.
default: Optional. If no search item matches, the default value is returned. If this parameter is not specified, NULL is returned.
NoteAll result values must have the same data type or be NULL. If the data types are inconsistent, an error is returned.
All search and expression values must have the same data type. Otherwise, an error is returned.
Return value
If a match is found, result is returned.
If no match is found, default is returned.
If default is not specified, NULL is returned.
If a duplicate search option exists and a match is found, the result corresponding to the first match is returned.
MaxCompute SQL typically returns NULL when it evaluates
NULL=NULL. However, in this function, NULL is equal to NULL.
Examples
The
sale_detailtable contains the fieldsshop_name string, customer_id string, total_price doubleand the following data.+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | | null | c5 | NULL | 2014 | shanghai | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | +------------+-------------+-------------+------------+------------+The following command is an example.
-- If the value of customer_id is c1, Taobao is returned. If the value is c2, Alipay is returned. If the value is c3, Aliyun is returned. If the value is NULL, N/A is returned. Otherwise, Others is returned. select decode(customer_id, 'c1', 'Taobao', 'c2', 'Alipay', 'c3', 'Aliyun', Null, 'N/A', 'Others') as result from sale_detail; -- This is equivalent to the following statement. if customer_id = c1 then result := 'Taobao'; elsif customer_id = c2 then result := 'Alipay'; elsif customer_id = c3 then result := 'Aliyun'; ... else result := 'Others'; end if;The following is the result.
+------------+ | result | +------------+ | Others | | Others | | Others | | Taobao | | Alipay | | Aliyun | +------------+
DECOMPRESS
Command format
binary decompress(binary <bin>)Description
Decompresses bin using the GZIP algorithm.
Parameters
bin: Required. A value of the BINARY type.
Return value
Returns a value of the BINARY type. If the input is NULL, NULL is returned.
Examples
Example 1: Decompress the compressed result of the string
hello, worldand convert it to a string.-- hello, world is returned. select cast(decompress(compress('hello, world')) as string);Example 2: The input parameter is NULL.
-- NULL is returned. select decompress(null);
GET_IDCARD_AGE
Command format
get_idcard_age(<idcardno>)Description
Returns the current age based on a Chinese ID card number. The age is the difference between the current year and the birth year in the ID card number.
Parameters
idcardno: Required. A 15- or 18-digit Chinese ID card number of the STRING type. The function validates the ID card number based on the province code and the last check digit. If the validation fails, NULL is returned.
Return value
Returns a value of the BIGINT type. If the input is NULL, NULL is returned.
GET_IDCARD_BIRTHDAY
Command format
get_idcard_birthday(<idcardno>)Description
Returns the date of birth based on a Chinese ID card number.
Parameters
idcardno: Required. A 15- or 18-digit Chinese ID card number of the STRING type. The function validates the ID card number based on the province code and the last check digit. If the validation fails, NULL is returned.
Return value
Returns a value of the DATETIME type. If the input is NULL, NULL is returned.
GET_IDCARD_SEX
Command format
get_idcard_sex(<idcardno>)Description
Returns the gender based on a Chinese ID card number. The value is
M(male) orF(female).Parameters
idcardno: Required. A 15- or 18-digit Chinese ID card number of the STRING type. The function validates the ID card number based on the province code and the last check digit. If the validation fails, NULL is returned.
Return value
Returns a value of the STRING type. If the input is NULL, NULL is returned.
GET_USER_ID
Command format
get_user_id()Description
Obtains the ID of the current account, which is the user ID or UID.
Parameters
No input parameters are required.
Return value
Returns the ID of the current account.
Examples
select get_user_id(); -- The following result is returned. +------------+ | _c0 | +------------+ | 1117xxxxxxxx8519 | +------------+
GREATEST
Command format
greatest(<var1>, <var2>[,...])Description
Returns the maximum value among the input parameters.
Parameters
var1, var2: Required. A value of the BIGINT, DOUBLE, DECIMAL, DATETIME, or STRING type.
Return value
Returns the maximum value among the input parameters. If no implicit conversion exists, the return value is of the same data type as the input parameters.
NULL is the minimum value.
If the input parameters have different data types, they are promoted to a common data type for comparison. For example, a mix of DOUBLE, BIGINT, DECIMAL, and STRING values results in a DOUBLE return type. A mix of STRING and DATETIME values results in a DATETIME return type. Other implicit conversions are not allowed.
When
set odps.sql.hive.compatible=true;is set, if any input parameter is NULL, NULL is returned.
HASH
Command format
If the MaxCompute project is in Hive-compatible mode, the command format is as follows.
INT HASH(<value1>, <value2>[, ...]);If the MaxCompute project is not in Hive-compatible mode, the command format is as follows.
BIGINT HASH(<value1>, <value2>[, ...]);
Description
Performs a hash operation on value1 and value2 to obtain a hash value.
Parameters
value1, value2: Required. The parameters from which to calculate the hash value. The parameters can be of different data types. The supported data types vary between Hive-compatible and non-Hive-compatible modes:
Hive-compatible mode: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, BOOLEAN, STRING, CHAR, VARCHAR, DATETIME, and DATE.
Non-Hive-compatible mode: BIGINT, DOUBLE, BOOLEAN, STRING, and DATETIME.
NoteFor two identical inputs, the returned hash values are always the same. However, two identical hash values do not guarantee that the input values are the same. A hash collision may occur.
Return value
Returns a value of the INT or BIGINT type. If an input parameter is an empty string or NULL, 0 is returned.
Examples
Example 1: Calculate the hash value of input parameters of the same data type.
-- 66 is returned. SELECT HASH(0L, 2L, 4L);Example 2: Calculate the hash value of input parameters of different data types.
-- 97 is returned. SELECT HASH(0L, 'a');Example 3: An input parameter is an empty string or NULL.
-- 0 is returned. SELECT HASH(0L, null); -- 0 is returned. SELECT HASH(0L, '');
IF
Command format
if(<testCondition>, <valueTrue>, <valueFalseOrNull>)Description
Checks whether testCondition is true. If it is true, the value of valueTrue is returned. Otherwise, the value of valueFalseOrNull is returned.
Parameters
testCondition: Required. The expression to be evaluated. It must be of the BOOLEAN type.
valueTrue: Required. The value that is returned if the testCondition expression is True.
valueFalseOrNull: The value that is returned if the testCondition expression is False. This can be set to NULL.
Return value
The return value is of the same data type as the valueTrue or valueFalseOrNull parameters.
Examples
-- 200 is returned. select if(1=2, 100, 200);
LEAST
Command format
least(<var1>, <var2>[,...])Description
Returns the minimum value among the input parameters.
Parameters
var: Required. An input parameter. It can be of the BIGINT, DOUBLE, DECIMAL, DATETIME, or STRING type.
Return value
The minimum value among the input parameters. If no implicit conversion exists, the return value is of the same data type as the input parameters.
When an implicit type conversion occurs, the resulting type is DOUBLE for conversions among DOUBLE, BIGINT, and STRING, DATETIME for conversions between STRING and DATETIME, and DECIMAL for conversions among DECIMAL, DOUBLE, BIGINT, and STRING. No other implicit type conversions are allowed.
NULL is the minimum value.
If all parameter values are NULL, NULL is returned.
Examples
-- 2 is returned. select least(5, 2, 7);
MAX_PT
Command format
MAX_PT(<table_full_name>)Description
Returns the maximum value of the level-1 partitions that contain data in a partitioned table. The partitions are sorted alphabetically.
Notes
The
MAX_PTfunction can also be implemented using standard SQL.SELECT * FROM table WHERE pt=MAX_PT("table");can be rewritten asSELECT * FROM table WHERE pt = (SELECT MAX(pt) FROM table);.NoteMaxCompute does not provide the
MIN_PTfunction. You cannot use the SQL statementSELECT * FROM table WHERE pt=MIN_PT("table");to perform an operation analogous toMAX_PT. However, you can use the standard SQL statementSELECT * FROM table WHERE pt= (SELECT MIN(pt) FROM table);to achieve the same result.If all partitions in the table are empty, the
MAX_PTfunction fails. Make sure that at least one partition contains data.OSS foreign tables also support the MAX_PT function. The behavior is the same as that for internal tables.
Parameters
table_full_name: Required. The name of the table. It is of the STRING type. You must have read permission on the table.
Return value
Returns the value of the largest level-1 partition.
NoteIf you create a partition using only the
ALTER TABLEstatement and the partition does not contain any data, this partition is not returned.Examples
Example 1: The tbl table is a partitioned table that contains the partitions 20120901 and 20120902. Both partitions contain data. In the following statement,
MAX_PTreturns'20120902'. The MaxCompute SQL statement reads data from the partition wherept='20120902'.SELECT * FROM tbl WHERE pt= MAX_PT('tbl'); -- This is equivalent to the following statement. SELECT * FROM tbl WHERE pt= (SELECT MAX(pt) FROM tbl);Example 2: In a multi-level partitioning scenario, use standard SQL to obtain data from the largest partition.
SELECT * FROM table WHERE pt1 = (SELECT MAX(pt1) FROM table) AND pt2 = (SELECT MAX(pt2) FROM table WHERE pt1= (SELECT MAX(pt1) FROM table));
NULLIF
Command format
T nullif(T <expr1>, T <expr2>)Description
Compares the values of expr1 and expr2. If they are equal, NULL is returned. Otherwise, expr1 is returned.
Parameters
expr1, expr2: Required. An expression of any type.
Trepresents the input data type, which can be any data type supported by MaxCompute.Return value
Returns NULL or expr1.
Examples
-- 2 is returned. select nullif(2, 3); -- NULL is returned. select nullif(2, 2); -- 3 is returned. select nullif(3, null);
NVL
Command format
nvl(T <value>, T <default_value>)Description
If the value of value is NULL, default_value is returned. Otherwise, value is returned. The two parameters must be of the same data type.
Parameters
value: Required. An input parameter.
Trepresents the input data type, which can be any data type supported by MaxCompute.default_value: Required. The replacement value. It must be of the same data type as value.
Examples
The
t_datatable has three columns:c1 string,c2 bigint, andc3 datetime. The table contains the following data.+----+------------+------------+ | c1 | c2 | c3 | +----+------------+------------+ | NULL | 20 | 2017-11-13 05:00:00 | | ddd | 25 | NULL | | bbb | NULL | 2017-11-12 08:00:00 | | aaa | 23 | 2017-11-11 00:00:00 | +----+------------+------------+Use the
nvlfunction to replace NULL values inc1with 00000, NULL values inc2with 0, and NULL values inc3with-.select nvl(c1,'00000'),nvl(c2,0),nvl(c3,'-') from nvl_test; -- The following result is returned. +-----+------------+-----+ | _c0 | _c1 | _c2 | +-----+------------+-----+ | 00000 | 20 | 2017-11-13 05:00:00 | | ddd | 25 | - | | bbb | 0 | 2017-11-12 08:00:00 | | aaa | 23 | 2017-11-11 00:00:00 | +-----+------------+-----+
ORDINAL
Command format
ordinal(bigint <nth>, <var1>, <var2>[,...])Description
Sorts the input variables in ascending order and returns the value at the nth position.
Parameters
nth: Required. The position number, starting from 1. It is of the BIGINT type. If nth is NULL, NULL is returned.
var: Required. The value to be sorted. It can be of the BIGINT, DOUBLE, DATETIME, or STRING type.
Return value
The value at the nth position. If no implicit conversion exists, the return value is of the same data type as the input parameters.
If the input parameters have different data types, they are promoted to a common data type. The return value is of this common data type. For example, a mix of DOUBLE, BIGINT, and STRING values results in a DOUBLE return type. A mix of STRING and DATETIME values results in a DATETIME return type. Other implicit conversions are not allowed.
NULL is the minimum value.
Examples
-- 3 is returned. SELECT ordinal(3, 1, 3, 7, 5, 2, 4, 6);
PARTITION_EXISTS
Command format
boolean partition_exists(string <table_name>, string... <partitions>)Description
Checks whether a specified partition exists.
Parameters
table_name: Required. The name of the table. It is of the STRING type. You can specify the project name in the table name, for example,
my_proj.my_table. If you do not specify a project name, the current project is used by default.partitions: Required. The name of the partition. It is of the STRING type. Specify the partition values in the order of the partition key columns. The number of partition values must be the same as the number of partition key columns.
Return value
Returns a value of the BOOLEAN type. If the specified partition exists, True is returned. Otherwise, False is returned.
Examples
-- Create a partitioned table named foo. create table foo (id bigint) partitioned by (ds string, hr string); -- Add a partition to the foo table. alter table foo add partition (ds='20190101', hr='1'); -- Query whether the partition ds='20190101' and hr='1' exists. The result is True. select partition_exists('foo', '20190101', '1');
SAMPLE
Command format
boolean sample(<x>, <y>, [<column_name1>, <column_name2>[,...]])Description
Samples data based on the values read from the column_name columns and the settings for x and y, filtering out rows that do not meet the sampling conditions.
Parameters
x, y: x is required. It must be a BIGINT constant greater than 0. This parameter specifies that the data is hashed into x parts and the yth part is selected.
y is optional. If it is omitted, the first part is selected by default. If you omit the y parameter, you must also omit the column_name parameter.
If x or y is of another data type or is less than or equal to 0, an error occurs. If y is greater than x, an error also occurs. If x or y is NULL, NULL is returned.
column_name: Optional. The column based on which to sample. If this parameter is omitted, random sampling is performed based on the values of x and y. The column can be of any data type and its value can be NULL. No implicit type conversion is performed. If column_name is a constant NULL, an error is returned.
NoteTo prevent data skew caused by NULL values, NULL values in the column_name column are evenly hashed into x parts. If you do not specify column_name, the output may not be uniform when the data volume is small. In this case, specify column_name to obtain a more uniform sample.
Currently, random sampling is supported only for columns of the following data types: bigint, datetime, boolean, double, string, binary, char, and varchar.
Return value
Returns a value of the BOOLEAN type.
Examples
A table named
tblaexists and contains a column namedcola.-- This indicates that the values are hashed into four parts based on the cola column, and the first part is taken. The return value is True. select * from tbla where sample (4, 1 , cola); -- This indicates that each row of data is randomly hashed into four parts, and the second part is taken. The return value is True. select * from tbla where sample (4, 2);
SHA
Command format
string sha(string|binary <expr>)Description
Calculates the SHA-1 hash value of the string or binary expr and returns the value as a hexadecimal string.
Parameters
expr: Required. A value of the STRING or BINARY type.
Return value
Returns a value of the STRING type. If the input is NULL, NULL is returned.
Examples
Example 1: Calculate the SHA hash value of the string
ABC.-- 3c01bdbb26f358bab27f267924aa2c9a03fcfdb8 is returned. select sha('ABC');Example 2: The input parameter is NULL.
-- NULL is returned. select sha(null);
SHA1
Command format
string sha1(string|binary <expr>)Description
Calculates the SHA-1 hash value of the string or binary expr and returns the value as a hexadecimal string.
Parameters
expr: Required. A value of the STRING or BINARY type.
Return value
Returns a value of the STRING type. If the input is NULL, NULL is returned.
Examples
Example 1: Calculate the SHA-1 hash value of the string
ABC.-- 3c01bdbb26f358bab27f267924aa2c9a03fcfdb8 is returned. select sha1('ABC');Example 2: The input parameter is NULL.
-- NULL is returned. select sha1(null);
SHA2
Command format
string sha2(string|binary <expr>, bigint <number>)Description
Calculates the SHA-2 hash value of the string or binary expr and returns the value in the specified number format.
Parameters
expr: Required. A value of the STRING or BINARY type.
number: Required. A value of the BIGINT type. The hash bit length. Valid values: 224, 256, 384, 512, and 0 (equivalent to 256).
Return value
Returns a value of the STRING type. The following rules apply:
If any input parameter is NULL, NULL is returned.
If the value of number is not within the valid range, NULL is returned.
Examples
Example 1: Calculate the SHA-2 hash value of the string
ABC.-- b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78 is returned. select sha2('ABC', 256);Example 2: An input parameter is NULL.
-- NULL is returned. select sha2('ABC', null);
STACK
Command format
stack(n, expr1, ..., exprk)Description
Splits
expr1, ..., exprkinto n rows. Unless otherwise specified, the output result uses the default column namescol0, col1....Parameters
n: Required. The number of rows to split into.
expr: Required. Specifies the parameters to be split, such as
expr1, ..., exprk. The parameters must be integers, and the total number of parameters must be an integer multiple of n to be split into n complete rows. An error is returned if these conditions are not met.
Return value
Returns a dataset with n rows. The number of columns is the quotient of the number of parameters divided by n.
Examples
-- Arrange 1, 2, 3, 4, 5, 6 into 3 rows. select stack(3, 1, 2, 3, 4, 5, 6); -- The following result is returned. +------+------+ | col0 | col1 | +------+------+ | 1 | 2 | | 3 | 4 | | 5 | 6 | +------+------+ -- Arrange 'A',10,date '2015-01-01','B',20,date '2016-01-01' into two rows. select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2); -- The following result is returned. +------+------+------+ | col0 | col1 | col2 | +------+------+------+ | A | 10 | 2015-01-01 | | B | 20 | 2016-01-01 | +------+------+------+ -- Arrange a, b, c, d into two rows. If the source table has multiple rows, the stack function is executed row by row. select stack(2,a,b,c,d) as (col,value) from values (1,1,2,3,4), (2,5,6,7,8), (3,9,10,11,12), (4,13,14,15,null) as t(key,a,b,c,d); -- The following result is returned. +------+-------+ | col | value | +------+-------+ | 1 | 2 | | 3 | 4 | | 5 | 6 | | 7 | 8 | | 9 | 10 | | 11 | 12 | | 13 | 14 | | 15 | NULL | +------+-------+ -- Use with lateral view. select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20, date '2016-01-01') tf as col0,col1,col2; -- The following result is returned. +------+------+------+ | col0 | col1 | col2 | +------+------+------+ | A | 10 | 2015-01-01 | | B | 20 | 2016-01-01 | +------+------+------+
STR_TO_MAP
Command format
str_to_map([string <mapDupKeyPolicy>,] <text> [, <delimiter1> [, <delimiter2>]])Description
Splits text into key-value pairs using delimiter1, and then splits each key-value pair into a key and a value using delimiter2.
Parameters
mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
exception: An error is returned.
last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policyparameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policyto exception. If you do not specify this parameter, the default value last_win is used.NoteThe behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of
odps.sql.map.key.dedup.policyis used.text: Required. A value of the STRING type. The string to be split.
delimiter1: Optional. A separator of the STRING type. If this parameter is not specified, the default separator is a comma (
,).delimiter2: Optional. A separator of the STRING type. If this parameter is not specified, the default separator is an equal sign (
=).NoteIf the separator is a regular expression or a special character, you must add two backslashes (\\) before it for escaping. Special characters include colons (:), periods (.), question marks (?), plus signs (+), and asterisks (*).
Return value
Returns a value of the
map<string, string>type. The return value is the result of splitting text using delimiter1 and delimiter2.Examples
-- {test1:1, test2:2} is returned. select str_to_map('test1&1-test2&2','-','&'); -- {test1:1, test2:2} is returned. select str_to_map("test1.1,test2.2", ",", "\\."); -- {test1:1, test2:3} is returned. select str_to_map("test1.1,test2.2,test2.3", ",", "\\.");
TABLE_EXISTS
Command format
boolean table_exists(string <table_name>)Description
Checks whether a specified table exists.
Parameters
table_name: Required. The name of the table. It is of the STRING type. You can specify the project name in the table name, for example,
my_proj.my_table. If you do not specify a project name, the current project is used by default.Return value
Returns a value of the BOOLEAN type. If the specified table exists, True is returned. Otherwise, False is returned.
Examples
-- Use in a select list. select if(table_exists('abd'), col1, col2) from src;
TRANS_ARRAY
Limits
All columns that are used as
keysmust be placed first, and the columns to be transposed must be placed after the key columns.A
selectstatement can contain only one user-defined table-valued function (UDTF) and cannot contain other columns.This function cannot be used with
group by,cluster by,distribute by, orsort by.
Command format
trans_array (<num_keys>, <separator>, <key1>,<key2>,…,<col1>,<col2>,<col3>) as (<key1>,<key2>,...,<col1>, <col2>)Description
A UDTF that transposes one row into multiple rows. It converts an array that is stored in a column and delimited by a fixed separator into multiple rows.
Parameters
num_keys: Required. A BIGINT constant that must be
>=0. This parameter specifies the number of columns to be used as transpositionkeyswhen converting to multiple rows.separator: Required. A STRING constant that is used as a separator to split a string into multiple elements. If this parameter is an empty string, an error is returned.
keys: Required. The columns to be used as
keysfor transposition. The number of key columns is specified by num_keys. If num_keys specifies that all columns are used askeys(that is, num_keys is equal to the total number of columns), only one row is returned.cols: Required. The arrays to be transposed into rows. All columns after the
keyscolumns are considered arrays to be transposed. These columns must be of the STRING type and store arrays in string format, such asHangzhou;Beijing;Shanghai, which is an array delimited by semicolons (;).
Return value
Returns the transposed rows. The new column names are specified by
as. The data types of thekeycolumns remain unchanged. All other columns are of the STRING type. The number of resulting rows is determined by the array with the most elements. Arrays with fewer elements are padded with NULL.Examples
Example 1: The
t_tabletable contains the following data.+----------+----------+------------+ | login_id | login_ip | login_time | +----------+----------+------------+ | wangwangA | 192.168.0.1,192.168.0.2 | 20120101010000,20120102010000 | | wangwangB | 192.168.45.10,192.168.67.22,192.168.6.3 | 20120111010000,20120112010000,20120223080000 | +----------+----------+------------+ -- Execute the SQL statement. select trans_array(1, ",", login_id, login_ip, login_time) as (login_id,login_ip,login_time) from t_table; -- The following result is returned. +----------+----------+------------+ | login_id | login_ip | login_time | +----------+----------+------------+ | wangwangB | 192.168.45.10 | 20120111010000 | | wangwangB | 192.168.67.22 | 20120112010000 | | wangwangB | 192.168.6.3 | 20120223080000 | | wangwangA | 192.168.0.1 | 20120101010000 | | wangwangA | 192.168.0.2 | 20120102010000 | +----------+----------+------------+ -- If the table contains the following data. Login_id LOGIN_IP LOGIN_TIME wangwangA 192.168.0.1,192.168.0.2 20120101010000 -- The arrays with fewer elements are padded with NULL. Login_id Login_ip Login_time wangwangA 192.168.0.1 20120101010000 wangwangA 192.168.0.2 NULLExample 2: The mf_fun_array_test_t table contains the following data.
+------------+------------+------------+------------+ | id | name | login_ip | login_time | +------------+------------+------------+------------+ | 1 | Tom | 192.168.100.1,192.168.100.2 | 20211101010101,20211101010102 | | 2 | Jerry | 192.168.100.3,192.168.100.4 | 20211101010103,20211101010104 | +------------+------------+------------+------------+ -- Use two keys, id and name, for array transposition. Execute the SQL statement. select trans_array(2, ",", Id,Name, login_ip, login_time) as (Id,Name,login_ip,login_time) from mf_fun_array_test_t; -- The following result is returned. The data has been split and grouped by the keys id and name. +------------+------------+------------+------------+ | id | name | login_ip | login_time | +------------+------------+------------+------------+ | 1 | Tom | 192.168.100.1 | 20211101010101 | | 1 | Tom | 192.168.100.2 | 20211101010102 | | 2 | Jerry | 192.168.100.3 | 20211101010103 | | 2 | Jerry | 192.168.100.4 | 20211101010104 | +------------+------------+------------+------------+
TRANS_COLS
Limits
All columns that are used as
keysmust be placed first, and the columns to be transposed must be placed after the key columns.A
selectstatement can contain only one UDTF and cannot contain other columns.
Command format
trans_cols (<num_keys>, <key1>,<key2>,…,<col1>, <col2>,<col3>) as (<idx>, <key1>,<key2>,…,<col1>, <col2>)Description
A UDTF that transposes one row into multiple rows. It splits different columns into different rows.
Parameters
num_keys: Required. A BIGINT constant that must be
>=0. This parameter specifies the number of columns to be used as transposition keys when converting to multiple rows.keys: Required. The columns to be used as keys for transposition. The number of key columns is specified by num_keys. If num_keys specifies that all columns are used as keys (that is, num_keys is equal to the total number of columns), only one row is returned.
idx: Required. Specifies the index of the transposed row.
cols: Required. The columns to be transposed into rows.
Return value
Returns the transposed rows. The new column names are specified by the
asclause. The output includes the key columns, an index column for the transposed row (starting from 1), and a value column containing the data from the transposed columns. The data types of the key columns remain unchanged. The value column will have a common supertype of the transposed input columns.Examples
The
t_tabletable contains the following data.+----------+----------+------------+ | Login_id | Login_ip1 | Login_ip2 | +----------+----------+------------+ | wangwangA | 192.168.0.1 | 192.168.0.2 | +----------+----------+------------+ -- Execute the SQL statement. select trans_cols(1, login_id, login_ip1, login_ip2) as (idx, login_id, login_ip) from t_table; -- The following result is returned. idx login_id login_ip 1 wangwangA 192.168.0.1 2 wangwangA 192.168.0.2
UNBASE64
Command format
binary unbase64(string <str>)Description
Converts the BASE64-encoded string str to its binary representation.
Parameters
str: Required. A value of the STRING type. The BASE64-encoded string to be converted.
Return value
Returns a value of the BINARY type. If the input is NULL, NULL is returned.
Examples
Example 1: Convert the string
YWxpYmFiYQ==to its binary value.-- alibaba is returned. select unbase64('YWxpYmFiYQ==');Example 2: The input parameter is NULL.
-- NULL is returned. select unbase64(null);
UNIQUE_ID
Command format
string unique_id()Description
Returns a random unique ID. An example is
29347a88-1e57-41ae-bb68-a9edbdd9****_1. This function is more efficient than the UUID function. The returned ID is longer than that returned by the UUID function and contains an underscore (_) and a number, such as_1.
UUID
Command format
string uuid()Description
Returns a random ID. An example is
29347a88-1e57-41ae-bb68-a9edbdd9****.NoteUUID returns a random ID. The probability of duplication is very low.