MaxCompute provides string functions for truncating, concatenating, converting, comparing, and searching string data in tables.
MaxCompute SQL supports the following string functions.
|
Function |
Description |
|
Returns the ASCII code of the first character in a string. |
|
|
Calculates the number of characters from string A that appear in string B. |
|
|
Converts an ASCII code to a character. |
|
|
Concatenates strings. |
|
|
Concatenates all input strings using a specified delimiter. |
|
|
Decodes a string using a specified encoding format. |
|
|
Encodes a string using a specified encoding format. |
|
|
Finds the position of a specified string in a comma-separated string. |
|
|
Converts a number to a string in a specified format. |
|
|
Converts binary data in a specified encoding format to a UTF-8 string. |
|
|
Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and output format. |
|
|
Extracts a specified string from a standard JSON string using a specified method. |
|
|
Converts a string to title case where each word starts with an uppercase letter followed by lowercase letters, separated by spaces. |
|
|
Finds the position of string A in string B. |
|
|
Determines whether a string can be converted from character set A to character set B. |
|
|
Extracts strings corresponding to a set of input keys from a standard JSON string. |
|
|
Splits a string into key-value pairs, separates the pairs, and returns the value corresponding to a specified key. |
|
|
Splits a string into multiple key-value pairs, separates the pairs, and returns values corresponding to multiple specified keys. |
|
|
Calculates the length of a string. |
|
|
Calculates the byte length of a string. |
|
|
Finds the position of a specified substring within a string. |
|
|
Left-pads a string to a specified length. |
|
|
Removes characters from the left end of a string. |
|
|
Returns a hash value calculated from a string expression. |
|
|
Calculates the MD5 hash of a string. |
|
|
Parses a URL and returns information about a specified part. |
|
|
Parses a URL and returns information about multiple parts. |
|
|
Counts substrings that match a specified pattern starting from a specified position. |
|
|
Splits a string into groups based on a specified pattern and returns the string from a specified group. |
|
|
Finds all substrings matching a regular expression pattern in a string and returns them as an array. |
|
|
Returns the start or end position of a substring that matches a specified pattern for a specified occurrence starting from a specified position. |
|
|
Replaces substrings matching a specified pattern for a specified occurrence with another string. |
|
|
Returns a substring that matches a specified pattern for a specified occurrence starting from a specified position. |
|
|
Returns a string repeated a specified number of times. |
|
|
Replaces substrings matching a specified string with another string. |
|
|
Returns the reverse of a string. |
|
|
Right-pads a string to a specified length. |
|
|
Removes characters from the right end of a string. |
|
|
Converts a standard string to a SOUNDEX string. |
|
|
Generates a string of spaces. |
|
|
Splits a string using a delimiter and returns an array. |
|
|
Splits a string using a delimiter and returns a specified part. |
|
|
Returns a substring of a STRING-type string starting from a specified position with a specified length. |
|
|
Returns a substring of a STRING or BINARY-type string starting from a specified position with a specified length. |
|
|
Extracts a substring before or after a specified occurrence of a delimiter. |
|
|
Converts a BOOLEAN, BIGINT, DECIMAL, or DOUBLE value to its STRING representation. |
|
|
Converts a specified complex type to a JSON string. |
|
|
Converts English uppercase letters in a string to lowercase. |
|
|
Converts English lowercase letters in a string to uppercase. |
|
|
Replaces all occurrences of string A in string B with string C. |
|
|
Removes characters from both ends of a string. |
|
|
Converts a string from |
|
|
Encodes a string in |
Usage notes
MaxCompute V2.0 provides additional functions. If the functions that you use involve new data types that are supported in the MaxCompute V2.0 data type edition, you must execute the SET statement to enable the MaxCompute V2.0 data type edition. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
Session level: To use the MaxCompute V2.0 data type edition, you must add
set odps.sql.type.system.odps2=true;before the SQL statement that you want to execute, and commit and execute them together.Project level: The project owner can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. To enable the MaxCompute V2.0 data type edition at the project level, run the following command:
setproject odps.sql.type.system.odps2=true;For more information about
setproject, see Project operations. For more information about the precautions that you must take when you enable the MaxCompute V2.0 data type edition at the project level, see Data type version guide.
Limitations
The following functions support only English characters:
-
TRIM/RTRIM/LTRIM: The trimChars parameter supports only English characters.
-
REVERSE: Supports only English characters in Hive mode.
-
SOUNDEX: Converts only English characters.
-
TOLOWER: Converts English uppercase letters in a string to lowercase.
-
TOUPPER: Converts English lowercase letters in a string to uppercase.
-
INITCAP: Converts the first English letter of each word to uppercase and the rest to lowercase.
ASCII
-
Syntax
bigint ascii(string <str>) -
Description
Returns the ASCII code of the first character in the string str.
-
Parameters
str: required. STRING type. If the input is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted to the STRING type before calculation.
-
Return value
Returns a value of the BIGINT type. The return value follows these rules:
-
If str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
-
If str is NULL, NULL is returned.
-
-
Examples
-
Example 1: Return the ASCII code of the first character in the string
abcde. Sample statement:-- Returns 97. select ascii('abcde'); -
Example 2: The input parameter is NULL. Sample statement:
-- Returns NULL. select ascii(null);
-
CHAR_MATCHCOUNT
-
Syntax
bigint char_matchcount(string <str1>, string <str2>) -
Description
Calculates how many characters in str1 appear in str2.
-
Parameters
str1 and str2: required. STRING type. Both strings must be valid UTF-8 strings. If invalid characters (non-Unicode encoded) are found during comparison, a negative value is returned.
-
Return value
Returns a value of the BIGINT type. If str1 or str2 is NULL, NULL is returned.
-
Examples
-
Example 1: Calculate how many characters from the string
aabcappear inabcde. Sample statement:-- Returns 4. select char_matchcount('aabc','abcde'); -
Example 2: One input parameter is NULL. Sample statement:
-- Returns NULL. select char_matchcount(null,'abcde');
-
CHR
-
Syntax
string chr(bigint <ascii>) -
Description
Converts a specified ASCII code to a character.
-
Parameters
ascii: required. BIGINT type ASCII value. Valid values: 0 to 127. If the input is of the STRING, DOUBLE, or DECIMAL type, it is implicitly converted to the BIGINT type before calculation.
-
Return value
Returns a value of the STRING type. The return value follows these rules:
-
If the ascii value is outside the valid range, an error is returned.
-
If ascii is not of the BIGINT, STRING, DOUBLE, or DECIMAL type, an error is returned.
-
If ascii is NULL, NULL is returned.
-
-
Examples
-
Example 1: Convert the ASCII code
100to a character. Sample statement:-- Returns d. select chr(100); -
Example 2: The input parameter is NULL. Sample statement:
-- Returns NULL. select chr(null); -
Example 3: The input is a STRING-type character. Sample statement:
--The input is implicitly converted to the BIGINT type for the operation, which returns 'd'. select chr('100');
-
CONCAT
-
Syntax
array<T> concat(array<T> <a>, array<T> <b>[,...]) string concat(string <str1>, string <str2>[,...]) -
Description
-
Arrays as inputs: Concatenates all elements of multiple arrays and returns a new array.
-
Strings as inputs: Concatenates multiple strings and returns a new string.
-
-
Parameters
-
a and b: required. These parameters specify arrays.
Tinarray<T>specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type. The null elements are also involved in the operation. -
str1 and str2: required. Values of the STRING type. If the input values are of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, they are implicitly converted into values of the STRING type before calculation. If the input values are of other data types, an error is returned.
-
-
Return value
-
A value of the ARRAY type is returned. If one of the input arrays is null, null is returned.
-
A value of the STRING type is returned. If no input parameters are configured or an input parameter is set to null, null is returned.
-
-
Examples
-
Example 1: Concatenate all elements of
array(10, 20)andarray(20, -20). Sample statement:-- The return value is [10, 20, 20, -20]. select concat(array(10, 20), array(20, -20)); -
Example 2: One of the input arrays contains a null element. Sample statement:
-- The return value is [10, null, 20, -20]. select concat(array(10, null), array(20, -20)); -
Example 3: One of the input arrays is null. Sample statement:
-- The return value is null. select concat(array(10, 20), null); -
Example 4: Concatenate strings
aabcandabcde. Sample statement:-- The return value is aabcabcde. select concat('aabc','abcde'); -
Example 5: The input is empty. Sample statement:
-- The return value is null. select concat(); -
Example 6: One of the input strings is null. Sample statement:
-- The return value is null. select concat('aabc', 'abcde', null);
-
CONCAT_WS
-
Syntax
string concat_ws(string <separator>, string <str1>, string <str2>[,...]) string concat_ws(string <separator>, array<string> <a>) -
Description
Returns a string formed by concatenating all input strings or array elements using the specified delimiter. This function is an extension in MaxCompute V2.0.
-
Parameters
-
separator: required. STRING type delimiter.
-
str1 and str2: At least two strings must be specified. STRING type. If the input is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, it is implicitly converted to the STRING type before calculation.
-
a: required. ARRAY type. Array elements must be of the STRING type.
-
-
Return value
Returns a value of the STRING or STRUCT type. The return value follows these rules:
-
If str1 or str2 is not of the STRING, BIGINT, DECIMAL, DOUBLE, or DATETIME type, an error is returned.
-
If no input parameters are provided or any input parameter is NULL, NULL is returned.
-
-
Examples
-
Example 1: Concatenate strings
nameandhanmeimeiusing:. Sample statement:-- Returns name:hanmeimei. select concat_ws(':','name','hanmeimei');-
-
Example 2: One input parameter is NULL. Sample statement:
-- Returns NULL. select concat_ws(':','avg',null,'34'); -
Example 3: Concatenate elements in array
array('name', 'hanmeimei')using:. Sample statement:-- Returns name:hanmeimei. select concat_ws(':',array('name', 'hanmeimei'));
-
DECODE
-
Syntax
STRING DECODE(BINARY <str>, STRING <charset>) -
Description
Decodes str using the format specified by charset.
-
Parameters
Parameter
Required
Description
str
Yes
BINARY type. The string to decode.
charset
Yes
STRING type. Encoding format. Valid values: UTF-8, UTF-16, UTF-16LE, UTF-16BE, ISO-8859-1, US-ASCII.
NoteISO-8859-1 and US-ASCII encoding formats currently support only English characters, not Chinese.
-
Return value
Returns STRING. If str or charset is NULL, NULL is returned.
-
Examples
-
Example 1: Encode and decode the string English Sample based on the UTF-8 format. Sample statement:
-- Encode and decode the string.SELECT DECODE(ENCODE("English Sample","UTF-8"), "UTF-8");Returns:
+-----+| _c0 |+-----+| English Sample |+-----+ -
Example 2: Set either of the input parameters to NULL. Sample statement:
SELECT DECODE(ENCODE("English Sample","UTF-8"), NULL);Returns:
+-----+| _c0 |+-----+| NULL |+-----+
-
ENCODE
-
Syntax
binary encode(string <str>, string <charset>) -
Description
Encodes str using the format specified by charset.
-
Parameters
-
str: required. STRING type. The string to re-encode.
-
charset: required. STRING type. Encoding format. Valid values: UTF-8, UTF-16, UTF-16LE, UTF-16BE, ISO-8859-1, US-ASCII.
-
-
Return value
Returns a BINARY type. If str or charset is NULL, NULL is returned.
-
Examples
-
Example 1: Encode the string
abcinUTF-8format. Sample statement:-- Returns abc. select encode("abc", "UTF-8"); -
Example 2: Encode the string
abcinUTF-16BEformat. Sample statement:-- Returns =00a=00b=00c. select encode("abc", "UTF-16BE"); -
Example 3: One input parameter is NULL. Sample statement:
-- Returns NULL. select encode("abc", null);
-
FIND_IN_SET
-
Syntax
BIGINT FIND_IN_SET(STRING <str1>, STRING <str2>[, STRING <delimiter>]) -
Description
Finds the position of string str1 in string str2 separated by delimiter, counting from 1.
-
Parameters
Parameter
Required
Description
str1
Yes
STRING type. The string to search for.
str2
Yes
STRING type. The string separated by delimiter.
delimiter
No
STRING constant. Default is comma (
,). The delimiter can be a single character or a string. -
Return value
Returns a BIGINT type. The return value follows these rules:
-
If str2 does not contain str1 or str1 contains delimiter, 0 is returned.
-
If str1 or str2 is
NULL,NULLis returned.
-
-
Examples
-
Example 1: Find the position of the string
abin the stringabc,hello,ab,c, which is separated by commas (,). Sample statement:SELECT FIND_IN_SET('ab', 'abc,hello,ab,c') AS pos;Returns:
+------------+ | pos | +------------+ | 3 | +------------+ -
Example 2: Find the position of string
hiin stringabc,hello,ab,cseparated by commas (,). Sample statement:SELECT FIND_IN_SET('hi', 'abc,hello,ab,c') AS pos;Returns:
+------------+ | pos | +------------+ | 0 | +------------+ -
Example 3: Find the position of the string
abin the underscore (_)-delimited stringabc_hello_ab_c. Sample statement:SELECT FIND_IN_SET('ab', 'abc_hello_ab_c', '_') AS pos;Returns:
+------------+ | pos | +------------+ | 3 | +------------+ -
Example 4: Input parameter
str1orstr2isNULL. Sample statement:SELECT FIND_IN_SET(null, 'abc,hello,ab,c') AS pos;Returns:
+------------+ | pos | +------------+ | NULL | +------------+
-
FORMAT_NUMBER
-
Syntax
STRING FORMAT_NUMBER(DOUBLE|BIGINT|INT|SMALLINT|TINYINT|FLOAT|DECIMAL|STRING <expr1>, INT <expr2>) -
Description
Converts expr1 to a string formatted according to expr2.
-
Parameters
-
expr1: required. A value of the FLOAT, DOUBLE, or DECIMAL type. This parameter specifies the expression that you want to format.
-
expr2: required. A value of the INT type. Valid values: 0 to 340. This parameter specifies the number of decimal places that you want to retain. This parameter can also be expressed in a format similar to
#,###,###.##. The number of decimal places returned varies based on the value of this parameter. -
Return value
A value of the STRING type is returned. The return value varies based on the following rules:
-
If the value of
expr2is greater than 0 and is less than or equal to 340, a value that is rounded to the specified number of decimal places is returned. -
If the value of
expr2is 0, the return value contains only the integer part and does not contain the decimal point or decimal part. -
If the value of
expr2is less than 0 orgreater than 340, an error is returned. -
If expr1 or expr2 is empty or is set to null, null is returned.
-
-
Examples
-
Example 1: Format a number as specified. Sample statements:
-- Returns 5.230. SELECT FORMAT_NUMBER(5.230134523424545456,3); -- Returns 12,332.123. SELECT FORMAT_NUMBER(12332.123456, '#,###,###,###.###'); -
Example 2: One input parameter is empty or NULL. Sample statements:
-- Returns NULL. SELECT FORMAT_NUMBER('',3); -- Returns NULL. SELECT FORMAT_NUMBER(null,3);
-
FROM_CHARSET
-
Syntax
STRING FROM_CHARSET(binary <source>, string <source_charset>, [string <mode>]) -
Description
FROM_CHARSET converts non-UTF-8 encoded binary data to a UTF-8 string for subsequent calculations.
-
Parameters
-
source: required. A value of the BINARY type. This parameter specifies the binary data that you want to convert.
-
source_charset: required. A value of the STRING type. This parameter specifies the original encoding format of the binary data that is specified by source. Valid values: UTF-8, UTF-16, UTF-16LE, UTF-16BE, ISO-8859-1, US-ASCII, GB2312, GBK, and GB18030.
-
mode: optional. A value of the STRING type. This parameter specifies the processing mode if a character cannot be converted when the FROM_CHARSET function converts the binary data specified by source into a string in the specified encoding format. Valid values:
-
NONE: reports an error. No processing is performed. This is the default value.
-
TRANSLIT: replaces the character with a similar character in the specified encoding format.
-
IGNORE: ignores the error and continues to run the command.
-
-
-
Return value
Returns a UTF-8 encoded STRING type. When parameters are NULL or empty strings, the return value follows these rules:
-
If any input parameter is NULL, returns NULL.
-
If any input parameter is an empty string, returns an error.
-
-
Examples
-
Example 1: Convert UTF-8 encoded binary data into a UTF-8 encoded string.
SELECT FROM_CHARSET(unhex('e58aa0e6b2b9e9949fe696a4e68bb70a'),'UTF-8', 'TRANSLIT');Returns:
+------------+| _c0 |+------------+| 加油锟斤拷 |+------------+ -
Example 2: Convert GBK-encoded binary data into a UTF-8 encoded string.
SELECT FROM_CHARSET(unhex('b9feb9febac3a4ce'), 'GBK');Returns:
+------------+| _c0 |+------------+| 哈哈好の |+------------+ -
Example 3: If an input parameter is null, null is returned.
SELECT FROM_CHARSET(unhex('b9feb9febac3a4ce'), null);Returns:
+------------+| _c0 |+------------+| NULL |+------------+
-
FROM_JSON
-
Syntax
from_json(<jsonStr>, <schema>) -
Description
Returns data of the ARRAY, MAP, or STRUCT type based on JSON string jsonStr and output format schema.
-
Parameters
-
jsonStr: required. The JSON string that you entered.
-
schema: required. The schema of the JSON string. The value of this parameter must be in the same format as that in the statement for creating a table, such as
array<bigint>,map<string, array<string>>, orstruct<a:int, b:double, `C`:map<string,string>>.NoteKeys in a struct are case-sensitive. You can also specify a struct in the format of
a BIGINT, b DOUBLE, which is equivalent toSTRUCT<a:BIGINT, b:DOUBLE>.The following table describes the mappings between JSON data types and MaxCompute data types.
JSON data type
MaxCompute data type
OBJECT
STRUCT, MAP, and STRING
ARRAY
ARRAY and STRING
NUMBER
TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, and STRING
BOOLEAN
BOOLEAN and STRING
STRING
STRING, CHAR, VARCHAR, BINARY, DATE, and DATETIME
NULL
All types
NoteThe JSON string of the OBJECT and ARRAY types are parsed as much as possible. If the data type of the JSON string is not mapped to any MaxCompute data type, the JSON string is omitted. For ease of use, all JSON data types can be converted into the STRING data type supported by MaxCompute. When you convert a JSON string of the NUMBER type to a value of the FLOAT, DOUBLE, or DECIMAL type, the precision of the value cannot be ensured. We recommend you convert the JSON string to a value of the STRING type and then convert the obtained value to a value of the FLOAT, DOUBLE, or DECIMAL type.
-
-
Return value
A value of the ARRAY, MAP, or STRUCT type is returned.
-
Examples
-
Example 1: Convert a specific JSON string into a value of a specific data type. Sample statement:
-- The return value is {"a":1,"b":0.8}. select from_json('{"a":1, "b":0.8}', 'a int, b double'); -- The return value is {"time":"26/08/2015"}. select from_json('{"time":"26/08/2015"}', 'time string'); -- The return value is {"a":1,"b":0.8}. select from_json('{"a":1, "b":0.8}', 'a int, b double, c string'); -- The return value is [1,2,3]. select from_json('[1, 2, 3, "a"]', 'array<bigint>'); -- The return value is {"d":"v","a":"1","b":"[1,2,3]","c":"{}"}. select from_json('{"a":1,"b":[1,2,3],"c":{},"d":"v"}', 'map<string, string>'); -
Example 2: Use the
map_keysandfrom_jsonfunctions to obtain all keys in a JSON string. You can also use JSON_KEYS for the same purpose. Sample statement:-- The return value is ["a","b"]. select map_keys(from_json('{"a":1,"b":2}','map<string,string>'));
-
GET_JSON_OBJECT
Usage notes
This function is used to extract a specific string from a standard JSON string by using JSONPath. This function supports input parameters of the following data types:
-
JSON types: If input parameters are of JSON types, standard JSONPath expressions are used.
-
STRING type: If input parameters are of the STRING type, the original JSONPath expressions are used.
Note
-
The JSONPath expressions that are used when input parameters are of JSON types and those that are used when input parameters are of the STRING type comply with different rules. This may cause incompatibility issues.
-
The GET_JSON_OBJECT function does not support the syntax of JSONPath expressions.
Input parameters of JSON data types
-
Syntax
string get_json_object(json <json>, string <json_path>) -
Description
Extracts a single string from a standard JSON string based on JSON PATH.
-
Parameters
-
json: required. A JSON string from which you want to extract a single string.
-
json_path: required. A JSONPath expression based on which you want to extract a single string.
-
-
Return value
A value of a STRING type is returned.
-
Examples
-
Example 1: Extract the value that corresponds to the key a from a JSON string.
SELECT get_json_object(json '{"a":1, "b":2}', '$.a');Returns:
+-----+| _c0 |+-----+| 1 |+-----+ -
Example 2: Extract the value that corresponds to the key c from a JSON string.
SELECT get_json_object(json '{"a":1, "b":2}', '$.c');Returns:
+-----+| _c0 |+-----+| NULL |+-----+ -
Example 3: If an invalid JSON path is specified, the return value is NULL.
SELECT get_json_object(json '{"a":1, "b":2}', '$invalid_json_path');Returns:
+-----+| _c0 |+-----+| NULL |+-----+
-
Input parameter of the STRING data type
-
Syntax
string get_json_object(string <json>, string <path>) -
Description
Extracts a single string from a standard JSON string based on path. The original data is read each time this function is called. Therefore, repeated calls may affect system performance and increase costs. To prevent repeated calls, you can use the
GET_JSON_OBJECTfunction with UDTFs. For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs. -
Parameters
-
json: required. A value of the STRING type. This parameter specifies a standard JSON object in the format of
{Key:Value, Key:Value,...}. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark (") before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark (') before extraction. -
path: required. A value of the STRING type. This parameter specifies the path in the value of the json parameter and starts with
$. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON data from OSS to MaxCompute. Meanings of different characters:-
$: indicates the root node. -
.or['']: indicates a child node. MaxCompute parses JSON objects by using.or['']. If a key in a JSON object contains a period (.), [''] can be used. -
[]([number]): indicates an array subscript, which starts from 0. -
*: indicates the wildcard for[]. If this character is used in the path parameter, an entire array is returned. An asterisk (*) cannot be escaped.
-
-
-
Limits
Only MaxCompute V2.0 allows you to extract data by using
['']in the path parameter. To use [''], you must add theset odps.sql.udf.getjsonobj.new=true;statement before the statement that you want to execute. -
Return value
-
If the json parameter is empty or invalid, null is returned.
-
If the format of json is valid and path exists, the related string is returned.
-
You can specify the
flagodps.sql.udf.getjsonobj.newparameter for a session to determine how this function returns a value.-
If you execute the
set odps.sql.udf.getjsonobj.new=true;statement, this function retains the original strings when it returns a value.We recommend that you use this configuration because it results in more standard function return behavior. This facilitates data processing and improves data processing performance. If a job in which this function escapes JSON reserved characters exists in a MaxCompute project, we recommend that you retain the original escape operation to prevent errors caused by lack of verification. The function complies with the following rules when it returns a value:
-
In this configuration, the return value is still a JSON string, which can be parsed as JSON data, without the need to use the
REPLACEorREGEXP_REPLACEfunction to replace backslashes (\). -
Duplicate keys are allowed in a JSON object. If duplicate keys exist, the data can be parsed.
-- The return value is 1. select get_json_object('{"a":"1","a":"2"}', '$.a'); -
The encoded strings that correspond to emojis are supported. However, DataWorks does not allow you to enter emojis. DataWorks allows you to enter only the encoded strings that correspond to emojis to MaxCompute by using a tool, such as Data Integration. DataWorks uses the
GET_JSON_OBJECTfunction to process the data.-- The return value is an emoji. select get_json_object('{"a":"<Emoji>"}', '$.a'); -
The output results are displayed in alphabetical order.
-- The return value is {"b":"1","a":"2"}. select get_json_object('{"b":"1","a":"2"}', '$');
-
-
If you execute the
set odps.sql.udf.getjsonobj.new=false;statement, this function escapes JSON reserved characters when it returns a value. The function complies with the following rules when it returns a value:-
JSON reserved characters such as line feeds (\n) and quotation marks (") are displayed as
'\n'and'\"'. -
Each key in a JSON object must be unique. If duplicate keys exist, the data may fail to be parsed. Sample statement:
-- The return value is null. select get_json_object('{"a":"1","a":"2"}', '$.a'); -
The encoded strings that correspond to emojis cannot be parsed. Sample statement:
-- The return value is null. select get_json_object('{"a":"<Emoji>"}', '$.a'); -
The output results are displayed in alphabetical order. Sample statement:
-- The return value is {"a":"2","b":"1"}. select get_json_object('{"b":"1","a":"2"}', '$');
-
Note
For MaxCompute projects that were created on or after January 21, 2021, the
GET_JSON_OBJECTfunction retains the original strings when it returns a value. For MaxCompute projects that are created before January 21, 2021, theGET_JSON_OBJECTfunction escapes JSON reserved characters when it returns a value. The following example helps you determine how theGET_JSON_OBJECTfunction returns a value in a MaxCompute project.select get_json_object('{"a":"[\\"1\\"]"}', '$.a'); -- Return JSON reserved characters by using escape characters. [\"1\"] -- Return the original strings. ["1"]You can submit an applicationor search for the DingTalk group ID 11782920 to join the MaxCompute developer community DingTalk group and request MaxCompute technical support engineers to configure the
GET_JSON_OBJECTfunction to retain original strings. This way, you do not need to frequently specify set odps.sql.udf.getjsonobj.new=false; for a session. -
-
-
Examples
-
Example 1: Extract information from the JSON object
src_json.json. Sample statement:-- The JSON string src_json.json contains the following content: +----+ json +----+ {"store": {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" } -- Extract the information of the owner field and return amy. select get_json_object(src_json.json, '$.owner') from src_json; -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. select get_json_object(src_json.json, '$.store.fruit[0]') from src_json; -- Extract the information of the non-existent field and return null. select get_json_object(src_json.json, '$.non_exist_key') from src_json; -
Example 2: Extract information from a JSON object of the ARRAY type. Sample statement:
-- The return value is 2222. select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]'); -- The return value is ["h0","h1","h2"]. set odps.sql.udf.getjsonobj.new=true; select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]'); -- The return value is ["h0","h1","h2"]. set odps.sql.udf.getjsonobj.new=false; select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]'); -- The return value is h1. select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]'); -
Example 3: Extract information from a JSON object that includes a period
(.). Sample statement:-- Create a table. create table mf_json (id string, json string); -- Insert data into the table. The key in the data contains a period (.). insert into table mf_json (id, json) values ("1", "{ \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); -- Insert data into the table. The key in the data does not contain a period (.). insert into table mf_json (id, json) values ("2", "{ \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); -- Query the value of id in the JSON object whose key is China.beijing. 0 is returned. Only [''] can be used to specify the key because the key contains a period (.). This way, MaxCompute can parse the key. select get_json_object(json, "$['China.beijing'].school['id']") from mf_json where id =1; -- Query the value of id in the JSON object whose key is China_beijing. 0 is returned. You can use one of the following statements: select get_json_object(json, "$['China_beijing'].school['id']") from mf_json where id =2; select get_json_object(json, "$.China_beijing.school['id']") from mf_json where id =2; -
Example 4: The json parameter is empty or invalid. Sample statement:
-- The return value is null. select get_json_object('','$.array[1][1]'); -- The return value is null. select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]'); -
Example 5: Escape a JSON string. Sample statement:
set odps.sql.udf.getjsonobj.new=true; -- The return value is "1". select get_json_object('{"a":"\\"1\\"","b":"2"}', '$.a'); -- The return value is '1'. select get_json_object('{"a":"\'1\'","b":"2"}', '$.a');
-
INITCAP
-
Syntax
string initcap(<str>) -
Description
Converts str to title case where words are separated by spaces, with the first letter of each word uppercase and the rest lowercase.
-
Parameters
str: required. STRING type. Input string.
-
Return value
Returns a string with the first letter of each word uppercase and the rest lowercase.
-
Example
-- Returns Odps Sql. SELECT initcap("oDps sql");
INSTR
-
Syntax
bigint instr(string <str1>, string <str2>[, bigint <start_position>[, bigint <nth_appearance>]]) -
Description
Finds the position of substring str2 in string str1.
-
Parameters
-
str1: required. STRING type. Target string to search. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING. Other types return an error.
-
str2: required. STRING type. Substring to match. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING. Other types return an error.
-
start_position: optional. A value of the BIGINT type. An error is returned for other data types. This parameter specifies the position of the character in str1 from which the search starts. The default start position is the first character, which is position 1. If start_position is a negative value, the search starts counting backward from the end of the string, where the last character is at position -1, the second-to-last at -2, and so on.
-
nth_appearance: Optional. A value of the BIGINT type that must be greater than 0. This parameter specifies the position of the nth_appearance-th match of str2 in str1. If nth_appearance is of another data type or is less than or equal to 0, an error is returned.
-
-
Return value
Returns a BIGINT type. The return value follows these rules:
-
If str1 does not contain str2, returns 0.
-
If str2 is empty, always matches successfully. Example:
select instr('abc','');returns 1. -
If str1, str2, start_position, or nth_appearance is NULL, returns NULL.
-
-
Examples
-
Example 1: Find position of character
ein stringTech on the net. Sample statement:-- Returns 2. select instr('Tech on the net', 'e'); -
Example 2: Find position of substring
onin stringTech on the net. Sample statement:-- Returns 6. select instr('Tech on the net', 'on'); -
Example 3: Find position of second occurrence of character
ein stringTech on the netstarting from third character. Sample statement:-- Returns 14. select instr('Tech on the net', 'e', 3, 2); -
Example 4: One input parameter is NULL. Sample statement:
-- Returns NULL. select instr('Tech on the net', null);
-
IS_ENCODING
-
Syntax
boolean is_encoding(string <str>, string <from_encoding>, string <to_encoding>) -
Description
Determines if input string str can be converted from character set from_encoding to to_encoding. Also used to detect garbled text—typically set from_encoding to UTF-8 and to_encoding to GBK.
-
Parameters
-
str: required. STRING type. Empty strings are considered valid for any character set.
-
from_encoding, to_encoding: required. STRING type. Source and target character sets.
-
-
Return value
Returns BOOLEAN type. The return value follows these rules:
-
If str converts successfully, returns True; otherwise False.
-
If str, from_encoding, or to_encoding is NULL, returns NULL.
-
-
Examples
-
Example 1: Determine whether the character
测试or測試can be converted from theUTF-8character set to theGBKcharacter set. The following command provides an example.-- Returns true. select is_encoding('测试', 'utf-8', 'gbk'); -- Returns true. select is_encoding('測試', 'utf-8', 'gbk'); -
Example 2: Check if character
Testcan convert fromutf-8togb2312. Sample statement:-- The GB2312 character set does not include these two characters. Returns false. select is_encoding('測試', 'utf-8', 'gb2312'); -
Example 3: One input parameter is NULL. Sample statement:
-- Returns NULL. select is_encoding('Test', null, 'gbk');
-
JSON_TUPLE
-
Syntax
string json_tuple(string <json>, string <key1>, string <key2>,...) -
Description
Extracts strings from a standard JSON string based on a set of input keys, such as
(key1,key2,...). -
Parameters
-
json: required. A value of the STRING type. This parameter specifies a standard JSON string.
-
key: required. A value of the STRING type. This parameter is used to describe the
pathof a JSON object in the JSON string. The value cannot start with a dollar sign ($). You can enter multiple keys at a time. MaxCompute parses JSON objects by using.or['']. If a key in a JSON object includes a period(.),['']can be used.
-
-
Return value
A value of the STRING type is returned.
Note-
If json is empty or invalid, null is returned.
-
If key is empty, invalid, or does not exist in the JSON string, null is returned.
-
If json is valid and key exists, the related string is returned.
-
This function can parse JSON data that contains Chinese characters.
-
This function can parse nested JSON data.
-
This function can parse JSON data that contains nested arrays.
-
The parsing action is equivalent to the execution of GET_JSON_OBJECT along with
set odps.sql.udf.getjsonobj.new=true;. To obtain multiple objects from a JSON string, you must call the GET_JSON_OBJECT function multiple times. As a result, the JSON string is parsed multiple times. The JSON_TUPLE function allows you to enter multiple keys at a time and the JSON string is parsed only once. JSON_TUPLE is more efficient than GET_JSON_OBJECT. -
JSON_TUPLE is a user-defined table-valued function (UDTF). If you want to select some columns from a table, use JSON_TUPLE together with the LATERAL VIEW clause.
-
-
Examples
--Create a table named school. create table school (id string, json string); --Insert data into the table. insert into school (id, json) values ("1", "{ \"school\": \"湖畔大学\", \"地址\":\"杭州\", \"SchoolRank\": \"00\", \"Class1\":{\"Student\":[{\"studentId\":1,\"scoreRankIn3Year\":[1,2,[3,2,6]]}, {\"studentId\":2,\"scoreRankIn3Year\":[2,3,[4,3,1]]}]}}");-
Example 1: Extract information from a JSON object. Sample command:
select json_tuple(school.json,"SchoolRank","Class1") as (item0, item1) from school; --The preceding statement is equivalent to the following statement. select get_json_object(school.json,"$.SchoolRank") item0,get_json_object(school.json,"$.Class1") item1 from school; --The following result is returned. +-------+-------+ | item0 | item1 | +-------+-------+ | 00 | {"Student":[{"studentId":1,"scoreRankIn3Year":[1,2,[3,2,6]]},{"studentId":2,"scoreRankIn3Year":[2,3,[4,3,1]]}]} | +-------+-------+ -
Example 2: Use
['']to extract information from a JSON object. Sample command:select json_tuple(school.json,"school","['Class1'].Student") as (item0, item2) from school where id=1; --The following result is returned. +-------+-------+ | item0 | item2 | +-------+-------+ | 湖畔大学 | [{"studentId":1,"scoreRankIn3Year":[1,2,[3,2,6]]},{"studentId":2,"scoreRankIn3Year":[2,3,[4,3,1]]}] | -
Example 3: Parse JSON data that contains Chinese characters. Sample command:
select json_tuple(school.json,"校名","地址") as (item0,item1) from school; --The following result is returned. +-------+-------+ | item0 | item1 | +-------+-------+ | 湖畔大学 | 杭州 | +-------+-------+ -
Example 4: Parse multi-level nested JSON data. Sample command:
select sc.Id, q.item0, q.item1 from school sc lateral view json_tuple(sc.json,"Class1.Student[*].studentId","Class1.Student[0].scoreRankIn3Year") q as item0,item1; --The following result is returned. +------------+-------+-------+ | id | item0 | item1 | +------------+-------+-------+ | 1 | [1,2] | [1,2,[3,2,6]] | +------------+-------+-------+ -
Example 5: Parse JSON data that contains multi-level nested arrays. Sample command:
select sc.Id, q.item0, q.item1 from school sc lateral view json_tuple(sc.json,"Class1.Student[0].scoreRankIn3Year[2]","Class1.Student[0].scoreRankIn3Year[2][1]") q as item0,item1; --The following result is returned. +------------+-------+-------+ | id | item0 | item1 | +------------+-------+-------+ | 1 | [3,2,6] | 2 | +------------+-------+-------+
-
KEYVALUE
-
Syntax
keyvalue(string <str>,[string <split1>,string <split2>,] string <key>) keyvalue(string <str>,string <key>) -
Description
Splits string str into key-value pairs using split1, separates pairs using split2, and returns the value corresponding to key.
-
Parameters
-
str: required. STRING type. String to split.
-
split1, split2: optional. STRING type. Delimiter strings for splitting. If unspecified, default split1 is
";"and split2 is":". If a substring split by split1 contains multiple split2, the result is undefined. -
key: required. STRING type. After splitting by split1 and split2, returns the value for key.
-
-
Return value
Returns STRING type. The return value follows these rules:
-
If split1 or split2 is NULL, returns NULL.
-
If str or key is NULL or no matching key exists, returns NULL.
-
If multiple key-value pairs match, returns the value for the first matching key.
-
-
Examples
-
Example 1: Split string
0:1/;1:2into key-value pairs and return value for key1. Sample statement:-- Returns 2. select keyvalue('0:1/;1:2', 1);The split1 and split2 parameters are not specified. The default value of split1 is a semicolon (
";") and the default value of split2 is a colon (":").After splitting by split1, key-value pairs are
0:1/,1:2. After splitting by split2:0 1/ 1 2Returns value 2 for key 1.
-
Example 2: Split string
“\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;”using“\;”as split1 and":"as split2, then return value for keytf. Sample statement:-- Returns 21910. select keyvalue("\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;","\;",":","tf");Splitting the string
"\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;"using the delimiter"\;"generates the following key-value pairs.decreaseStore:1,xcard:1,isB2C:1,tf:21910,cart:1,shipping:2,pf:0,market:shoes,instPayAmount:0The result of splitting by
":"is as follows.decreaseStore 1 xcard 1 isB2C 1 tf 21910 cart 1 shipping 2 pf 0 market shoes instPayAmount 0Returns value 21910 for key
tf.
-
KEYVALUE_TUPLE
-
Syntax
KEYVALUE_TUPLE(str, split1, split2, key1, key2, ..., keyN) -
Description
Splits string str into key-value pairs using split1, separates pairs using split2, and returns values for multiple keys.
-
Parameters
-
str: required. STRING type. String to split.
-
split1, split2: required. STRING type. Delimiter strings for splitting. If a substring split by split1 contains multiple split2, the result is undefined.
-
key: required. STRING type. After splitting by split1 and split2, returns the value for key.
-
-
Return value
Returns STRING type. The return value follows these rules:
-
If split1 or split2 is NULL, returns NULL.
-
If str or key is NULL or no matching key exists, returns NULL.
-
-
Example
-- Create table create table mf_user ( user_id string, user_info string ); -- Insert data insert into mf_user values('1','age:18;genda:f;address:abc'),('2','age:20;genda:m;address:bcd'); -- Query SELECT user_id, KEYVALUE(user_info,';',':','age') as age, KEYVALUE(user_info,';',':','genda') as genda, KEYVALUE(user_info,';',':','address') as address FROM mf_user; -- Equivalent to KEYVALUE query SELECT user_id, age, genda, address FROM mf_user LATERAL VIEW KEYVALUE_TUPLE(user_info,';', ':','age','genda','address') ui AS age,genda,address;The result is as follows.
+------------+------------+------------+------------+ | user_id | age | genda | address | +------------+------------+------------+------------+ | 1 | 18 | f | abc | | 2 | 20 | m | bcd | +------------+------------+------------+------------+
LENGTH
-
Syntax
bigint length(string <str>) -
Description
Calculates the length of string str.
-
Parameters
str: required. STRING type. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING.
-
Return value
Returns BIGINT type. The return value follows these rules:
-
If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.
-
If str is NULL, returns NULL.
-
If str is not UTF-8 encoded, returns -1.
-
-
Examples
-
Example 1: Calculate length of string
Tech on the net. Sample statement:-- Returns 15. select length('Tech on the net'); -
Example 2: Input parameter is NULL. Sample statement:
-- Returns NULL. select length(null);
-
LENGTHB
-
Syntax
bigint lengthb(string <str>) -
Description
Calculates the byte length of string str.
-
Parameters
str: required. STRING type. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING.
-
Return value
Returns BIGINT type. The return value follows these rules:
-
If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.
-
If str is NULL, returns NULL.
-
-
Examples
-
Example 1: Calculate byte length of string
Tech on the net. Sample statement:-- Returns 15. select lengthb('Tech on the net'); -
Example 2: Input parameter is NULL. Sample statement:
-- Returns NULL. select lengthb(null);
-
LOCATE
-
Syntax
bigint locate(string <substr>, string <str>[, bigint <start_pos>]) -
Description
Finds the position of str substring substr. Use start_pos to specify the starting position (counting from 1).
-
Parameters
-
substr: required. STRING type. Substring to find.
-
str: required. STRING type. String to search.
-
start_pos: optional. BIGINT type. Starting position for search.
-
-
Return value
Returns BIGINT type. The return value follows these rules:
-
If str doesn't contain substr, returns 0.
-
If str or substr is NULL, returns NULL.
-
If start_pos is NULL, returns 0.
-
-
Examples
-
Example 1: Find position of string
abin stringabchelloabc. Sample statement:-- Returns 1. select locate('ab', 'abchelloabc'); -
Example 2: Find position of string
hiin stringabchelloabc. Sample statement:-- Returns 0. select locate('hi', 'abc,hello,ab,c'); -
Example 3: start_pos is NULL. Sample statement:
-- Returns 0. select locate('ab', 'abhelloabc', null);
-
LPAD
-
Syntax
string lpad(string <str1>, int <length>, string <str2>) -
Description
Left-pads string str1 to length characters using string str2. This function is an extension in MaxCompute V2.0.
-
Parameters
-
str1: required. STRING type. String to left-pad.
-
length: required. INT type. Number of characters for left padding.
-
str2: required. Padding string.
-
-
Return value
Returns STRING type. The return value follows these rules:
-
If length is less than str1's length, returns leftmost length characters of str1.
-
If length is 0, returns empty string.
-
If no input parameters or any input parameter is NULL, returns NULL.
-
-
Examples
-
Example 1: Use the string
12to left-pad the stringabcdefghto 10 characters. The sample command is as follows.-- Returns 12abcdefgh. select lpad('abcdefgh', 10, '12'); -
Example 2: Use the string
12to left-pad the stringabcdefghto a length of 5 characters. The following is a sample command.-- Returns abcde. select lpad('abcdefgh', 5, '12'); -
Example 3: length is 0. Sample statement:
-- Returns empty string. select lpad('abcdefgh' ,0, '12'); -
Example 4: One input parameter is NULL. Sample statement:
-- Returns NULL. select lpad(null ,0, '12');
-
LTRIM
-
Syntax
string ltrim(string <str>[, <trimChars>]) string trim(leading [<trimChars>] from <str>) -
Description
Removes characters from the left end of str:
-
If trimChars is unspecified, removes leading spaces by default.
-
If you specify trimChars, the function treats the characters in trimChars as a set and removes from the left end of str the longest possible substring consisting entirely of characters in the set trimChars.
-
-
Parameters
-
str: required. STRING type. String to trim. If input is BIGINT, DECIMAL, DOUBLE, or DATETIME, it's implicitly converted to STRING.
-
trimChars: optional. STRING type. Characters to remove.
-
-
Return value
Returns STRING type. The return value follows these rules:
-
If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.
-
If str or trimChars is NULL, returns NULL.
-
-
Examples
-
Example 1: Remove leading spaces from string
yxTxyomxx. Sample statements:-- Returns yxTxyomxx. select ltrim(' yxTxyomxx '); -- Equivalent to: select trim(leading from ' yxTxyomxx '); -
Example 2: Remove all leading characters from string
yxTxyomxxthat are in setxy.-- Returns Txyomxx. Any leading x or y is removed. select ltrim('yxTxyomxx', 'xy'); -- Equivalent to: select trim(leading 'xy' from 'yxTxyomxx'); -
Example 3: Input parameter is NULL. Sample statements:
-- Returns NULL. select ltrim(null); select ltrim('yxTxyomxx', null);
-
MASK_HASH
-
Syntax
mask_hash(<expr>) -
Description
Returns a hash value calculated from string expression expr. Identical hash values indicate identical expression values.
-
Parameters
expr: required. String expression for hash calculation. Supports STRING, CHAR, VARCHAR, BINARY types.
-
Return value
Returns a 64-byte hash value. For non-string expressions, returns
null(Hive-compatible). -
Example
-- Returns hash for abc select mask_hash("abc"); -- Returns +------------+ | _c0 | +------------+ | ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad | +------------+ -- Returns NULL for non-string input select mask_hash(100); -- Returns +------------+ | _c0 | +------------+ | NULL | +------------+
MD5
-
Syntax
string md5(string <str>) -
Description
Calculates the MD5 hash of string str.
-
Parameters
str: required. STRING type. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING.
-
Return value
Returns STRING type. The return value follows these rules:
-
If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.
-
If str is NULL, returns NULL.
-
-
Examples
-
Example 1: Calculate MD5 hash of string
Tech on the net. Sample statement:-- Returns ddc4c4796880633333d77a60fcda9af6. select md5('Tech on the net'); -
Example 2: Input parameter is NULL. Sample statement:
-- Returns NULL. select md5(null);
-
PARSE_URL
-
Syntax
string parse_url(string <url>, string <part>[, string <key>]) -
Description
Parses url and extracts information specified by part.
-
Parameters
-
url: required. STRING type. URL link. Invalid URLs return an error.
-
part: required. STRING type. Valid values: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO (case-insensitive).
-
key: optional. When part is QUERY, returns the value for key.
-
-
Return description
Returns STRING type. The return value follows these rules:
-
If url, part, or key is NULL, returns NULL.
-
If part is invalid, returns an error.
-
-
Example
-- Returns example.com. select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'HOST'); -- Returns /over/there/index.dtb. select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'PATH'); -- Returns animal. select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'QUERY', 'type'); -- Returns nose. select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'REF'); -- Returns file. select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'PROTOCOL'); -- Returns username:password@example.com:8042. select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'AUTHORITY'); -- Returns username:password. select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'USERINFO');
PARSE_URL_TUPLE
-
Syntax
string parse_url_tuple(string <url>, string <key1>, string <key2>,...) -
Description
Parses url and extracts strings specified by input keys key1, key2, etc. Similar to PARSE_URL but extracts multiple keys simultaneously with better performance.
-
Parameters
-
url: required. STRING type. URL link. Invalid URLs return an error.
-
key1, key2: required. STRING type. Keys to extract. Valid values:
-
HOST: Host address (domain or IP).
-
PATH: Path to network resource on server.
-
QUERY: Query string (content to query).
-
REF: indicates the fragment identifier of a URL, which is the content after the
#symbol. -
PROTOCOL: Protocol type.
-
AUTHORITY: Server domain/IP, port, and user authentication info (username, password).
-
FILE: Path and query content (PATH + QUERY).
-
USERINFO: User authentication info.
-
QUERY:<KEY>: Value for specified key in query string.
Case-insensitive. Invalid values return an error.
-
-
-
Return Description
Returns STRING type. If url or key is NULL, returns an error.
-
Example
Extract strings for each key from
file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose. Sample statement:select parse_url_tuple('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'HOST', 'PATH', 'QUERY', 'REF', 'PROTOCOL', 'AUTHORITY', 'FILE', 'USERINFO', 'QUERY:type', 'QUERY:name') as (item0, item1, item2, item3, item4, item5, item6, item7, item8, item9);Returns:
+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ | item0 | item1 | item2 | item3 | item4 | item5 | item6 | item7 | item8 | item9 | +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+ | example.com | /over/there/index.dtb | type=animal&name=narwhal | nose | file | username:password@example.com:8042 | /over/there/index.dtb?type=animal&name=narwhal | username:password | animal | narwhal | +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
REGEXP_COUNT
-
Syntax
bigint regexp_count(string <source>, string <pattern>[, bigint <start_position>]) -
Description
Counts substrings in source starting from start_position that match specified pattern.
-
Parameters
-
source: required. STRING type. String to search. Other types return an error.
-
pattern: required. STRING constant or regular expression. Pattern to match. For more regex guidelines, see RLIKE string matching. Empty or invalid pattern returns an error.
-
start_position: optional. A constant of the BIGINT type. The value of this parameter must be greater than 0. If the value is of another data type or is less than or equal to 0, an error is returned. If you do not specify this parameter, the default value is 1. This value indicates that the search starts from the first character of the source string.
-
-
Return value
Returns BIGINT type. The return value follows these rules:
-
If no matches found, returns 0.
-
If source, pattern, or start_position is NULL, returns NULL.
-
-
Examples
-
Example 1: Count matching substrings in
abababcfrom specified position. Sample statements:-- Returns 1. select regexp_count('abababc', 'a.c'); -- Returns 2. select regexp_count('abababc', '[[:alpha:]]{2}', 3); -
Example 2: One input parameter is NULL. Sample statement:
-- Returns NULL. select regexp_count('abababc', null); -
Example 3: Count occurrences of
:in JSON string{"account_id":123456789,"account_name":"allen","location":"hangzhou","bill":100}. Sample statement:-- Returns 4. select regexp_count('{"account_id":123456789,"account_name":"allen","location":"hangzhou","bill":100}',':');
-
REGEXP_EXTRACT
-
Syntax
string regexp_extract(string <source>, string <pattern>[, bigint <groupid>])NoteIn Hive-compatible data type versions,
REGEXP_EXTRACTfollows Java regex specification. In 1.0 and 2.0 data type versions, it follows MaxCompute specification. -
Description
Splits string source into groups using pattern and returns the string from group groupid.
-
Parameters
-
source: required. STRING type. String to split.
-
pattern: required. STRING constant or regular expression. Pattern to match. For more regex guidelines, see RLIKE string matching.
-
groupid: optional. BIGINT constant, must be ≥0.
NoteData is stored in UTF-8 format. Chinese characters can be represented in hexadecimal within range [\\x{4e00},\\x{9fa5}].
-
-
Return value
Returns STRING type. The return value follows these rules:
-
If pattern is an empty string or no group is specified in pattern, an error is returned.
-
groupid is not of the BIGINT type or is less than 0, an error is returned. If you do not specify this parameter, the default value is 1, indicating that the substring in the first group is returned. If groupid is set to 0, the substring that matches the entire pattern is returned.
-
If source, pattern, or groupid is NULL, returns NULL.
-
-
Examples
-
Example 1: Split
foothebarusingfoo(.*?)(bar). Sample statements:-- Returns the. select regexp_extract('foothebar', 'foo(.*?)(bar)'); -- Returns foothebar. select regexp_extract('foothebar', 'foo(.*?)(bar)', 0); -
Example 2: Split
8d99d8using8d(\\d+)d8. Sample statement:-- Returns 99. In MaxCompute client, use two backslashes (\\) as escape characters for regex. select regexp_extract('8d99d8', '8d(\\d+)d8'); -
Example 3: Extract Chinese characters and punctuation from
[Alibaba Cloud]aliyun. Sample statement:-- Returns [Alibaba Cloud]. select regexp_extract('[Alibaba Cloud]aliyun', '([^\\x{00}-\\x{ff}]+)'); -
Example 4: Extract Chinese characters from
[Alibaba Cloud]aliyun. Sample statement:--Returns Alibaba Cloud. select regexp_extract('【Alibaba Cloud】aliyun', '([\\x{4e00}-\\x{9fa5}]+)'); -
Example 5: Extract Chinese characters from
[Alibaba Cloud]aliyunAlibaba Cloud. This complex scenario requires REGEXP_REPLACE. Sample statement:-- Returns [Alibaba Cloud]Alibaba Cloud. select regexp_replace('[Alibaba Cloud]aliyunAlibaba Cloud','([\\x{00}-\\x{ff}])', ''); -
Example 6: No groups defined. Invalid command example:
select regexp_extract('foothebar', 'foothebar');
-
REGEXP_EXTRACT_ALL
-
Syntax
array<T> regexp_extract_all(string <source>, string <pattern>[,bigint <group_id>]) -
Description
Finds all substrings matching a regular expression pattern in a string and returns them as an array.
-
Parameters
-
source: required. STRING type. The string to analyze.
-
pattern: required. STRING type. The pattern to match. This can be a STRING constant or a regular expression. For more regex guidelines, see RLIKE string matching.
-
group_id: optional. A BIGINT value. This parameter specifies the ID of the group used to match the pattern. The value of this parameter must be greater than or equal to
0. If you do not specify this parameter, the group with a group_id of1is used to match the pattern. If you set this parameter to0, the pattern is matched as a whole.
-
-
Return value
An ARRAY value is returned. If you specify group_id, an array containing all matching results for the specified group_id is returned. If you do not specify group_id, an array containing all matching results for group_id
1is returned. -
Example
-
If you do not specify group_id, the default behavior is to return results that match the first group_id.
SELECT regexp_extract_all('100-200, 300-400', '(\\d+)-(\\d+)');Returns:
+------------+ | _c0 | +------------+ | [100,300] | +------------+ -
If you set the group_id value to
2, the matching result for the second group_id is returned.SELECT regexp_extract_all('100-200, 300-400', '(\\d+)-(\\d+)',2);The following is the result.
+------------+ | _c0 | +------------+ | [200,400] | +------------+
-
REGEXP_INSTR
-
Syntax
bigint regexp_instr(string <source>, string <pattern>[,bigint <start_position>[, bigint <occurrence>[, bigint <return_option>]]]) -
Description
Returns the starting or ending position of the specified occurrence of a substring that matches pattern in source, starting from start_position.
-
Parameters
-
source: Required. The source string. The data type must be STRING.
-
pattern: Required. The pattern to match. The value can be a STRING constant or a regular expression. For more information about regular expressions, see RLIKE string matching. An empty pattern returns an error.
-
start_position: Optional. The position from which the search starts. The value must be a BIGINT constant. The default value is 1.
-
occurrence: Optional. The occurrence of the match to find. The value must be a BIGINT constant. The default value is 1, which indicates the first match.
-
return_option: Optional. The position to return. The value must be a BIGINT constant. Valid values: 0 specifies the start position and 1 specifies the end position. The default value is 0. If you specify an invalid value, an error is returned.
-
-
Return value
The return value is of the BIGINT type. The return_option parameter specifies whether to return the starting or ending position of the matched substring in source. The return value is determined by the following rules:
-
If pattern is empty, an error is returned.
-
If the value of start_position or occurrence is not a BIGINT constant or is less than or equal to 0, an error is returned.
-
If the value of source, pattern, start_position, occurrence, or return_option is NULL, NULL is returned.
-
-
Examples
-
Example 1: Return the starting position of the
2nd occurrence of a substring that matcheso[[:alpha:]]{1}in the stringi love www.taobao.com, starting from position3. Sample statement:-- Returns 14. select regexp_instr('i love www.taobao.com', 'o[[:alpha:]]{1}', 3, 2); -
Example 2: Return the ending position of the
secondoccurrence of a substring that matcheso[[:alpha:]]{1}in the stringi love www.taobao.com. The search starts from thethirdcharacter. Sample statement:-- Returns 16. select regexp_instr('i love www.taobao.com', 'o[[:alpha:]]{1}', 3, 2, 1); -
Example 3: An input parameter is NULL. Sample statement:
-- Returns NULL. select regexp_instr('i love www.taobao.com', null, 3, 2);
-
REGEXP_REPLACE
-
Syntax
string regexp_replace(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])NoteFor Hive-compatible data types, the
REGEXP_REPLACEfunction follows the Java regular expression specification. For data types 1.0 and 2.0, it follows the MaxCompute specification. -
Description
Replaces the substring that matches the pattern at the position specified by occurrence in the source string with replace_string and returns the result.
-
Parameters
-
source: Required. A value of the STRING type. This is the source string to be modified.
-
pattern: Required. A STRING constant or a regular expression. This is the pattern to match. For more information about regular expression rules, see RLIKE string matching. If the pattern is empty, an error is returned.
-
replace_string: Required. A value of the STRING type. This is the string that replaces the substring that matches the pattern.
Note-
If replace_string is an empty string, the function deletes the substring that matches the pattern.
-
The replace_string can contain backreferences in the format of \n to insert the substring that matches the nth capturing group in the pattern. The value of n can be an integer from 1 to 9. \0 inserts the entire substring that matches the pattern. You must escape backslashes, such as \\1, or use a raw string, such as R'(\1)'.
-
-
occurrence: Optional. A constant of the BIGINT type, which must be greater than or equal to 0. The value of this parameter indicates that the string that matches the specified pattern at the nth occurrence specified by occurrence is replaced with replace_string. If this parameter is set to 0, all the substrings that match the specified pattern are replaced. If it is of another data type or is less than 0, an error is returned. Default value: 0.
-
-
Return value
A value of the STRING type is returned. The return value is determined by the following rules:
-
If you reference a group that does not exist, the result is undefined.
-
If replace_string is NULL and a match for the pattern is found, NULL is returned.
-
If replace_string is NULL and pattern does not match, the function returns the original string.
-
If source, pattern, or occurrence is NULL, NULL is returned.
-
-
Examples
-
Example 1: Replace substrings based on specified rules.
-- Returns Abcd. select regexp_replace("abcd", "a", "A", 0); -- Returns bcd. select regexp_replace("abcd", "a", "", 0); -- Returns 19700101. select regexp_replace("1970-01-01", "-", "", 0); -- Returns abc. select regexp_replace("a1b2c3", "[0-9]", "", 0); -- Returns a1b2c. select regexp_replace("a1b2c3", "[0-9]", "", 3); -
Example 2: Replace all matches for the pattern
([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})in the string123.456.7890with(\\1)\\2-\\3. Sample statement:-- Returns (123)456-7890. select regexp_replace('123.456.7890', '([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})', '(\\1)\\2-\\3', 0); -
Example 3: Replace substrings that match a pattern in the string
abcdbased on specified rules. Sample statements:-- Returns a b c d. select regexp_replace('abcd', '(.)', '\\1 ', 0); -- Returns a bcd. select regexp_replace('abcd', '(.)', '\\1 ', 1); -- Returns d. select regexp_replace("abcd", "(.*)(.)$", "\\2", 0); -
Example 4: The url_set table contains a URL column where the values are in the format
www.simple@xxx.com. Thexxxpart is unique for each row. This example replaces all characters afterwww. Sample statement:-- Returns wwwtest. select regexp_replace(url,'(www)(.*)','wwwtest',0) from url_set; -
Example 5: An input parameter is NULL. Sample statement:
-- Returns NULL. select regexp_replace('abcd', '(.)', null, 0); -
Example 6: Reference a group that does not exist. Sample statements:
-- Pattern defines only one group; second group reference doesn't exist. -- Avoid this usage; result is undefined. regexp_replace("abcd", "(.)", "\\2", 0) = "" or "abcd" -- Pattern has no groups; \1 references non-existent group. -- Avoid this usage; result is undefined. regexp_replace("abcd", "a", "\\1", 0) = "bcd" or "abcd"
-
REGEXP_SUBSTR
-
Syntax
string regexp_substr(string <source>, string <pattern>[, bigint <start_position>[, bigint <occurrence>]]) -
Description
Returns the substring from the start_position in the source string that matches the pattern for the occurrence-th time.
-
Parameters
-
source: Required. STRING type. The string to search.
-
pattern: Required. STRING constant or regular expression. The pattern to match. Regular expression syntax is described in RLIKE string matching.
-
start_position: Optional. BIGINT constant. The value must be greater than 0. If you do not specify this parameter, the default value is 1, which indicates that the search starts from the first character of the source string.
-
occurrence: Optional. BIGINT constant. The value must be greater than 0. The default value is 1, which indicates the first match.
-
-
Return value
Returns a STRING. The return value follows these rules:
-
If the pattern is empty, an error occurs.
-
If no match is found, returns NULL.
-
If start_position or occurrence is not a BIGINT or is less than or equal to 0, an error occurs.
-
If source, pattern, start_position, or occurrence is NULL, returns NULL.
-
-
Examples
-
Example 1: Return the substring that matches the specified pattern in
I love aliyun very much. Sample statement:-- Returns aliyun. select regexp_substr('I love aliyun very much', 'a[[:alpha:]]{5}'); -- Returns have. select regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 1); -- Returns 2. select regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 2); -
Example 2: One input parameter is NULL. Sample statement:
-- Returns NULL. select regexp_substr('I love aliyun very much', null);
-
REPEAT
-
Syntax
string repeat(string <str>, bigint <n>) -
Description
Returns the string str repeated n times.
-
Parameters
-
str: required. STRING type. If the input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it is implicitly converted to STRING.
-
n: required. BIGINT type. The value size must not exceed 2 MB.
-
-
Return value
Returns a value of STRING type. The return value follows these rules:
-
If str is not of type STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.
-
If n is empty, an error is returned.
-
If either str or n is NULL, NULL is returned.
-
-
Examples
-
Example 1: Repeat the string
abc5times. Sample statement:-- Returns abcabcabcabcabc. select repeat('abc', 5); -
Example 2: One input parameter is NULL. Sample statement:
-- Returns NULL. select repeat('abc', null);
-
REPLACE
-
Syntax
string replace(string <str>, string <old>, string <new>) -
Description
Replaces all occurrences of old with new in str and returns the modified str. If no matches are found, the original str is returned. This function is a MaxCompute V2.0 extension function.
-
Parameters
-
str: required. STRING type. The source string.
-
old: Required. The string to compare.
-
new: required. STRING type. The replacement string.
-
-
Return value
Returns a value of the STRING type. If any input parameter is NULL, NULL is returned.
-
Examples
-
Example 1: Replace all occurrences of the substring
ababwith12in the stringababab. Sample statement:-- Returns 12ab. select replace('ababab','abab','12'); -
Example 2: One of the input parameters is NULL. Sample statement:
-- Returns NULL. select replace('123abab456ab',null,'abab');
-
REVERSE
-
Syntax
string reverse(string <str>) -
Description
Returns the reverse of the string str.
-
Parameters
str: Required. A parameter of the STRING type. If the input is a BIGINT, DOUBLE, DECIMAL, or DATETIME value, it is implicitly converted to the STRING type.
-
Return value
Returns a value of the STRING type. The following rules apply:
-
If the data type of str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.
-
If str is NULL, NULL is returned.
-
-
Examples
-
Example 1: Reverse the string
I love aliyun very much. Sample statement:-- Returns hcum yrev nuyila evol I. select reverse('I love aliyun very much'); -
Example 2: The input parameter is NULL. Sample statement:
-- Returns NULL. select reverse(null);
-
RPAD
-
Syntax
string rpad(string <str1>, int <length>, string <str2>) -
Description
Right-pads the string str1 to length characters using the padding string str2. This function is an extension in MaxCompute V2.0.
-
Parameters
-
str1: required. STRING type. The string to right-pad.
-
length: required. INT type. Must be greater than or equal to 0. Specifies the number of characters in the padded result.
-
str2 (required): The string used for padding.
-
-
Return value
Returns STRING type. The return value follows these rules:
-
If length is less than str1's length, returns leftmost length characters of str1.
-
If length is 0, returns empty string.
-
If no input parameters or any input parameter is NULL, returns NULL.
-
-
Examples
-
Example 1: Right-pad the string
abcdefghto 10 characters using the padding string12. Sample statement:-- Returns abcdefgh12. select rpad('abcdefgh', 10, '12'); -
Example 2: Right-pad the string
abcdefghto 5 characters using the padding string12. Sample statement:-- Returns abcde. select rpad('abcdefgh', 5, '12'); -
Example 3: length is 0. Sample statement:
-- Returns empty string. select rpad('abcdefgh' ,0, '12'); -
Example 4: One input parameter is NULL. Sample statement:
-- Returns NULL. select rpad(null ,0, '12');
-
RTRIM
-
Syntax
string rtrim(string <str>[, <trimChars>]) string trim(trailing [<trimChars>] from <str>) -
Description
Removes characters from the right end of str:
-
If trimChars is not specified, spaces are trimmed by default.
-
If trimChars is specified, the characters in trimChars are treated as a collection. The longest substring from the right end of str that consists entirely of characters from the trimChars collection is removed.
-
-
Parameters
-
str: required. STRING type. The string to trim. If the input is BIGINT, DECIMAL, DOUBLE, or DATETIME, it is implicitly converted to STRING.
-
trimChars: optional. STRING type. The characters to remove.
-
-
Return value
Returns STRING type. The return value follows these rules:
-
If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.
-
If str or trimChars is NULL, returns NULL.
-
-
Examples
-
Example 1: Remove trailing spaces from the string
yxTxyomxx. Sample statements:-- Returns yxTxyomxx. select rtrim(' yxTxyomxx '); -- Equivalent to: select trim(trailing from ' yxTxyomxx '); -
Example 2: Remove all trailing characters from the string
yxTxyomxxthat belong to the setxy.-- Returns yxTxyom. Any trailing x or y is removed. select rtrim('yxTxyomxx', 'xy'); -- Equivalent to: select trim(trailing 'xy' from 'yxTxyomxx'); -
Example 3: Input parameter is NULL. Sample statements:
-- Returns NULL. select rtrim(null); select rtrim('yxTxyomxx', null);
-
SOUNDEX
-
Syntax
string soundex(string <str>) -
Description
Converts a standard string to a SOUNDEX string.
-
Parameters
str: Required. The string to convert. This parameter is of STRING type. This function is an extension in MaxCompute V2.0.
-
Return value
The return value is of STRING type. If str is NULL, the function returns NULL.
-
Examples
-
Example 1: Converts the string
helloto a SOUNDEX string. Example statement:-- Returns H400. select soundex('hello'); -
Example 2: The input parameter is NULL. Example statement:
-- Returns NULL. select soundex(null);
-
SPACE
-
Syntax
string space(bigint <n>) -
Description
Generates a string that consists of n spaces.
-
Parameters
n: required. BIGINT type. The value size must not exceed 2 MB.
-
Return value
Returns a STRING. The return value follows these rules:
-
If n is empty, an error is returned.
-
If n is NULL, NULL is returned.
-
-
Example
-- Returns 10. select length(space(10));
SPLIT
-
Syntax
ARRAY<STRING> SPLIT(STRING <source>, STRING <delimiter>[, BOOLEAN <trimTailEmpty>]) -- Standard example. -- Returns ["a","b","c"]. SELECT SPLIT('a,b,c', ','); -
Parameters
-
source: Required. The string to split. The data type is STRING.
delimiter: Required. The separator that is used to split the string. This parameter supports regular expressions. The data type is STRING.
trimTailEmpty: Optional. Specifies whether to keep trailing empty strings. The default value is
true. If you set this parameter tofalse, trailing empty strings are kept. The data type is BOOLEAN.
-
-
Return value
Returns a value of the ARRAY<STRING> type.
-
Example
-- The return value is ["a"," b"," c"]. select split("a, b, c", ","); -- No empty string is returned by default. select split("a, b, c,,", ","); -- The following result is returned: +------------+ | _c0 | +------------+ | ["a"," b"," c"] | +------------+ -- If you need to return an empty string, execute the following statement: select split("a, b, c,,", ",", false); -- The following result is returned: +------------+ | _c0 | +------------+ | ["a"," b"," c","",""] | +------------+
SPLIT_PART
-
Syntax
string split_part(string <str>, string <separator>, bigint <start>[, bigint <end>]) -
Description
Splits the string str using the delimiter separator and returns the substring from part start to end (closed interval).
-
Parameters
-
str: required. STRING type. The string to split. If the input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it is implicitly converted to STRING.
-
separator: required. STRING constant. The delimiter (a single character or string).
-
start: required. BIGINT constant, must be greater than 0. The starting part number (counting from 1).
-
end: a BIGINT constant greater than or equal to start. This parameter specifies the ending segment number to return. If this parameter is not specified, its value defaults to the value of start, and only the segment specified by start is returned.
-
-
Return Value Description
Returns a STRING. The return value follows these rules:
-
If start exceeds the actual number of parts (for example, there are 6 parts but start > 6), an empty string is returned.
-
If separator is not found in str and start = 1, the entire str is returned. If str is empty, an empty string is returned.
-
If separator is empty, the original str is returned.
-
If end exceeds the number of parts, the substring from start to the last part is returned.
-
If str is not of type STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.
-
If separator is not a STRING constant, an error is returned.
-
If start or end is not a BIGINT constant, an error is returned.
-
If any parameter except separator is NULL, NULL is returned.
-
-
Examples
-
Example 1: Split the string
a,b,c,dusing the delimiter,and return the specified parts. Sample statements:-- Returns a. select split_part('a,b,c,d', ',', 1); -- Returns a,b. select split_part('a,b,c,d', ',', 1, 2); -
Example 2: start exceeds the actual number of parts. Sample statement:
-- Returns empty string. select split_part('a,b,c,d', ',', 10); -
Example 3: separator is not found in str. Sample statements:
-- Returns a,b,c,d. select split_part('a,b,c,d', ':', 1); -- Returns empty string. select split_part('a,b,c,d', ':', 2); -
Example 4: separator is empty. Sample statement:
-- Returns a,b,c,d. select split_part('a,b,c,d', '', 1); -
Example 5: end exceeds the actual number of parts. Sample statement:
-- Returns b,c,d. select split_part('a,b,c,d', ',', 2, 6); -
Example 6: Any input parameter except separator is NULL. Sample statement:
-- Returns NULL. select split_part('a,b,c,d', ',', null);
-
SUBSTR
-
Syntax
string substr(string <str>, bigint <start_position>[, bigint <length>]) -
Description
Returns a substring of str starting from start_position with a length of length.
-
Parameters
-
str: required. STRING type. If the input is BIGINT, DECIMAL, DOUBLE, or DATETIME, it is implicitly converted to STRING.
-
start_position: required. BIGINT type. The default start position is 1.
-
Hive-compatible data type version: If start_position=0, it behaves the same as position 1.
-
1.0 and 2.0 data type versions: If start_position=0, it returns NULL.
-
-
length: optional. BIGINT type. The substring length must be >0.
Important-
If
setproject odps.function.strictmode=false: a length<0 returns blank. -
If
setproject odps.function.strictmode=true: a length<0 returns an error.
-
-
-
Return value
Returns STRING type. The return value follows these rules:
-
If str is not STRING, BIGINT, DECIMAL, DOUBLE, or DATETIME, it returns an error.
-
If length is not BIGINT or ≤0, it returns an error.
-
If you omit length, the substring extends to the end of str.
-
If str, start_position, or length is NULL, it returns NULL.
-
-
Examples
-
Example 1: Return a substring of
abcfrom the specified position with the specified length. Sample statements:-- Returns bc. select substr('abc', 2); -- Returns b. select substr('abc', 2, 1); -- Returns bc. select substr('abc',-2 , 2); -
Example 2: When an input parameter is NULL. Sample statement:
-- Returns NULL. select substr('abc', null);
-
SUBSTRING
-
Syntax
string substring(string|binary <str>, int <start_position>[, int <length>]) -
Description
Returns a substring of str starting from start_position with length.
-
Parameters
-
str: required. This parameter is of STRING or BINARY type.
-
start_position: required. This parameter is of INT type. The starting position is 1-based. If start_position is 0, an empty string is returned. If start_position is a negative value, the position is counted backward from the end of the string. In this case, -1 refers to the last character, and counting proceeds backward sequentially.
-
length: optional. This parameter is of BIGINT type. The substring length must be greater than 0.
-
-
Return value
The return value is of STRING type. It adheres to the following rules:
-
If str is not of STRING or BINARY type, an error is returned.
-
If length is not of BIGINT type or is less than or equal to 0, an error is returned.
-
When the length parameter is omitted, a substring up to the end of str is returned.
-
If str, start_position, or length is NULL, NULL is returned.
-
-
Examples
-
Example 1: This example returns a substring of
abcfrom a specified position with a specified length. Sample statements:-- Returns bc. select substring('abc', 2); -- Returns b. select substring('abc', 2, 1); -- Returns bc. select substring('abc',-2,2); -- Returns ab. select substring('abc',-3,2); -- Returns 001. substring(bin(2345), 2, 3); -
Example 2: This example shows a case where one input parameter is NULL. Sample statement:
-- Returns NULL. select substring('abc', null, null);
-
SUBSTRING_INDEX
-
Syntax
string substring_index(string <str>, string <separator>, int <count>) -
Description
This function extracts a substring from the str string, either before or after the count-th occurrence of a delimiter. If count is positive, the search for the delimiter starts from the left. If count is negative, the search starts from the right. This function is a MaxCompute V2.0 extension function.
-
Parameters
-
str: required. A STRING type. This is the string to extract from.
-
separator: required. A STRING type delimiter.
-
count: Required. An INT value that specifies the position of the separator.
-
-
Return value
The function returns a value of STRING type. If any input parameter is NULL, the function returns NULL.
-
Examples
-
Example 1: Extracts from the string
https://help.aliyun.com. Sample statements:-- Returns https://help.aliyun. select substring_index('https://help.aliyun.com', '.', 2); -- Returns aliyun.com. select substring_index('https://help.aliyun.com', '.', -2); -
Example 2: An input parameter is NULL. Sample statement:
-- Returns NULL. select substring_index('https://help.aliyun.com', null, 2);
-
TO_CHAR
-
Syntax
string to_char(boolean <value>) string to_char(bigint <value>) string to_char(double <value>) string to_char(decimal <value>) -
Description
Converts values of type BOOLEAN, BIGINT, DECIMAL, or DOUBLE to their STRING representations.
-
Parameters
value: required. A value of type BOOLEAN, BIGINT, DECIMAL, or DOUBLE.
-
Return value
Returns a value of type STRING. The return value follows these rules:
-
If value is not of type BOOLEAN, BIGINT, DECIMAL, or DOUBLE, the function returns an error.
-
If value is NULL, the function returns NULL.
-
-
Examples
-
Example 1: Convert values to STRING representation. Sample statement:
-- Returns 123. select to_char(123); -- Returns TRUE. select to_char(true); -- Returns 1.23. select to_char(1.23); -
Example 2: Input parameter is NULL. Sample statement:
-- Returns NULL. select to_char(null);
-
TO_JSON
-
Syntax
string to_json(<expr>) -
Description
Converts an expression specified by expr of a given complex data type into a JSON string.
-
Parameters
expr: required. An expression of the ARRAY, MAP, or STRUCT type.
NoteIf the input expression is of the STRUCT type (
struct<key1:value1, key2:value2>), take note of the following points:-
All keys are converted into lowercase letters when you convert the expression into a JSON string.
-
If a
valueis null, the key-value pair to which thevaluebelongs is not included in the JSON string that is returned. For example, ifvalue2is null,key2:value2is not included in the JSON string that is returned.
-
-
Return value
A JSON string is returned.
-
Examples
-
Example 1: Convert an expression of a given complex data type into a JSON string. Sample statement:
-- The return value is {"a":1,"b":2}. select to_json(named_struct('a', 1, 'b', 2)); -- The return value is {"time":"26/08/2015"}. select to_json(named_struct('time', "26/08/2015")); -- The return value is [{"a":1,"b":2}]. select to_json(array(named_struct('a', 1, 'b', 2))); -- The return value is {"a":{"b":1}}. select to_json(map('a', named_struct('b', 1))); -- The return value is {"a":1}. select to_json(map('a', 1)); -- The return value is [{"a":1}]. select to_json(array((map('a', 1)))); -
Example 2: The input expression is of the STRUCT type. Sample statement:
-- The return value is {"a":"B"}. If the expression of the STRUCT type is converted into a JSON string, all keys are converted into lowercase letters. select to_json(named_struct("A", "B")); -- The return value is {"k2":"v2"}. The key-value pair to which null belongs is not included in the JSON string that is returned. select to_json(named_struct("k1", cast(null as string), "k2", "v2"));
-
TOLOWER
-
Syntax
string tolower(string <source>) -
Description
Converts uppercase English letters in the string source to lowercase.
-
Parameters
source: required. A value of the STRING type. If the input is a BIGINT, DOUBLE, DECIMAL, or DATETIME value, it is implicitly converted to the STRING type. This function operates only on English characters.
-
Return value
Returns a value of the STRING type. The returned value is based on the following rules:
-
If the data type of source is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.
-
If source is NULL, NULL is returned.
-
-
Examples
-
Example 1: Convert uppercase letters to lowercase. Sample statements:
-- Returns abcd. select tolower('aBcd'); -- Returns 中国fighting. select tolower('中国Fighting'); -
Example 2: The input parameter is NULL. Sample statement:
-- Returns NULL. select tolower(null);
-
TOUPPER
-
Syntax
string toupper(string <source>) -
Description
Converts the lowercase characters in the source string to their corresponding uppercase characters.
-
Parameters
source: required. STRING type. If the input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it is implicitly converted to STRING. Supports only English characters.
-
Return value
Returns a value of the STRING type. The returned value is based on the following rules:
-
If the data type of source is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.
-
If source is NULL, NULL is returned.
-
-
Examples
-
Example 1: Convert lowercase letters to uppercase. Sample statements:
-- Returns ABCD. select toupper('aBcd'); -- Returns ChinaFIGHTING. select toupper('中国Fighting'); -
Example 2: The input parameter is NULL. Sample statement:
-- Returns NULL. select toupper(null);
-
TRANSLATE
-
Syntax
string translate(string|varchar <str1>, string|varchar <str2>, string|varchar <str3>) -
Description
Replaces each character in str1 that appears in str2 with the corresponding character in str3. If no match is found, no replacement occurs. This function is an extension in MaxCompute V2.0.
-
Return value
Returns a STRING. If any input parameter is NULL, the function returns NULL.
-
Examples
-
Example 1: Replace each character in
abababthat appears inababwith the corresponding character incd. Sample statement:-- Returns cdcdcd. select translate('ababab','abab','cd'); -
Example 2: Replace each character in
abababthat appears inababwith the corresponding character incdefg. Sample statement:-- Returns cdcdcd. select translate('ababab','abab','cdefg'); -
Example 3: One input parameter is NULL. Sample statement:
-- Returns NULL. select translate('ababab','cd',null);
-
TRIM
-
Syntax
string trim(string <str>[,<trimChars>]) string trim([BOTH] [<trimChars>] from <str>) -
Description
Removes characters from both ends of str:
-
If you do not specify trimChars, the function removes spaces by default.
-
If you specify trimChars, the function treats the characters in trimChars as a set and removes the longest possible substrings consisting only of characters from this set from both the left and right ends of str.
-
-
Parameters
-
str: required. STRING type. The string to trim. If the input is BIGINT, DECIMAL, DOUBLE, or DATETIME, it is implicitly converted to STRING.
-
trimChars: optional. STRING type. The characters to remove.
-
-
Return value
Returns STRING type. The return value follows these rules:
-
If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.
-
If str or trimChars is NULL, returns NULL.
-
-
Examples
-
Example 1: Remove spaces from both ends of the string
yxTxyomxx. Sample statements:-- Returns yxTxyomxx. select trim(' yxTxyomxx '); -- Equivalent to: select trim(both from ' yxTxyomxx '); select trim(from ' yxTxyomxx '); -
Example 2: Remove all characters from both ends of the string
yxTxyomxxthat belong to the setxy.-- Returns Txyom. Any x or y on either end is removed. select trim('yxTxyomxx', 'xy'); -- Equivalent to: select trim(both 'xy' from 'yxTxyomxx'); select trim('xy' from 'yxTxyomxx'); -
Example 3: Input parameter is NULL. Sample statements:
-- Returns NULL. select trim(null); select trim('yxTxyomxx', null);
-
URL_DECODE
-
Syntax
string url_decode(string <input>[, string <encoding>]) -
Description
Converts an input string from
application/x-www-form-urlencoded MIMEformat to standard characters. This function is the inverse ofurl_encode. The encoding rules are as follows:-
a–z and A–Z remain unchanged.
-
Period (.), hyphen (-), asterisk (*), and underscore (_) remain unchanged.
-
Plus sign (+) converts to space.
-
Sequences in
%xyformat are converted to byte values. Consecutive bytes are decoded into strings using the specified encoding name. -
All other characters remain unchanged.
-
-
Parameters
-
input: This parameter is required and is of STRING type. It represents the input string.
-
encoding: This parameter is optional and specifies the encoding format. It supports formats such as GBK and UTF-8. If unspecified, the default encoding is UTF-8.
-
-
Return value
This function returns a UTF-8 encoded STRING type. If the input or encoding parameter is NULL, the function returns NULL.
-
Example
-- Returns examplefor url_decode:// (fdsf). select url_decode('%E7%A4%BA%E4%BE%8Bfor+url_decode%3A%2F%2F+%28fdsf%29'); -- Returns Example for URL_DECODE:// dsf(fasfs). select url_decode('Example+for+url_decode+%3A%2F%2F+dsf%28fasfs%29', 'GBK');
URL_ENCODE
-
Syntax
string url_encode(string <input>[, string <encoding>]) -
Description
Encodes an input string in
application/x-www-form-urlencoded MIMEformat. Encoding rules:-
Letters a–z and A–Z remain unchanged.
-
The period (.), hyphen (-), asterisk (*), and underscore (_) remain unchanged.
-
Spaces convert to plus signs (+).
-
All other characters are converted to byte values using the specified encoding. Each byte appears as
%xy, wherexyis the character’s hexadecimal representation.
-
-
Parameters
-
input: Required. STRING type. The string to input.
-
encoding: optional. The encoding format. Supported formats include GBK and UTF-8. The default is UTF-8 if unspecified.
-
-
Return value
Returns a STRING. If input or encoding is NULL, returns NULL.
-
Example
-- Returns %E7%A4%BA%E4%BE%8Bfor+url_encode%3A%2F%2F+%28fdsf%29. select url_encode('examplefor url_encode:// (fdsf)'); -- Returns Example+for+url_encode+%3A%2F%2F+dsf%28fasfs%29. select url_encode('Example for url_encode:// dsf(fasfs)', 'GBK');
References
To extend beyond built-in functions, create user-defined functions (UDFs). For more information, see Overview of MaxCompute UDFs.