String functions

更新时间:
复制 MD 格式

MaxCompute provides string functions for truncating, concatenating, converting, comparing, and searching string data in tables.

MaxCompute SQL supports the following string functions.

Function

Description

ASCII

Returns the ASCII code of the first character in a string.

CHAR_MATCHCOUNT

Calculates the number of characters from string A that appear in string B.

CHR

Converts an ASCII code to a character.

CONCAT

Concatenates strings.

CONCAT_WS

Concatenates all input strings using a specified delimiter.

DECODE

Decodes a string using a specified encoding format.

ENCODE

Encodes a string using a specified encoding format.

FIND_IN_SET

Finds the position of a specified string in a comma-separated string.

FORMAT_NUMBER

Converts a number to a string in a specified format.

FROM_CHARSET

Converts binary data in a specified encoding format to a UTF-8 string.

FROM_JSON

Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and output format.

GET_JSON_OBJECT

Extracts a specified string from a standard JSON string using a specified method.

INITCAP

Converts a string to title case where each word starts with an uppercase letter followed by lowercase letters, separated by spaces.

INSTR

Finds the position of string A in string B.

IS_ENCODING

Determines whether a string can be converted from character set A to character set B.

JSON_TUPLE

Extracts strings corresponding to a set of input keys from a standard JSON string.

KEYVALUE

Splits a string into key-value pairs, separates the pairs, and returns the value corresponding to a specified key.

KEYVALUE_TUPLE

Splits a string into multiple key-value pairs, separates the pairs, and returns values corresponding to multiple specified keys.

LENGTH

Calculates the length of a string.

LENGTHB

Calculates the byte length of a string.

LOCATE

Finds the position of a specified substring within a string.

LPAD

Left-pads a string to a specified length.

LTRIM

Removes characters from the left end of a string.

MASK_HASH

Returns a hash value calculated from a string expression.

MD5

Calculates the MD5 hash of a string.

PARSE_URL

Parses a URL and returns information about a specified part.

PARSE_URL_TUPLE

Parses a URL and returns information about multiple parts.

REGEXP_COUNT

Counts substrings that match a specified pattern starting from a specified position.

REGEXP_EXTRACT

Splits a string into groups based on a specified pattern and returns the string from a specified group.

REGEXP_EXTRACT_ALL

Finds all substrings matching a regular expression pattern in a string and returns them as an array.

REGEXP_INSTR

Returns the start or end position of a substring that matches a specified pattern for a specified occurrence starting from a specified position.

REGEXP_REPLACE

Replaces substrings matching a specified pattern for a specified occurrence with another string.

REGEXP_SUBSTR

Returns a substring that matches a specified pattern for a specified occurrence starting from a specified position.

REPEAT

Returns a string repeated a specified number of times.

REPLACE

Replaces substrings matching a specified string with another string.

REVERSE

Returns the reverse of a string.

RPAD

Right-pads a string to a specified length.

RTRIM

Removes characters from the right end of a string.

SOUNDEX

Converts a standard string to a SOUNDEX string.

SPACE

Generates a string of spaces.

SPLIT

Splits a string using a delimiter and returns an array.

SPLIT_PART

Splits a string using a delimiter and returns a specified part.

SUBSTR

Returns a substring of a STRING-type string starting from a specified position with a specified length.

SUBSTRING

Returns a substring of a STRING or BINARY-type string starting from a specified position with a specified length.

SUBSTRING_INDEX

Extracts a substring before or after a specified occurrence of a delimiter.

TO_CHAR

Converts a BOOLEAN, BIGINT, DECIMAL, or DOUBLE value to its STRING representation.

TO_JSON

Converts a specified complex type to a JSON string.

TOLOWER

Converts English uppercase letters in a string to lowercase.

TOUPPER

Converts English lowercase letters in a string to uppercase.

TRANSLATE

Replaces all occurrences of string A in string B with string C.

TRIM

Removes characters from both ends of a string.

URL_DECODE

Converts a string from application/x-www-form-urlencoded MIME format to standard characters.

URL_ENCODE

Encodes a string in application/x-www-form-urlencoded MIME format.

Usage notes

MaxCompute V2.0 provides additional functions. If the functions that you use involve new data types that are supported in the MaxCompute V2.0 data type edition, you must execute the SET statement to enable the MaxCompute V2.0 data type edition. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.

  • Session level: To use the MaxCompute V2.0 data type edition, you must add set odps.sql.type.system.odps2=true; before the SQL statement that you want to execute, and commit and execute them together.

  • Project level: The project owner can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. To enable the MaxCompute V2.0 data type edition at the project level, run the following command:

    setproject odps.sql.type.system.odps2=true; 

    For more information about setproject, see Project operations. For more information about the precautions that you must take when you enable the MaxCompute V2.0 data type edition at the project level, see Data type version guide.

Limitations

The following functions support only English characters:

  • TRIM/RTRIM/LTRIM: The trimChars parameter supports only English characters.

  • REVERSE: Supports only English characters in Hive mode.

  • SOUNDEX: Converts only English characters.

  • TOLOWER: Converts English uppercase letters in a string to lowercase.

  • TOUPPER: Converts English lowercase letters in a string to uppercase.

  • INITCAP: Converts the first English letter of each word to uppercase and the rest to lowercase.

ASCII

  • Syntax

    bigint ascii(string <str>)
  • Description

    Returns the ASCII code of the first character in the string str.

  • Parameters

    str: required. STRING type. If the input is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted to the STRING type before calculation.

  • Return value

    Returns a value of the BIGINT type. The return value follows these rules:

    • If str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.

    • If str is NULL, NULL is returned.

  • Examples

    • Example 1: Return the ASCII code of the first character in the string abcde. Sample statement:

      -- Returns 97.
      select ascii('abcde'); 
    • Example 2: The input parameter is NULL. Sample statement:

      -- Returns NULL.
      select ascii(null);

CHAR_MATCHCOUNT

  • Syntax

    bigint char_matchcount(string <str1>, string <str2>)
  • Description

    Calculates how many characters in str1 appear in str2.

  • Parameters

    str1 and str2: required. STRING type. Both strings must be valid UTF-8 strings. If invalid characters (non-Unicode encoded) are found during comparison, a negative value is returned.

  • Return value

    Returns a value of the BIGINT type. If str1 or str2 is NULL, NULL is returned.

  • Examples

    • Example 1: Calculate how many characters from the string aabc appear in abcde. Sample statement:

      -- Returns 4.
      select char_matchcount('aabc','abcde');
    • Example 2: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select char_matchcount(null,'abcde');

CHR

  • Syntax

    string chr(bigint <ascii>)
  • Description

    Converts a specified ASCII code to a character.

  • Parameters

    ascii: required. BIGINT type ASCII value. Valid values: 0 to 127. If the input is of the STRING, DOUBLE, or DECIMAL type, it is implicitly converted to the BIGINT type before calculation.

  • Return value

    Returns a value of the STRING type. The return value follows these rules:

    • If the ascii value is outside the valid range, an error is returned.

    • If ascii is not of the BIGINT, STRING, DOUBLE, or DECIMAL type, an error is returned.

    • If ascii is NULL, NULL is returned.

  • Examples

    • Example 1: Convert the ASCII code 100 to a character. Sample statement:

      -- Returns d.
      select chr(100);
    • Example 2: The input parameter is NULL. Sample statement:

      -- Returns NULL.
      select chr(null);
    • Example 3: The input is a STRING-type character. Sample statement:

      --The input is implicitly converted to the BIGINT type for the operation, which returns 'd'.
      select chr('100');

CONCAT

  • Syntax

    array<T> concat(array<T> <a>, array<T> <b>[,...])
    string concat(string <str1>, string <str2>[,...])
  • Description

    • Arrays as inputs: Concatenates all elements of multiple arrays and returns a new array.

    • Strings as inputs: Concatenates multiple strings and returns a new string.

  • Parameters

    • a and b: required. These parameters specify arrays. T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type. The null elements are also involved in the operation.

    • str1 and str2: required. Values of the STRING type. If the input values are of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, they are implicitly converted into values of the STRING type before calculation. If the input values are of other data types, an error is returned.

  • Return value

    • A value of the ARRAY type is returned. If one of the input arrays is null, null is returned.

    • A value of the STRING type is returned. If no input parameters are configured or an input parameter is set to null, null is returned.

  • Examples

    • Example 1: Concatenate all elements of array(10, 20) and array(20, -20). Sample statement:

      -- The return value is [10, 20, 20, -20]. 
      select concat(array(10, 20), array(20, -20));
    • Example 2: One of the input arrays contains a null element. Sample statement:

      -- The return value is [10, null, 20, -20]. 
      select concat(array(10, null), array(20, -20));
    • Example 3: One of the input arrays is null. Sample statement:

      -- The return value is null. 
      select concat(array(10, 20), null);
    • Example 4: Concatenate strings aabc and abcde. Sample statement:

      -- The return value is aabcabcde. 
      select concat('aabc','abcde');
    • Example 5: The input is empty. Sample statement:

      -- The return value is null. 
      select concat();
    • Example 6: One of the input strings is null. Sample statement:

      -- The return value is null. 
      select concat('aabc', 'abcde', null);

CONCAT_WS

  • Syntax

    string concat_ws(string <separator>, string <str1>, string <str2>[,...])
    string concat_ws(string <separator>, array<string> <a>)
  • Description

    Returns a string formed by concatenating all input strings or array elements using the specified delimiter. This function is an extension in MaxCompute V2.0.

  • Parameters

    • separator: required. STRING type delimiter.

    • str1 and str2: At least two strings must be specified. STRING type. If the input is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, it is implicitly converted to the STRING type before calculation.

    • a: required. ARRAY type. Array elements must be of the STRING type.

  • Return value

    Returns a value of the STRING or STRUCT type. The return value follows these rules:

    • If str1 or str2 is not of the STRING, BIGINT, DECIMAL, DOUBLE, or DATETIME type, an error is returned.

    • If no input parameters are provided or any input parameter is NULL, NULL is returned.

  • Examples

    • Example 1: Concatenate strings name and hanmeimei using :. Sample statement:

      -- Returns name:hanmeimei.
      select concat_ws(':','name','hanmeimei');

      -

    • Example 2: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select concat_ws(':','avg',null,'34');
    • Example 3: Concatenate elements in array array('name', 'hanmeimei') using :. Sample statement:

      -- Returns name:hanmeimei.
      select concat_ws(':',array('name', 'hanmeimei'));

DECODE

  • Syntax

    STRING DECODE(BINARY <str>, STRING <charset>)
  • Description

    Decodes str using the format specified by charset.

  • Parameters

    Parameter

    Required

    Description

    str

    Yes

    BINARY type. The string to decode.

    charset

    Yes

    STRING type. Encoding format. Valid values: UTF-8, UTF-16, UTF-16LE, UTF-16BE, ISO-8859-1, US-ASCII.

    Note

    ISO-8859-1 and US-ASCII encoding formats currently support only English characters, not Chinese.

  • Return value

    Returns STRING. If str or charset is NULL, NULL is returned.

  • Examples

    • Example 1: Encode and decode the string English Sample based on the UTF-8 format. Sample statement:

      -- Encode and decode the string.SELECT DECODE(ENCODE("English Sample","UTF-8"), "UTF-8");

      Returns:

      +-----+| _c0 |+-----+| English Sample |+-----+
    • Example 2: Set either of the input parameters to NULL. Sample statement:

      SELECT DECODE(ENCODE("English Sample","UTF-8"), NULL);

      Returns:

      +-----+| _c0 |+-----+| NULL |+-----+

ENCODE

  • Syntax

    binary encode(string <str>, string <charset>)
  • Description

    Encodes str using the format specified by charset.

  • Parameters

    • str: required. STRING type. The string to re-encode.

    • charset: required. STRING type. Encoding format. Valid values: UTF-8, UTF-16, UTF-16LE, UTF-16BE, ISO-8859-1, US-ASCII.

  • Return value

    Returns a BINARY type. If str or charset is NULL, NULL is returned.

  • Examples

    • Example 1: Encode the string abc in UTF-8 format. Sample statement:

      -- Returns abc.
      select encode("abc", "UTF-8");
    • Example 2: Encode the string abc in UTF-16BE format. Sample statement:

      -- Returns =00a=00b=00c.
      select encode("abc", "UTF-16BE");
    • Example 3: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select encode("abc", null);

FIND_IN_SET

  • Syntax

    BIGINT FIND_IN_SET(STRING <str1>, STRING <str2>[, STRING <delimiter>])
  • Description

    Finds the position of string str1 in string str2 separated by delimiter, counting from 1.

  • Parameters

    Parameter

    Required

    Description

    str1

    Yes

    STRING type. The string to search for.

    str2

    Yes

    STRING type. The string separated by delimiter.

    delimiter

    No

    STRING constant. Default is comma (,). The delimiter can be a single character or a string.

  • Return value

    Returns a BIGINT type. The return value follows these rules:

    • If str2 does not contain str1 or str1 contains delimiter, 0 is returned.

    • If str1 or str2 is NULL, NULL is returned.

  • Examples

    • Example 1: Find the position of the string ab in the string abc,hello,ab,c, which is separated by commas (,). Sample statement:

      SELECT FIND_IN_SET('ab', 'abc,hello,ab,c') AS pos;

      Returns:

      +------------+
      | pos        |
      +------------+
      | 3          |
      +------------+
    • Example 2: Find the position of string hi in string abc,hello,ab,c separated by commas (,). Sample statement:

      SELECT FIND_IN_SET('hi', 'abc,hello,ab,c') AS pos;

      Returns:

      +------------+
      | pos        |
      +------------+
      | 0          |
      +------------+
    • Example 3: Find the position of the string ab in the underscore (_)-delimited string abc_hello_ab_c. Sample statement:

      SELECT FIND_IN_SET('ab', 'abc_hello_ab_c', '_') AS pos;

      Returns:

      +------------+
      | pos        |
      +------------+
      | 3          |
      +------------+
    • Example 4: Input parameter str1 or str2 is NULL. Sample statement:

      SELECT FIND_IN_SET(null, 'abc,hello,ab,c') AS pos;

      Returns:

      +------------+
      | pos        |
      +------------+
      | NULL       |
      +------------+

FORMAT_NUMBER

  • Syntax

    STRING FORMAT_NUMBER(DOUBLE|BIGINT|INT|SMALLINT|TINYINT|FLOAT|DECIMAL|STRING <expr1>, INT <expr2>)
  • Description

    Converts expr1 to a string formatted according to expr2.

  • Parameters

  • expr1: required. A value of the FLOAT, DOUBLE, or DECIMAL type. This parameter specifies the expression that you want to format.

  • expr2: required. A value of the INT type. Valid values: 0 to 340. This parameter specifies the number of decimal places that you want to retain. This parameter can also be expressed in a format similar to #,###,###.##. The number of decimal places returned varies based on the value of this parameter.

  • Return value

    A value of the STRING type is returned. The return value varies based on the following rules:

    • If the value of expr2 is greater than 0 and is less than or equal to 340, a value that is rounded to the specified number of decimal places is returned.

    • If the value of expr2 is 0, the return value contains only the integer part and does not contain the decimal point or decimal part.

    • If the value of expr2 is less than 0 or greater than 340, an error is returned.

    • If expr1 or expr2 is empty or is set to null, null is returned.

  • Examples

    • Example 1: Format a number as specified. Sample statements:

      -- Returns 5.230.
      SELECT FORMAT_NUMBER(5.230134523424545456,3);
      -- Returns 12,332.123.
      SELECT FORMAT_NUMBER(12332.123456, '#,###,###,###.###');
    • Example 2: One input parameter is empty or NULL. Sample statements:

      -- Returns NULL.
      SELECT FORMAT_NUMBER('',3);
      -- Returns NULL.
      SELECT FORMAT_NUMBER(null,3);

FROM_CHARSET

  • Syntax

    STRING FROM_CHARSET(binary <source>, string <source_charset>, [string <mode>])
  • Description

    FROM_CHARSET converts non-UTF-8 encoded binary data to a UTF-8 string for subsequent calculations.

  • Parameters

    • source: required. A value of the BINARY type. This parameter specifies the binary data that you want to convert.

    • source_charset: required. A value of the STRING type. This parameter specifies the original encoding format of the binary data that is specified by source. Valid values: UTF-8, UTF-16, UTF-16LE, UTF-16BE, ISO-8859-1, US-ASCII, GB2312, GBK, and GB18030.

    • mode: optional. A value of the STRING type. This parameter specifies the processing mode if a character cannot be converted when the FROM_CHARSET function converts the binary data specified by source into a string in the specified encoding format. Valid values:

      • NONE: reports an error. No processing is performed. This is the default value.

      • TRANSLIT: replaces the character with a similar character in the specified encoding format.

      • IGNORE: ignores the error and continues to run the command.

  • Return value

    Returns a UTF-8 encoded STRING type. When parameters are NULL or empty strings, the return value follows these rules:

    • If any input parameter is NULL, returns NULL.

    • If any input parameter is an empty string, returns an error.

  • Examples

    • Example 1: Convert UTF-8 encoded binary data into a UTF-8 encoded string.

      SELECT FROM_CHARSET(unhex('e58aa0e6b2b9e9949fe696a4e68bb70a'),'UTF-8', 'TRANSLIT');

      Returns:

      +------------+| _c0        |+------------+| 加油锟斤拷
           |+------------+
    • Example 2: Convert GBK-encoded binary data into a UTF-8 encoded string.

      SELECT FROM_CHARSET(unhex('b9feb9febac3a4ce'), 'GBK');

      Returns:

      +------------+| _c0        |+------------+| 哈哈好の       |+------------+
    • Example 3: If an input parameter is null, null is returned.

      SELECT FROM_CHARSET(unhex('b9feb9febac3a4ce'), null);

      Returns:

      +------------+| _c0        |+------------+| NULL       |+------------+

FROM_JSON

  • Syntax

    from_json(<jsonStr>, <schema>)
  • Description

    Returns data of the ARRAY, MAP, or STRUCT type based on JSON string jsonStr and output format schema.

  • Parameters

    • jsonStr: required. The JSON string that you entered.

    • schema: required. The schema of the JSON string. The value of this parameter must be in the same format as that in the statement for creating a table, such as array<bigint>, map<string, array<string>>, or struct<a:int, b:double, `C`:map<string,string>>.

      Note

      Keys in a struct are case-sensitive. You can also specify a struct in the format of a BIGINT, b DOUBLE, which is equivalent to STRUCT<a:BIGINT, b:DOUBLE>.

      The following table describes the mappings between JSON data types and MaxCompute data types.

      JSON data type

      MaxCompute data type

      OBJECT

      STRUCT, MAP, and STRING

      ARRAY

      ARRAY and STRING

      NUMBER

      TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, and STRING

      BOOLEAN

      BOOLEAN and STRING

      STRING

      STRING, CHAR, VARCHAR, BINARY, DATE, and DATETIME

      NULL

      All types

      Note

      The JSON string of the OBJECT and ARRAY types are parsed as much as possible. If the data type of the JSON string is not mapped to any MaxCompute data type, the JSON string is omitted. For ease of use, all JSON data types can be converted into the STRING data type supported by MaxCompute. When you convert a JSON string of the NUMBER type to a value of the FLOAT, DOUBLE, or DECIMAL type, the precision of the value cannot be ensured. We recommend you convert the JSON string to a value of the STRING type and then convert the obtained value to a value of the FLOAT, DOUBLE, or DECIMAL type.

  • Return value

    A value of the ARRAY, MAP, or STRUCT type is returned.

  • Examples

    • Example 1: Convert a specific JSON string into a value of a specific data type. Sample statement:

      -- The return value is {"a":1,"b":0.8}. 
      select from_json('{"a":1, "b":0.8}', 'a int, b double');
      -- The return value is {"time":"26/08/2015"}. 
      select from_json('{"time":"26/08/2015"}', 'time string');
      -- The return value is {"a":1,"b":0.8}. 
      select from_json('{"a":1, "b":0.8}', 'a int, b double, c string');
      -- The return value is [1,2,3]. 
      select from_json('[1, 2, 3, "a"]', 'array<bigint>');
      -- The return value is {"d":"v","a":"1","b":"[1,2,3]","c":"{}"}. 
      select from_json('{"a":1,"b":[1,2,3],"c":{},"d":"v"}', 'map<string, string>');
    • Example 2: Use the map_keys and from_json functions to obtain all keys in a JSON string. You can also use JSON_KEYS for the same purpose. Sample statement:

      -- The return value is ["a","b"]. 
      select map_keys(from_json('{"a":1,"b":2}','map<string,string>'));

GET_JSON_OBJECT

Usage notes

This function is used to extract a specific string from a standard JSON string by using JSONPath. This function supports input parameters of the following data types:

  • JSON types: If input parameters are of JSON types, standard JSONPath expressions are used.

  • STRING type: If input parameters are of the STRING type, the original JSONPath expressions are used.

Note

  • The JSONPath expressions that are used when input parameters are of JSON types and those that are used when input parameters are of the STRING type comply with different rules. This may cause incompatibility issues.

  • The GET_JSON_OBJECT function does not support the syntax of JSONPath expressions.

Input parameters of JSON data types

  • Syntax

    string get_json_object(json <json>, string <json_path>)
  • Description

    Extracts a single string from a standard JSON string based on JSON PATH.

  • Parameters

    • json: required. A JSON string from which you want to extract a single string.

    • json_path: required. A JSONPath expression based on which you want to extract a single string.

  • Return value

    A value of a STRING type is returned.

  • Examples

    • Example 1: Extract the value that corresponds to the key a from a JSON string.

      SELECT get_json_object(json '{"a":1, "b":2}', '$.a');

      Returns:

      +-----+| _c0 |+-----+| 1   |+-----+
    • Example 2: Extract the value that corresponds to the key c from a JSON string.

      SELECT get_json_object(json '{"a":1, "b":2}', '$.c');

      Returns:

      +-----+| _c0 |+-----+| NULL |+-----+
    • Example 3: If an invalid JSON path is specified, the return value is NULL.

      SELECT get_json_object(json '{"a":1, "b":2}', '$invalid_json_path');

      Returns:

      +-----+| _c0 |+-----+| NULL |+-----+

Input parameter of the STRING data type

  • Syntax

    string get_json_object(string <json>, string <path>)
  • Description

    Extracts a single string from a standard JSON string based on path. The original data is read each time this function is called. Therefore, repeated calls may affect system performance and increase costs. To prevent repeated calls, you can use the GET_JSON_OBJECT function with UDTFs. For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs.

  • Parameters

    • json: required. A value of the STRING type. This parameter specifies a standard JSON object in the format of {Key:Value, Key:Value,...}. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark (") before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark (') before extraction.

    • path: required. A value of the STRING type. This parameter specifies the path in the value of the json parameter and starts with $. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON data from OSS to MaxCompute. Meanings of different characters:

      • $: indicates the root node.

      • . or ['']: indicates a child node. MaxCompute parses JSON objects by using . or ['']. If a key in a JSON object contains a period (.), [''] can be used.

      • [] ([number]): indicates an array subscript, which starts from 0.

      • *: indicates the wildcard for []. If this character is used in the path parameter, an entire array is returned. An asterisk (*) cannot be escaped.

  • Limits

    Only MaxCompute V2.0 allows you to extract data by using [''] in the path parameter. To use [''], you must add the set odps.sql.udf.getjsonobj.new=true; statement before the statement that you want to execute.

  • Return value

    • If the json parameter is empty or invalid, null is returned.

    • If the format of json is valid and path exists, the related string is returned.

    • You can specify the flagodps.sql.udf.getjsonobj.new parameter for a session to determine how this function returns a value.

      • If you execute the set odps.sql.udf.getjsonobj.new=true; statement, this function retains the original strings when it returns a value.

        We recommend that you use this configuration because it results in more standard function return behavior. This facilitates data processing and improves data processing performance. If a job in which this function escapes JSON reserved characters exists in a MaxCompute project, we recommend that you retain the original escape operation to prevent errors caused by lack of verification. The function complies with the following rules when it returns a value:

        • In this configuration, the return value is still a JSON string, which can be parsed as JSON data, without the need to use the REPLACE or REGEXP_REPLACE function to replace backslashes (\).

        • Duplicate keys are allowed in a JSON object. If duplicate keys exist, the data can be parsed.

          -- The return value is 1. 
          select get_json_object('{"a":"1","a":"2"}', '$.a');
        • The encoded strings that correspond to emojis are supported. However, DataWorks does not allow you to enter emojis. DataWorks allows you to enter only the encoded strings that correspond to emojis to MaxCompute by using a tool, such as Data Integration. DataWorks uses the GET_JSON_OBJECT function to process the data.

          -- The return value is an emoji. 
          select get_json_object('{"a":"<Emoji>"}', '$.a');
        • The output results are displayed in alphabetical order.

          -- The return value is {"b":"1","a":"2"}. 
          select get_json_object('{"b":"1","a":"2"}', '$');
      • If you execute the set odps.sql.udf.getjsonobj.new=false; statement, this function escapes JSON reserved characters when it returns a value. The function complies with the following rules when it returns a value:

        • JSON reserved characters such as line feeds (\n) and quotation marks (") are displayed as '\n' and '\"'.

        • Each key in a JSON object must be unique. If duplicate keys exist, the data may fail to be parsed. Sample statement:

          -- The return value is null. 
          select get_json_object('{"a":"1","a":"2"}', '$.a');
        • The encoded strings that correspond to emojis cannot be parsed. Sample statement:

          -- The return value is null. 
          select get_json_object('{"a":"<Emoji>"}', '$.a');
        • The output results are displayed in alphabetical order. Sample statement:

          -- The return value is {"a":"2","b":"1"}. 
          select get_json_object('{"b":"1","a":"2"}', '$');

      Note

      For MaxCompute projects that were created on or after January 21, 2021, the GET_JSON_OBJECT function retains the original strings when it returns a value. For MaxCompute projects that are created before January 21, 2021, the GET_JSON_OBJECT function escapes JSON reserved characters when it returns a value. The following example helps you determine how the GET_JSON_OBJECT function returns a value in a MaxCompute project.

      select get_json_object('{"a":"[\\"1\\"]"}', '$.a');
      -- Return JSON reserved characters by using escape characters.
      [\"1\"]
      
      -- Return the original strings.
      ["1"]

      You can submit an applicationor search for the DingTalk group ID 11782920 to join the MaxCompute developer community DingTalk group and request MaxCompute technical support engineers to configure the GET_JSON_OBJECT function to retain original strings. This way, you do not need to frequently specify set odps.sql.udf.getjsonobj.new=false; for a session.

  • Examples

    • Example 1: Extract information from the JSON object src_json.json. Sample statement:

      -- The JSON string src_json.json contains the following content: 
      +----+
      json
      +----+
      {"store":
      {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
      "bicycle":{"price":19.95,"color":"red"}
      },
      "email":"amy@only_for_json_udf_test.net",
      "owner":"amy"
      }
      -- Extract the information of the owner field and return amy. 
      select get_json_object(src_json.json, '$.owner') from src_json;
      -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. 
      select get_json_object(src_json.json, '$.store.fruit[0]') from src_json;
      -- Extract the information of the non-existent field and return null. 
      select get_json_object(src_json.json, '$.non_exist_key') from src_json;
    • Example 2: Extract information from a JSON object of the ARRAY type. Sample statement:

      -- The return value is 2222. 
      select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]');
      -- The return value is ["h0","h1","h2"]. 
      set odps.sql.udf.getjsonobj.new=true;
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]');
      -- The return value is ["h0","h1","h2"]. 
      set odps.sql.udf.getjsonobj.new=false;
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]');
      -- The return value is h1. 
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');
    • Example 3: Extract information from a JSON object that includes a period (.). Sample statement:

      -- Create a table. 
      create table mf_json (id string, json string);
      -- Insert data into the table. The key in the data contains a period (.). 
      insert into table mf_json (id, json) values ("1", "{
      \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
      \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
      -- Insert data into the table. The key in the data does not contain a period (.). 
      insert into table mf_json (id, json) values ("2", "{
      \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
      \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
      -- Query the value of id in the JSON object whose key is China.beijing. 0 is returned. Only [''] can be used to specify the key because the key contains a period (.). This way, MaxCompute can parse the key. 
      select get_json_object(json, "$['China.beijing'].school['id']") from mf_json where id =1;
      -- Query the value of id in the JSON object whose key is China_beijing. 0 is returned. You can use one of the following statements: 
      select get_json_object(json, "$['China_beijing'].school['id']") from mf_json where id =2;
      select get_json_object(json, "$.China_beijing.school['id']") from mf_json where id =2;
    • Example 4: The json parameter is empty or invalid. Sample statement:

      -- The return value is null. 
      select get_json_object('','$.array[1][1]');
      -- The return value is null. 
      select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');
    • Example 5: Escape a JSON string. Sample statement:

      set odps.sql.udf.getjsonobj.new=true;
      -- The return value is "1". 
      select get_json_object('{"a":"\\"1\\"","b":"2"}', '$.a'); 
      -- The return value is '1'. 
      select get_json_object('{"a":"\'1\'","b":"2"}', '$.a'); 

INITCAP

  • Syntax

    string initcap(<str>)
  • Description

    Converts str to title case where words are separated by spaces, with the first letter of each word uppercase and the rest lowercase.

  • Parameters

    str: required. STRING type. Input string.

  • Return value

    Returns a string with the first letter of each word uppercase and the rest lowercase.

  • Example

    -- Returns Odps Sql.
    SELECT initcap("oDps sql");

INSTR

  • Syntax

    bigint instr(string <str1>, string <str2>[, bigint <start_position>[, bigint <nth_appearance>]])
  • Description

    Finds the position of substring str2 in string str1.

  • Parameters

    • str1: required. STRING type. Target string to search. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING. Other types return an error.

    • str2: required. STRING type. Substring to match. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING. Other types return an error.

    • start_position: optional. A value of the BIGINT type. An error is returned for other data types. This parameter specifies the position of the character in str1 from which the search starts. The default start position is the first character, which is position 1. If start_position is a negative value, the search starts counting backward from the end of the string, where the last character is at position -1, the second-to-last at -2, and so on.

    • nth_appearance: Optional. A value of the BIGINT type that must be greater than 0. This parameter specifies the position of the nth_appearance-th match of str2 in str1. If nth_appearance is of another data type or is less than or equal to 0, an error is returned.

  • Return value

    Returns a BIGINT type. The return value follows these rules:

    • If str1 does not contain str2, returns 0.

    • If str2 is empty, always matches successfully. Example: select instr('abc',''); returns 1.

    • If str1, str2, start_position, or nth_appearance is NULL, returns NULL.

  • Examples

    • Example 1: Find position of character e in string Tech on the net. Sample statement:

      -- Returns 2.
      select instr('Tech on the net', 'e');
    • Example 2: Find position of substring on in string Tech on the net. Sample statement:

      -- Returns 6.
      select instr('Tech on the net', 'on');
    • Example 3: Find position of second occurrence of character e in string Tech on the net starting from third character. Sample statement:

      -- Returns 14.
      select instr('Tech on the net', 'e', 3, 2);
    • Example 4: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select instr('Tech on the net', null);

IS_ENCODING

  • Syntax

    boolean is_encoding(string <str>, string <from_encoding>, string <to_encoding>)
  • Description

    Determines if input string str can be converted from character set from_encoding to to_encoding. Also used to detect garbled text—typically set from_encoding to UTF-8 and to_encoding to GBK.

  • Parameters

    • str: required. STRING type. Empty strings are considered valid for any character set.

    • from_encoding, to_encoding: required. STRING type. Source and target character sets.

  • Return value

    Returns BOOLEAN type. The return value follows these rules:

    • If str converts successfully, returns True; otherwise False.

    • If str, from_encoding, or to_encoding is NULL, returns NULL.

  • Examples

    • Example 1: Determine whether the character 测试 or 測試 can be converted from the UTF-8 character set to the GBK character set. The following command provides an example.

      -- Returns true.
      select is_encoding('测试', 'utf-8', 'gbk');
      -- Returns true.
      select is_encoding('測試', 'utf-8', 'gbk');
    • Example 2: Check if character Test can convert from utf-8 to gb2312. Sample statement:

      -- The GB2312 character set does not include these two characters. Returns false.
      select is_encoding('測試', 'utf-8', 'gb2312');
    • Example 3: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select is_encoding('Test', null, 'gbk');

JSON_TUPLE

  • Syntax

    string json_tuple(string <json>, string <key1>, string <key2>,...)
  • Description

    Extracts strings from a standard JSON string based on a set of input keys, such as (key1,key2,...).

  • Parameters

    • json: required. A value of the STRING type. This parameter specifies a standard JSON string.

    • key: required. A value of the STRING type. This parameter is used to describe the path of a JSON object in the JSON string. The value cannot start with a dollar sign ($). You can enter multiple keys at a time. MaxCompute parses JSON objects by using . or ['']. If a key in a JSON object includes a period (.), [''] can be used.

  • Return value

    A value of the STRING type is returned.

    Note
    • If json is empty or invalid, null is returned.

    • If key is empty, invalid, or does not exist in the JSON string, null is returned.

    • If json is valid and key exists, the related string is returned.

    • This function can parse JSON data that contains Chinese characters.

    • This function can parse nested JSON data.

    • This function can parse JSON data that contains nested arrays.

    • The parsing action is equivalent to the execution of GET_JSON_OBJECT along with set odps.sql.udf.getjsonobj.new=true;. To obtain multiple objects from a JSON string, you must call the GET_JSON_OBJECT function multiple times. As a result, the JSON string is parsed multiple times. The JSON_TUPLE function allows you to enter multiple keys at a time and the JSON string is parsed only once. JSON_TUPLE is more efficient than GET_JSON_OBJECT.

    • JSON_TUPLE is a user-defined table-valued function (UDTF). If you want to select some columns from a table, use JSON_TUPLE together with the LATERAL VIEW clause.

  • Examples

    --Create a table named school.
    create table school (id string, json string);
    --Insert data into the table.
    insert into school (id, json) values ("1", "{
    \"school\": \"湖畔大学\", \"地址\":\"杭州\", \"SchoolRank\": \"00\", 
    \"Class1\":{\"Student\":[{\"studentId\":1,\"scoreRankIn3Year\":[1,2,[3,2,6]]}, 
    {\"studentId\":2,\"scoreRankIn3Year\":[2,3,[4,3,1]]}]}}");
    • Example 1: Extract information from a JSON object. Sample command:

      select json_tuple(school.json,"SchoolRank","Class1") as (item0, item1) from school;
      --The preceding statement is equivalent to the following statement.
      select get_json_object(school.json,"$.SchoolRank") item0,get_json_object(school.json,"$.Class1") item1 from school;
      
      --The following result is returned.
      +-------+-------+
      | item0 | item1 |
      +-------+-------+
      | 00    | {"Student":[{"studentId":1,"scoreRankIn3Year":[1,2,[3,2,6]]},{"studentId":2,"scoreRankIn3Year":[2,3,[4,3,1]]}]} |
      +-------+-------+
    • Example 2: Use [''] to extract information from a JSON object. Sample command:

      select json_tuple(school.json,"school","['Class1'].Student") as (item0, item2) from school where id=1;
      --The following result is returned.
      +-------+-------+
      | item0 | item2 |
      +-------+-------+
      | 湖畔大学 | [{"studentId":1,"scoreRankIn3Year":[1,2,[3,2,6]]},{"studentId":2,"scoreRankIn3Year":[2,3,[4,3,1]]}] |
    • Example 3: Parse JSON data that contains Chinese characters. Sample command:

      select json_tuple(school.json,"校名","地址") as (item0,item1) from school;
      --The following result is returned.
      +-------+-------+
      | item0 | item1 |
      +-------+-------+
      | 湖畔大学 | 杭州 |
      +-------+-------+
    • Example 4: Parse multi-level nested JSON data. Sample command:

      select sc.Id, q.item0, q.item1 
      from school sc lateral view json_tuple(sc.json,"Class1.Student[*].studentId","Class1.Student[0].scoreRankIn3Year") q as item0,item1;
      
      --The following result is returned.
      +------------+-------+-------+
      | id         | item0 | item1 |
      +------------+-------+-------+
      | 1          | [1,2] | [1,2,[3,2,6]] |
      +------------+-------+-------+
    • Example 5: Parse JSON data that contains multi-level nested arrays. Sample command:

      select sc.Id, q.item0, q.item1
      from school sc lateral view json_tuple(sc.json,"Class1.Student[0].scoreRankIn3Year[2]","Class1.Student[0].scoreRankIn3Year[2][1]") q as item0,item1;
      --The following result is returned.
      +------------+-------+-------+
      | id         | item0 | item1 |
      +------------+-------+-------+
      | 1          | [3,2,6] | 2     |
      +------------+-------+-------+

KEYVALUE

  • Syntax

    keyvalue(string <str>,[string <split1>,string <split2>,] string <key>)
    keyvalue(string <str>,string <key>) 
  • Description

    Splits string str into key-value pairs using split1, separates pairs using split2, and returns the value corresponding to key.

  • Parameters

    • str: required. STRING type. String to split.

    • split1, split2: optional. STRING type. Delimiter strings for splitting. If unspecified, default split1 is ";" and split2 is ":". If a substring split by split1 contains multiple split2, the result is undefined.

    • key: required. STRING type. After splitting by split1 and split2, returns the value for key.

  • Return value

    Returns STRING type. The return value follows these rules:

    • If split1 or split2 is NULL, returns NULL.

    • If str or key is NULL or no matching key exists, returns NULL.

    • If multiple key-value pairs match, returns the value for the first matching key.

  • Examples

    • Example 1: Split string 0:1/;1:2 into key-value pairs and return value for key 1. Sample statement:

      -- Returns 2.
      select keyvalue('0:1/;1:2', 1);

      The split1 and split2 parameters are not specified. The default value of split1 is a semicolon (";") and the default value of split2 is a colon (":").

      After splitting by split1, key-value pairs are 0:1/,1:2. After splitting by split2:

      0 1/  
      1 2

      Returns value 2 for key 1.

    • Example 2: Split string “\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;” using “\;” as split1 and ":" as split2, then return value for key tf. Sample statement:

      -- Returns 21910.
      select keyvalue("\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;","\;",":","tf");

      Splitting the string "\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;" using the delimiter "\;" generates the following key-value pairs.

      decreaseStore:1,xcard:1,isB2C:1,tf:21910,cart:1,shipping:2,pf:0,market:shoes,instPayAmount:0 

      The result of splitting by ":" is as follows.

      decreaseStore 1  
      xcard 1  
      isB2C 1  
      tf 21910  
      cart 1  
      shipping 2  
      pf 0  
      market shoes  
      instPayAmount 0

      Returns value 21910 for key tf.

KEYVALUE_TUPLE

  • Syntax

    KEYVALUE_TUPLE(str, split1, split2, key1, key2, ..., keyN)
  • Description

    Splits string str into key-value pairs using split1, separates pairs using split2, and returns values for multiple keys.

  • Parameters

    • str: required. STRING type. String to split.

    • split1, split2: required. STRING type. Delimiter strings for splitting. If a substring split by split1 contains multiple split2, the result is undefined.

    • key: required. STRING type. After splitting by split1 and split2, returns the value for key.

  • Return value

    Returns STRING type. The return value follows these rules:

    • If split1 or split2 is NULL, returns NULL.

    • If str or key is NULL or no matching key exists, returns NULL.

  • Example

    -- Create table
    create table mf_user (
    user_id string,
    user_info string
    );
    -- Insert data
    insert into mf_user values('1','age:18;genda:f;address:abc'),('2','age:20;genda:m;address:bcd');
    -- Query
    SELECT user_id,
    KEYVALUE(user_info,';',':','age') as age,
    KEYVALUE(user_info,';',':','genda') as genda,
    KEYVALUE(user_info,';',':','address') as address
    FROM mf_user;
    -- Equivalent to KEYVALUE query
    SELECT user_id,
    age,
    genda,
    address
    FROM mf_user LATERAL VIEW KEYVALUE_TUPLE(user_info,';', ':','age','genda','address') ui AS age,genda,address;

    The result is as follows.

    +------------+------------+------------+------------+
    | user_id    | age        | genda      | address    |
    +------------+------------+------------+------------+
    | 1          | 18         | f          | abc        |
    | 2          | 20         | m          | bcd        |
    +------------+------------+------------+------------+

LENGTH

  • Syntax

    bigint length(string <str>)
  • Description

    Calculates the length of string str.

  • Parameters

    str: required. STRING type. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING.

  • Return value

    Returns BIGINT type. The return value follows these rules:

    • If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.

    • If str is NULL, returns NULL.

    • If str is not UTF-8 encoded, returns -1.

  • Examples

    • Example 1: Calculate length of string Tech on the net. Sample statement:

      -- Returns 15.
      select length('Tech on the net');
    • Example 2: Input parameter is NULL. Sample statement:

      -- Returns NULL.
      select length(null);

LENGTHB

  • Syntax

    bigint lengthb(string <str>)
  • Description

    Calculates the byte length of string str.

  • Parameters

    str: required. STRING type. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING.

  • Return value

    Returns BIGINT type. The return value follows these rules:

    • If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.

    • If str is NULL, returns NULL.

  • Examples

    • Example 1: Calculate byte length of string Tech on the net. Sample statement:

      -- Returns 15.
      select lengthb('Tech on the net');
    • Example 2: Input parameter is NULL. Sample statement:

      -- Returns NULL.
      select lengthb(null);

LOCATE

  • Syntax

    bigint locate(string <substr>, string <str>[, bigint <start_pos>]) 
  • Description

    Finds the position of str substring substr. Use start_pos to specify the starting position (counting from 1).

  • Parameters

    • substr: required. STRING type. Substring to find.

    • str: required. STRING type. String to search.

    • start_pos: optional. BIGINT type. Starting position for search.

  • Return value

    Returns BIGINT type. The return value follows these rules:

    • If str doesn't contain substr, returns 0.

    • If str or substr is NULL, returns NULL.

    • If start_pos is NULL, returns 0.

  • Examples

    • Example 1: Find position of string ab in string abchelloabc. Sample statement:

      -- Returns 1.
      select locate('ab', 'abchelloabc');
    • Example 2: Find position of string hi in string abchelloabc. Sample statement:

      -- Returns 0.
      select locate('hi', 'abc,hello,ab,c');
    • Example 3: start_pos is NULL. Sample statement:

      -- Returns 0.
      select locate('ab', 'abhelloabc', null);

LPAD

  • Syntax

    string lpad(string <str1>, int <length>, string <str2>)
  • Description

    Left-pads string str1 to length characters using string str2. This function is an extension in MaxCompute V2.0.

  • Parameters

    • str1: required. STRING type. String to left-pad.

    • length: required. INT type. Number of characters for left padding.

    • str2: required. Padding string.

  • Return value

    Returns STRING type. The return value follows these rules:

    • If length is less than str1's length, returns leftmost length characters of str1.

    • If length is 0, returns empty string.

    • If no input parameters or any input parameter is NULL, returns NULL.

  • Examples

    • Example 1: Use the string 12 to left-pad the string abcdefgh to 10 characters. The sample command is as follows.

      -- Returns 12abcdefgh.
      select lpad('abcdefgh', 10, '12');
    • Example 2: Use the string 12 to left-pad the string abcdefgh to a length of 5 characters. The following is a sample command.

      -- Returns abcde.
      select lpad('abcdefgh', 5, '12');
    • Example 3: length is 0. Sample statement:

      -- Returns empty string.
      select lpad('abcdefgh' ,0, '12'); 
    • Example 4: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select lpad(null ,0, '12');

LTRIM

  • Syntax

    string ltrim(string <str>[, <trimChars>])
    string trim(leading [<trimChars>] from <str>)
  • Description

    Removes characters from the left end of str:

    • If trimChars is unspecified, removes leading spaces by default.

    • If you specify trimChars, the function treats the characters in trimChars as a set and removes from the left end of str the longest possible substring consisting entirely of characters in the set trimChars.

  • Parameters

    • str: required. STRING type. String to trim. If input is BIGINT, DECIMAL, DOUBLE, or DATETIME, it's implicitly converted to STRING.

    • trimChars: optional. STRING type. Characters to remove.

  • Return value

    Returns STRING type. The return value follows these rules:

    • If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.

    • If str or trimChars is NULL, returns NULL.

  • Examples

    • Example 1: Remove leading spaces from string yxTxyomxx . Sample statements:

      -- Returns yxTxyomxx.
      select ltrim(' yxTxyomxx ');
      -- Equivalent to:
      select trim(leading from ' yxTxyomxx ');
    • Example 2: Remove all leading characters from string yxTxyomxx that are in set xy.

      -- Returns Txyomxx. Any leading x or y is removed.
      select ltrim('yxTxyomxx', 'xy');
      -- Equivalent to:
      select trim(leading 'xy' from 'yxTxyomxx');
    • Example 3: Input parameter is NULL. Sample statements:

      -- Returns NULL.
      select ltrim(null);
      select ltrim('yxTxyomxx', null);

MASK_HASH

  • Syntax

    mask_hash(<expr>)
  • Description

    Returns a hash value calculated from string expression expr. Identical hash values indicate identical expression values.

  • Parameters

    expr: required. String expression for hash calculation. Supports STRING, CHAR, VARCHAR, BINARY types.

  • Return value

    Returns a 64-byte hash value. For non-string expressions, returns null (Hive-compatible).

  • Example

    -- Returns hash for abc
    select mask_hash("abc");
    -- Returns
    +------------+
    | _c0        |
    +------------+
    | ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad |
    +------------+
    
    -- Returns NULL for non-string input
    select mask_hash(100);
    -- Returns
    +------------+
    | _c0        |
    +------------+
    | NULL       |
    +------------+

MD5

  • Syntax

    string md5(string <str>)
  • Description

    Calculates the MD5 hash of string str.

  • Parameters

    str: required. STRING type. If input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it's implicitly converted to STRING.

  • Return value

    Returns STRING type. The return value follows these rules:

    • If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.

    • If str is NULL, returns NULL.

  • Examples

    • Example 1: Calculate MD5 hash of string Tech on the net. Sample statement:

      -- Returns ddc4c4796880633333d77a60fcda9af6.
      select md5('Tech on the net');
    • Example 2: Input parameter is NULL. Sample statement:

      -- Returns NULL.
      select md5(null);

PARSE_URL

  • Syntax

    string parse_url(string <url>, string <part>[, string <key>])
  • Description

    Parses url and extracts information specified by part.

  • Parameters

    • url: required. STRING type. URL link. Invalid URLs return an error.

    • part: required. STRING type. Valid values: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO (case-insensitive).

    • key: optional. When part is QUERY, returns the value for key.

  • Return description

    Returns STRING type. The return value follows these rules:

    • If url, part, or key is NULL, returns NULL.

    • If part is invalid, returns an error.

  • Example

    -- Returns example.com.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'HOST');
    -- Returns /over/there/index.dtb.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'PATH');
    -- Returns animal.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'QUERY', 'type');
    -- Returns nose.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'REF');
    -- Returns file.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'PROTOCOL');
    -- Returns username:password@example.com:8042.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'AUTHORITY');
    -- Returns username:password.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'USERINFO');

PARSE_URL_TUPLE

  • Syntax

    string parse_url_tuple(string <url>, string <key1>, string <key2>,...)
  • Description

    Parses url and extracts strings specified by input keys key1, key2, etc. Similar to PARSE_URL but extracts multiple keys simultaneously with better performance.

  • Parameters

    • url: required. STRING type. URL link. Invalid URLs return an error.

    • key1, key2: required. STRING type. Keys to extract. Valid values:

      • HOST: Host address (domain or IP).

      • PATH: Path to network resource on server.

      • QUERY: Query string (content to query).

      • REF: indicates the fragment identifier of a URL, which is the content after the # symbol.

      • PROTOCOL: Protocol type.

      • AUTHORITY: Server domain/IP, port, and user authentication info (username, password).

      • FILE: Path and query content (PATH + QUERY).

      • USERINFO: User authentication info.

      • QUERY:<KEY>: Value for specified key in query string.

      Case-insensitive. Invalid values return an error.

  • Return Description

    Returns STRING type. If url or key is NULL, returns an error.

  • Example

    Extract strings for each key from file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose. Sample statement:

    select parse_url_tuple('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'HOST', 'PATH', 'QUERY', 'REF', 'PROTOCOL', 'AUTHORITY', 'FILE', 'USERINFO', 'QUERY:type', 'QUERY:name') as (item0, item1, item2, item3, item4, item5, item6, item7, item8, item9);

    Returns:

    +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
    | item0      | item1      | item2      | item3      | item4      | item5      | item6      | item7      | item8      | item9      |
    +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
    | example.com | /over/there/index.dtb | type=animal&name=narwhal | nose       | file       | username:password@example.com:8042 | /over/there/index.dtb?type=animal&name=narwhal | username:password | animal     | narwhal    |
    +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+

REGEXP_COUNT

  • Syntax

    bigint regexp_count(string <source>, string <pattern>[, bigint <start_position>])
  • Description

    Counts substrings in source starting from start_position that match specified pattern.

  • Parameters

    • source: required. STRING type. String to search. Other types return an error.

    • pattern: required. STRING constant or regular expression. Pattern to match. For more regex guidelines, see RLIKE string matching. Empty or invalid pattern returns an error.

    • start_position: optional. A constant of the BIGINT type. The value of this parameter must be greater than 0. If the value is of another data type or is less than or equal to 0, an error is returned. If you do not specify this parameter, the default value is 1. This value indicates that the search starts from the first character of the source string.

  • Return value

    Returns BIGINT type. The return value follows these rules:

    • If no matches found, returns 0.

    • If source, pattern, or start_position is NULL, returns NULL.

  • Examples

    • Example 1: Count matching substrings in abababc from specified position. Sample statements:

      -- Returns 1.
      select regexp_count('abababc', 'a.c');
      -- Returns 2.
      select regexp_count('abababc', '[[:alpha:]]{2}', 3);
    • Example 2: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select regexp_count('abababc', null);
    • Example 3: Count occurrences of : in JSON string {"account_id":123456789,"account_name":"allen","location":"hangzhou","bill":100}. Sample statement:

      -- Returns 4.
      select regexp_count('{"account_id":123456789,"account_name":"allen","location":"hangzhou","bill":100}',':');

REGEXP_EXTRACT

  • Syntax

    string regexp_extract(string <source>, string <pattern>[, bigint <groupid>])
    Note

    In Hive-compatible data type versions, REGEXP_EXTRACT follows Java regex specification. In 1.0 and 2.0 data type versions, it follows MaxCompute specification.

  • Description

    Splits string source into groups using pattern and returns the string from group groupid.

  • Parameters

    • source: required. STRING type. String to split.

    • pattern: required. STRING constant or regular expression. Pattern to match. For more regex guidelines, see RLIKE string matching.

    • groupid: optional. BIGINT constant, must be ≥0.

    Note

    Data is stored in UTF-8 format. Chinese characters can be represented in hexadecimal within range [\\x{4e00},\\x{9fa5}].

  • Return value

    Returns STRING type. The return value follows these rules:

    • If pattern is an empty string or no group is specified in pattern, an error is returned.

    • groupid is not of the BIGINT type or is less than 0, an error is returned. If you do not specify this parameter, the default value is 1, indicating that the substring in the first group is returned. If groupid is set to 0, the substring that matches the entire pattern is returned.

    • If source, pattern, or groupid is NULL, returns NULL.

  • Examples

    • Example 1: Split foothebar using foo(.*?)(bar). Sample statements:

      -- Returns the.
      select regexp_extract('foothebar', 'foo(.*?)(bar)');
      -- Returns foothebar.
      select regexp_extract('foothebar', 'foo(.*?)(bar)', 0);
    • Example 2: Split 8d99d8 using 8d(\\d+)d8. Sample statement:

      -- Returns 99. In MaxCompute client, use two backslashes (\\) as escape characters for regex.
      select regexp_extract('8d99d8', '8d(\\d+)d8');
    • Example 3: Extract Chinese characters and punctuation from [Alibaba Cloud]aliyun. Sample statement:

      -- Returns [Alibaba Cloud].
      select regexp_extract('[Alibaba Cloud]aliyun', '([^\\x{00}-\\x{ff}]+)');
    • Example 4: Extract Chinese characters from [Alibaba Cloud]aliyun. Sample statement:

      --Returns Alibaba Cloud.
      select regexp_extract('【Alibaba Cloud】aliyun', '([\\x{4e00}-\\x{9fa5}]+)');
    • Example 5: Extract Chinese characters from [Alibaba Cloud]aliyunAlibaba Cloud. This complex scenario requires REGEXP_REPLACE. Sample statement:

      -- Returns [Alibaba Cloud]Alibaba Cloud.
      select regexp_replace('[Alibaba Cloud]aliyunAlibaba Cloud','([\\x{00}-\\x{ff}])', '');
    • Example 6: No groups defined. Invalid command example:

      select regexp_extract('foothebar', 'foothebar');

REGEXP_EXTRACT_ALL

  • Syntax

    array<T> regexp_extract_all(string <source>, string <pattern>[,bigint <group_id>])
  • Description

    Finds all substrings matching a regular expression pattern in a string and returns them as an array.

  • Parameters

    • source: required. STRING type. The string to analyze.

    • pattern: required. STRING type. The pattern to match. This can be a STRING constant or a regular expression. For more regex guidelines, see RLIKE string matching.

    • group_id: optional. A BIGINT value. This parameter specifies the ID of the group used to match the pattern. The value of this parameter must be greater than or equal to 0. If you do not specify this parameter, the group with a group_id of 1 is used to match the pattern. If you set this parameter to 0, the pattern is matched as a whole.

  • Return value

    An ARRAY value is returned. If you specify group_id, an array containing all matching results for the specified group_id is returned. If you do not specify group_id, an array containing all matching results for group_id 1 is returned.

  • Example

    • If you do not specify group_id, the default behavior is to return results that match the first group_id.

      SELECT regexp_extract_all('100-200, 300-400', '(\\d+)-(\\d+)');

      Returns:

      +------------+
      | _c0        |
      +------------+
      | [100,300] |
      +------------+
    • If you set the group_id value to 2, the matching result for the second group_id is returned.

      SELECT regexp_extract_all('100-200, 300-400', '(\\d+)-(\\d+)',2);

      The following is the result.

      +------------+
      | _c0        |
      +------------+
      | [200,400] |
      +------------+

REGEXP_INSTR

  • Syntax

    bigint regexp_instr(string <source>, string <pattern>[,bigint <start_position>[, bigint <occurrence>[, bigint <return_option>]]])
  • Description

    Returns the starting or ending position of the specified occurrence of a substring that matches pattern in source, starting from start_position.

  • Parameters

    • source: Required. The source string. The data type must be STRING.

    • pattern: Required. The pattern to match. The value can be a STRING constant or a regular expression. For more information about regular expressions, see RLIKE string matching. An empty pattern returns an error.

    • start_position: Optional. The position from which the search starts. The value must be a BIGINT constant. The default value is 1.

    • occurrence: Optional. The occurrence of the match to find. The value must be a BIGINT constant. The default value is 1, which indicates the first match.

    • return_option: Optional. The position to return. The value must be a BIGINT constant. Valid values: 0 specifies the start position and 1 specifies the end position. The default value is 0. If you specify an invalid value, an error is returned.

  • Return value

    The return value is of the BIGINT type. The return_option parameter specifies whether to return the starting or ending position of the matched substring in source. The return value is determined by the following rules:

    • If pattern is empty, an error is returned.

    • If the value of start_position or occurrence is not a BIGINT constant or is less than or equal to 0, an error is returned.

    • If the value of source, pattern, start_position, occurrence, or return_option is NULL, NULL is returned.

  • Examples

    • Example 1: Return the starting position of the 2nd occurrence of a substring that matches o[[:alpha:]]{1} in the string i love www.taobao.com, starting from position 3. Sample statement:

      -- Returns 14.
      select regexp_instr('i love www.taobao.com', 'o[[:alpha:]]{1}', 3, 2);
    • Example 2: Return the ending position of the second occurrence of a substring that matches o[[:alpha:]]{1} in the string i love www.taobao.com. The search starts from the third character. Sample statement:

      -- Returns 16.
      select regexp_instr('i love www.taobao.com', 'o[[:alpha:]]{1}', 3, 2, 1);
    • Example 3: An input parameter is NULL. Sample statement:

      -- Returns NULL.
      select regexp_instr('i love www.taobao.com', null, 3, 2);

REGEXP_REPLACE

  • Syntax

    string regexp_replace(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])
    Note

    For Hive-compatible data types, the REGEXP_REPLACE function follows the Java regular expression specification. For data types 1.0 and 2.0, it follows the MaxCompute specification.

  • Description

    Replaces the substring that matches the pattern at the position specified by occurrence in the source string with replace_string and returns the result.

  • Parameters

    • source: Required. A value of the STRING type. This is the source string to be modified.

    • pattern: Required. A STRING constant or a regular expression. This is the pattern to match. For more information about regular expression rules, see RLIKE string matching. If the pattern is empty, an error is returned.

    • replace_string: Required. A value of the STRING type. This is the string that replaces the substring that matches the pattern.

      Note
      • If replace_string is an empty string, the function deletes the substring that matches the pattern.

      • The replace_string can contain backreferences in the format of \n to insert the substring that matches the nth capturing group in the pattern. The value of n can be an integer from 1 to 9. \0 inserts the entire substring that matches the pattern. You must escape backslashes, such as \\1, or use a raw string, such as R'(\1)'.

    • occurrence: Optional. A constant of the BIGINT type, which must be greater than or equal to 0. The value of this parameter indicates that the string that matches the specified pattern at the nth occurrence specified by occurrence is replaced with replace_string. If this parameter is set to 0, all the substrings that match the specified pattern are replaced. If it is of another data type or is less than 0, an error is returned. Default value: 0.

  • Return value

    A value of the STRING type is returned. The return value is determined by the following rules:

    • If you reference a group that does not exist, the result is undefined.

    • If replace_string is NULL and a match for the pattern is found, NULL is returned.

    • If replace_string is NULL and pattern does not match, the function returns the original string.

    • If source, pattern, or occurrence is NULL, NULL is returned.

  • Examples

    • Example 1: Replace substrings based on specified rules.

      -- Returns Abcd.
      select regexp_replace("abcd", "a", "A", 0);
      -- Returns bcd.
      select regexp_replace("abcd", "a", "", 0);
      -- Returns 19700101.
      select regexp_replace("1970-01-01", "-", "", 0);
      -- Returns abc.
      select regexp_replace("a1b2c3", "[0-9]", "", 0);
      -- Returns a1b2c.
      select regexp_replace("a1b2c3", "[0-9]", "", 3);
    • Example 2: Replace all matches for the pattern ([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4}) in the string 123.456.7890 with (\\1)\\2-\\3. Sample statement:

      -- Returns (123)456-7890.
      select regexp_replace('123.456.7890', '([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})',
      '(\\1)\\2-\\3', 0);
    • Example 3: Replace substrings that match a pattern in the string abcd based on specified rules. Sample statements:

      -- Returns a b c d.
      select regexp_replace('abcd', '(.)', '\\1 ', 0);
      -- Returns a bcd.
      select regexp_replace('abcd', '(.)', '\\1 ', 1);
      -- Returns d.
      select regexp_replace("abcd", "(.*)(.)$", "\\2", 0);
    • Example 4: The url_set table contains a URL column where the values are in the format www.simple@xxx.com. The xxx part is unique for each row. This example replaces all characters after www. Sample statement:

      -- Returns wwwtest.
      select regexp_replace(url,'(www)(.*)','wwwtest',0) from url_set;
    • Example 5: An input parameter is NULL. Sample statement:

      -- Returns NULL.
      select regexp_replace('abcd', '(.)', null, 0);
    • Example 6: Reference a group that does not exist. Sample statements:

      -- Pattern defines only one group; second group reference doesn't exist.
      -- Avoid this usage; result is undefined.
      regexp_replace("abcd", "(.)", "\\2", 0) = "" or "abcd"
      -- Pattern has no groups; \1 references non-existent group.
      -- Avoid this usage; result is undefined.
      regexp_replace("abcd", "a", "\\1", 0) = "bcd" or "abcd"

REGEXP_SUBSTR

  • Syntax

    string regexp_substr(string <source>, string <pattern>[, bigint <start_position>[, bigint <occurrence>]])
  • Description

    Returns the substring from the start_position in the source string that matches the pattern for the occurrence-th time.

  • Parameters

    • source: Required. STRING type. The string to search.

    • pattern: Required. STRING constant or regular expression. The pattern to match. Regular expression syntax is described in RLIKE string matching.

    • start_position: Optional. BIGINT constant. The value must be greater than 0. If you do not specify this parameter, the default value is 1, which indicates that the search starts from the first character of the source string.

    • occurrence: Optional. BIGINT constant. The value must be greater than 0. The default value is 1, which indicates the first match.

  • Return value

    Returns a STRING. The return value follows these rules:

    • If the pattern is empty, an error occurs.

    • If no match is found, returns NULL.

    • If start_position or occurrence is not a BIGINT or is less than or equal to 0, an error occurs.

    • If source, pattern, start_position, or occurrence is NULL, returns NULL.

  • Examples

    • Example 1: Return the substring that matches the specified pattern in I love aliyun very much. Sample statement:

      -- Returns aliyun.
      select regexp_substr('I love aliyun very much', 'a[[:alpha:]]{5}');
      -- Returns have.
      select regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 1);
      -- Returns 2.
      select regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 2);
    • Example 2: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select regexp_substr('I love aliyun very much', null);

REPEAT

  • Syntax

    string repeat(string <str>, bigint <n>)
  • Description

    Returns the string str repeated n times.

  • Parameters

    • str: required. STRING type. If the input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it is implicitly converted to STRING.

    • n: required. BIGINT type. The value size must not exceed 2 MB.

  • Return value

    Returns a value of STRING type. The return value follows these rules:

    • If str is not of type STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.

    • If n is empty, an error is returned.

    • If either str or n is NULL, NULL is returned.

  • Examples

    • Example 1: Repeat the string abc 5 times. Sample statement:

      -- Returns abcabcabcabcabc.
      select repeat('abc', 5); 
    • Example 2: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select repeat('abc', null);

REPLACE

  • Syntax

    string replace(string <str>, string <old>, string <new>)
  • Description

    Replaces all occurrences of old with new in str and returns the modified str. If no matches are found, the original str is returned. This function is a MaxCompute V2.0 extension function.

  • Parameters

    • str: required. STRING type. The source string.

    • old: Required. The string to compare.

    • new: required. STRING type. The replacement string.

  • Return value

    Returns a value of the STRING type. If any input parameter is NULL, NULL is returned.

  • Examples

    • Example 1: Replace all occurrences of the substring abab with 12 in the string ababab. Sample statement:

      -- Returns 12ab.
      select replace('ababab','abab','12');
    • Example 2: One of the input parameters is NULL. Sample statement:

      -- Returns NULL.
      select replace('123abab456ab',null,'abab');

REVERSE

  • Syntax

    string reverse(string <str>)
  • Description

    Returns the reverse of the string str.

  • Parameters

    str: Required. A parameter of the STRING type. If the input is a BIGINT, DOUBLE, DECIMAL, or DATETIME value, it is implicitly converted to the STRING type.

  • Return value

    Returns a value of the STRING type. The following rules apply:

    • If the data type of str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.

    • If str is NULL, NULL is returned.

  • Examples

    • Example 1: Reverse the string I love aliyun very much. Sample statement:

      -- Returns hcum yrev nuyila evol I.
      select reverse('I love aliyun very much');
    • Example 2: The input parameter is NULL. Sample statement:

      -- Returns NULL.
      select reverse(null);

RPAD

  • Syntax

    string rpad(string <str1>, int <length>, string <str2>)
  • Description

    Right-pads the string str1 to length characters using the padding string str2. This function is an extension in MaxCompute V2.0.

  • Parameters

    • str1: required. STRING type. The string to right-pad.

    • length: required. INT type. Must be greater than or equal to 0. Specifies the number of characters in the padded result.

    • str2 (required): The string used for padding.

  • Return value

    Returns STRING type. The return value follows these rules:

    • If length is less than str1's length, returns leftmost length characters of str1.

    • If length is 0, returns empty string.

    • If no input parameters or any input parameter is NULL, returns NULL.

  • Examples

    • Example 1: Right-pad the string abcdefgh to 10 characters using the padding string 12. Sample statement:

      -- Returns abcdefgh12.
      select rpad('abcdefgh', 10, '12');
    • Example 2: Right-pad the string abcdefgh to 5 characters using the padding string 12. Sample statement:

      -- Returns abcde.
      select rpad('abcdefgh', 5, '12');
    • Example 3: length is 0. Sample statement:

      -- Returns empty string.
      select rpad('abcdefgh' ,0, '12'); 
    • Example 4: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select rpad(null ,0, '12');

RTRIM

  • Syntax

    string rtrim(string <str>[, <trimChars>])
    string trim(trailing [<trimChars>] from <str>)
  • Description

    Removes characters from the right end of str:

    • If trimChars is not specified, spaces are trimmed by default.

    • If trimChars is specified, the characters in trimChars are treated as a collection. The longest substring from the right end of str that consists entirely of characters from the trimChars collection is removed.

  • Parameters

    • str: required. STRING type. The string to trim. If the input is BIGINT, DECIMAL, DOUBLE, or DATETIME, it is implicitly converted to STRING.

    • trimChars: optional. STRING type. The characters to remove.

  • Return value

    Returns STRING type. The return value follows these rules:

    • If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.

    • If str or trimChars is NULL, returns NULL.

  • Examples

    • Example 1: Remove trailing spaces from the string yxTxyomxx . Sample statements:

      -- Returns  yxTxyomxx.
      select rtrim(' yxTxyomxx ');
      -- Equivalent to:
      select trim(trailing from ' yxTxyomxx ');
    • Example 2: Remove all trailing characters from the string yxTxyomxx that belong to the set xy.

      -- Returns yxTxyom. Any trailing x or y is removed.
      select rtrim('yxTxyomxx', 'xy');
      -- Equivalent to:
      select trim(trailing 'xy' from 'yxTxyomxx');
    • Example 3: Input parameter is NULL. Sample statements:

      -- Returns NULL.
      select rtrim(null);
      select rtrim('yxTxyomxx', null);

SOUNDEX

  • Syntax

    string soundex(string <str>)
  • Description

    Converts a standard string to a SOUNDEX string.

  • Parameters

    str: Required. The string to convert. This parameter is of STRING type. This function is an extension in MaxCompute V2.0.

  • Return value

    The return value is of STRING type. If str is NULL, the function returns NULL.

  • Examples

    • Example 1: Converts the string hello to a SOUNDEX string. Example statement:

      -- Returns H400.
      select soundex('hello');
    • Example 2: The input parameter is NULL. Example statement:

      -- Returns NULL.
      select soundex(null);

SPACE

  • Syntax

    string space(bigint <n>)
  • Description

    Generates a string that consists of n spaces.

  • Parameters

    n: required. BIGINT type. The value size must not exceed 2 MB.

  • Return value

    Returns a STRING. The return value follows these rules:

    • If n is empty, an error is returned.

    • If n is NULL, NULL is returned.

  • Example

    -- Returns 10.
    select length(space(10));

SPLIT

  • Syntax

    ARRAY<STRING> SPLIT(STRING <source>, STRING <delimiter>[, BOOLEAN <trimTailEmpty>])
    
    -- Standard example.
    -- Returns ["a","b","c"].
    SELECT SPLIT('a,b,c', ',');
  • Parameters

      • source: Required. The string to split. The data type is STRING.

      • delimiter: Required. The separator that is used to split the string. This parameter supports regular expressions. The data type is STRING.

      • trimTailEmpty: Optional. Specifies whether to keep trailing empty strings. The default value is true. If you set this parameter to false, trailing empty strings are kept. The data type is BOOLEAN.

  • Return value

    Returns a value of the ARRAY<STRING> type.

  • Example

    -- The return value is ["a"," b"," c"]. 
    select split("a, b, c", ",");
    
    -- No empty string is returned by default.
    select split("a, b, c,,", ",");
    -- The following result is returned:
    +------------+
    | _c0        |
    +------------+
    | ["a"," b"," c"] |
    +------------+
    
    -- If you need to return an empty string, execute the following statement:
    select split("a, b, c,,", ",", false);
    -- The following result is returned:
    +------------+
    | _c0        |
    +------------+
    | ["a"," b"," c","",""] |
    +------------+

SPLIT_PART

  • Syntax

    string split_part(string <str>, string <separator>, bigint <start>[, bigint <end>])
  • Description

    Splits the string str using the delimiter separator and returns the substring from part start to end (closed interval).

  • Parameters

    • str: required. STRING type. The string to split. If the input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it is implicitly converted to STRING.

    • separator: required. STRING constant. The delimiter (a single character or string).

    • start: required. BIGINT constant, must be greater than 0. The starting part number (counting from 1).

    • end: a BIGINT constant greater than or equal to start. This parameter specifies the ending segment number to return. If this parameter is not specified, its value defaults to the value of start, and only the segment specified by start is returned.

  • Return Value Description

    Returns a STRING. The return value follows these rules:

    • If start exceeds the actual number of parts (for example, there are 6 parts but start > 6), an empty string is returned.

    • If separator is not found in str and start = 1, the entire str is returned. If str is empty, an empty string is returned.

    • If separator is empty, the original str is returned.

    • If end exceeds the number of parts, the substring from start to the last part is returned.

    • If str is not of type STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.

    • If separator is not a STRING constant, an error is returned.

    • If start or end is not a BIGINT constant, an error is returned.

    • If any parameter except separator is NULL, NULL is returned.

  • Examples

    • Example 1: Split the string a,b,c,d using the delimiter , and return the specified parts. Sample statements:

      -- Returns a.
      select split_part('a,b,c,d', ',', 1);
      -- Returns a,b.
      select split_part('a,b,c,d', ',', 1, 2);
    • Example 2: start exceeds the actual number of parts. Sample statement:

      -- Returns empty string.
      select split_part('a,b,c,d', ',', 10);
    • Example 3: separator is not found in str. Sample statements:

      -- Returns a,b,c,d.
      select split_part('a,b,c,d', ':', 1);
      -- Returns empty string.
      select split_part('a,b,c,d', ':', 2);
    • Example 4: separator is empty. Sample statement:

      -- Returns a,b,c,d.
      select split_part('a,b,c,d', '', 1);
    • Example 5: end exceeds the actual number of parts. Sample statement:

      -- Returns b,c,d.
      select split_part('a,b,c,d', ',', 2, 6);
    • Example 6: Any input parameter except separator is NULL. Sample statement:

      -- Returns NULL.
      select split_part('a,b,c,d', ',', null);

SUBSTR

  • Syntax

    string substr(string <str>, bigint <start_position>[, bigint <length>])
  • Description

    Returns a substring of str starting from start_position with a length of length.

  • Parameters

    • str: required. STRING type. If the input is BIGINT, DECIMAL, DOUBLE, or DATETIME, it is implicitly converted to STRING.

    • start_position: required. BIGINT type. The default start position is 1.

      • Hive-compatible data type version: If start_position=0, it behaves the same as position 1.

      • 1.0 and 2.0 data type versions: If start_position=0, it returns NULL.

    • length: optional. BIGINT type. The substring length must be >0.

      Important
      • If setproject odps.function.strictmode=false: a length<0 returns blank.

      • If setproject odps.function.strictmode=true: a length<0 returns an error.

  • Return value

    Returns STRING type. The return value follows these rules:

    • If str is not STRING, BIGINT, DECIMAL, DOUBLE, or DATETIME, it returns an error.

    • If length is not BIGINT or ≤0, it returns an error.

    • If you omit length, the substring extends to the end of str.

    • If str, start_position, or length is NULL, it returns NULL.

  • Examples

    • Example 1: Return a substring of abc from the specified position with the specified length. Sample statements:

      -- Returns bc.
      select substr('abc', 2);
      -- Returns b.
      select substr('abc', 2, 1);
      -- Returns bc.
      select substr('abc',-2 , 2);
    • Example 2: When an input parameter is NULL. Sample statement:

      -- Returns NULL.
      select substr('abc', null);

SUBSTRING

  • Syntax

    string substring(string|binary <str>, int <start_position>[, int <length>])
  • Description

    Returns a substring of str starting from start_position with length.

  • Parameters

    • str: required. This parameter is of STRING or BINARY type.

    • start_position: required. This parameter is of INT type. The starting position is 1-based. If start_position is 0, an empty string is returned. If start_position is a negative value, the position is counted backward from the end of the string. In this case, -1 refers to the last character, and counting proceeds backward sequentially.

    • length: optional. This parameter is of BIGINT type. The substring length must be greater than 0.

  • Return value

    The return value is of STRING type. It adheres to the following rules:

    • If str is not of STRING or BINARY type, an error is returned.

    • If length is not of BIGINT type or is less than or equal to 0, an error is returned.

    • When the length parameter is omitted, a substring up to the end of str is returned.

    • If str, start_position, or length is NULL, NULL is returned.

  • Examples

    • Example 1: This example returns a substring of abc from a specified position with a specified length. Sample statements:

      -- Returns bc.
      select substring('abc', 2);
      -- Returns b.
      select substring('abc', 2, 1);
      -- Returns bc.
      select substring('abc',-2,2);
      -- Returns ab.
      select substring('abc',-3,2);
      -- Returns 001.
      substring(bin(2345), 2, 3);
    • Example 2: This example shows a case where one input parameter is NULL. Sample statement:

      -- Returns NULL.
      select substring('abc', null, null);

SUBSTRING_INDEX

  • Syntax

    string substring_index(string <str>, string <separator>, int <count>)
  • Description

    This function extracts a substring from the str string, either before or after the count-th occurrence of a delimiter. If count is positive, the search for the delimiter starts from the left. If count is negative, the search starts from the right. This function is a MaxCompute V2.0 extension function.

  • Parameters

    • str: required. A STRING type. This is the string to extract from.

    • separator: required. A STRING type delimiter.

    • count: Required. An INT value that specifies the position of the separator.

  • Return value

    The function returns a value of STRING type. If any input parameter is NULL, the function returns NULL.

  • Examples

    • Example 1: Extracts from the string https://help.aliyun.com. Sample statements:

      -- Returns https://help.aliyun.
      select substring_index('https://help.aliyun.com', '.', 2);
      -- Returns aliyun.com.
      select substring_index('https://help.aliyun.com', '.', -2);
    • Example 2: An input parameter is NULL. Sample statement:

      -- Returns NULL.
      select substring_index('https://help.aliyun.com', null, 2);

TO_CHAR

  • Syntax

    string to_char(boolean <value>)
    string to_char(bigint <value>)
    string to_char(double <value>)
    string to_char(decimal <value>)
  • Description

    Converts values of type BOOLEAN, BIGINT, DECIMAL, or DOUBLE to their STRING representations.

  • Parameters

    value: required. A value of type BOOLEAN, BIGINT, DECIMAL, or DOUBLE.

  • Return value

    Returns a value of type STRING. The return value follows these rules:

    • If value is not of type BOOLEAN, BIGINT, DECIMAL, or DOUBLE, the function returns an error.

    • If value is NULL, the function returns NULL.

  • Examples

    • Example 1: Convert values to STRING representation. Sample statement:

      -- Returns 123.
      select to_char(123);
      -- Returns TRUE.
      select to_char(true);
      -- Returns 1.23.
      select to_char(1.23);
    • Example 2: Input parameter is NULL. Sample statement:

      -- Returns NULL.
      select to_char(null);

TO_JSON

  • Syntax

    string to_json(<expr>)
  • Description

    Converts an expression specified by expr of a given complex data type into a JSON string.

  • Parameters

    expr: required. An expression of the ARRAY, MAP, or STRUCT type.

    Note

    If the input expression is of the STRUCT type (struct<key1:value1, key2:value2>), take note of the following points:

    • All keys are converted into lowercase letters when you convert the expression into a JSON string.

    • If a value is null, the key-value pair to which the value belongs is not included in the JSON string that is returned. For example, if value2 is null, key2:value2 is not included in the JSON string that is returned.

  • Return value

    A JSON string is returned.

  • Examples

    • Example 1: Convert an expression of a given complex data type into a JSON string. Sample statement:

      -- The return value is {"a":1,"b":2}. 
      select to_json(named_struct('a', 1, 'b', 2));
      -- The return value is {"time":"26/08/2015"}. 
      select to_json(named_struct('time', "26/08/2015"));
      -- The return value is [{"a":1,"b":2}]. 
      select to_json(array(named_struct('a', 1, 'b', 2)));
      -- The return value is {"a":{"b":1}}. 
      select to_json(map('a', named_struct('b', 1)));
      -- The return value is {"a":1}. 
      select to_json(map('a', 1));
      -- The return value is [{"a":1}]. 
      select to_json(array((map('a', 1))));
    • Example 2: The input expression is of the STRUCT type. Sample statement:

      -- The return value is {"a":"B"}. If the expression of the STRUCT type is converted into a JSON string, all keys are converted into lowercase letters. 
      select to_json(named_struct("A", "B"));
      -- The return value is {"k2":"v2"}. The key-value pair to which null belongs is not included in the JSON string that is returned. 
      select to_json(named_struct("k1", cast(null as string), "k2", "v2"));

TOLOWER

  • Syntax

    string tolower(string <source>)
  • Description

    Converts uppercase English letters in the string source to lowercase.

  • Parameters

    source: required. A value of the STRING type. If the input is a BIGINT, DOUBLE, DECIMAL, or DATETIME value, it is implicitly converted to the STRING type. This function operates only on English characters.

  • Return value

    Returns a value of the STRING type. The returned value is based on the following rules:

    • If the data type of source is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.

    • If source is NULL, NULL is returned.

  • Examples

    • Example 1: Convert uppercase letters to lowercase. Sample statements:

      -- Returns abcd.
      select tolower('aBcd');
      -- Returns 中国fighting.
      select tolower('中国Fighting');
    • Example 2: The input parameter is NULL. Sample statement:

      -- Returns NULL.
      select tolower(null);

TOUPPER

  • Syntax

    string toupper(string <source>)
  • Description

    Converts the lowercase characters in the source string to their corresponding uppercase characters.

  • Parameters

    source: required. STRING type. If the input is BIGINT, DOUBLE, DECIMAL, or DATETIME, it is implicitly converted to STRING. Supports only English characters.

  • Return value

    Returns a value of the STRING type. The returned value is based on the following rules:

    • If the data type of source is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, an error is returned.

    • If source is NULL, NULL is returned.

  • Examples

    • Example 1: Convert lowercase letters to uppercase. Sample statements:

      -- Returns ABCD.
      select toupper('aBcd');
      -- Returns ChinaFIGHTING.
      select toupper('中国Fighting');
    • Example 2: The input parameter is NULL. Sample statement:

      -- Returns NULL.
      select toupper(null);

TRANSLATE

  • Syntax

    string translate(string|varchar <str1>, string|varchar <str2>, string|varchar <str3>)
  • Description

    Replaces each character in str1 that appears in str2 with the corresponding character in str3. If no match is found, no replacement occurs. This function is an extension in MaxCompute V2.0.

  • Return value

    Returns a STRING. If any input parameter is NULL, the function returns NULL.

  • Examples

    • Example 1: Replace each character in ababab that appears in abab with the corresponding character in cd. Sample statement:

      -- Returns cdcdcd.
      select translate('ababab','abab','cd');
    • Example 2: Replace each character in ababab that appears in abab with the corresponding character in cdefg. Sample statement:

      -- Returns cdcdcd.
      select translate('ababab','abab','cdefg');
    • Example 3: One input parameter is NULL. Sample statement:

      -- Returns NULL.
      select translate('ababab','cd',null);

TRIM

  • Syntax

    string trim(string <str>[,<trimChars>])
    string trim([BOTH] [<trimChars>] from <str>)
  • Description

    Removes characters from both ends of str:

    • If you do not specify trimChars, the function removes spaces by default.

    • If you specify trimChars, the function treats the characters in trimChars as a set and removes the longest possible substrings consisting only of characters from this set from both the left and right ends of str.

  • Parameters

    • str: required. STRING type. The string to trim. If the input is BIGINT, DECIMAL, DOUBLE, or DATETIME, it is implicitly converted to STRING.

    • trimChars: optional. STRING type. The characters to remove.

  • Return value

    Returns STRING type. The return value follows these rules:

    • If str is not STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME, returns an error.

    • If str or trimChars is NULL, returns NULL.

  • Examples

    • Example 1: Remove spaces from both ends of the string yxTxyomxx . Sample statements:

      -- Returns yxTxyomxx.
      select trim(' yxTxyomxx ');
      -- Equivalent to:
      select trim(both from ' yxTxyomxx ');
      select trim(from ' yxTxyomxx ');
    • Example 2: Remove all characters from both ends of the string yxTxyomxx that belong to the set xy.

      -- Returns Txyom. Any x or y on either end is removed.
      select trim('yxTxyomxx', 'xy');
      -- Equivalent to:
      select trim(both 'xy' from 'yxTxyomxx');
      select trim('xy' from 'yxTxyomxx');
    • Example 3: Input parameter is NULL. Sample statements:

      -- Returns NULL.
      select trim(null);
      select trim('yxTxyomxx', null);

URL_DECODE

  • Syntax

    string url_decode(string <input>[, string <encoding>])
  • Description

    Converts an input string from application/x-www-form-urlencoded MIME format to standard characters. This function is the inverse of url_encode. The encoding rules are as follows:

    • a–z and A–Z remain unchanged.

    • Period (.), hyphen (-), asterisk (*), and underscore (_) remain unchanged.

    • Plus sign (+) converts to space.

    • Sequences in %xy format are converted to byte values. Consecutive bytes are decoded into strings using the specified encoding name.

    • All other characters remain unchanged.

  • Parameters

    • input: This parameter is required and is of STRING type. It represents the input string.

    • encoding: This parameter is optional and specifies the encoding format. It supports formats such as GBK and UTF-8. If unspecified, the default encoding is UTF-8.

  • Return value

    This function returns a UTF-8 encoded STRING type. If the input or encoding parameter is NULL, the function returns NULL.

  • Example

    -- Returns examplefor url_decode:// (fdsf).
    select url_decode('%E7%A4%BA%E4%BE%8Bfor+url_decode%3A%2F%2F+%28fdsf%29');
    -- Returns Example for URL_DECODE:// dsf(fasfs).
    select url_decode('Example+for+url_decode+%3A%2F%2F+dsf%28fasfs%29', 'GBK');

URL_ENCODE

  • Syntax

    string url_encode(string <input>[, string <encoding>])
  • Description

    Encodes an input string in application/x-www-form-urlencoded MIME format. Encoding rules:

    • Letters a–z and A–Z remain unchanged.

    • The period (.), hyphen (-), asterisk (*), and underscore (_) remain unchanged.

    • Spaces convert to plus signs (+).

    • All other characters are converted to byte values using the specified encoding. Each byte appears as %xy, where xy is the character’s hexadecimal representation.

  • Parameters

    • input: Required. STRING type. The string to input.

    • encoding: optional. The encoding format. Supported formats include GBK and UTF-8. The default is UTF-8 if unspecified.

  • Return value

    Returns a STRING. If input or encoding is NULL, returns NULL.

  • Example

    -- Returns %E7%A4%BA%E4%BE%8Bfor+url_encode%3A%2F%2F+%28fdsf%29.
    select url_encode('examplefor url_encode:// (fdsf)');
    -- Returns Example+for+url_encode+%3A%2F%2F+dsf%28fasfs%29.
    select url_encode('Example for url_encode:// dsf(fasfs)', 'GBK');

References

To extend beyond built-in functions, create user-defined functions (UDFs). For more information, see Overview of MaxCompute UDFs.