Data type

更新时间:
复制 MD 格式

Dataphin currently utilizes MaxCompute 2.0 as its compute engine. This topic outlines the data types that MaxCompute 2.0 supports within Dataphin.

Limitations

The data type system version 2.0 is intended for:

  • MaxCompute projects created after April 2020 that contain no pre-existing data.

  • Environments where all dependent product components are also compatible with the version 2.0 type system.

Basic data types

Type

Constant example

Description

TINYINT

1Y and -127Y

An 8-bit signed integer type.

Valid values range from -128 to 127.

SMALLINT

32767S and -100S

A 16-bit signed integer type.

Valid values range from -32768 to 32767.

INT

1000 and -15645787

A 32-bit signed integer type.

Valid values range from -231 to 231-1.

BIGINT

100000000000L and -1L

A 64-bit signed integer type.

Valid values range from -263+1 to 263 -1.

BINARY

  • unhex('FA34E10293CB42848573A4E39937F479')

  • X'616263'

A binary number with a maximum length of 8 MB.

Note
  • In the format X'num [...]', num is a hexadecimal number, which can be 0~9 or A~F. For example, X'616263' represents abc because the ASCII encoding of a is 0x61, b is 0x62, and c is 0x63. X'616263' is equivalent to unhex('616263').

  • If the string length is odd, the system prepends a 0. For instance, X'616' is equivalent to X'0616'.

  • Use single quotes; double quotes are not valid. For example, X"616263" is not recognized as a BINARY constant.

FLOAT

3.14F and cast(3.14159261E+7 as float)

A 32-bit binary floating-point type.

Note

FLOAT data type calculations may result in precision loss due to computer storage and internal computing logic. For high precision requirements, consider converting FLOAT data to DECIMAL data.

DOUBLE

3.14D and 3.14159261E+7

A 64-bit binary floating-point type.

Note

DOUBLE data type calculations may result in precision loss due to computer storage and internal computing logic. For high precision requirements, consider converting DOUBLE data to DECIMAL data.

DECIMAL(precision,scale)

3.5BD and 99999999999.9999999BD

A precise numeric type based on the decimal system.

  • precision defines the maximum number of representable digits. Valid values: 1 <= precision <= 38.

  • scale defines the number of decimal places. Valid values: 0 <= scale <= 18.

The default is decimal(38,18) if precision and scale are not specified.

Note
  • DECIMAL types from different editions cannot coexist in the same table.

  • Enabling Hive compatible mode with the setproject odps.sql.hive.compatible=true; command will round the number of decimal places in the Decimal(precision, scale) type if it exceeds the scale during Tunnel Upload and SQL operations. Should the integer part surpass the limit, an error will be reported.

  • Setting odps.sql.decimal.tostring.trimzero to true removes trailing zeros after the decimal point, while setting it to false retains them. The default is true. This parameter only affects data read from tables and does not apply to static values.

VARCHAR(n)

None

A variable-length character type where n specifies the length.

Valid values: 1 to 65535.

CHAR(n)

None

The fixed-length character type is defined by n, which specifies its length, with a maximum of 255 characters. When the length is insufficient, spaces are added for padding; however, these spaces are not considered during comparisons.

STRING

"abc", 'bcd', "alibaba", and 'inc'

The string type with a maximum length of 8 MB.

DATE

DATE'2017-11-11'

The date type formatted as yyyy-mm-dd.

Valid values span from 0001-01-01 to 9999-12-31.

DATETIME

DATETIME'2017-11-11 00:00:00'

The DATETIME type.

Valid values range from 0001-01-01 00:00:00.000 to 9999-12-31 23:59:59.999, accurate to the millisecond.

TIMESTAMP

TIMESTAMP'2017-11-11 00:00:00.123456789'

The TIMESTAMP data type.

Valid values span from 0001-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999, accurate to the nanosecond.

Note

The timestamp is time zone agnostic. It stores a date offset from Epoch (UTC 1970-01-01 00:00:00) that is consistent across all time zones. TIMESTAMP data can be manipulated for time zone calculations using built-in functions, such as converting TIMESTAMP to STRING data based on the current time zone with cast(<a timestamp> as string).

TIMESTAMP_NTZ

TIMESTAMP_NTZ '2017-11-11 00:00:00.123456789'

A TIMESTAMP data type that does not consider time zones.

Valid values range from 0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999. For more details on the TIMESTAMP_NTZ data type, see the MaxCompute TIMESTAMP_NTZ data type documentation.

BOOLEAN

True and False

The BOOLEAN type.

Valid values: True and False.

Please consider the following points:

  • All data types are nullable.

  • The INT keyword in an SQL statement refers to the 32-bit integer type.

    -- Convert a into a 32-bit integer. 
    CAST(a AS INT)
  • By default, an integer constant is processed as the INT type. For example, integer constant 1 in SELECT 1 + a; is processed as the INT type. If a constant exceeds the value range of the INT type but does not exceed the value range of the BIGINT type, the constant is processed as the BIGINT type. If the constant exceeds the value range of the BIGINT type, the constant is processed as the DOUBLE type.

  • Implicit conversions

    • Certain implicit conversions (e.g., STRING to BIGINT, STRING to DATETIME, DOUBLE to BIGINT, DECIMAL to DOUBLE, DECIMAL to BIGINT) are disabled to prevent potential data loss or errors. Use the CAST function for explicit conversions.

    • Constants of the VARCHAR type can be implicitly converted into the STRING type.

  • Tables, Functions, and UDFs

    • Built-in functions that use 2.0 data types operate correctly in the 2.0 data type system.

    • UDFs are resolved and overloaded based on the 2.0 data type system.

    • Partition columns support STRING, VARCHAR, CHAR, TINYINT, SMALLINT, INT, and BIGINT data types.

  • MaxCompute automatically concatenates multiple adjacent string literals. For example, select 'abc' 'efg' 'ddt'; returns abcefgddt.

  • When inserting a constant into a DECIMAL field, the constant's format must match its definition. For example:

    INSERT INTO test_tb(a) VALUES (3.5BD)
  • DATETIME values do not include milliseconds. To specify the time format with millisecond precision, use the -dfp parameter in the Tunnel command, as in tunnel upload -dfp 'yyyy-MM-dd HH:mm:ss.SSS'. For more information, see Tunnel command documentation.

Complex data types

Data type

Definition

Constructor

ARRAY

  • array<int>

  • array<struct<a:int, b:string>>

  • array(1, 2, 3)

  • array(array(1, 2), array(3, 4))

MAP

  • map<string, string>

  • map<smallint, array<string>>

  • map("k1", "v1", "k2", "v2")

  • map(1S, array('a', 'b'), 2S, array('x', 'y'))

STRUCT

  • struct<x:int, y:int>

  • struct<field1:bigint, field2:array<int>, field3:map<int, int>>

  • named_struct('x', 1, 'y', 2)

  • named_struct('field1', 100L, 'field2', array(1, 2), 'field3', map(1, 100, 2, 200))

Note

MaxCompute supports complex data types, including nested structures. For detailed information on built-in functions, refer to ARRAY, MAP, and STRUCT.