Use Spark SQL to create an internal table

更新时间:
复制 MD 格式

AnalyticDB for MySQL supports Spark SQL for creating C-Store tables, which use the XUANWU storage engine. The Spark SQL syntax for C-Store tables includes AnalyticDB-specific properties not found in open source Spark SQL.

Syntax differences from open source Spark SQL

Creating C-Store tables in AnalyticDB for MySQL requires AnalyticDB-specific TBLPROPERTIES keys (see Parameters). For creating databases and querying data, AnalyticDB for MySQL uses the same syntax as open source Spark SQL. For details, see the open source Spark SQL reference.

Limitations

Spark SQL can only create tables that use the XUANWU storage engine.

Syntax

CREATE TABLE [ IF NOT EXISTS ] table_identifier
    [ ( column_name column_type [ COMMENT col_comment1 ], ... ) ]
    USING adb
    [ COMMENT table_comment ]
    [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]

Parameters

Core parameters

ParameterRequiredDescription
table_identifierYesThe table name. Use the db_name.table_identifier format to distinguish tables with the same name across databases. For naming conventions, see Naming conventions.
column_nameYesThe column name. For naming conventions, see Naming conventions.
column_typeYesThe column data type. For supported types, see Data type mappings.
USING adbYesSpecifies that the table is an AnalyticDB for MySQL table.
COMMENTNoA comment for the table.
TBLPROPERTIESNoKey-value pairs that define table properties. See the TBLPROPERTIES parameters below.

TBLPROPERTIES: required properties

PropertyRequiredDescription
distributeTypeYesThe distribution type. Valid values: HASH (distributes table data across shards based on the hash values of the distribution key columns) and BROADCAST (stores a full copy of the table in each shard; keep replicated tables small).
distributeColumnsYesThe distribution key columns. Accepts multiple columns. The distribution key cannot be changed after the table is created.
storagePolicyYesThe storage policy. Valid values: HOT (default), COLD, and MIXED. Case-insensitive. See Storage policies for details.

TBLPROPERTIES: optional properties

PropertyRequiredDescription
hotPartitionCountRequired when storagePolicy=MIXEDThe number of hot partitions for hybrid storage. Partitions are sorted by partition key value in descending order — the first N partitions are hot partitions stored on SSDs, and the remaining partitions are cold partitions stored in OSS. N must be a positive integer. Do not set this property for HOT or COLD storage policies.
primaryKeyNoThe primary key columns. A primary key must include the distribution key and the partition key. Place the distribution key and partition key at the beginning of a composite primary key.
partitionTypeNoThe partition type. Set to VALUE.
partitionColumnNoThe partition key column (subpartition).
partitionCountNoThe partition lifecycle — the maximum number of subpartitions per shard. If you specify a partition key, you must also set partitionCount, otherwise an error is returned. For example, 'partitionCount'='365' means each shard retains at most 365 subpartitions. If a date column is the partition key, data from day 1 is automatically deleted when new data arrives on day 366.
indexNameNoThe index names, comma-separated. Example: 'indexName'='index1,index2'.
indexTypeNoThe index type for a named index. Valid values: FULLTEXT (full-text index) and NORMAL (regular index). Example: 'indexType.index1'='FULLTEXT'.
indexColumnNoThe indexed column for a named index. Example: 'indexColumn.index1'='c1'.
indexAllNoSpecifies whether to index all columns. Valid values: Y (default) and N.
tableEngineNameNoThe storage engine. Defaults to XUANWU.

Storage policies

PolicyData locationPerformanceCostUse when
HOT (default)All partitions on SSDsBest query performanceHighestData is frequently accessed
COLDAll partitions in OSS with zone-redundant storage (multi-AZ)LowerCost-effectiveData is rarely accessed
MIXEDHot partitions on SSDs; cold partitions in OSSHigh for recent dataBalancedData access frequency varies by age; requires hotPartitionCount

Examples

Create a non-partitioned table

-- Non-partitioned table with HASH distribution and COLD storage
CREATE TABLE orders (
  order_id     BIGINT    NOT NULL COMMENT 'Order ID',
  customer_id  INT       NOT NULL COMMENT 'Customer ID',
  order_status STRING    NOT NULL COMMENT 'Order status',
  total_price  BIGINT    NOT NULL COMMENT 'Order amount',
  order_date   TIMESTAMP NOT NULL COMMENT 'Order date'
)
USING adb
TBLPROPERTIES (
  'distributeType'    = 'HASH',
  'distributeColumns' = 'order_status',
  'storagePolicy'     = 'COLD'
);

Create a partitioned table with a lifecycle

-- Partitioned table with primary key, selective indexing, and partition lifecycle
CREATE TABLE customer (
  customer_id     BIGINT    NOT NULL COMMENT 'Customer ID',
  customer_name   STRING    NOT NULL COMMENT 'Customer name',
  phone_num       BIGINT    NOT NULL COMMENT 'Phone number',
  city_name       STRING    NOT NULL COMMENT 'City',
  sex             INT       NOT NULL COMMENT 'Gender',
  id_number       STRING    NOT NULL COMMENT 'ID card number',
  home_address    STRING    NOT NULL COMMENT 'Home address',
  office_address  STRING    NOT NULL COMMENT 'Office address',
  age             INT       NOT NULL COMMENT 'Age',
  login_time      TIMESTAMP NOT NULL COMMENT 'Logon time'
)
USING adb
TBLPROPERTIES (
  'distributeType'    = 'HASH',
  'primaryKey'        = 'customer_id,phone_num,city_name',
  'distributeColumns' = 'city_name',
  'partitionType'     = 'value',
  'partitionColumn'   = 'city_name',
  'partitionCount'    = 100,
  'indexAll'          = 'N',
  'storagePolicy'     = 'COLD'
);

Data type mappings

Spark SQL typeXIHE SQL type
BOOLEANBOOLEAN
TINYINTTINYINT
SMALLINTSMALLINT
INTINT
INTEGERINTEGER
BIGINTBIGINT
FLOATFLOAT
DOUBLEDOUBLE
BINARYBINARY
DATEDATE
STRINGVARCHAR, TIME, POINT, JSON, ARRAY, MAP
TIMESTAMPDATETIME, TIMESTAMP

Next steps

Reading and writing internal table data