Introduction to the syntax for creating tables by using Spark SQL-AnalyticDB(AnalyticDB)-阿里云帮助中心

AnalyticDB for MySQL supports Spark SQL for creating C-Store tables, which use the XUANWU storage engine. The Spark SQL syntax for C-Store tables includes AnalyticDB-specific properties not found in open source Spark SQL.

Syntax differences from open source Spark SQL

Creating C-Store tables in AnalyticDB for MySQL requires AnalyticDB-specific TBLPROPERTIES keys (see Parameters). For creating databases and querying data, AnalyticDB for MySQL uses the same syntax as open source Spark SQL. For details, see the open source Spark SQL reference.

Limitations

Spark SQL can only create tables that use the XUANWU storage engine.

Syntax

CREATE TABLE [ IF NOT EXISTS ] table_identifier
    [ ( column_name column_type [ COMMENT col_comment1 ], ... ) ]
    USING adb
    [ COMMENT table_comment ]
    [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]

Parameters

Core parameters

Parameter	Required	Description
`table_identifier`	Yes	The table name. Use the `db_name.table_identifier` format to distinguish tables with the same name across databases. For naming conventions, see Naming conventions.
`column_name`	Yes	The column name. For naming conventions, see Naming conventions.
`column_type`	Yes	The column data type. For supported types, see Data type mappings.
`USING adb`	Yes	Specifies that the table is an AnalyticDB for MySQL table.
`COMMENT`	No	A comment for the table.
`TBLPROPERTIES`	No	Key-value pairs that define table properties. See the TBLPROPERTIES parameters below.

TBLPROPERTIES: required properties

Property	Required	Description
`distributeType`	Yes	The distribution type. Valid values: `HASH` (distributes table data across shards based on the hash values of the distribution key columns) and `BROADCAST` (stores a full copy of the table in each shard; keep replicated tables small).
`distributeColumns`	Yes	The distribution key columns. Accepts multiple columns. The distribution key cannot be changed after the table is created.
`storagePolicy`	Yes	The storage policy. Valid values: `HOT` (default), `COLD`, and `MIXED`. Case-insensitive. See Storage policies for details.

TBLPROPERTIES: optional properties

Property	Required	Description
`hotPartitionCount`	Required when `storagePolicy=MIXED`	The number of hot partitions for hybrid storage. Partitions are sorted by partition key value in descending order — the first N partitions are hot partitions stored on SSDs, and the remaining partitions are cold partitions stored in OSS. N must be a positive integer. Do not set this property for `HOT` or `COLD` storage policies.
`primaryKey`	No	The primary key columns. A primary key must include the distribution key and the partition key. Place the distribution key and partition key at the beginning of a composite primary key.
`partitionType`	No	The partition type. Set to `VALUE`.
`partitionColumn`	No	The partition key column (subpartition).
`partitionCount`	No	The partition lifecycle — the maximum number of subpartitions per shard. If you specify a partition key, you must also set `partitionCount`, otherwise an error is returned. For example, `'partitionCount'='365'` means each shard retains at most 365 subpartitions. If a date column is the partition key, data from day 1 is automatically deleted when new data arrives on day 366.
`indexName`	No	The index names, comma-separated. Example: `'indexName'='index1,index2'`.
`indexType`	No	The index type for a named index. Valid values: `FULLTEXT` (full-text index) and `NORMAL` (regular index). Example: `'indexType.index1'='FULLTEXT'`.
`indexColumn`	No	The indexed column for a named index. Example: `'indexColumn.index1'='c1'`.
`indexAll`	No	Specifies whether to index all columns. Valid values: `Y` (default) and `N`.
`tableEngineName`	No	The storage engine. Defaults to `XUANWU`.

Storage policies

Policy	Data location	Performance	Cost	Use when
`HOT` (default)	All partitions on SSDs	Best query performance	Highest	Data is frequently accessed
`COLD`	All partitions in OSS with zone-redundant storage (multi-AZ)	Lower	Cost-effective	Data is rarely accessed
`MIXED`	Hot partitions on SSDs; cold partitions in OSS	High for recent data	Balanced	Data access frequency varies by age; requires `hotPartitionCount`

Examples

Create a non-partitioned table

-- Non-partitioned table with HASH distribution and COLD storage
CREATE TABLE orders (
  order_id     BIGINT    NOT NULL COMMENT 'Order ID',
  customer_id  INT       NOT NULL COMMENT 'Customer ID',
  order_status STRING    NOT NULL COMMENT 'Order status',
  total_price  BIGINT    NOT NULL COMMENT 'Order amount',
  order_date   TIMESTAMP NOT NULL COMMENT 'Order date'
)
USING adb
TBLPROPERTIES (
  'distributeType'    = 'HASH',
  'distributeColumns' = 'order_status',
  'storagePolicy'     = 'COLD'
);

Create a partitioned table with a lifecycle

-- Partitioned table with primary key, selective indexing, and partition lifecycle
CREATE TABLE customer (
  customer_id     BIGINT    NOT NULL COMMENT 'Customer ID',
  customer_name   STRING    NOT NULL COMMENT 'Customer name',
  phone_num       BIGINT    NOT NULL COMMENT 'Phone number',
  city_name       STRING    NOT NULL COMMENT 'City',
  sex             INT       NOT NULL COMMENT 'Gender',
  id_number       STRING    NOT NULL COMMENT 'ID card number',
  home_address    STRING    NOT NULL COMMENT 'Home address',
  office_address  STRING    NOT NULL COMMENT 'Office address',
  age             INT       NOT NULL COMMENT 'Age',
  login_time      TIMESTAMP NOT NULL COMMENT 'Logon time'
)
USING adb
TBLPROPERTIES (
  'distributeType'    = 'HASH',
  'primaryKey'        = 'customer_id,phone_num,city_name',
  'distributeColumns' = 'city_name',
  'partitionType'     = 'value',
  'partitionColumn'   = 'city_name',
  'partitionCount'    = 100,
  'indexAll'          = 'N',
  'storagePolicy'     = 'COLD'
);

Data type mappings

Spark SQL type	XIHE SQL type
BOOLEAN	BOOLEAN
TINYINT	TINYINT
SMALLINT	SMALLINT
INT	INT
INTEGER	INTEGER
BIGINT	BIGINT
FLOAT	FLOAT
DOUBLE	DOUBLE
BINARY	BINARY
DATE	DATE
STRING	VARCHAR, TIME, POINT, JSON, ARRAY, MAP
TIMESTAMP	DATETIME, TIMESTAMP

Next steps

Reading and writing internal table data