云原生数据仓库AnalyticDB MySQL版支持通过CREATE TABLE
创建表,也支持通过CTAS
将查询到的数据写入新表中。
语法
CREATE TABLE [IF NOT EXISTS] table_name
({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'string']
| table_constraints}
[, ... ] )
table_attribute
[partition_options]
[storage_policy]
[block_size]
[AS] query_expression
COMMENT 'string'
column_attributes:
[DEFAULT default_expr]
[AUTO_INCREMENT]
column_constraints:
[{NOT NULL|NULL} ]
[PRIMARY KEY]
table_constraints:
[{INDEX|KEY} [index_name] (column_name,...)]
[PRIMARY KEY [index_name] (column_name,...)]
[CLUSTERED KEY [index_name] (column_name,...)]
table_attribute:
DISTRIBUTED BY HASH(column_name,...) | DISTRIBUTED BY BROADCAST
partition_options:
PARTITION BY
{VALUE(column_name) | VALUE(date_format(column_name, ?))}
LIFECYCLE N
storage_policy:
STORAGE_POLICY= 'HOT'|'COLD'|'MIXED' [hot_partition_count=N]
block_size:
BLOCK_SIZE= VALUE
参数
参数 | 说明 |
---|---|
table_name | 表名。 表名以字母或下划线(_)开头,可包含字母、数字以及下划线(_),长度为1到127个字符。 支持 |
column_name | 列名。 列名以字母或下划线(_)开头,可包含字母、数字以及下划线(_),长度为1到127个字符。 |
column_type | 要添加的列的数据类型。 AnalyticDB MySQL版支持的数据类型,请参见基础数据类型。 |
column_attributes |
|
column_constraints |
|
table_constraints | 说明 云原生数据仓库AnalyticDB MySQL版不支持创建唯一索引。
|
table_attribute |
|
partition_options | PARTITION BY :普通表中定义分区。 通过 例如, AnalyticDB MySQL版不支持修改分区键。如果需要修改分区键,请参见更改分区键/分布键。 说明
|
storage_policy | 说明 目前仅弹性模式集群版(新版)实例支持冷热数据分层存储功能。
不同存储策略下数据读写性能不同,存储成本不同。为了降低数据存储成本,同时还要保证查询性能,您可以选择将查询频度高的数据(称为热数据)存储在SSD介质;将查询频度低的数据(称为冷数据)存储在HDD介质。 根据业务需求,您还可以按表粒度、表的二级分区粒度独立选择冷、热存储介质。例如,指定这个表数据全部存储在SSD,或者全部存储在HDD,或者指定这个表的一部分二级分区存储在SSD,另一部分二级分区存储在HDD。
|
hot_partition_count=N | 指定MIXED存储策略时热分区的个数。表示按分区键的值的大小倒序排列,最大N个分区为热分区,其他分区为冷分区。
|
block_size | 指定列式存储中每个block存储的Value的个数,也是最小的IO单元,默认值取值说明:
调大或调小会使得每次IO读取的Value个数变大或变小,具体产生的影响需要结合查询特征,例如点查询时,若block_size较大,存储读block的效率会降低,此时可以适当调小block_size。 重要 若不熟悉列式存储原理,建议不要进行更改。 |
注意事项
- 创建表时,AnalyticDB MySQL版集群默认编码格式为utf-8,相当于MySQL中的utf8mb4编码,暂不支持其他编码格式。
- 目前AnalyticDB MySQL版集群支持创建的最大表数目如下所示:
- 集群版:min(节点组数量*256,10000)。
- 基础版:
- T8,500。
- T16和T32,1500。
- T52,2500。
示例
- 新建普通表TEST,
id
为自增列,分布键为id
,按照id
值进行HASH分区。CREATE TABLE test ( id bigint auto_increment, name varchar, value int, ts timestamp ) DISTRIBUTED BY HASH(id);
- 新建普通表CUSTOMER,
customer_id
为分布键,login_time
为分区键,并按login_time
排序,数据保存天数为30天,则第31天写入数据后,系统会自动删除第1天写入的数据,login_time
、customer_id
、phone_num
为组合主键。CREATE TABLE customer ( customer_id bigint NOT NULL COMMENT '顾客ID', customer_name varchar NOT NULL COMMENT '顾客姓名', phone_num bigint NOT NULL COMMENT '电话', city_name varchar NOT NULL COMMENT '所属城市', sex int NOT NULL COMMENT '性别', id_number varchar NOT NULL COMMENT '身份证号码', home_address varchar NOT NULL COMMENT '家庭住址', office_address varchar NOT NULL COMMENT '办公地址', age int NOT NULL COMMENT '年龄', login_time timestamp NOT NULL COMMENT '登录时间', PRIMARY KEY (login_time,customer_id,phone_num) ) DISTRIBUTED BY HASH(customer_id) PARTITION BY VALUE(DATE_FORMAT(login_time, '%Y%m%d')) LIFECYCLE 30 COMMENT '客户信息表';
MySQL语法兼容性说明
AnalyticDB MySQL版标准建表语法中必须包含DISTRIBUTED BY ...
,而MySQL建表语法中没有DISTRIBUTED BY ...
。AnalyticDB MySQL版默认兼容MySQL建表语法,您可以根据实际情况通过以下两种方式处理DISTRIBUTED BY ...
不兼容问题。
- 如果MySQL表含有主键,AnalyticDB MySQL版默认将主键作为
DISTRIBUTED BY COLUMN
。CREATE TABLE t (c1 bigint, c2 int, c3 varchar, PRIMARY KEY(c1,c2));
SHOW CREATE TABLE t;
+-------+---------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------+ | t | Create Table `t` ( `c1` bigint, `c2` int, `c3` varchar, primary key (c1,c2) ) DISTRIBUTED BY HASH(`c1`,`c2`) INDEX_ALL='Y' | +-------+----------------------------------------------------------------------------+ 1 row in set (0.04 sec)
- 如果MySQL表不含主键,AnalyticDB MySQL版将添加一个
__adb_auto_id__
字段作为主键和DISTRIBUTED BY COLUMN
。CREATE TABLE t (c1 bigint, c2 int, c3 varchar);
SHOW CREATE TABLE t;
+-------+----------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+----------------------------------------------------------------------------------------------+ | t | Create Table `t` ( `c1` bigint, `c2` int, `c3` varchar, `__adb_auto_id__` bigint AUTO_INCREMENT, primary key (__adb_auto_id__) ) DISTRIBUTED BY HASH(`__adb_auto_id__`) INDEX_ALL='Y' | +-------+----------------------------------------------------------------------------------------------+ 1 row in set (0.04 sec)
创建表时指定冷热数据存储策略
语法
CREATE TABLE [IF NOT EXISTS] table_name
({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'string']
| table_constraints}
[, ... ] )
table_attribute
[partition_options]
[storage_policy]
[AS] query_expression
COMMENT 'string'
storage_policy:
STORAGE_POLICY= 'HOT'|'COLD'|'MIXED' [hot_partition_count=N]
示例
- 创建表时指定冷(COLD)存储策略
CREATE TABLE test_table ( L_ORDERKEY bigint NOT NULL, L_LINENUMBER int NOT NULL, L_SHIPDATE date NOT NULL, dummy varchar, primary key (l_orderkey,l_linenumber,l_shipdate) ) DISTRIBUTE BY HASH(l_orderkey) PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' STORAGE_POLICY='COLD';
- 创建表时指定热(HOT)存储策略
CREATE TABLE test_table ( L_ORDERKEY bigint NOT NULL, L_LINENUMBER int NOT NULL, L_SHIPDATE date NOT NULL, dummy varchar, primary key (l_orderkey,l_linenumber,l_shipdate) ) DISTRIBUTE BY HASH(l_orderkey) PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' STORAGE_POLICY='HOT';
- 创建表时指定混合(MIXED)存储策略,同时指定热分区数量为16个
CREATE TABLE test_table ( L_ORDERKEY bigint NOT NULL, L_LINENUMBER int NOT NULL, L_SHIPDATE date NOT NULL, dummy varchar, primary key (l_orderkey,l_linenumber,l_shipdate) ) DISTRIBUTE BY HASH(l_orderkey) PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' STORAGE_POLICY='MIXED' HOT_PARTITION_COUNT=16;
创建带向量索引的表
语法
ann index [index_name] (col_name,...)] [algorithm=HNSW_PQ ] [dis_function=SquaredL2]
示例
CREATE TABLE fact_tb (
xid bigint not null,
cid bigint not null,
uid varchar not null,
vid varchar not null,
wid varchar not null,
short_feature array < smallint >(4),
float_feature array < float >(4),
ann index short_feature_index(short_feature),
ann index float_feature_index(float_feature),
PRIMARY KEY (xid, cid, vid)
) DISTRIBUTE BY HASH(xid) PARTITION BY VALUE(cid) LIFECYCLE 4;
参数说明
short_feature/float_feature
:向量列的名称,用户自定义。array<float>(4)
:向量列的数据类型和向量的维数,用户自定义。必须要指定向量的维度,支持的类型包括: float、byte、short三种数组类型。表示对feature_data的列类型定义为512,示例如下:feature_data` array<float>(512)
ann
:系统关键字。index
:系统关键字。short_feature_index/float_feature_index
:索引名,用户自定义。对feature_data这一列创建向量索引:ann index ecnn_index(`FEATURE_DATA`) algorithm=HNSW_PQ dis_function=SquaredL2
algorithm、dis_function定义可以省略,默认值分别为HNSW_PQ、SquaredL2。
algorithm
:向量距离计算公式使用的算法。AnalyticDB MySQL版支持的向量距离计算公式算法如下表所示。算法 适用场景 适用数据类型 HNSW_PQ 适用于单表数据量在百万级别到千万级别之间,对向量维度敏感的中等规模数据量场景。 short[]、byte[]、float[] dis_function
:向量距离计算公式,默认值为SquaredL2。AnalyticDB MySQL版支持的向量距离计算公式如下表所示。距离计算公式 计算公式 适用数据类型 SquaredEuclidean(简称SquaredL2) (x1-y1)^2+(x2-y2)^2+… byte[]、short[]或者float[]
向表中插入数据
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (1,2,'A','B','C','[1,1,1,1]','[1.2,1.5,2,3.0]');
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (2,1,'e','v','f','[2,2,2,2]','[1.5,1.15,2.2,2.7]');
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (0,6,'d','f','g','[3,3,3,3]','[0.2,1.6,5,3.7]');
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (5,4,'j','b','h','[4,4,4,4]','[1.0,4.15,6,2.9]');
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (8,5,'Sj','Hb','Dh','[5,5,5,5]','[1.3,4.5,6.9,5.2]');
insert into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (5,4,'x','g','h','[3,4,4,4]','[1.0,4.15,6,2.9]');
insert into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (5,4,'j','r','k','[6,6,4,4]','[1.0,4.15,6,2.9]');
insert into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (5,4,'s','i','q','[2,2,4,4]','[1.0,4.15,6,2.9]');
使用向量索引查询
- 查询short_feature与向量
'[1,1,1,1]'
距离最近的3条记录,按距离排序:
返回结果:SELECT xid, l2_distance(short_feature, '[1,1,1,1]') FROM fact_tb ORDER BY 2 LIMIT 3;
+-------+-------------------------------------------------------------+ | xid | l2_distance(short_feature, '[1,1,1,1]') | +-------+-------------------------------------------------------------+ | 1 | 0.0 | +-------+-------------------------------------------------------------+ | 2 | 4.0 | +-------+-------------------------------------------------------------+ | 0 | 16.0 | +-------+-------------------------------------------------------------+ 3 row in set (0.02 sec)
- 查询xid为5且cid为4,short_feature与向量
'[1,1,1,1]'
距离最近的4条记录,按距离排序:
返回结果:SELECT uid, l2_distance(short_feature, '[1,1,1,1]') FROM fact_tb WHERE xid = 5 AND cid = 4 ORDER BY 2 LIMIT 4;
+-------+-------------------------------------------------------------+ | uid | l2_distance(short_feature, '[1,1,1,1]') | +-------+-------------------------------------------------------------+ | s | 20.0 | +-------+-------------------------------------------------------------+ | x | 31.0 | +-------+-------------------------------------------------------------+ | j | 36.0 | +-------+-------------------------------------------------------------+ | j | 68.0 | +-------+-------------------------------------------------------------+ 4 row in set (0.04 sec)
- 查询short_feature与向量
'[1,1,1,1]'
距离最近的3条记录,按距离排序,且距离不能超过50:
返回结果:SELECT uid, l2_distance(short_feature, '[1,1,1,1]') FROM fact_tb WHERE l2_distance(short_feature, '[1,1,1,1]') < 50.0 AND xid = 5 ORDER BY 2 LIMIT 3;
+-------+-------------------------------------------------------------+ | uid | l2_distance(short_feature, '[1,1,1,1]') | +-------+-------------------------------------------------------------+ | s | 20.0 | +-------+-------------------------------------------------------------+ | x | 31.0 | +-------+-------------------------------------------------------------+ | j | 36.0 | +-------+-------------------------------------------------------------+ 3 row in set (0.05 sec)