This topic describes how to create a Lindorm external table in MaxCompute and map a Lindorm data source.
Introduction
Lindorm is a cloud-native multi-model database designed for scenarios such as the Internet of Things (IoT), internet services, and the Internet of Vehicles. It supports unified access to and integrated processing of diverse data types, including wide tables, time series, text, objects, streams, and spatial data. Lindorm is compatible with multiple standard interfaces, such as SQL, HBase/Cassandra/S3, TSDB, HDFS, Solr, and Kafka, and is one of the core databases that supports Alibaba's key services.
If a network connection is established between MaxCompute and Lindorm, you can create an external table that maps to a Lindorm table. This allows you to:
-
Synchronize data from a Lindorm table to MaxCompute for subsequent ETL processing.
-
Run a federated query by joining the table with other Lindorm tables.
-
Write data from MaxCompute to a Lindorm table.
Limitations
-
Region availability: This feature is only available in the following regions: China (Beijing), China (Shanghai), China (Zhangjiakou), China (Hangzhou), China (Shenzhen), China (Hong Kong), China (Shanghai) Finance Cloud Zone F, Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).
-
Lindorm external tables do not support the cluster property.
Considerations
When you read data from a source Lindorm table, mismatches in the column count are handled as follows: If the source table has fewer columns than defined in the external table's DDL, the missing columns are set to NULL. If the source table has more columns, the extra data is ignored.
Create a Lindorm external table
Syntax
When you create a Lindorm external table, you must specify HBaseStorageHandler in the CREATE EXTERNAL TABLE statement and configure the Lindorm instance.
Table and column names are case-insensitive. Forcing case conversion is not supported. The syntax is as follows:
-- Enable Hive-compatible mode.
SET odps.sql.hive.compatible = true;
-- Configure the odps.sql.hbase.version parameter to switch the HBase version to lindorm.
SET odps.sql.hbase.version=lindorm;
-- Create a Lindorm external table.
CREATE EXTERNAL TABLE if NOT EXISTS <mc_lindorm_external>
(
<col_name1> <data_type>,
<col_name2> <data_type>,
......
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' -- The handler for the Lindorm data source.
WITH SERDEPROPERTIES(
'hbase.columns.mapping'='<:key,col-family:col-name1,col-family:col-name2,...>')
TBLPROPERTIES(
-- The hbase.table.name and hbase.zookeeper.quorum parameters are required.
'hbase.table.name'='',
'hbase.zookeeper.quorum'='',
-- Configure the hbase.client.username and hbase.client.password parameters.
'hbase.client.username'='****','hbase.client.password'='****',
-- The hbase.zookeeper.property.clientPort, mcfed.zookeeper.session.timeout, and hbase.client.retries.number parameters are optional.
-- The default value of hbase.zookeeper.property.clientPort is 30020.
['hbase.zookeeper.property.clientPort'='{30020}',]
['mcfed.zookeeper.session.timeout'='{{VALUE}}',]
['hbase.client.retries.number'='{{VALUE}}',]
'networklink'='{{NETWORKLINK_NAME}}'
);
Parameters
-
mc_lindorm_external: Required. The name of the Lindorm external table that you want to create.
-
col_name: Required. The name of a column in the Lindorm external table.
-
data_type: Required. The data type of the column.
-
WITH SERDEPROPERTIES:
-
'hbase.columns.mapping'='<:key,col-family:col-name1,col-family:col-name2,...>': Required. The column mapping between the Lindorm external table and the source Lindorm table. For more information, see HBaseIntegration-ColumnMapping.
-
The row key (
:key1[,:key2,:keyn]) must be specified first. -
The column format is
col-family:col-name1,col-family:col-name2. A single job can contain only one column family.
-
-
-
TBLPROPERTIES:
-
'hbase.table.name'='<namespace:lindorm_table_name>': Required. The namespace and name of the source Lindorm table. If you do not specify a namespace, the
defaultnamespace is used. You must create the table in the Lindorm console in advance. -
'hbase.zookeeper.quorum'='<VPC ip1|host1:port,VPC ip2|host2:port...>': Required. The ZooKeeper connection string for the Lindorm instance's wide-table engine. You can find this value as the Java API access URL in the Lindorm console.
-
hbase.client.username: Required. To obtain the username, follow these steps:
-
登录Lindorm控制台,在左上角选择地域。
-
在Instances页面,单击目标实例名称,进入该实例详情页。
-
在左侧导航栏,选择数据库连接。
-
在Wide Table Engine页签下,获取Default Username。
-
-
hbase.client.password: Required. To obtain the password, follow these steps:
-
登录Lindorm控制台,在左上角选择地域。
-
在Instances页面,单击目标实例名称,进入该实例详情页。
-
在左侧导航栏,选择数据库连接。
-
在Wide Table Engine页签下,获取Default Initial Password。
-
-
'hbase.zookeeper.property.clientPort'='{2181|30020}': Optional. The ZooKeeper port for Lindorm. The default port is 30020.
-
'mcfed.zookeeper.session.timeout'='<value>': Optional. The ZooKeeper session timeout, in milliseconds. The recommended value is 30000.
-
'hbase.client.retries.number'='<value>': Optional. The maximum number of connection retries.
-
'networklink'='<networklink_name>': Required. The name of the MaxCompute network link used to access the VPC where the Lindorm instance resides.
NoteParameters that start with mcfed.zookeeper or hbase are ZooKeeper-related configurations.
-
Example
-
You have created a MaxCompute project and installed and configured the MaxCompute client.
-
You have activated the Lindorm service and created a Lindorm instance.
-
You have downloaded and configured Lindorm Shell. For more information, see Access the wide-table engine by using Lindorm Shell.
-
Network connection
-
You have established a network connection between MaxCompute and Lindorm through a VPC. For more information about VPC connection solutions, see Network connection process.
-
This connection only allows MaxCompute to access the specified VPC. To access other VPCs or resources across regions, you must connect the VPCs by following the instructions in VPC access (direct connection over a leased line).
-
-
Use Lindorm Shell to create a Lindorm table and insert data.
-
Create the
lindorm_testtable.CREATE 'lindorm_test','f'; -
Insert data into the
lindorm_testtable.put 'lindorm_test', '1122', 'f:col1', 'hello'; put 'lindorm_test', '1122', 'f:col2', 'hbase'; -
Query data from the
lindorm_testtable.scan 'lindorm_test';The following result is returned:
ROW COLUMN+CELL 1122 column=f:col1, timestamp=1634201741745, value=hello 1122 column=f:col2, timestamp=1634201835934, value=hbase 1 row(s) Took 0.0766 seconds
-
-
Create a MaxCompute external table and map it to the Lindorm data source.
-
Run the following command on the MaxCompute client to create a Lindorm external table named
lindorm_test_ext.SET odps.sql.hive.compatible = true; SET odps.sql.hbase.version=lindorm; CREATE EXTERNAL TABLE lindorm_test_ext ( row_key INT, col1 STRING, col2 STRING ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'=':key,f:col1,f:col2') TBLPROPERTIES ( 'hbase.table.name'='lindorm_test', 'hbase.zookeeper.quorum'='ld-2z****-proxy-lindorm.lindorm.rds.aliyuncs.com:30020', 'hbase.client.username'='<user_name>', 'hbase.client.password'='***', 'networklink'='<your network link name>'); -
Query data from the external table
lindorm_test_ext.SET odps.sql.hive.compatible = true; SET odps.sql.hbase.version=lindorm; SELECT * FROM lindorm_test_ext; -- The following result is returned: +------------+------------+----------+ | key | col1 | col2 | +------------+------------+----------+ | 1122 | hello | hbase | +------------+------------+----------+ -
Insert data into the external table
lindorm_test_ext.SET odps.sql.hive.compatible = true; SET odps.sql.hbase.version = lindorm; INSERT OVERWRITE TABLE lindorm_test_ext SELECT '1123', 'lindorm', 'mftestinsert'; -
Query data from the external table
lindorm_test_extagain.SET odps.sql.hive.compatible = true; SET odps.sql.hbase.version=lindorm; SELECT * FROM lindorm_test_ext; -- The following result is returned: +------------+------------+-------------+ | key | col1 | col2 | +------------+------------+-------------+ | 1122 | hello | hbase | | 1123 | lindorm | mftestinsert| +------------+------------+-------------+ -
Use the
scanstatement to query data from the Lindorm tablelindorm_test.scan 'lindorm_test';The following result is returned:
ROW COLUMN+CELL 1122 column=f:col1, timestamp=1634201741745, value=hello 1122 column=f:col2, timestamp=1634201835934, value=hbase 1123 column=f:col1, timestamp=1634552508947, value=lindorm 1123 column=f:col2, timestamp=1634552508947, value=mftestinsert 2 row(s) Took 0.0683 seconds
-