Lindorm external tables

更新时间:
复制 MD 格式

This topic describes how to create a Lindorm external table in MaxCompute and map a Lindorm data source.

Introduction

Lindorm is a cloud-native multi-model database designed for scenarios such as the Internet of Things (IoT), internet services, and the Internet of Vehicles. It supports unified access to and integrated processing of diverse data types, including wide tables, time series, text, objects, streams, and spatial data. Lindorm is compatible with multiple standard interfaces, such as SQL, HBase/Cassandra/S3, TSDB, HDFS, Solr, and Kafka, and is one of the core databases that supports Alibaba's key services.

If a network connection is established between MaxCompute and Lindorm, you can create an external table that maps to a Lindorm table. This allows you to:

  • Synchronize data from a Lindorm table to MaxCompute for subsequent ETL processing.

  • Run a federated query by joining the table with other Lindorm tables.

  • Write data from MaxCompute to a Lindorm table.

Limitations

  • Region availability: This feature is only available in the following regions: China (Beijing), China (Shanghai), China (Zhangjiakou), China (Hangzhou), China (Shenzhen), China (Hong Kong), China (Shanghai) Finance Cloud Zone F, Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).

  • Lindorm external tables do not support the cluster property.

Considerations

When you read data from a source Lindorm table, mismatches in the column count are handled as follows: If the source table has fewer columns than defined in the external table's DDL, the missing columns are set to NULL. If the source table has more columns, the extra data is ignored.

Create a Lindorm external table

Syntax

When you create a Lindorm external table, you must specify HBaseStorageHandler in the CREATE EXTERNAL TABLE statement and configure the Lindorm instance.

Table and column names are case-insensitive. Forcing case conversion is not supported. The syntax is as follows:

-- Enable Hive-compatible mode.
SET odps.sql.hive.compatible = true;
-- Configure the odps.sql.hbase.version parameter to switch the HBase version to lindorm.
SET odps.sql.hbase.version=lindorm;
-- Create a Lindorm external table.
CREATE EXTERNAL TABLE if NOT EXISTS <mc_lindorm_external>
(
  <col_name1> <data_type>,
  <col_name2> <data_type>,
  ......
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' -- The handler for the Lindorm data source.
WITH SERDEPROPERTIES(
  'hbase.columns.mapping'='<:key,col-family:col-name1,col-family:col-name2,...>')
TBLPROPERTIES(
    -- The hbase.table.name and hbase.zookeeper.quorum parameters are required.
    'hbase.table.name'='',
    'hbase.zookeeper.quorum'='',
    -- Configure the hbase.client.username and hbase.client.password parameters.
    'hbase.client.username'='****','hbase.client.password'='****',
    -- The hbase.zookeeper.property.clientPort, mcfed.zookeeper.session.timeout, and hbase.client.retries.number parameters are optional.
    -- The default value of hbase.zookeeper.property.clientPort is 30020.
    ['hbase.zookeeper.property.clientPort'='{30020}',]
    ['mcfed.zookeeper.session.timeout'='{{VALUE}}',]
    ['hbase.client.retries.number'='{{VALUE}}',]
    'networklink'='{{NETWORKLINK_NAME}}'
);

Parameters

  • mc_lindorm_external: Required. The name of the Lindorm external table that you want to create.

  • col_name: Required. The name of a column in the Lindorm external table.

  • data_type: Required. The data type of the column.

  • WITH SERDEPROPERTIES:

    • 'hbase.columns.mapping'='<:key,col-family:col-name1,col-family:col-name2,...>': Required. The column mapping between the Lindorm external table and the source Lindorm table. For more information, see HBaseIntegration-ColumnMapping.

      • The row key (:key1[,:key2,:keyn]) must be specified first.

      • The column format is col-family:col-name1,col-family:col-name2. A single job can contain only one column family.

  • TBLPROPERTIES:

    • 'hbase.table.name'='<namespace:lindorm_table_name>': Required. The namespace and name of the source Lindorm table. If you do not specify a namespace, the default namespace is used. You must create the table in the Lindorm console in advance.

    • 'hbase.zookeeper.quorum'='<VPC ip1|host1:port,VPC ip2|host2:port...>': Required. The ZooKeeper connection string for the Lindorm instance's wide-table engine. You can find this value as the Java API access URL in the Lindorm console.

    • hbase.client.username: Required. To obtain the username, follow these steps:

      1. 登录Lindorm控制台,在左上角选择地域。

      2. Instances页面,单击目标实例名称,进入该实例详情页。

      3. 在左侧导航栏,选择数据库连接

      4. Wide Table Engine页签下,获取Default Username

    • hbase.client.password: Required. To obtain the password, follow these steps:

      1. 登录Lindorm控制台,在左上角选择地域。

      2. Instances页面,单击目标实例名称,进入该实例详情页。

      3. 在左侧导航栏,选择数据库连接

      4. Wide Table Engine页签下,获取Default Initial Password

    • 'hbase.zookeeper.property.clientPort'='{2181|30020}': Optional. The ZooKeeper port for Lindorm. The default port is 30020.

    • 'mcfed.zookeeper.session.timeout'='<value>': Optional. The ZooKeeper session timeout, in milliseconds. The recommended value is 30000.

    • 'hbase.client.retries.number'='<value>': Optional. The maximum number of connection retries.

    • 'networklink'='<networklink_name>': Required. The name of the MaxCompute network link used to access the VPC where the Lindorm instance resides.

    Note

    Parameters that start with mcfed.zookeeper or hbase are ZooKeeper-related configurations.

Example

  1. You have created a MaxCompute project and installed and configured the MaxCompute client.

  2. You have activated the Lindorm service and created a Lindorm instance.

  3. You have downloaded and configured Lindorm Shell. For more information, see Access the wide-table engine by using Lindorm Shell.

  4. Network connection

    • You have established a network connection between MaxCompute and Lindorm through a VPC. For more information about VPC connection solutions, see Network connection process.

    • This connection only allows MaxCompute to access the specified VPC. To access other VPCs or resources across regions, you must connect the VPCs by following the instructions in VPC access (direct connection over a leased line).

  5. Use Lindorm Shell to create a Lindorm table and insert data.

    1. Create the lindorm_test table.

      CREATE 'lindorm_test','f';
    2. Insert data into the lindorm_test table.

      put 'lindorm_test', '1122', 'f:col1', 'hello';
      put 'lindorm_test', '1122', 'f:col2', 'hbase';
    3. Query data from the lindorm_test table.

      scan 'lindorm_test';

      The following result is returned:

      ROW                                 COLUMN+CELL
       1122                              column=f:col1, timestamp=1634201741745, value=hello
       1122                              column=f:col2, timestamp=1634201835934, value=hbase
      1 row(s)
      Took 0.0766 seconds
  6. Create a MaxCompute external table and map it to the Lindorm data source.

    1. Run the following command on the MaxCompute client to create a Lindorm external table named lindorm_test_ext.

      SET odps.sql.hive.compatible = true;
      SET odps.sql.hbase.version=lindorm;
      CREATE EXTERNAL TABLE lindorm_test_ext
      (
        row_key  INT,
        col1 STRING,
        col2 STRING
      )
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES (
        'hbase.columns.mapping'=':key,f:col1,f:col2')
      TBLPROPERTIES (
        'hbase.table.name'='lindorm_test',
        'hbase.zookeeper.quorum'='ld-2z****-proxy-lindorm.lindorm.rds.aliyuncs.com:30020',
        'hbase.client.username'='<user_name>',
        'hbase.client.password'='***',
        'networklink'='<your network link name>');
    2. Query data from the external table lindorm_test_ext.

      SET odps.sql.hive.compatible = true;
      SET odps.sql.hbase.version=lindorm;
      SELECT * FROM lindorm_test_ext;
      -- The following result is returned:
      +------------+------------+----------+
      | key        | col1       | col2     |
      +------------+------------+----------+
      | 1122       | hello      | hbase    |  
      +------------+------------+----------+
    3. Insert data into the external table lindorm_test_ext.

      SET odps.sql.hive.compatible = true;
      SET odps.sql.hbase.version = lindorm;
      INSERT OVERWRITE TABLE lindorm_test_ext SELECT '1123', 'lindorm', 'mftestinsert';
    4. Query data from the external table lindorm_test_ext again.

      SET odps.sql.hive.compatible = true;
      SET odps.sql.hbase.version=lindorm;
      SELECT * FROM lindorm_test_ext;
      -- The following result is returned:
      +------------+------------+-------------+
      | key        | col1       | col2        |
      +------------+------------+-------------+
      | 1122       | hello      | hbase       | 
      | 1123       | lindorm    | mftestinsert|   
      +------------+------------+-------------+
    5. Use the scan statement to query data from the Lindorm table lindorm_test.

      scan 'lindorm_test';

      The following result is returned:

      ROW                                 COLUMN+CELL
       1122                              column=f:col1, timestamp=1634201741745, value=hello
       1122                              column=f:col2, timestamp=1634201835934, value=hbase
       1123                              column=f:col1, timestamp=1634552508947, value=lindorm
       1123                              column=f:col2, timestamp=1634552508947, value=mftestinsert
      2 row(s)
      Took 0.0683 seconds