Read Lindorm data using Spark SQL-AnalyticDB(AnalyticDB)-阿里云帮助中心

Prerequisites

Create a single-zone Lindorm instance. For more information, see Create an instance.

Important
You must activate LindormTable, Lindorm Distributed Processing System (LDPS), and LindormDFS for the Lindorm instance. To activate LDPS, see Activate a service.
Activate Object Storage Service (OSS), and create a bucket and a directory. For more information, see Activate OSS, Create a bucket, and Manage directories.
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.
A job resource group is created for the AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster.
A database account is created for the AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster.

Preparations

In the Lindorm console, go to the Instance Details page to obtain the vSwitch ID.
In the Lindorm console, go to the Access Control page. Click the Security Group tab to view the security group ID of your Lindorm instance. If no security group is added, see Add a security group.

Accessing Hive data in Lindorm

Obtain the hdfs-site configuration information for your HDFS client.

Note
To obtain the hdfs-site configuration information, contact Lindorm technical support (DingTalk ID: s0s3eg3).
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Job Development > SQL Development.
In the SQLConsole window, select the Spark engine and a job resource group.

In the SQL Console window, enter the following commands:

SET spark.adb.connectors = oss,external_hive;
SET spark.adb.eni.enabled=true;
SET spark.adb.eni.vswitchId=vsw-bp14pj8h0k5p0kwu3****;
SET spark.adb.eni.securityGroupId=sg-bp11m93k021tp4ca****;
SET spark.sql.catalog.lindorm_catalog = org.apache.spark.sql.hive.V2HiveExternalCatalog;
SET spark.adb.eni.adbHostAlias.enabled=true;
SET spark.sql.catalogImplementation = hive;
-- HDFS configurations
SET spark.sql.catalog.lindorm_catalog.spark.hadoop.hive.metastore.uris=thrift://ld-bp1ttz8833x9c****-proxy-ldps-hms.lindorm.aliyuncs.com:9083;
SET spark.hadoop.dfs.nameservices=ld-bp1ttz8833x9c****;
SET spark.hadoop.dfs.client.failover.proxy.provider.ld-bp1ttz8833x9c****=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider;
SET spark.hadoop.dfs.ha.automatic-failover.enabled=true;
SET spark.hadoop.dfs.ha.namenodes.ld-bp1ttz8833x9c****=nn1,nn2;
SET spark.hadoop.dfs.namenode.rpc-address.ld-bp1ttz8833x9c****.nn1=ld-bp1ttz8833x9c-master1-001.lindorm.rds.aliyuncs.com:8020;
SET spark.hadoop.dfs.namenode.rpc-address.ld-bp1ttz8833x9c****.nn2=ld-bp1ttz8833x9c****-master2-001.lindorm.rds.aliyuncs.com:8020;
SET spark.hadoop.dfs.namenode.http-address.ld-bp1ttz8833x9c****.nn1=ld-bp1ttz8833x9c****-master1-001.lindorm.rds.aliyuncs.com:50071;
SET spark.hadoop.dfs.namenode.http-address.ld-bp1ttz8833x9c****.nn2=ld-bp1ttz8833x9c****-master2-001.lindorm.rds.aliyuncs.com:50071;

-- Create an OSS external database and table
CREATE DATABASE external_oss_db location 'oss://testBucketName/warehouse/db';
CREATE TABLE external_oss_db.test(id int, name string) using parquet location 'oss://testBucketName/warehouse/db/test';

-- Insert data from the Hive table into the OSS external table
INSERT INTO external_oss_db.test SELECT * FROM lindorm_catalog.spark_test.test;

-- Read data from the OSS external table
SELECT * FROM external_oss_db.test;

Parameter	Required	Description
spark.adb.connectors	Yes	The names of the built-in connectors of AnalyticDB for MySQL Spark that you want to enable. Separate multiple names with commas (,). Valid values: oss, hudi, delta, adb, odps, external_hive, jindo, and default. This example uses the `oss,external_hive` connectors.
spark.adb.eni.enabled	Yes	Specifies whether to enable Elastic Network Interface (ENI). If you use external tables to access other external data sources, you must enable ENI. Valid values: true: Enable false: Disable In this example, set this parameter to `true`.
spark.adb.eni.vswitchId	Yes	The vSwitch ID of the Lindorm instance that you obtained in the "Preparations" section.
spark.adb.eni.securityGroupId	Yes	The security group ID of the Lindorm instance that you obtained in the "Preparations" section.
spark.sql.catalog.lindorm_catalog	Yes	The implementation class for the data source catalog. Set this parameter to `org.apache.spark.sql.hive.V2HiveExternalCatalog`. Note You can customize the `catalog_name` part of the parameter name. This example uses `lindorm_catalog`.
spark.adb.eni.adbHostAlias.enabled	Yes	Specifies whether to automatically write the domain name resolution information that AnalyticDB for MySQL requires to a mapping table of domain names and IP addresses. Valid values: true: Enable. false: Disable. If you use an ENI to read data from or write data to EMR Hive, you must set this parameter to true.
HDFS configurations	Yes	The `hdfs-site` configuration information that you obtained in Step 1. The format is `set spark.hadoop.<name>=<value>`. Note Replace <name> and <value> with the actual values from your configuration information.
location	Yes	The OSS storage path for the external database and external table. The format is `oss://<bucketname/database/table>`. This example uses `oss://testBucketName/warehouse/db/test`. Note Replace `<bucketname/database/table>` with your actual OSS path.

Click Execute.
After the Spark job completes successfully, go to the Spark JAR Development page. On the Applications tab, find the job and click Logs in the Actions column to view the results. For more information, see Spark development editor.

Accessing data in Lindorm wide tables

Obtain the LindormTable VPC endpoint for access using the HBase Java API. For more information, see View connection endpoints.
Download and decompress the JAR packages required to connect to the Lindorm instance, and then upload the extracted JAR packages to OSS. For more information, see Upload files.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Job Development > SQL Development.
In the SQLConsole window, select the Spark engine and a job resource group.

In the SQL Console window, enter the following commands:

SET spark.adb.eni.enabled=true;
SET spark.adb.eni.vswitchId=vsw-bp14pj8h0k5p0kwu3****;
SET spark.adb.eni.securityGroupId=sg-bp11m93k021tp4ca****;
SET spark.sql.catalog.lindorm_table = com.alibaba.lindorm.ldspark.datasources.v2.LindormCatalog;
SET spark.sql.catalog.lindorm_table.url=ld-bp1ttz8833x9c****-proxy-lindorm.lindorm.rds.aliyuncs.com:30020;
SET spark.sql.catalog.lindorm_table.username=test;
SET spark.sql.catalog.lindorm_table.password=password;
add jar "oss://<bucketname>/hbase_jars/*";

-- Create an OSS external database and table
CREATE DATABASE IF NOT exists external_oss_db location 'oss://testBucketName/warehouse/db';
CREATE TABLE IF NOT exists external_oss_db.lindorm_table_2(id string, name string) using parquet location 'oss://testBucketName/warehouse/db/lindorm_table_2';

-- Insert data from the Lindorm wide table into the OSS external table
INSERT INTO external_oss_db.lindorm_table_2 SELECT * FROM lindorm_table.spark_test.tb;

-- Read data from the OSS external table
SELECT * FROM external_oss_db.lindorm_table_2;

Parameter description:

Parameter	Required	Description
spark.adb.eni.enabled	Yes	Specifies whether to enable Elastic Network Interface (ENI). If you use external tables to access other external data sources, you must enable ENI. Valid values: true: Enable false: Disable In this example, set this parameter to `true`.
spark.adb.eni.vswitchId	Yes	The vSwitch ID of the Lindorm instance that you obtained in the "Preparations" section.
spark.adb.eni.securityGroupId	Yes	The security group ID of the Lindorm instance that you obtained in the "Preparations" section.
spark.sql.catalog.lindorm_table	Yes	The implementation class for the data source catalog. Set this parameter to `com.alibaba.lindorm.ldspark.datasources.v2.LindormCatalog`. Note You can customize the `catalog_name` part of the parameter name. This example uses `lindorm_table`.
spark.sql.catalog.lindorm_table.url	Yes	The VPC endpoint for access using the HBase Java API that you obtained in Step 1.
spark.sql.catalog.lindorm_table.username	Yes	The username and password for LindormTable. If you forget your username or password, you can change the password in the cluster management system of LindormTable. For more information, see Change a user password.
spark.sql.catalog.lindorm_table.password	Yes
jar	Yes	The OSS path to the directory where you uploaded the JAR packages in Step 2. Use `/` to include all JARs in the directory. Example: `oss://testBucketName/hbase_jars/`.
location	Yes	The OSS storage path for the external database and external table. The format is `oss://<bucketname/database/table>`. This example uses `oss://testBucketName/warehouse/db/lindorm_table_2`. Note Replace `<bucketname/database/table>` with your actual OSS path.

Click Execute.
After the Spark job completes successfully, go to the Spark JAR Development page. On the Applications tab, find the job and click Logs in the Actions column to view the results. For more information, see Spark development editor.