Data forwarding and federated computing between Lindorm and external data sources-Lindorm(Lindorm)-阿里云帮助中心

The Lindorm compute engine can access external Java Database Connectivity (JDBC) data sources. You can use Spark SQL to perform data forwarding and federated computing between Lindorm and external data sources.

Prerequisites

The compute engine service is enabled for your Lindorm instance. For more information, see Activate the service.
Ensure that access permissions are enabled for the external data source and that it is in the same virtual private cloud (VPC) as the Lindorm instance.

Parameter settings

An external data source is mapped to a Catalog in the Lindorm compute engine. The Catalog acts as an identifier for the data source. You must specify a custom name for the Catalog. The following table describes the parameters for a Spark SQL job that uses hello_jdbc as the Catalog name.

Configuration item	Description	Example value
spark.sql.catalog.hello_jdbc	Declares an external data source. The value must be `org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog`.	org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog
spark.sql.catalog.hello_jdbc.url	The JDBC URL of the external data source. The Lindorm compute engine currently supports MySQL as an external data source. The compute engine includes the `mysql-connector-java-8.0.27.jar` connector.	jdbc:mysql://192.168.11.2:3306
spark.sql.catalog.hello_jdbc.user	The username to access the external data source.	testUser
spark.sql.catalog.hello_jdbc.password	The password to access the external data source.	testPassword****

Note

For more information about configuration items, see Data Source Option.

Access data

Configure and submit Spark jobs for an external data source using one of the following methods. Then, use Spark SQL to access the data source.

JDBC developer guide
Best practices for developing JAR jobs
Developing Python Jobs

Example 1: Read data from the external JDBC table `db0.mysql_tbl0`, join the data with the Hive table `hive_tbl0`, and write the result to the Hive table `hive_tbl1`.

INSERT INTO spark_catalog.default.hive_tbl1
SELECT A.col0, A.col1, A.col2, B.col1
FROM hello_jdbc.db0.mysql_tbl0 A JOIN spark_catalog.default.hive_tbl0 B
ON A.col0 = B.col0;

Example 2: Filter the Hive table `hive_tbl1` and write the result to the external JDBC table `db0.mysql_tbl1`.

INSERT INTO hello_jdbc.db0.mysql_tbl1
SELECT col0, col1, col2
FROM spark_catalog.default.hive_tbl1
WHERE col0 > 12345;

For more information, see JDBC To Other Databases.