Access a JDBC external table

更新时间:
复制 MD 格式

The Lindorm compute engine can access external Java Database Connectivity (JDBC) data sources. You can use Spark SQL to perform data forwarding and federated computing between Lindorm and external data sources.

Prerequisites

  • The compute engine service is enabled for your Lindorm instance. For more information, see Activate the service.

  • Ensure that access permissions are enabled for the external data source and that it is in the same virtual private cloud (VPC) as the Lindorm instance.

Parameter settings

An external data source is mapped to a Catalog in the Lindorm compute engine. The Catalog acts as an identifier for the data source. You must specify a custom name for the Catalog. The following table describes the parameters for a Spark SQL job that uses hello_jdbc as the Catalog name.

Configuration item

Description

Example value

spark.sql.catalog.hello_jdbc

Declares an external data source. The value must be org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog.

org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog

spark.sql.catalog.hello_jdbc.url

The JDBC URL of the external data source. The Lindorm compute engine currently supports MySQL as an external data source. The compute engine includes the `mysql-connector-java-8.0.27.jar` connector.

jdbc:mysql://192.168.11.2:3306

spark.sql.catalog.hello_jdbc.user

The username to access the external data source.

testUser

spark.sql.catalog.hello_jdbc.password

The password to access the external data source.

testPassword****

Note

For more information about configuration items, see Data Source Option.

Access data

Configure and submit Spark jobs for an external data source using one of the following methods. Then, use Spark SQL to access the data source.

  • JDBC developer guide

  • Best practices for developing JAR jobs

  • Developing Python Jobs

  • Example 1: Read data from the external JDBC table `db0.mysql_tbl0`, join the data with the Hive table `hive_tbl0`, and write the result to the Hive table `hive_tbl1`.

    INSERT INTO spark_catalog.default.hive_tbl1
    SELECT A.col0, A.col1, A.col2, B.col1
    FROM hello_jdbc.db0.mysql_tbl0 A JOIN spark_catalog.default.hive_tbl0 B
    ON A.col0 = B.col0;
  • Example 2: Filter the Hive table `hive_tbl1` and write the result to the external JDBC table `db0.mysql_tbl1`.

    INSERT INTO hello_jdbc.db0.mysql_tbl1
    SELECT col0, col1, col2
    FROM spark_catalog.default.hive_tbl1
    WHERE col0 > 12345;

For more information, see JDBC To Other Databases.