Read and write RDS for PostgreSQL data by using Spark SQL
AnalyticDB for MySQL supports Spark SQL jobs that read from and write to ApsaraDB RDS for PostgreSQL databases over JDBC. This tutorial walks through setting up the JDBC driver, creating a temporary view, and running read and write queries against an RDS for PostgreSQL table.
Prerequisites
Before you begin, ensure that you have:
-
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster
-
A job resource group created for the cluster — Create a resource group
-
A database account for the AnalyticDB for MySQL cluster:
-
Alibaba Cloud account: a privileged account
-
Resource Access Management (RAM) user: a privileged account and a standard account, with the standard account associated with the RAM user
-
-
An RDS for PostgreSQL instance in the same VPC as the AnalyticDB for MySQL cluster
-
A database account for the RDS for PostgreSQL instance
-
A security group configured for the RDS for PostgreSQL instance, with rules that allow inbound and outbound access on the PostgreSQL port
-
The vSwitch network segment associated with the RDS for PostgreSQL instance added to the RDS for PostgreSQL whitelist
Step 1: Prepare data in RDS for PostgreSQL
In your RDS for PostgreSQL instance, create a schema named demo and a table named tbl, then insert sample rows:
CREATE SCHEMA demo;
CREATE TABLE demo.tbl (
id INTEGER,
first_name VARCHAR(32),
last_name VARCHAR(32),
age INTEGER
);
INSERT INTO demo.tbl VALUES(1, 'a', 'b', 5);
INSERT INTO demo.tbl VALUES(2, 'c', 'd', 6);
INSERT INTO demo.tbl VALUES(3, 'e', 'f', 7);
Step 2: Download and upload the PostgreSQL JDBC driver
Spark requires the PostgreSQL JDBC driver on the classpath to open a JDBC connection at runtime. Upload the driver to Object Storage Service (OSS) so that Spark can load it when the job starts.
-
Download the driver that matches your RDS for PostgreSQL version from the official website. This tutorial uses postgresql-42.7.7.jar.
-
Upload the JAR file to OSS.
Step 3: Submit the Spark SQL job
-
Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters, then click the target cluster ID.
-
In the left-side navigation pane, choose Job Development > SQL Development.
-
In the SQLConsole window, select the Spark engine and the job-specific resource group.
-
Enter the following statements. Replace the placeholders with your actual values.
-- Load the PostgreSQL JDBC driver from OSS ADD jar oss://<OSS_BUCKET_NAME>/postgresql-42.7.7.jar; -- Enable elastic network interface (ENI) access so Spark can reach the RDS for PostgreSQL instance SET spark.adb.eni.enabled=true; SET spark.adb.eni.vswitchId=<VSWITCH_ID>; SET spark.adb.eni.securityGroupId=<SECURITY_GROUP_ID>; -- Create a temporary view mapped to the RDS for PostgreSQL table CREATE TEMPORARY VIEW table_tmp USING org.apache.spark.sql.jdbc OPTIONS ( url 'jdbc:postgresql://<RDS_ENDPOINT>:5432/<DB_NAME>', driver 'org.postgresql.Driver', dbtable 'demo.tbl', user '<DB_USER>', password '<DB_PASSWORD>' ); -- Write: insert a new row into the RDS for PostgreSQL table through the view INSERT INTO table_tmp VALUES(4, 'e', 'f', 8); -- Read: query all rows from the RDS for PostgreSQL table through the view SELECT * FROM table_tmp;Replace each placeholder with the appropriate value:
Placeholder Description Example <OSS_BUCKET_NAME>Name of the OSS bucket where you uploaded the driver my-bucket<VSWITCH_ID>ID of the vSwitch associated with the RDS for PostgreSQL instance. Find it on the instance's Database Connection page by hovering over the VPC. vsw-bp1sxxsodv28ey5dl****<SECURITY_GROUP_ID>ID of the security group associated with the RDS for PostgreSQL instance sg-bp19mr685pmg4ihc****<RDS_ENDPOINT>Connection endpoint of the RDS for PostgreSQL instance pgm-t4n37****.pgsql.singapore.rds.aliyuncs.com<DB_NAME>Name of the target database test<DB_USER>Database account of the RDS for PostgreSQL instance myuser<DB_PASSWORD>Password for the database account — The following table describes the key parameters:
Parameter Description Scope ADD jarOSS path of the PostgreSQL JDBC driver JAR Read/Write spark.adb.eni.enabledEnables ENI-based network access. Set to true.Read/Write spark.adb.eni.vswitchIdvSwitch ID for the RDS for PostgreSQL instance Read/Write spark.adb.eni.securityGroupIdSecurity group ID associated with the RDS for PostgreSQL instance Read/Write table_tmpThe name of the temporary view. This tutorial uses table_tmpas an example.Read/Write USING org.apache.spark.sql.jdbcFixed value. Specifies the JDBC data source. Read/Write urlJDBC connection string. Format: jdbc:postgresql://<host>:5432/<db_name>Read/Write driverFixed value: org.postgresql.DriverRead/Write dbtableTable to access. Format: schema_name.table_nameRead/Write userDatabase account of the RDS for PostgreSQL instance Read/Write passwordPassword for the database account Read/Write -
Click Execute.
-
After the job completes, view the query results in Logs on the Applications tab of the Job Development > Spark JAR Development page. For details, see Spark Development Editor.