Read and write data to Delta external tables by using Spark SQL

更新时间:
复制 MD 格式

AnalyticDB for MySQL integrates with Delta, a data lake table format backed by Object Storage Service (OSS) that supports INSERT, UPDATE, and DELETE operations. This integration lets you run Spark SQL to read and write Delta external tables stored in OSS.

AnalyticDB for MySQL includes a built-in Delta package (version 2.0.2). If the built-in version does not meet your requirements, load a custom Delta package instead.

Prerequisites

Before you begin, make sure you have:

  • An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster

  • A job resource group for the cluster

  • A database account for the cluster

Limitations

  • The XIHE engine cannot read or write Delta tables.

  • AnalyticDB for MySQL Spark tracks the corresponding Delta version but does not troubleshoot Delta issues or incompatibility between Delta versions.

Read and write Delta external tables

Follow these steps to set up your environment, create a Delta external table, write data, and query results.

Step 1: Open the SQL editor

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters.

  2. Find the cluster you want to manage and click the cluster ID.

  3. In the left-side navigation pane, choose Job Development > SQL Development.

  4. On the SQLConsole 1 tab, select the Spark engine and a job resource group or a Spark interactive resource group.

Step 2: Configure the Delta environment (custom packages only)

Skip this step if you are using the built-in Delta package.

To avoid version incompatibility, use a Delta package version that is compatible with AnalyticDB for MySQL Spark 3.2.0.

Note

On the Clusters page, click the Data Lakehouse Edition tab, find the cluster you want to manage, and click the Cluster ID before proceeding.

Run the following statements to load your custom Delta JAR files and configure Spark:

-- Load custom Delta packages (upload the JARs to OSS first)
ADD JAR oss://<bucket_name>/path/to/delta-core_xx.jar;
ADD JAR oss://<bucket_name>/path/to/delta-storage-xx.jar;

-- Configure AnalyticDB for MySQL Spark to use the OSS connector
SET spark.adb.connectors=oss;

-- Enable Delta Lake integration with Spark
SET spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension;
SET spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog;

Replace <bucket_name> with your OSS bucket name and update the JAR file paths accordingly.

Step 3: Create an external database and a Delta external table

Note

You can run the following SQL statements in batch processing mode or interactive mode. For more information, see Spark SQL execution modes.

  1. Create a database. Skip this step if the database already exists.

     CREATE DATABASE IF NOT EXISTS external_delta_db
     LOCATION "oss://<bucket_name>/test/";

    Replace "oss://<bucket_name>/test/" with your OSS path.

  2. Create a Delta external table:

     CREATE TABLE IF NOT EXISTS external_delta_db.delta_test_tbl (
       id    INT,
       name  STRING,
       age   INT
     )
     USING delta
     PARTITIONED BY (age)
     LOCATION "oss://<bucket_name>/test/delta_test_tbl";

    Replace "oss://<bucket_name>/test/delta_test_tbl" with the OSS path where you want to store the table data.

Step 4: Write data

Note

You can run the following SQL statements in batch processing mode or interactive mode. For more information, see Spark SQL execution modes.

INSERT

Delta supports four insert patterns. Choose based on your use case:

Statement

Behavior

INSERT INTO

Appends rows without deduplication

INSERT OVERWRITE (no partition)

Replaces all existing data in the table

INSERT OVERWRITE ... PARTITION(col=val)

Replaces data in a specific static partition

INSERT OVERWRITE ... PARTITION(col)

Replaces data in the matching dynamic partitions

Append rows:

INSERT INTO external_delta_db.delta_test_tbl VALUES (1, 'lisa', 10), (2, 'jams', 10);

Replace all data in the table:

INSERT OVERWRITE external_delta_db.delta_test_tbl VALUES (2, 'zhangsan', 10), (4, 'lisi', 30);

Replace data in a static partition:

INSERT OVERWRITE external_delta_db.delta_test_tbl PARTITION (age=17) VALUES (3, 'anna');

Replace data in dynamic partitions:

INSERT OVERWRITE external_delta_db.delta_test_tbl PARTITION (age) VALUES (1, 'bom', 10);

UPDATE

UPDATE external_delta_db.delta_test_tbl SET name = 'box' WHERE id = 1;

This updates the name column to box for the row where id = 1.

DELETE

DELETE FROM external_delta_db.delta_test_tbl WHERE id = 1;

This deletes the row where id = 1.

Step 5: Query data

Note
  • You can run the following SQL statements in batch processing mode or interactive mode. For more information, see Spark SQL execution modes.

  • When you run a Spark SQL statement, the system returns only a message that indicates whether the execution succeeded or failed. No data is returned. View the table data in the Logs on the Application List tab of the Spark Jar Development page. For more information, see Spark editor.

SELECT * FROM external_delta_db.delta_test_tbl;