Spark SQL execution modes
AnalyticDB for MySQL supports executing Spark SQL in two modes: batch and interactive. Both modes connect directly to the AnalyticDB for MySQL metadata service, allowing you to read from and write to AnalyticDB for MySQL databases and tables. This topic describes the usage notes, use cases, features, and startup procedures for both execution modes.
Batch execution
Usage notes
-
When executing SQL in batch mode, begin with the
USE <database_name>;statement to select a database. -
When you specify a table in an SQL statement, you must use the
database_name.table_nameformat. -
When you run DML, DDL, or DQL statements in batch mode, the system returns only a success or failure status, not the data. A successful execution prints a sample of its results to the log. To view the data returned by an SQL statement, see View Spark application information.
Use cases
-
Executing multiple, interdependent SQL statements.
-
Enforcing strict resource isolation for SQL statements.
-
Processing large-scale data, such as running a one-time ETL job with SQL.
-
Uploading and iterating on complex third-party dependency packages.
Features
-
SQL submitted in batch mode runs in a dedicated Spark application, ensuring stability.
-
Supports independent configuration settings defined within an SQL statement, such as
SET spark.sql.adaptive.coalescePartitions.minPartitionSize = 2MB;. -
If a batch SQL statement contains a SELECT statement, the system prints a sample of the results to the log.
Startup method
In the SQL Console, select a job resource group and the Spark engine, enter your SQL statement, and click Run Now.
Interactive execution
Usage notes
-
In interactive mode, DML statements return a maximum of 1,000 rows.
-
DDL statements executed in interactive mode, such as
CREATE TABLE, return only a success or failure status, not data from the table. This behavior is consistent with open-source conventions. -
A Spark interactive resource group takes some time to start. If startup fails, wait a few moments and try again.
Use cases
-
Running ad hoc queries where a full result set is not required.
-
Executing a large number of DDL statements.
-
For immediate execution of DQL statements in environments where resource isolation and occasional failures are acceptable.
Features
-
Resource isolation is thread-level. Multiple users executing SQL in the same Spark application can interfere with each other.
-
You can apply thread-level configuration settings by using SQL statements.
-
Application-level configuration settings take effect only after you restart the Spark interactive resource group. To modify application-level settings, you must first stop the Spark interactive resource group, configure the parameters, and then restart the Spark interactive resource group.
Startup method
In the SQL Console, select a Spark interactive resource group and the Spark engine, enter your SQL statement, and click Run Now.