Spark
This topic answers frequently asked questions about AnalyticDB for MySQL Spark.
Use the Alibaba Cloud AI Assistant
The Alibaba Cloud AI Assistant provides intelligent Q&A to quickly answer your questions about cloud products and point you to relevant documentation.Alibaba Cloud AI Assistant
FAQ overview
What do I do if the "No space left on device" error occurs when I run a Spark application?
What do I do if a ClassNotFound error appears in Spark application logs?
What do I do if a NoSuchMethod error appears in Spark application logs?
Why does a Spark executor node become Dead when running a Spark application?
Why does a network connection failure occur when Spark accesses an external data source?
Why does a Spark application report an "oss object 403" error?
How do I find the reason why a Spark application runs slowly?
What do I do if a Spark application gets stuck when creating a User-Defined Function (UDF)?
How do I configure Spark Driver/Executor environment variables?
How do I view Spark application information?
On the Spark JAR Development page, search by Application ID to view details about a Spark application. For more information, see Spark editor.
What do I do if the "User %s do not have right permission [ * ] to resource [ * ]" error occurs when I submit a Spark application?
The Resource Access Management (RAM) user lacks permission to call the operation. Grant the required permissions:
Log on to the RAM console and grant the RAM user permission for the resource specified in the error message.
Grant the RAM user the AliyunADBFullAccess and AliyunADBSparkProcessingDataRole permissions, plus read and write permissions on AnalyticDB for MySQL databases and tables. For more information, see Account authorization.
What do I do if the "No space left on device" error occurs when I run a Spark application?
The local disks of the executors have run out of space. To diagnose and fix this:
On the Applications tab of the Spark JAR Development page, click UI in the Actions column for the Spark application.
On the Executors tab, check the stderr log for the affected executor to confirm insufficient disk space.
Increase the executor disk capacity by setting the
spark.adb.executorDiskSizeparameter. For more information, see Spark application configuration parameters.
The maximum local disk capacity per executor is 100 GiB. If the error persists at 100 GiB, increase the number of executors instead.
What do I do if the "Current Query Elastic Resource Failed" error occurs when I submit a Spark application?
The database account used to submit the XIHE BSP application over Java Database Connectivity (JDBC) is bound to a Job resource group, which slows resource allocation and causes the application to time out. Disassociate the database account from the Job resource group to resolve the issue.
What do I do if a ClassNotFound error appears in Spark application logs?
A class is missing from the JAR package uploaded when the Spark application was submitted. Open the JAR package to check whether the class exists:
Business-related class: Repackage the application to include the missing class.
Third-party JAR: Use one of the following approaches:
Use the Shade or Assembly plugin for Maven to manage dependencies. Add the required dependency packages manually.
Upload the third-party JAR to OSS and configure the
jarsparameter when submitting the Spark application. For more information, see Introduction to Spark application development.
What do I do if a NoSuchMethod error appears in Spark application logs?
The JAR package you imported conflicts with Spark. Add the following configurations to the Spark conf parameters to log the JAR packages involved in the class loading process, which helps identify the conflicting packages:
"spark.executor.extraJavaOptions":"-verbose:class",
"spark.driver.extraJavaOptions":"-verbose:class"
Once you identify the conflicting packages, use Maven conflict resolution methods such as Provided and Relocation to resolve the error. For more information about conf parameters, see Spark application configuration parameters.
What do I do if the "ClassNotFoundException: org.apache.hadoop.hive.serde2.JsonSerDe" error occurs when a Spark SQL application reads a JSON external table?
Download hive-serde-3.1.2.jar and hive-hcatalog-core-2.3.9.jar, upload them to OSS, then add the following statements when submitting the Spark SQL application:
add jar oss://<testBucketName>/hive-hcatalog-core-2.3.9.jar;
add jar oss://<testBucketName>/hive-serde-3.1.2.jar;
Replace <testBucketName> with the OSS path where the JAR files are stored.
What do I do if the "No such file or directory" error occurs when a Spark SQL application reads an internal table?
By default, hot data for internal tables is stored on worker nodes. Spark reads data offline from OSS, so if the hot data has not been written to OSS, the Spark SQL query fails.
Before querying hot data from an internal table, use the XIHE engine to run the following SQL statements, then manually build the table:
SET ADB_CONFIG CSTORE_HOT_TABLE_ALLOW_SINGLE_REPLICA_BUILD=true;
SET ADB_CONFIG ELASTIC_ENABLE_HOT_PARTITION_HAS_HDD_REPLICA=true;
SET ADB_CONFIG ELASTIC_PRODUCT_ENABLE_MIXED_STORAGE_POLICY=true;
After the build completes, the hot data is available in OSS and the query succeeds. For more information, see BUILD.
What do I do if the "RAM user[***] is not bound to a ADB user" error occurs when a RAM user uses AnalyticDB for MySQL Spark in the DMS console?
When a privileged database account is created, it is automatically bound to the RAM user, which causes DMS authentication to fail. Fix the binding relationship as follows:
Bind the RAM user to a standard database account.
Bind the Alibaba Cloud account (the account to which the RAM user belongs) to the privileged account.
What do I do if a java.lang.ArrayIndexOutOfBoundsException error occurs when a Spark SQL application reads Snappy-compressed data?
This includes data from self-built Simple Log Service (SLS) delivery jobs. The community Spark plugin is not compatible with reading Snappy-compressed data.
Download lake-storage-migration-tool.jar, upload it to OSS, and add the following statements to the Spark SQL job:
conf spark.hadoop.io.compression.codec.snappy.native=true;
conf spark.hadoop.io.compression.codecs=com.alibaba.analyticdb.aps.codec.SlsSnappyCodec;
ADD JAR "oss://<testBucketName>/lake-storage-migration-tool.jar";
Set the ADD JAR path to the actual OSS path of the lake-storage-migration-tool.jar file.
Why does a Spark executor node become Dead when running a Spark application?
To confirm the issue, go to the Spark JAR Development page, click UI in the Actions column for the target application, then check the Executors tab in the Spark UI. Look for the error message Failed to connect to /xx.xx.xx.xx:xxxx or an executor with Dead status. For more information about navigating to the Spark UI, see Spark development editor.
The most common causes are:
Container memory exceeded (exit code 137)
In addition to JVM memory, a Spark executor uses off-heap memory for shuffle and cache operations, and memory for Python UDFs. If total container memory exceeds the allowed limit, the OS kills the Spark process. The driver log shows:
ERROR TaskSchedulerImpl: Lost executor xx on xx.xx.xx.xx:The executor with id xx exited with exit code 137.
Increase spark.executor.memoryOverhead, which controls the memory (in MB) available to non-Spark processes inside the container. The default is 30% of total executor container memory—for example, a Medium spec (2 cores, 8 GB) defaults to 2.4 GB. To increase it:
spark.executor.memoryOverhead: 4000MB
java.lang.OutOfMemoryError
Check the stderr or stdout log for the Dead executor on the Executors tab. To fix this:
Optimize the Spark application to reduce memory usage.
Increase the executor specification using
spark.executor.resourceSpec.
Dynamic resource allocation shutdown (no action needed)
If dynamic resource allocation is enabled (spark.dynamicAllocation.enabled) and the Dead executor log contains the message Driver command a shutdown, this is expected behavior and does not affect your application. No action is required.
If none of the above applies, check the log for the Dead executor. If the error is in your business code, search for the error message to find a solution or submit a ticket to technical support.
Why does a network connection failure occur when Spark accesses an external data source?
An Elastic Network Interface (ENI) is not enabled for VPC access. In the Spark application configuration, set the following parameters:
spark.adb.eni.enabledspark.adb.eni.vswitchIdspark.adb.eni.securityGroupId
The values vary depending on the data source. For more information, see Spark application configuration parameters and Access external data sources.
Why do the databases and tables returned by SHOW TABLES or SHOW DATABASE not match the actual databases and tables?
First, confirm that the metastore service version for the Spark SQL application is set to adb. This applies to applications submitted from Enterprise Edition, Basic Edition, or Data Lakehouse Edition clusters.
When the metastore version is adb, the application only displays databases and tables for which the current user has read permissions—others are hidden by design. If a Hive MetaStore version is specified instead, check the permissions and connectivity of your self-managed metastore service. For more information, see Spark application configuration parameters.
Why does a Spark application report an "oss object 403" error?
There are four common causes:
Cross-region access — AnalyticDB for MySQL Spark does not support reading JAR packages or files across regions. Confirm that the OSS bucket containing the JAR packages and files is in the same region as the Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster.
Insufficient OSS read permission — The role specified for spark.adb.roleArn lacks OSS read permission. Grant the required permissions to the RAM user.
Incorrect file path — Specify the correct OSS path in the Spark application code.
Incorrect format — Multiple files must be separated by commas (,) and the configuration must be valid JSON. Check that the Spark application code uses the correct format.
How do I find the reason why a Spark application runs slowly?
Go to the Spark JAR Development page, click UI in the Actions column for the target application to open the Spark UI. For more information about navigating there, see Spark development editor.
Step 1: Check for exceptions
Executor went Dead — On the Executors tab, check the Status field. If any executor shows Dead, see Why does a Spark executor node become Dead when running a Spark application?.
Task retries in the driver log — On the Executors tab, view the stderr log for the executor whose Executor ID is driver. Use the error message to identify the cause. Most exceptions stem from business logic; search for the error message or check your code.
If an OOM exception occurs, look for large fields or high-memory operations in the business logic. If more memory is needed, use an executor or driver with higher specifications.
Step 2: If no exception is present
Insufficient resources — On the Stages tab, find the slow-running stage and check the Tasks: Succeeded/Total column. If the total number of tasks exceeds Number of executors x Number of executor cores, resources are insufficient and tasks must run in multiple waves.
For example: 100 total tasks, spark.executor.instances=5, spark.executor.resourceSpec=medium (2 cores, 8 GB) → 10 concurrent tasks → 10 waves to complete.
Increase spark.executor.instances or spark.executor.resourceSpec to add total resources. Avoid setting Number of executors x Number of executor cores significantly higher than the total task count to prevent wasting resources.
GC time too long — On the Executors tab, check the Task Time (GC Time) field. If GC time is high for some executors, either optimize the business logic or increase the executor or driver specification (spark.executor.resourceSpec or spark.driver.resourceSpec).
Stack information — On the Executors tab, click Thread Dump to view stack traces.
Thread Dump is only available when the application Status is running.
Refresh the stack multiple times while the application is running:
If a business function appears blocked, the function may contain hot spots or inefficient logic. Optimize that part of the code.
If a Spark internal function appears blocked, search for the error message to find a solution.
How do I periodically delete Spark application logs?
Find the log storage path:
In the Spark UI, click the Environment tab.
NoteFor more information about how to go to the Spark UI, see Spark development editor.
Click Spark Properties and check the value of
spark.app.log.rootPath.
Set an OSS lifecycle rule to automatically delete logs after they expire. For more information, see Lifecycle rules based on last modified time.
What do I do if a Spark application gets stuck when creating a UDF?
This is a known bug in the community edition of Spark when downloading files from OSS. Add the following configuration parameters to the Spark application:
SET spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES=oss://testBucketname/udf.jar;
SET spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES=oss://testBucketname/udf.jar;
SET spark.driver.extraClassPath=/tmp/testBucketname/udf.jar;
SET spark.executor.extraClassPath=/tmp/testBucketname/udf.jar;
For more information about these parameters, see Spark application configuration parameters.
How do I customize the log output format?
Edit the log configuration file based on your Spark version:
Spark versions earlier than 3.5: The configuration file is log4j.properties. Set the
log4j.appender.console.layout.ConversionPatternparameter.Spark 3.5 and later: The configuration file is log4j2.properties. Set the
appender.console.layout.patternparameter. A template file is available to download.
Upload the modified configuration file to OSS and reference it in the business SQL code:
NoteReplace
testBucketNamewith the actual bucket name.CONF spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES=oss://testBucketName/<log4j2.properties/log4j.properties>; CONF spark.executor.extraJavaOptions=-Dlog4j.configurationFile=file:/tmp/testBucketName/<log4j2.properties/log4j.properties>; CONF spark.driver.extraJavaOptions=-Dlog4j.configurationFile=file:/tmp/testBucketName/<log4j2.properties/log4j.properties>; CONF spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES=oss://testBucketName/<log4j2.properties/log4j.properties>;
How do I configure Spark Driver/Executor environment variables?
Set Driver environment variables:
"spark.kubernetes.driverEnv.key1":"value1",
"spark.kubernetes.driverEnv.key2":"value2"
Set Executor environment variables:
"spark.executorEnv.key1":"value1",
"spark.executorEnv.key2":"value2"
The exact format of configuration parameters varies depending on the Spark development tool. For more information, see Spark application configuration parameters.