Common Java/Scala class conflict issues and solutions for MaxCompute Spark jobs.
Overview of class conflicts
-
These errors typically throw
java.lang.NoClassDefFoundErroror method-not-found exceptions. Check your POM and exclude the conflicting dependencies. -
Cause: Some dependencies in your custom JAR package may have different versions from those in the Spark client
jarsdirectory. The JVM may load your JAR packages first during class loading, causing conflicts.
Differences between provided and compile scopes
-
provided: The dependency is required at compile time but not packaged for runtime. The cluster supplies the JAR package at runtime, mainly from the Spark clientjarsdirectory. If you do not set these dependencies toprovided, class conflicts or class/method-not-found errors may occur. -
compile: The dependency is required at both compile time and runtime. These are typically third-party libraries related to your code logic that do not exist in the cluster and must be included in your JAR package.
The main JAR package must be a fat JAR that includes all compile-scoped dependencies so that the required classes can be loaded at runtime.
POM self-check
JAR packages that must be set to provided
-
JAR packages with groupId
org.apache.spark:These are community Spark JAR packages already available in the Spark client
jarsdirectory. They do not need to be included in your JAR package and are automatically uploaded to the MaxCompute cluster when the Spark client submits a job. -
cupid-sdk:
Automatically uploaded to the MaxCompute cluster during job submission.
-
odps-sdk:
Automatically uploaded to the MaxCompute cluster during job submission.
-
hadoop-yarn-client:
Used for job uploads. This package may be pulled in as a transitive dependency, so check and exclude it before packaging.
JAR packages that must not be set to provided
-
JAR packages used to access external services, such as MySQL or other third-party services.