Spark

更新时间:
复制 MD 格式

MaxCompute Spark is an Apache Spark-compatible computing service built into MaxCompute. Spark jobs run on MaxCompute's shared compute resources under the same permission system as MaxCompute SQL and MapReduce, so you can bring existing Spark workloads to MaxCompute without rewriting them.

Key features

  • Native multi-version Spark

    MaxCompute Spark runs open-source Apache Spark natively with full API compatibility. Multiple Spark versions are supported, so you can target the version your existing code requires.

  • Shared compute resources

    Spark jobs run on the same compute resources as MaxCompute SQL and MapReduce tasks, with no separate cluster to provision or manage.

  • Unified data and permission management

    Spark jobs operate within the MaxCompute project's permission system, so data access is governed by the same rules applied to all other MaxCompute tasks.

  • Native Spark experience

    The native Spark UI is available for real-time job monitoring. Query history is accessible for post-run analysis.

Supported features

MaxCompute Spark supports the following:

  • Offline computing using standard Spark programming models and libraries: GraphX, MLlib, RDD, Spark SQL, and PySpark.

  • Reading from and writing to MaxCompute tables.

  • Referencing file resources stored in MaxCompute.

  • Accessing services in Alibaba Cloud VPC environments.

  • Accessing Alibaba Cloud OSS for unstructured data storage.

  • Reading MaxCompute OSS external tables.

  • DataWorks Notebook.

Limitations

MaxCompute Spark does not support the following:

  • Interactive shells, including Spark Shell, Spark SQL Shell, and PySpark Shell.

  • MaxCompute built-in functions and user-defined functions (UDFs).

  • MaxCompute external tables other than OSS external tables.