DataWorks official images provide common runtime environments for different node types in Data Studio to meet the execution environment requirements of various tasks. You can use these official images directly in Data Studio or build custom images based on them. This topic describes the official images.
Overview
In Data Studio, if you do not configure a runtime environment image for a node, the Default standard image is used. The Default standard image provides only a basic runtime environment, which may not meet the requirements of specific tasks. You can use base images configured for official images to provide standardized environments for different task types in Data Studio. Official images are pre-configured base images. You can create custom images based on them and apply additional configurations to extend support for more environment scenarios and meet the needs of different task types.
Image list
Supported versions and regions are subject to the DataWorks console. Images may have multiple versions. The following table only describes the capabilities of the latest image versions.
DataWorks provides the following images:
|
Image name |
Image description |
Applicable tasks |
|
dataworks_pyodps_py311_task_pod |
The official image for DataWorks PyODPS nodes. This image uses Python 3.11. |
|
|
dataworks_pairec_task_pod |
The official DataWorks PAI-Rec image. It is used to run algorithms generated by PAI-Rec. For the versions of the feature_store SDK and pyfg, see the console. |
|
|
dataworks_pyodps_task_pod |
The official image for DataWorks PyODPS nodes. This image uses Python 3.7. |
|
|
dataworks_emr_base_task_pod |
The base image for EMR clusters. It supports the EMR Serverless Spark, EMR on ECS DataLake, and EMR on ECS Custom cluster types.
|
|
|
dataworks_shell_jdk17_task_pod |
The official image for DataWorks Shell nodes. This image uses JDK 17. |
|
|
dataworks_shell_task_pod |
The official image for DataWorks Shell nodes. This image uses JDK 7. If you need to customize the runtime environment and require Subprocess parameter passing, you can build a Custom images based on this image. |
|
|
dataworks_python_task_pod |
The official image for DataWorks Python nodes. System info: py3.11-ubuntu22.04. |
|
|
dataworks_cdh_custom_task_pod |
The base image for DataWorks CDH clusters. It cannot be used directly. You must install |
|
|
dataworks_controller_task_pod |
The official image for DataWorks assignment nodes. If you need to customize the runtime environment and use the assignment node or assignment parameters to pass parameters to downstream nodes, build a Custom images based on this image. |
|
|
dataworks-mcp |
Used for DataWorks Agent for third-party clients task development. System info: py3.11-ubuntu22.04. |
|
|
dataworks-notebook |
Used for Basic notebook development task development. System info: py3.11-ubuntu22.04. |
|
|
dataworks_notebook_task_pod |
The official image for DataWorks Notebook nodes. System info: py3.11-ubuntu22.04. The dataworks-notebook and dataworks-mcp images for the Python environment and personal development environment are identical. |
|
|
dataworks-maxcompute |
Used for Build a MaxCompute image in a personal environment. System info: py3.11-ubuntu20.04. |
Use images
In Data Studio, in addition to using official images, you can also use custom images that are associated with the workspace.
-
Use the image in new DataStudio: In the Run Configuration and Scheduling Settings sections on the right side of the node development page, configure the resource group and image for test runs and post-deployment runs.
-
Use the image in legacy DataStudio: In the dialog that appears after you click Run with Parameters on the node development page, or in the Scheduling Settings panel on the right side of the node development page, configure the resource group and image for the node's test run and post-deployment run.
-
Use an image in the personal development environment: When you create a personal development environment instance, you can select a different official image from the Image Configuration drop-down list.
When configuring the resource group and image, note the following:
-
Resource Group for Scheduling: Select a serverless resource group.
-
Image: Select an official image or a deployed custom image.