Manage runtime environments

更新时间:
复制 MD 格式

The Python environment for EMR Serverless Spark includes matplotlib, NumPy, and pandas by default. To use other third-party libraries, you must create a runtime environment that packages the required libraries.

Prerequisites

You have created a workspace. For more information, see Manage workspaces.

Create a runtime environment

  1. Go to the runtime environment management page.

    1. Log on to the E-MapReduce console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, click the name of the target workspace.

    4. On the EMR Serverless Spark page, select Runtime Environments in the left navigation bar.

  2. Click Create Runtime Environment.

  3. On the Create Runtime Environment page, configure the following parameters.

    Parameter

    Required

    Description

    Name

    Yes

    Enter a name for the runtime environment.

    Description

    No

    Enter a description for the runtime environment.

    Queue for Environment Initialization

    Yes

    Select a resource queue for initialization. Creating the runtime environment consumes 1 Core and 4 GB of resources from this queue. These resources are released automatically after initialization.

    Normal Network Connection

    No

    If you need to add PyPI libraries from a source other than the Alibaba Cloud source, select an appropriate network connection. The runtime environment uses this connection to access the source address during creation.

    For more information about how to create a network connection, see Establish network connectivity between EMR Serverless Spark and other VPCs.

    Python version

    Yes

    It defaults to Python 3.8. You can select another version based on your business requirements.

    Ensure that the selected Python version is compatible with your target Python libraries to prevent packaging failures or runtime errors caused by version mismatches.

  4. Add library information.

    1. Click Add Library.

    2. In the Create Library dialog box, select a Source Type, configure the related parameters, and then click OK.

      Parameter

      Description

      PyPI

      • PyPI Package: Enter the library name and, optionally, the version. If you omit the version, the system installs the latest version. The Alibaba Cloud source is used by default.

        For example, Plotly or Plotly==4.9.0.

      • Package Source: Specify a custom PyPI source URL. If you leave this field blank, it defaults to the Alibaba Cloud source. If you use a custom source, ensure that you have selected an appropriate network connection.

      Workspace

      From the Workspace drop-down list, select a file resource from the current workspace. If no resources are available, upload one on the Files page.

      Supported file types: .zip, .tar, .whl, .tar.gz, .jar, and .txt.

      Note

      If you select a .txt file, the system treats it as a requirements file and installs the Python libraries and versions listed in the file.

      OSS Resource

      In the OSS Resource field, enter the path of a file stored in Object Storage Service (OSS).

      Supported file types: .zip, .tar, .whl, .tar.gz, .jar, and .txt.

      Note

      If you specify a .txt file, the system treats it as a requirements file and installs the Python libraries and versions listed in the file.

  5. Click create.

    The environment begins initializing after creation.

Edit a runtime environment

You can edit a runtime environment to update the libraries it contains.

  1. On the Runtime Environments page, find the target runtime environment and click Edit in the Actions column.

  2. On the Modify Runtime Environment page, update the configuration of the runtime environment.

  3. Click Save Changes.

    Saving the changes re-initializes the environment based on the new configuration.

    Note

    After an environment is re-initialized, the changes do not apply to active Notebook sessions immediately. To use the latest runtime environment in a Notebook session, you must restart the Notebook session resources.

Use a runtime environment

Once a runtime environment is in the Ready state, you can use it for data development or in corresponding sessions.

  • PySpark batch job: When a job starts, the system pre-installs the necessary libraries from the selected runtime environment.

  • Job orchestration: When you add a Notebook node to a workflow, you can select the corresponding runtime environment.

  • Notebook session: When a Notebook session starts, the system pre-installs libraries based on the selected environment.

  • Livy Gateway: When you submit a job through Livy Gateway, the system pre-configures the resources required to run the job based on the selected environment.

  • When submitting jobs using Spark Submit, Apache Airflow, and Livy, specify the runtime environment by passing the environment ID as a configuration parameter: --conf spark.emr.serverless.environmentId=<environment_id>.