Use a RAY resource group

更新时间:
复制 MD 格式

This topic describes how to use a RAY resource group in Alibaba Cloud Lindorm. You will learn how to prepare the environment, submit a job, monitor its status, and view logs.

Important

The RAY resource group is currently in invitational preview. To request access, contact Lindorm technical support (DingTalk ID: s0s3eg3).

Prerequisites

Prepare the environment

  1. Log on to the Lindorm console. In the upper-left corner of the page, select the region of the instance. On the Instances page, click the ID of the target instance or click View Instance Details in the Actions column for the instance.

  2. On the Instance Details page, in the Configurations section, click Resource Groups in the Actions column of Compute Engine.

  3. On the Resource Group Details page, hover your mouse over WebUI in the Actions column of the RAY resource group to view its WebUI address. Example: http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/.

  4. Install the Ray client. For Python 3, run pip3 install ray[default].

    Note

    After the installation, you can run ray --version to verify that the installation was successful.

RAY resource group authentication

A RAY resource group uses a token from Lindorm Compute for authentication, preventing unauthorized clients from accessing cluster APIs and submitting jobs.

The token is used for compute resource authentication.

Configure job authentication via CLI

You can configure command-line authentication for a RAY resource group in two ways.

  • Method 1: Pass the token by using the --headers parameter.

    ray job submit \
      --address "http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/" \
      --headers '{"Authorization": "Bearer xxxxx-xxx-xxxx-xxxx-xxxxxxxxxxxx"}' \
      --runtime-env-json '{"working_dir": "."}' \
      -- python yourRayJob.py
  • Method 2: Set the RAY_AUTH_MODE and RAY_AUTH_TOKEN environment variables.

    export RAY_AUTH_MODE=token
    export RAY_AUTH_TOKEN=xxxxx-xxx-xxxx-xxxx-xxxxxxxxxxxx
    
    ray job submit \
      --address "http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/" \
      --runtime-env-json '{"working_dir": "."}' \
      -- python yourRayJob.py
    Note

    This method requires Ray client version 2.52.0 or later.

Configure Ray Dashboard authentication

When you first access the WebUI of a RAY resource group, you must enter a token.

Submit a RAY job

  1. On your client, prepare the RAY job.

    In this example, a RAY resource group processes the test-data file in the test-bucket bucket of Object Storage Service (OSS) and uploads the results to OSS. The job logic is defined in the ray-oss-example.py script, located in the ray_job_test directory on the client.

    import ray
    import sys
    
    from ossfs import OSSFileSystem
    import tempfile
    import ossfs
    
    ray.init()
    
    @ray.remote
    def process(oss_key: str, oss_secret: str, filename: str):
        print("Processing %s" % filename)
        fs = oss_filesystem(oss_key, oss_secret)
        # Download to local
        tmp_filename = tempfile.NamedTemporaryFile(delete=False).name
        fs.get_file(filename, tmp_filename)
        print("tmp file name is %s" % tmp_filename)
    
        with open(tmp_filename, 'rb') as f:
            content = f.read()
            print(content)
    
        # Put to OSS
        result_remote_filename = f"{filename}_result"
        fs.put_file(tmp_filename, result_remote_filename)
        return "success"
    
    def oss_filesystem(oss_key: str, oss_secret: str) -> OSSFileSystem:
        return ossfs.OSSFileSystem(
            endpoint="oss-cn-hangzhou-internal.aliyuncs.com", # OSS Endpoint
            key=oss_key, 
            secret=oss_secret 
        )
    
    if __name__ == "__main__":
        if (len(sys.argv) < 2):
            raise ValueError("python %s oss_key oss_secret" % __file__)
    
        oss_key = sys.argv[1]
        oss_secret = sys.argv[2]
        base = "/test-bucket/test-data/"  # /<bucketname>/path
        fs = oss_filesystem(oss_key, oss_secret)
    
        files = [item['name'] for item in fs.ls(base) if item['name'] != base]
    
        for file in files:
            print("Head processing %s" % file)
            result = ray.get(process.remote(oss_key, oss_secret, file))
            print(f"{file} is processed, status is {result}")
    
    ray.shutdown()

    Parameters

    Parameter

    Example

    Description

    Endpoint

    oss-cn-hangzhou-internal.aliyuncs.com

    The OSS endpoint. See Regions and endpoints to find the correct endpoint.

    Base

    /test-bucket/test-data/

    The path to the OSS file that you want to process.

    Note
    • Ray allows you to declare the resources required for each task or actor in the @ray.remote() decorator. For example, you can use num_cpus and num_gpus to specify the required CPU and GPU resources. For more information about the parameters, see the documentation.

    • Ray supports pipeline-style scheduling of data processing tasks on heterogeneous resources, such as CPUs and GPUs, across multiple nodes. This method significantly improves data processing efficiency over traditional batch processing. For more information, see the documentation.

  2. Submit the job to the specified RAY resource group.

    1. Navigate to the directory where the job is located by running cd ray_job_test.

    2. Submit the job by running ray job submit --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --runtime-env-json '{"working_dir": "."}' --address RAY_ADDRESS -- python ray-oss-example.py oss_key oss_secret.

      Parameters

      Parameter

      Example

      Description

      RAY_ADDRESS

      http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/

      The WebUI address of the RAY resource group.

      my_job.py

      ray-oss-example.py

      The name of the script to run.

      oss_key

      yourAccessKeyID

      The AccessKey ID and AccessKey Secret of the Alibaba Cloud account or RAM user used to access the OSS file. To obtain an AccessKey pair, see Obtain an AccessKey.

      oss_secret

      yourAccessKeySecret

      RAY_AUTH_TOKEN

      8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2

      The token required for authentication with the RAY resource group.

      Example

      ray job submit --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --runtime-env-json '{"pip": ["ossfs"], "working_dir": "."}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/ -- python ray-oss-example.py yourAccessKeyID yourAccessKeySecret

      The command returns a submission ID (SUBMISSION_ID) of the submitted job, such as raysubmit_gmSnPSFqmEXG****, which you need to check the job status from the command line.

View job status

You can monitor the real-time status, logs, and job list on the Ray Dashboard by opening the WebUI address of the RAY resource group in a browser.

Manage jobs

You can view and manage jobs in several ways:

  • Command line: Use the ray job subcommand to view and manage jobs. For more information, see the Ray Jobs CLI API.

  • Python SDK: Ray provides a Python library that allows you to use JobSubmissionClient to view and manage jobs. For more information, see the Python SDK API.

  • REST API: Ray provides a REST API that you can use to view and manage jobs by making HTTP requests. For more information, see the Ray Jobs REST API.

    Note

    Use the WebUI address as the entry point URL to access the Ray REST API.

The following examples show how to view and manage jobs from the command line.

  • View job status.

    ray job status --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --address RAY_ADDRESS SUBMISSION_ID

    Example:

    ray job status --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/ raysubmit_gmSnPSFqmEXG****
  • View job logs.

    ray job logs --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --address RAY_ADDRESS SUBMISSION_ID

    Example:

    ray job logs --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/ raysubmit_gmSnPSFqmEXG****
  • View the job list.

    ray job list --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --address RAY_ADDRESS

    Example:

    ray job list --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/
  • Stop a job.

    ray job stop --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --address RAY_ADDRESS SUBMISSION_ID

    Example:

    ray job stop --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/ raysubmit_gmSnPSFqmEXG****