This topic describes how to use a RAY resource group in Alibaba Cloud Lindorm. You will learn how to prepare the environment, submit a job, monitor its status, and view logs.
The RAY resource group is currently in invitational preview. To request access, contact Lindorm technical support (DingTalk ID: s0s3eg3).
Prerequisites
-
You have enabled Lindorm LindormTable.
-
You have activated Lindorm compute engine.
You have enabled the RAY resource group feature.
You have added the client IP address to the Lindorm whitelist.
Prepare the environment
Log on to the Lindorm console. In the upper-left corner of the page, select the region of the instance. On the Instances page, click the ID of the target instance or click View Instance Details in the Actions column for the instance.
On the Instance Details page, in the Configurations section, click Resource Groups in the Actions column of Compute Engine.
On the Resource Group Details page, hover your mouse over WebUI in the Actions column of the RAY resource group to view its WebUI address. Example:
http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/.Install the Ray client. For Python 3, run
pip3 install ray[default].NoteAfter the installation, you can run
ray --versionto verify that the installation was successful.
RAY resource group authentication
A RAY resource group uses a token from Lindorm Compute for authentication, preventing unauthorized clients from accessing cluster APIs and submitting jobs.
The token is used for compute resource authentication.
Configure job authentication via CLI
You can configure command-line authentication for a RAY resource group in two ways.
Method 1: Pass the token by using the --headers parameter.
ray job submit \ --address "http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/" \ --headers '{"Authorization": "Bearer xxxxx-xxx-xxxx-xxxx-xxxxxxxxxxxx"}' \ --runtime-env-json '{"working_dir": "."}' \ -- python yourRayJob.pyMethod 2: Set the RAY_AUTH_MODE and RAY_AUTH_TOKEN environment variables.
export RAY_AUTH_MODE=token export RAY_AUTH_TOKEN=xxxxx-xxx-xxxx-xxxx-xxxxxxxxxxxx ray job submit \ --address "http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/" \ --runtime-env-json '{"working_dir": "."}' \ -- python yourRayJob.pyNoteThis method requires Ray client version 2.52.0 or later.
Configure Ray Dashboard authentication
When you first access the WebUI of a RAY resource group, you must enter a token.
Submit a RAY job
On your client, prepare the RAY job.
In this example, a RAY resource group processes the
test-datafile in thetest-bucketbucket of Object Storage Service (OSS) and uploads the results to OSS. The job logic is defined in theray-oss-example.pyscript, located in theray_job_testdirectory on the client.import ray import sys from ossfs import OSSFileSystem import tempfile import ossfs ray.init() @ray.remote def process(oss_key: str, oss_secret: str, filename: str): print("Processing %s" % filename) fs = oss_filesystem(oss_key, oss_secret) # Download to local tmp_filename = tempfile.NamedTemporaryFile(delete=False).name fs.get_file(filename, tmp_filename) print("tmp file name is %s" % tmp_filename) with open(tmp_filename, 'rb') as f: content = f.read() print(content) # Put to OSS result_remote_filename = f"{filename}_result" fs.put_file(tmp_filename, result_remote_filename) return "success" def oss_filesystem(oss_key: str, oss_secret: str) -> OSSFileSystem: return ossfs.OSSFileSystem( endpoint="oss-cn-hangzhou-internal.aliyuncs.com", # OSS Endpoint key=oss_key, secret=oss_secret ) if __name__ == "__main__": if (len(sys.argv) < 2): raise ValueError("python %s oss_key oss_secret" % __file__) oss_key = sys.argv[1] oss_secret = sys.argv[2] base = "/test-bucket/test-data/" # /<bucketname>/path fs = oss_filesystem(oss_key, oss_secret) files = [item['name'] for item in fs.ls(base) if item['name'] != base] for file in files: print("Head processing %s" % file) result = ray.get(process.remote(oss_key, oss_secret, file)) print(f"{file} is processed, status is {result}") ray.shutdown()Parameters
Parameter
Example
Description
Endpoint
oss-cn-hangzhou-internal.aliyuncs.comThe OSS endpoint. See Regions and endpoints to find the correct endpoint.
Base
/test-bucket/test-data/The path to the OSS file that you want to process.
NoteRay allows you to declare the resources required for each task or actor in the
@ray.remote()decorator. For example, you can usenum_cpusandnum_gpusto specify the required CPU and GPU resources. For more information about the parameters, see the documentation.Ray supports pipeline-style scheduling of data processing tasks on heterogeneous resources, such as CPUs and GPUs, across multiple nodes. This method significantly improves data processing efficiency over traditional batch processing. For more information, see the documentation.
Submit the job to the specified RAY resource group.
Navigate to the directory where the job is located by running
cd ray_job_test.Submit the job by running
ray job submit --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --runtime-env-json '{"working_dir": "."}' --address RAY_ADDRESS -- python ray-oss-example.py oss_key oss_secret.Parameters
Parameter
Example
Description
RAY_ADDRESS
http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/The WebUI address of the RAY resource group.
my_job.py
ray-oss-example.pyThe name of the script to run.
oss_key
yourAccessKeyIDThe AccessKey ID and AccessKey Secret of the Alibaba Cloud account or RAM user used to access the OSS file. To obtain an AccessKey pair, see Obtain an AccessKey.
oss_secret
yourAccessKeySecretRAY_AUTH_TOKEN
8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2
The token required for authentication with the RAY resource group.
Example
ray job submit --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --runtime-env-json '{"pip": ["ossfs"], "working_dir": "."}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/ -- python ray-oss-example.py yourAccessKeyID yourAccessKeySecretThe command returns a submission ID (
SUBMISSION_ID) of the submitted job, such asraysubmit_gmSnPSFqmEXG****, which you need to check the job status from the command line.
View job status
You can monitor the real-time status, logs, and job list on the Ray Dashboard by opening the WebUI address of the RAY resource group in a browser.
Manage jobs
You can view and manage jobs in several ways:
Command line: Use the ray job subcommand to view and manage jobs. For more information, see the Ray Jobs CLI API.
Python SDK: Ray provides a Python library that allows you to use JobSubmissionClient to view and manage jobs. For more information, see the Python SDK API.
REST API: Ray provides a REST API that you can use to view and manage jobs by making HTTP requests. For more information, see the Ray Jobs REST API.
NoteUse the WebUI address as the entry point URL to access the Ray REST API.
The following examples show how to view and manage jobs from the command line.
View job status.
ray job status --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --address RAY_ADDRESS SUBMISSION_IDExample:
ray job status --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/ raysubmit_gmSnPSFqmEXG****View job logs.
ray job logs --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --address RAY_ADDRESS SUBMISSION_IDExample:
ray job logs --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/ raysubmit_gmSnPSFqmEXG****View the job list.
ray job list --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --address RAY_ADDRESSExample:
ray job list --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/Stop a job.
ray job stop --headers '{"Authorization": "Bearer RAY_AUTH_TOKEN"}' --address RAY_ADDRESS SUBMISSION_IDExample:
ray job stop --headers '{"Authorization": "Bearer 8f2e1a3c-9b4d-4e5f-a6c2-d7b8f9e0a1b2"}' --address http://alb-57k7r581oht8rd****.cn-hangzhou.alb.aliyuncsslb.com/ray/raycg/dashboard/ raysubmit_gmSnPSFqmEXG****