Manage Scheduling Clusters

更新时间:
复制 MD 格式

Dataphin enables you to connect multiple scheduling clusters and uniformly manage all schedulable resources across them. Each tenant can use resources from different clusters to create custom resource groups. This approach resolves challenges such as cross-Region data transmission and resource isolation.

Background information

image

Each Dataphin tenant has a **default scheduling cluster**, which is the **Dataphin cluster**. Dataphin also allows metadata warehouse tenants to register custom clusters for task scheduling and specify which tenants can use each custom cluster. This feature helps resolve issues such as low security, high bandwidth costs, and low transmission efficiency that arise when transferring data across Regions over the public network.

For example, if Dataphin is deployed in an on-premises data center and you need to integrate data from cloud business database A into business database B—which resides in the same Region as database A—perform the following steps. First, create a Kubernetes (K8s) cluster in the cloud using Container Service (CC), and specify the machines available for Dataphin task scheduling. Then, register this cluster with Dataphin and create a corresponding custom resource group. When creating an integration task, select the scheduling resource group created under that cluster. This enables data transmission within the same Region without exchanging data with the Dataphin cluster.

Limits

  • Only customers deployed with the latest architecture can use the scheduling cluster management feature. For details, contact the product O&M team.

  • You can register up to **5** clusters, excluding the default cluster (Dataphin cluster).

Permissions

Only global roles with **resource configuration management permission** can manage scheduling clusters. Global roles with **system settings viewing permission** can view system settings.

Manage Scheduling Clusters

  1. On the Dataphin homepage, in the top menu bar, click Management Hub, and then click System Settings.

  2. In the left navigation pane, click Tenant Settings, and then click Resource Settings.

  3. On the Resource Settings page, click the Scheduling Cluster Management tab. By default, the Scheduling Clusters list displays the clusters available to the current tenant—the default cluster.

    The scheduling cluster list shows basic information about each cluster, including Scheduling Cluster Name/ID, Owner, Total Resources, Status, Description, Last Updated By/Time, and supported management operations.

    • Total Resources: The total available resources in the current cluster.

    • Status: The scheduling cluster status includes **Waiting for Resource Reporting**, **Waiting Timeout**, **Normal**, and **Abnormal**. For more information, see Scheduling Cluster Status.

  4. (Optional) Filter scheduling clusters by entering a cluster name, owner, or status.

  5. Perform the following management operations on scheduling clusters in the list.

    Note

    The default scheduling cluster is a system cluster and supports viewing only.

    Operation

    Description

    Edit

    Click the image.png icon in the Actions column of the target scheduling cluster. In the Edit Scheduling Cluster Basic Information dialog box, modify the following information for the current scheduling cluster: Cluster Basic Information, MaxCompute Connection Configuration, Metric Collection Configuration, and . For more information about parameters, see Edit Registered Scheduling Cluster.

    Cluster Connection Configuration Guide

    Click the image.png icon in the Actions column of the target scheduling cluster, and in the Cluster Connection Configuration Guide dialog box, view how to configure the connection and authorization for custom clusters. Only successfully connected clusters can be used to create custom resource groups. For detailed operations, see How Dataphin Connects to Data Sources in Alibaba Cloud VPC by Registering Scheduling Clusters.

    Delete

    Click the image.png icon in the Actions column of the target scheduling cluster to delete the scheduling cluster for which no custom resource group has been created.

    Important

    After deletion, the Agent application deployed in the target cluster stops running and cannot be recovered. **Contact the target cluster owner** to delete the corresponding container (Pod). The specific deletion command is sh uninstall.sh.

Edit Registered Scheduling Clusters

Parameter

Description

Cluster Basic Information

Cluster Name, Owner, Description

The parameter descriptions for cluster basic information are the same as for the creation operation. For details, see Register Scheduling Clusters.

MaxCompute Connection Configuration

Custom Endpoint

The connection configuration for the current cluster to access MaxCompute compute sources. By default, it is the same as the configuration in **Management Center** > **Compute Settings**. After enabling, a dedicated connection address is added for the current cluster.

If the cluster can connect to MaxCompute's VPC Endpoint, prioritize selecting the VPC address.

Cluster Region

Select the cluster's Region. The available options here are the same as those in **Management Center** > **Compute Settings** > **Region**.

Network Connection Method

You can select Alibaba Cloud VPC Network or Public Network Access.

Note

You can configure this option only when the cluster's Region is **Beijing**, **Shanghai**, **Shenzhen**, **Hangzhou**, or **Chengdu**. You can only select an option that is **different** from the one in **Management Center** > **Compute Settings**. For example, if the network connection method in Compute Settings is public network, then only Alibaba Cloud VPC network can be selected here.

Connection Endpoint

  • When Cluster Region is set to Other, the address defaults to the one specified for the Endpoint in Management Hub > Compute Settings, and you must modify it manually.

  • When the Cluster Region is set to Beijing, Shanghai, Shenzhen, Hangzhou, or Chengdu, the system automatically generates the corresponding Endpoint, which cannot be modified.

Metric Collection Configuration

Metric Collection

Collect cluster metric information through Prometheus's HTTP API. This is disabled by default. After enabling, view the resource consumption trend of the current cluster in **O&M** > **Scheduling Resource Dashboard**.

Cluster Type

You can select Alibaba Cloud ACK or Other.

Prometheus HTTP API

Enter the Prometheus HTTP API by default.

Authentication Type

When you select Alibaba Cloud ACK as the cluster type, you can choose No Authentication, Token Authentication, or AccessKey Authentication. When you select Other as the cluster type, you can choose No Authentication or Token Authentication.

When you select token authentication, you must also specify the token. When you select AccessKey authentication, you must also specify the AccessKey ID and AccessKey Secret.

Scheduling Cluster Status

Scheduling cluster statuses include **Waiting for Resource Reporting**, **Waiting Timeout**, **Normal**, and **Abnormal**. Descriptions for each status are as follows:

Note

Only clusters with a **Normal** status can be used to create custom resource groups. If a custom resource group has already been created under a cluster and the cluster’s status changes to **Abnormal**, the custom resource group becomes unusable.

Parameter

Description

Waiting for Resource Reporting

The cluster is in the Waiting for Resource Reporting status if it is only registered for connection configuration, or if connection configuration is complete but Dataphin has not received resource information reported by the specified cluster. For details, see How Dataphin Connects to Data Sources in Alibaba Cloud VPC Through Registered Scheduling Clusters.

Waiting Timeout

If the cluster does not report information within **2 hours** after registration, it enters the Waiting Timeout status. In this status, contact the cluster owner to confirm whether the Agent application is deployed or if the target cluster has available machines.

Normal

The cluster is successfully registered, and its connection configuration is complete. Dataphin continuously and stably receives resource information reported by the specified cluster, allowing normal use.

Abnormal

If a cluster was previously in a Normal status but remains unresponsive for a certain period, it enters an Abnormal status. Check if the cluster's Agent application is working correctly or contact the cluster owner to check if the target cluster has available machines.