Create a Flink compute source
A Flink compute source provides Flink-based compute resources for Dataphin projects. Only projects bound to a Flink compute source can run Flink compute jobs.
Prerequisites
-
Apache Flink is enabled as the real-time compute engine for your tenant. Set Real-time Compute Engine.
-
Your account has the New Compute Source permission through a custom role, Super Admin, or Project Admin. Data Warehouse Planning Permission List.
Procedure
-
In the top menu bar on the Dataphin homepage, select Planning > Compute Source.
-
On the Compute Source page, click Add Compute Source and select Flink Compute Source.
-
On the Create Compute Source page, configure the parameters.
-
Basic Information
Parameter
Description
Compute Type
Select Flink.
Compute Source Name
The compute source name must meet these requirements:
-
Can contain Chinese characters, letters, digits, underscores (_), and hyphens (-).
-
Cannot exceed 64 characters.
Compute Source Description
Optional. Maximum 128 characters.
-
-
Select a deployment mode and configure its parameters
Dataphin supports Yarn and Kubernetes deployment modes. Parameters vary by mode.
Yarn
-
Cluster Basic Information
Parameter
Description
Configuration File
Upload the cluster's
yarn-site.xml,core-site.xml, andhdfs-site.xmlconfiguration files.Cluster Kerberos
Kerberos is a symmetric-key authentication protocol that enables SSO across services such as HBase and HDFS.
If your cluster uses Kerberos, enable this option and upload a Krb5 file or configure the KDC server address.
-
Krb5 Authentication File: Upload the Krb5 file for Kerberos authentication.
-
KDC Server Address: The KDC server address. Separate multiple addresses with commas (,).
Cluster Type (Optional)
Select your cluster type for connection testing. Supported types: Aliyun E-MapReduce 5.x, CDH 5.x Hadoop, CDH 6.x Hadoop, Cloudera Data Platform 7.x, AsiaInfo DP 5.3 Hadoop, and Transwarp TDH 6.x Hadoop.
ImportantThe connection test can succeed without a cluster type, but selecting one prevents potential failures.
-
-
Flink Compute Resources
Parameter
Description
Compute Resource Type
Select Resource Queue or Session Cluster.
Resource Queue
Enter the YARN queue name for Flink job submission. The name must follow these rules:
-
Length: Maximum 256 characters.
-
Character limit: English letters, numbers, spaces, and
-_.@'(). -
Case-sensitivity: Case-sensitive.
-
Uniqueness: Must be unique within the compute source.
To add multiple queues, click + Add.
Note-
Maximum 10 resource queues.
-
To remove a queue, click the
delete icon. At least one queue must remain. Deleting a queue prevents submission of jobs that depend on it.
Session Cluster
Select one or more session clusters. The list shows all clusters from , regardless of status.
-
-
Flink Kerberos Authentication
NoteFlink Kerberos authentication is available only with the Resource Queue compute resource type.
-
Flink Kerberos: Enable this if your Flink cluster uses Kerberos authentication. Upload a keytab file and configure a principal.
-
Keytab File: Upload the keytab file from the Flink server.
-
Principal: The Kerberos username for the Flink keytab file.
-
-
Username: Required when Flink Kerberos is disabled. The cluster username for submitting Flink jobs.
-
-
CheckPoint Storage
File system: HDFS, OSS-HDFS, or Aliyun OSS (Flink 1.14 and 1.15 only). Parameters vary by file system.
NoteOSS-HDFS is supported only with Aliyun E-MapReduce 5.x Hadoop.
-
HDFS parameters:
Directory Path: The checkpoint storage path. Flink must have access to this path. Example:
hdfs://cdh-cluster-00001:8020/openflink/savepoint/. For HA clusters, usehdfs://nameservice/path. -
OSS-HDFS parameters:
-
Directory Path: The checkpoint storage path. Flink must have access to this path. Example:
hdfs://cdh-cluster-00001:8020/openflink/savepoint/. For HA clusters, usehdfs://nameservice/path. -
AccessKey ID and AccessKey Secret: The credentials to access the cluster's OSS. Use an existing AccessKey pair or create one. Create an AccessKey.
NoteThe AccessKey Secret is displayed only at creation and cannot be retrieved later.
-
-
Aliyun OSS parameters:
-
Endpoint: The OSS service endpoint.
-
Directory path: The format is
oss://{Bucket}/{Object}. -
AccessKey ID and AccessKey Secret: The credentials to access the cluster's OSS. Use an existing AccessKey pair or create one. Create an AccessKey.
NoteThe AccessKey Secret is displayed only at creation and cannot be retrieved later.
-
ImportantThe AccessKey credentials you configure here override any credentials set in the
core-site.xmlfile. -
Kubernetes
-
Cluster Basic Information
Not required for Kubernetes.
-
Flink Compute Engine Configuration
Select a checkpoint file system: NFS, S3, or Azure Blob Storage. Parameters vary by file system.
NFS
Parameter
Description
Server
The NFS server domain name.
Version
NFSv3 or NFSv4.
Contents
The checkpoint storage path on NFS. Example:
/data/checkpoint.Maximum capacity
Maximum NFS storage capacity in GiB. Exceeding this limit disrupts checkpoint storage.
S3
Parameter
Description
Endpoint (Optional)
The S3 endpoint. Example:
http://s3.us-east-2.amazonaws.com.NoteOptional for Amazon S3. Required for other S3-compatible services.
Directory Path
The checkpoint storage path. Format:
s3://{YOUR-BUCKET}/{path}. Use a dedicated directory and clean it up regularly.Access Key, Secret Key
The credentials for your S3-compatible storage. Click the
icon to view the plain text.Azure Blob Storage
Parameter
Description
Protocol
Only ABFS is supported.
Authentication Method
Only Shared Key is supported.
Directory Path
The checkpoint storage path. Format:
abfs://{YOUR-CONTAINER}@${YOUR-AZURE-ACCOUNT}.dfs.core.windows.net/{object-path}.Access Key
The access key for your Azure Blob Storage account. Click the
icon to view the plain text.
-
-
-
Click Test Connection to verify connectivity with the cluster.
Kubernetes mode does not support connection testing. Click Submit directly.
-
After the test succeeds, click Submit.
Next steps
Bind the Flink compute source to a project. Create a general project.