Create a Flink DataStream job
Create a Flink DataStream job based on the open source Flink real-time engine.
Prerequisites
-
Enable the real-time engine and configure an open source Flink compute source for your project. For more information, see Create a general-purpose project.
-
Upload the JAR package for your DataStream job to the Dataphin platform. For more information, see Upload and reference resources.
Permissions
A Super Admin, Project Admin, or Developer can create a Flink DataStream job.
Step 1: Create a Flink DataStream job
-
In the top navigation bar of the Dataphin homepage, choose Develop > Data Development.
-
In the top navigation bar, select a Project. If you are in Dev-Prod mode, you also need to select an Environment.
-
In the left-side navigation pane, choose Data Processing > Compute Task. In the compute task list on the right, click the
icon and choose Flink DataStream. -
In the Create Flink DataStream Job dialog box, configure the parameters.
Flink on Kubernetes
Parameter
Description
Task name
The name must:
-
Contain only lowercase letters, digits, and underscores (_).
-
Be 4 to 63 characters long.
-
Be unique within the project.
-
Start with a letter.
Image type
You can select System Image or Custom Image. The default is System Image. If you select Custom Image, you must also provide the image address.
Image address
Enter the image address. Before using a custom image, you must download the metrics reporting plug-in and Checkpoint storage extension plug-in from the prompt on the page and package them into your image.
Production environment resource queue
You can select any resource group configured for real-time jobs, including those in externally registered clusters.
Engine version
Select the Flink engine version for the job. Only version 1.20.1 is supported.
Development environment resource queue and engine version
You can select System Default Settings or Custom Configuration.
-
System Default Settings: This is the default option. It uses the same environment cluster and engine version as the production environment.
-
Custom Configuration: You can select a specific environment cluster and engine version for the development environment job. This option is not available for projects in Basic mode.
Storage path
Select the storage directory for the job.
If a directory does not exist, you can create a folder:
-
Above the compute task list on the left, click the
icon to open the Create Folder dialog box. -
In the Create Folder dialog box, enter a folder Name and, if needed, select a directory for it.
-
Click OK.
Resource package path/Select resources
-
If you set Image type to Custom Image, enter the resource package path within the image, such as
local:///opt/flink/usrlib/flink-sql-job-1.15.3.jar. -
If you set Image type to System Image, select the resource package that the Flink DataStream job depends on.
Class name
Enter the fully qualified class name of the resource.
Use template
Enable this option to reference an existing real-time template.
Description
Enter a brief description for the Flink DataStream job. The description can be up to 1,000 characters.
Flink on YARN
Parameter
Description
Task name
The name must:
-
Contain only lowercase letters, digits, and underscores (_).
-
Be 4 to 63 characters long.
-
Be unique within the project.
-
Start with a letter.
Production environment resource queue
Lists all resource queues and Session clusters managed by the Flink compute source bound to the current project.
Production environment engine version
Select the Flink engine version for the job. Dataphin supports the following engine versions:
-
1.20.1
-
1.15.3
-
1.14.2
-
1.13.1
NoteIf you select a Session cluster for the production environment resource queue, you can only select version 1.20.1.
Development environment resource queue and engine version
You can select System Default Settings or Custom Configuration.
-
System Default Settings: This is the default option. It uses the same environment cluster and engine version as the production environment.
-
Custom Configuration: You can select a specific environment cluster and engine version for the development environment job. The available options are the same as for the production environment. This option is not available for projects in Basic mode.
Storage path
Select the storage directory for the job.
If a directory does not exist, you can create a folder:
-
Above the compute task list on the left, click the
icon to open the Create Folder dialog box. -
In the Create Folder dialog box, enter a folder Name and, if needed, select a directory for it.
-
Click OK.
Select resources
Select the resource package that the Flink DataStream job depends on.
Class name
Enter the fully qualified class name of the resource.
Use template
Enable this option to reference an existing real-time template.
Description
Enter a brief description for the Flink DataStream job. The description can be up to 1,000 characters.
-
-
Click OK.
Step 2: Add code and pre-compile job
A Flink DataStream job is a Java JAR job. To pass main arguments, enter them in the code editor, separated by spaces or line breaks. The system parses non-commented content, splits it into an argument array, and passes the array to the main function.
After editing the code, click Pre-compile in the top menu bar to check for syntax and permission issues.
If the pre-compilation succeeds, a Pre-compilation successful message appears. If it fails, a Pre-compilation failed message appears, and you can click Console at the bottom of the page to view the failure logs.
Step 3: Configure the Flink DataStream job
-
In the right-side pane of the current compute task, click Configuration.
-
In the configuration panel, configure the real-time mode for the Flink DataStream job.
ImportantFlink DataStream jobs do not support offline mode.
-
Real-time mode
-
Resource configuration (Required): Configure the resource queues, engine versions, job parallelism, number of TaskManagers, JobManager memory, and TaskManager memory for the production and development environments. For more information, see .
-
Variable configuration: Undeclared variables in your code are automatically parsed and added to the parameter list, where you can configure their types and values. For more information, see .
-
Checkpoint configuration: Configure checkpoints to allow the Flink job to recover to its previous state after an unexpected crash. For more information, see .
-
State configuration: Configure the automatic cleanup period for data in the state. For more information, see .
-
Runtime parameters: Configure runtime parameters to control the execution behavior and performance of your Flink application. For more information, see .
-
Kubernetes YAML: You can view and modify the Pod YAML files (including JobManager Pod and TaskManager Pod) for the job's real-time mode.
-
Dependency files: Configure the resource files that the job depends on. For more information, see .
NoteConfiguring dependency files is not supported for Flink DataStream jobs that use a custom image.
-
Dependencies: Configure job dependencies to help users quickly identify upstream and downstream jobs during troubleshooting. For more information, see .
-
-
-
Click OK.
Step 4: Submit the Flink DataStream job
-
Click the Submit button in the top menu bar.
-
In the Submit dialog box, review the Content to be submitted and Pre-check information, and enter commit notes.
-
Click OK and Submit.
-
Note
If your project is in Dev-Prod mode, you must publish the Flink DataStream job to the production environment. For more information, see Manage publishing tasks.
Next steps
After submitting the job, go to the O&M Center to monitor and manage it. For more information, see Manage real-time tasks.