Create a Flink SQL task based on the open-source Flink real-time engine.
Prerequisites
Make sure that the real-time engine is enabled and Flink is configured as the compute source for your project. For more information, see Create a general-purpose project.
Permissions
Only super administrators, project administrators, and developers can create Flink SQL compute tasks.
Step 1: Create a Flink SQL task
-
In the top navigation bar of the Dataphin homepage, choose Develop > Data Development.
-
In the top navigation bar, select a Project. If you are in Dev-Prod mode, also select an Environment.
-
In the left-side navigation pane, choose Data Processing > Compute Task. In the compute task list on the right, click the
icon and select Flink SQL. -
In the Create Flink SQL Task dialog box, configure the parameters.
Parameter
Description
Task Name
The name must:
-
Contain only lowercase letters, digits, and underscores (_).
-
Be 4 to 63 characters in length.
-
Be unique within the project.
-
Start with a letter.
Production Environment Resource Queue/Development Environment Resource Queue
-
When a project is bound to a Flink compute source with a Kubernetes deployment mode, you can select all resource groups that are configured for real-time jobs (including resource groups in externally registered clusters).
-
When the deployment mode of the Flink compute source is YARN, the drop-down list contains all resource queues and Session clusters that are managed by the Flink compute source bound to the current project.
NoteIf your project workspace is in Basic mode, you can configure only the resource queue for the production environment.
Production Environment Engine Version/Development Environment Engine Version
Select the Flink engine version for the task. Dataphin supports the following versions:
-
1.20.1
-
1.15.3
-
1.14.2
-
1.13.1
Note-
When you select a Session cluster for the production/development resource queue, you can only select version 1.20.1.
-
If your project workspace is in Basic mode, you can configure only the engine version for the production environment.
Storage location
Select a directory to store the task.
If no directory exists, you can Create Folder. To do so:
-
Above the compute task list on the left, click the
icon to open the Create Folder dialog box. -
In the Create Folder dialog box, enter a Name for the folder and Select Directory to specify its location.
-
Click OK.
Creation Method
You can choose from Create Blank Task, Use Sample Code, and Use Template.
-
Create Blank Task: Creates an empty Flink SQL task.
-
Use Sample Code: Creates a task using built-in sample code.
-
Use Template: Creates a task based on a real-time compute task template.
Description
Enter a brief description for the Flink SQL task. The description can be up to 1,000 characters.
-
-
Click OK.
Step 2: Develop and precompile Flink SQL code
-
In the Flink SQL task code editor, write your code.
Dataphin supports creating metatables from native DDL statements. If Dataphin detects a native
create tableorcreate temporary tablestatement, click the
hint icon in the editor to quickly create a metatable. For more information, see Develop Flink SQL tasks.After writing the code, click the Format button in the top menu bar to automatically format the SQL code.
-
Click Precompile in the top menu bar to check for syntax and permission errors.
If the pre-compilation succeeds, a Pre-compilation successful message appears. If it fails, a Pre-compilation failed message appears. Click Console at the bottom of the page to view the failure log.
Step 3: Configure the Flink task
-
In the editor sidebar, click Configuration.
-
In the configuration panel, configure task settings for Real-time Mode and Batch Mode.
NoteDataphin supports stream-batch integration with a unified processing engine. You can configure both stream and batch settings for the same code to generate instances for different modes from a single codebase. To enable batch processing, turn on batch mode on the task configuration page and configure resources, scheduling, dependencies, and other settings.
-
Real-time Mode
-
Resource Configuration (Required): Configure resource queues for the production and development environments, engine version, task parallelism, number of Task Managers, Job Manager memory, and Task Manager memory. For more information, see .
-
Variable configuration: Assign values to variables used in the compute task code. These variables are automatically replaced with their corresponding values during execution. For more information, see .
-
Checkpoint configuration: Checkpoints help a Flink SQL job recover to its pre-crash state upon restart. For configuration details, see .
-
State configuration: Configure the automatic data cleanup period for the State. For more information, see .
-
Runtime Parameters: Configure runtime parameters to control Flink application execution behavior and performance. For more information, see .
-
Dependency Files: Configure resource files that the task depends on. For more information, see .
-
Dependencies: Configure dependencies to identify upstream and downstream task relationships during debugging. For more information, see .
-
-
Batch Mode (Beta)
-
Resource Configuration (Required): Configure resource queues for the production and development environments, engine version, task parallelism, number of Task Managers, Job Manager Memory, and Task Manager Memory. For configuration instructions, see .
-
Variable configuration: Assign values to variables in the compute task code so that variable parameters are automatically replaced with the corresponding values. For configuration instructions, see .
-
Runtime Parameters: Configure runtime parameters to control Flink application execution behavior and performance. For more information, see .
-
Dependency Files: Configure resource files that the Flink SQL task depends on. For more information, see .
-
Scheduling Configuration (Required): Configure how the node is periodically scheduled in the production environment, including the scheduling cycle, effective date, and other properties. For more information, see .
-
Dependencies (Required): Configure dependencies to identify upstream and downstream task relationships during debugging. For more information, see .
-
-
-
Click OK.
Step 4: Debug the Flink code
-
Click the Debug button in the top menu bar. This samples data and performs local debugging to verify your code.
-
In the debugging configuration dialog box, select Real-time Mode - FLINK Stream Task for real-time debugging or Batch Mode - FLINK Batch Task for batch debugging.
-
Real-time debugging: Samples real-time data and performs local debugging in Flink Stream mode. For more information, see Debug in real-time mode.
-
Batch debugging: Samples data from the corresponding offline physical table and performs local debugging in Flink Batch mode. For more information, see Debug in batch mode.
-
You can debug only one mode at a time. For the selected mode, you must sample data from the corresponding table type.
Step 5: Submit the Flink SQL task
-
Click the Submit button in the top menu bar.
-
In the Submit dialog box, review the Submission Content and Pre-check information, and enter Submission Comments.
-
Click OK and Submit.
If your project is in Dev-Prod mode, you must publish the Flink SQL task to the production environment. For more information, see Manage publishing tasks.
Next steps
After the task is submitted, you can view and manage Flink SQL jobs in the O&M Center. For more information, see Manage real-time jobs.