Create Flink SQL tasks based on the Ververica Flink engine to process real-time and batch data.
Prerequisites
Before you begin, make sure that the project has enabled the real-time engine and configured the Ververica Flink compute source. For more information, see Create a general project.
Permissions
Only super administrators, project administrators, and developers can create Flink SQL compute tasks.
Step 1: Create a Flink SQL task
-
In the top menu bar of the Dataphin home page, select Development > Data Development.
-
In the top menu bar, select Project. If in Dev-Prod mode, also select Environment.
-
In the navigation pane on the left, select Data Processing > Compute Task. In the list of compute tasks on the right, click the
icon and select Flink SQL. -
In the Create Flink SQL Task dialog box, configure the parameters.
Parameter
Description
Task Name
The naming conventions are as follows:
-
Only lowercase English letters, numbers, and underscores (_) are allowed.
-
The name must be 4 to 63 characters in length.
-
Duplicate names are not allowed within the same project.
-
The name must start with an English letter.
Production Environment Cluster
Select the cluster for the Flink SQL task.
Production Engine Version
Select the engine version for running tasks in production.
NoteIf your project space is in Basic mode, this configuration item is Engine Version.
Development Environment Cluster And Engine Version
You can select System Default Configuration or Custom Configuration.
-
System Default Configuration: The default option. Uses the same environment cluster and engine version as production.
-
Custom Configuration: Manually select the environment cluster and engine version for running tasks in development.
NoteIf your project space is in Basic mode, this configuration item does not apply.
Storage Directory
Select the directory for the task.
If no directory exists, you can Create Folder:
-
Above the compute task list on the left side of the page, click the
icon to open the Create Folder dialog box. -
In the Create Folder dialog box, enter the folder Name and select the Directory location as needed.
-
Click Confirm.
Creation Method
The following methods are supported: Blank Creation, Reference Sample Code, and Use Template.
-
Create Blank: Create a normal, blank Flink SQL task.
-
Reference Sample Code: Quickly create a task by referencing built-in sample code.
-
Use Template: Quickly create a task based on a real-time computing task template.
Description
Enter a description of the Flink SQL task, up to 1,000 characters.
-
-
Click OK.
Step 2: Develop and precompile the Flink SQL node code
-
On the Flink SQL node code page, write the node code.
After writing the code, click Format in the menu bar to auto-format the SQL code.
-
Click Precompile to check for syntax and permission issues.
If precompilation is successful, a Precompilation Successful message appears. If it fails, a Precompilation Failed message appears. Click Console at the bottom of the page to view the failure log.
Step 3: Configure Flink SQL Job
-
Click Configuration in the right sidebar of the current compute task.
-
On the configuration panel, configure the settings for the Flink SQL node in Real-time mode and Offline mode.
NoteDataphin real-time computing supports stream-batch integrated tasks using a unified compute engine. You can configure Stream + Batch on a single code to generate instances in different modes. To enable batch processing, activate offline mode on the task configuration page and configure resources, schedule dependencies, and other settings.
-
Real-time mode
-
Resource configuration (required): Configure the cluster, engine version, Job Manager CPUs, and Job Manager Memory for the production and development environments. For configuration instructions, see Configure resources for Ververica Flink real-time mode.
-
Variable Configuration: Variables for this node can be defined directly in the code without prior declaration. The system automatically extracts them into the parameter list, where you can adjust their types and values. For configuration instructions, see Real-time Mode Variable Configuration.
-
Checkpoint Configuration: Configuring checkpoints for a Flink SQL task enables recovery to the pre-failure state if an unexpected failure occurs. For configuration instructions, see Real-time mode Checkpoint configuration.
-
State Configuration: Set the interval for automatic data cleanup in the State. For configuration instructions, see Real-time Mode State Configuration.
-
Run parameters: Control the execution behavior and performance of Flink applications by configuring run parameters. For configuration instructions, see real-time mode run parameter configuration.
-
Dependency files: Configure the resource files required by the task. For configuration instructions, refer to real-time mode dependency file configuration.
-
Dependency relationships: Configuring dependency relationships helps you quickly identify upstream and downstream tasks during troubleshooting. For configuration instructions, see the configuration of real-time mode dependency relationships.
-
-
Offline mode (Beta)
-
Schedule Configuration (Required): Schedule configuration defines the recurring schedule pattern of a node in the production environment. Use schedule properties to set the scheduling cycle and effective date. For configuration instructions, see Offline Mode Schedule Configuration.
-
Resource Configuration (Required): Configure the cluster, engine version, degree of parallelism, number of Task Managers, Job Manager Memory, and Task Manager Memory for the production and development environments. For configuration instructions, see Configure resources for Ververica Flink offline mode.
-
Runtime parameters: Control the execution behavior and performance of Flink applications by configuring runtime parameters. For configuration instructions, see Offline mode runtime parameter configuration.
-
Dependency files: Configure the resource files required by the Flink SQL task. For configuration instructions, see Offline mode dependency file configuration.
-
Dependency Relationships (Required): Configuring dependency relationships helps you quickly identify upstream and downstream tasks during troubleshooting. For more information, see Offline Mode Dependency Relationship Configuration.
-
-
-
Click OK.
Step 4: Test the Flink SQL node code
-
Test your Flink SQL code in Dataphin. Click Test in the top menu bar to sample data from the code node and run local tests to verify correctness.
-
In the test configuration dialog box, select Real-time Pattern - FLINK Stream Node for real-time pattern testing or Offline Pattern - FLINK Batch Node for offline pattern testing.
-
Real-time Pattern Testing: Samples the corresponding real-time physical data and runs a local test using the Flink Stream pattern. For more information, see real-time pattern testing.
-
Offline Pattern Test: Uses data from the corresponding offline physical table and runs a local test using the Flink Batch pattern. For more information, see documentation.
-
Currently, only single pattern testing is supported. After selecting a pattern, you can sample the corresponding pattern table data for testing.
Step 5: Submit a Flink SQL Job
-
Click Submit in the top menu bar.
-
In the Submit dialog box, review the Submission Content and Pre-check information, and fill in the Submission Remarks.
-
Click Confirm And Submit.
NoteIf your project follows a Dev-Prod pattern, you must publish the Flink SQL node to the production environment. For detailed instructions, see here.
What to do next
In the Operation Center, view and manage Flink SQL nodes to ensure they run as expected. For more information, see View and manage real-time instances or View and manage real-time nodes.