Flink SQL job

更新时间:
复制 MD 格式

This topic guides you through creating, deploying, and starting a Flink SQL job, outlining the basic development and operational workflow.

Prerequisites

Step 1: Create an SQL draft

  1. Go to the SQL draft creation page.

    1. Log on to the Realtime Compute console.

    2. Find the target Flink workspace and click Console in the Actions column.

    3. In the navigation pane, click Development > ETL.

  2. Click the image icon, and then click New Blank Stream Draft. Enter a File Name and select an Engine Version.

    Realtime Compute for Apache Flink provides a variety of code templates and data synchronization templates. Each template includes use case descriptions, code examples, and instructions. You can click a template to quickly learn about product features and syntax to implement your business logic. For more information, see Code templates and Data synchronization templates.

    Parameter

    Description

    Example

    File Name

    The name of the SQL draft.

    Note

    The name must be unique within the current project.

    flink-test

    Engine Version

    The Flink engine version for the SQL draft.

    We recommend using versions with the Recommend or Stable tag for better reliability and performance. For more information about engine versions, see Release notes and Engine versions.

    vvr-8.0.8-flink-1.17

  3. Click Create.

Step 2: Write SQL and view draft configurations

  1. Write SQL code.

    Copy the following SQL into the editor. This example uses a Datagen connector to generate a random data stream and a Print connector to write the output to the console logs. For more information about supported connectors, see Supported connectors.

    -- Create a temporary source table named datagen_source.
    CREATE TEMPORARY TABLE datagen_source(
      randstr VARCHAR
    ) WITH (
      'connector' = 'datagen' -- Use the Datagen connector.
    );
    
    -- Create a temporary sink table named print_table.
    CREATE TEMPORARY TABLE print_table(
      randstr  VARCHAR
    ) WITH (
      'connector' = 'print',   -- Use the Print connector.
      'logger' = 'true'        -- Write the output to logs.
    );
    
    -- Select a substring from the randstr field and insert it into the sink table.
    INSERT INTO print_table
    SELECT SUBSTRING(randstr,0,8) from datagen_source;
    Note
    • This example uses an INSERT INTO statement to write data to a single sink table. You can also use the INSERT INTO statement to write data to multiple sink tables. For more information, see INSERT INTO statement.

    • In production, use tables registered in Catalogs instead of temporary tables. For more information, see Catalogs.

  2. View draft configurations.

    On the right of the SQL editor, you can view or configure settings on several tabs.

    Tab

    Description

    More configurations

    • Engine version: For more information, see Engine versions and Lifecycle policies. We recommend that you use a recommended or stable version. The version tags are as follows:

      • Recommend: The latest minor version of the current major version.

      • Stable: The latest minor version of a major version within its service period, with known defects fixed.

      • Normal: Other minor versions that are still within their service periods.

      • Deprecated: Versions that have passed their end-of-life (EOL) date.

    • Additional dependencies: Additional dependencies required for the job, such as temporary functions.

    • Kerberos Authentication: Enable Kerberos authentication and configure a registered Kerberos cluster and principal information. If you have not registered a Kerberos cluster, see Register a Hive Kerberos cluster.

    Code structure

    • Data flow: Use the data flow diagram to quickly view the data lineage.

    • Tree structure: Use the tree structure diagram to quickly view the data sources.

    Version information

    You can view the version history of the SQL draft here. For more information about the functions in the Actions column, see Manage draft versions.

(Optional) Step 3: Validate and debug the SQL draft

  1. Validate the SQL draft.

    Validation checks the SQL semantics, network connectivity, and metadata of the tables used in your draft. After validation, you can click SQL Optimization in the results area to view potential SQL risks and optimization suggestions.

    1. In the upper-right corner of the SQL editor, click Deep Check.

    2. In the Deep Check dialog box, click Confirm.

    Note

    If a timeout error occurs, you might see the following message: The RPC times out maybe because the SQL parsing is too complicated. Please consider enlarging the flink.sqlserver.rpc.execution.timeout option in flink-configuration, which by default is 120 s.

    Solution: Add the following configuration parameter at the top of your SQL editor.

    SET 'flink.sqlserver.rpc.execution.timeout' = '600s';
  2. Debug the SQL draft.

    The debug feature lets you simulate a job run to check its output and verify your SELECT or INSERT logic. This feature improves development efficiency and reduces data quality risks.

    Note

    The debug feature does not write data to the sink table.

    1. In the upper-right corner of the SQL editor, click Debug.

    2. In the Debug dialog box, select a debug cluster and click Next.

      If no session cluster is available, you must create one. The session cluster must use the same engine version as the SQL draft and be running. For more information, see Step 1: Create a session cluster.

    3. Configure the debug data and click OK.

      For more information about the configuration, see Step 2: Debug a job.

Step 4: Deploy the SQL draft

In the upper-right corner of the SQL editor, click Deploy. In the Deploy New Version dialog box, configure the parameters as needed and click OK.

When deploying the draft, you can select a queue or a session cluster as the Deployment target. The following table compares these two options.

Deployment target

Environment

Key features

Queue

Production

  • Exclusive resources: Resources are dedicated to the job and are not preempted, which ensures stability.

  • Resource isolation: You can add resource queues to isolate and manage resources.

  • Use cases: Suitable for long-running or high-priority jobs.

Session cluster

Development and testing

  • Shared resources: Multiple jobs share a JobManager (JM), improving resource utilization.

  • Fast startup: Jobs start quickly by reusing initialized resources.

  • Use cases: Suitable for development, testing, and lightweight jobs. You must plan resource quotas carefully to prevent the resource sharing mechanism from affecting job stability.

Important

Logs are not available for jobs that run on a session cluster.

Step 5: Start job and view results

  1. In the navigation pane, click O&M > Deployments.

  2. Find the target job and click Start in the Actions column.

    Select stateless start, and then click Start. The job is running when its status changes to Running. For more information about start parameters, see Start a job.

  3. On the Deployment details page, view the Flink job results.

    1. On the O&M > Deployments page, click the name of the target job.

    2. On the Logs tab, click the Running TaskManagers subtab. In the Path, ID column, click a TaskManager.

    3. Click the Logs tab and search for logs related to PrintSinkOutputWriter.

      If you find log entries containing the chain Source: datagen_source → Calc → Sink: print_table, the data stream is being processed correctly.

(Optional) Step 6: Stop the job

To apply changes to a job (such as code modifications, WITH parameter updates, or version changes), you must redeploy, stop, and then restart it. A restart is also required for a stateless start or to apply non-dynamic configuration changes. For more information about stopping a job, see Stop a job.

  1. On the O&M > Deployments page, find the target job and click Cancel in the Actions column.

  2. Click OK.

Related documents