Create an integration task by using a single pipeline

更新时间:
复制 MD 格式

A single offline pipeline is a script that processes offline integration tasks. You can use this pipeline to sync one or more tables from one or more source data sources to one or more target data sources. This topic describes how to configure an integration task using a single offline pipeline.

Prerequisites

Before you configure the integration task, you must configure the required source and target data sources. This lets you select data sources for your development components to control the database read and write operations of the integration task. For more information about the data sources that offline pipelines support, see Data sources supported by Data Integration.

Procedure

Step 1: Create a single offline pipeline

  1. On the Dataphin home page, select Development > Data Integration from the top menu bar.

  2. On the Data Integration page, select a Project from the top menu bar. If you are in Dev-Prod mode, you must also select an Environment.

  3. In the Batch Integration list, click the image icon and select Batch Pipeline. The Create Offline Pipeline dialog box appears.

  4. In the Create Offline Pipeline dialog box, configure the parameters for the pipeline.

    Parameter

    Description

    Pipeline Name

    Enter a name for the pipeline. The name can be up to 64 characters in length. The following special characters are not supported: vertical bar (|), forward slash (/), backslash (\), colon (:), question mark (?), angle brackets (<>), asterisk (*), and single quotation mark (').

    Schedule Type

    Select a schedule type for the pipeline. The Schedule Type can be one of the following:

    • Recurring Task Node: A task that runs on a regular basis.

    • Manual Node: A task that has no dependencies and must be triggered manually.

    Description

    Enter a brief description of the single offline pipeline. The description can be up to 1,000 characters in length.

    Select Directory

    Select the folder where the task is stored.

    If no folder is created, create a new folder. To do so, perform the following steps:

    1. Above the compute task list on the left, click the image icon. The New Folder dialog box opens.

    2. In the New Folder dialog box, enter a Name for the folder and select a location for the Directory as needed.

    3. Click OK.

  5. Click OK.

Step 2: Develop the offline pipeline script

A single offline pipeline is developed using visual components. To develop the pipeline script, select a component type from the Component Library in the sidebar of the canvas and drag the component to the canvas.

  • Component types: The component library contains five types of components: Input, Transform, Flow, Output, and Custom. Each component type is designed for a different scenario. Select the component types based on your requirements. For more information, see Development of the integration component library.

  • Components: Components are functional modules that you can use to develop pipeline scripts. Each component provides a different feature. Select the components based on your requirements. For more information, see Component configuration.

Step 3: Configure pipeline scheduling

  1. In the menu bar of the canvas, click the Scheduling Configuration button to configure the schedule.

  2. On the scheduling configuration page, configure the Basic information, Scheduling configuration, Scheduling dependencies, Scheduling parameters, Runtime configuration, and Resource configuration for the integration pipeline. The following list describes these configurations:

    • Basic information: You can configure the developer, owner, and description for the integration pipeline task. For more information, see Configure basic information for an offline integration pipeline.

    • Scheduling configuration: For a recurring task node, this configuration defines how the integration pipeline task is scheduled in the production environment. You can use the scheduling properties to configure the scheduling type, scheduling cycle, scheduling logic, and execution mode. For more information, see Scheduling configuration for an offline integration pipeline.

    • Scheduling dependencies: For a recurring task node, this configuration defines the nodes on which the integration pipeline task depends. Dataphin runs the nodes in a business flow in an orderly manner based on the configured scheduling dependencies. This ensures that business data is generated effectively and in a timely manner. For more information, see Scheduling dependency configuration for an offline pipeline.

    • Runtime configuration: Based on your business scenario, you can configure a task-level runtime timeout period and a retry policy for failed tasks. This prevents resource waste from long-running tasks and improves the reliability of task execution. For more information, see Runtime configuration for an offline pipeline task.

    • Resource configuration: You can configure the resource group for the offline integration task. The task uses the resources from this resource group for scheduling. For more information, see Configure resources for an offline integration pipeline task.

Step 4: Save and submit the offline integration task

  1. At the top of the canvas, click the Save icon to save the pipeline task.

  2. At the top of the canvas, click the Submit icon. In the Commit Remarks dialog box, enter your remarks and click OK And Submit.

    When you submit the task, Dataphin parses the data lineage and performs a check. For more information, see Instructions on how to submit an integration task.

What to do next

  • If your development mode is Dev-Prod, you must publish the task. For more information, see Manage publish tasks.

  • If your development mode is Basic, the task is scheduled in the production environment after you submit it. You can go to the Operation Center to view your published tasks. For more information, see Operation Center.