Configure the GaussDB (DWS) Output Component

更新时间:
复制 MD 格式

The GaussDB (DWS) output component writes data to a GaussDB (DWS) data source. When syncing data from another data source to GaussDB (DWS), configure the target data source in the GaussDB (DWS) output component after you complete the source data configuration. This topic describes how to configure the GaussDB (DWS) output component.

Prerequisites

  • You have created a GaussDB (DWS) data source. For more information, see Create a GaussDB (DWS) data source.

  • The account used to configure the GaussDB (DWS) output component must have write-through permission for the data source. If you lack this permission, request it. For more information, see Request data source permissions.

Procedure

  1. On the Dataphin homepage, in the top menu bar, choose R&D > Data Integration.

  2. On the integration page, in the top menu bar, select a project. In Dev-Prod mode, also select an environment.

  3. In the left navigation pane, click Offline Integration. In the Offline Integration list, click the target offline pipeline to open its configuration page.

  4. In the upper-right corner, click Component Library to open the Component Library panel.

  5. In the left navigation pane of the Component Library panel, select Output. In the output component list on the right, locate the GaussDB (DWS) component and drag it onto the canvas.

  6. Click and drag the image icon from the upstream input, transform, or flow component and connect it to the GaussDB (DWS) output component.

  7. On the GaussDB (DWS) output component card, click the image icon to open the GaussDB (DWS) output configuration dialog box. image

  8. In the GaussDB (DWS) output configuration dialog box, configure the following parameters.

    Parameter

    Description

    Basic Information

    Step Name

    The name of the GaussDB (DWS) output component. Dataphin auto-generates a step name, but you can modify it based on your business scenario. Follow these naming rules:

    • Use only letters, digits, underscores (_), and Chinese characters.

    • Do not exceed 64 characters.

    Datasource

    The data source drop-down list shows all GaussDB (DWS) data sources, including those with and without write-through permission. Click the image icon to copy the current data source name.

    Schema (optional)

    Select a schema to choose a table across schemas. If you do not specify a schema, Dataphin uses the schema configured in the data source by default.

    Table

    Select the target table for output data. Enter a keyword to search for tables, or enter the exact table name and click Exact Search. After selecting a table, Dataphin automatically checks its status. Click the image icon to copy the selected table name.

    If the target table does not exist in the GaussDB (DWS) data source, use the one-click table creation feature to quickly generate it. Follow these steps:

    1. Click One-Click Table Creation. Dataphin auto-generates SQL code to create the target table, including the table name (defaulting to the source table name) and field types (preliminarily converted based on Dataphin fields).

    2. Modify the SQL script as needed, then click Create. After successful creation, Dataphin automatically sets the new table as the output target.

      Note

      If a table with the same name exists in the development environment, Dataphin returns an error when you click Create.

    Production Table Missing Policy

    Choose how to handle missing production tables: Do Nothing or Automatic Creation. The default is Automatic Creation. If you select Do Nothing, Dataphin skips table creation during task publishing. If you select Automatic Creation, Dataphin creates a table with the same name in the target environment during publishing.

    • Do Nothing: If the target table does not exist, Dataphin shows an error during submission but still allows publishing. You must manually create the table in the production environment before running the task.

    • Automatic Creation: Click Edit Table Creation Statement to adjust the auto-filled SQL. Use the placeholder ${table_name} for the table name—this is the only supported format. Dataphin replaces it with the actual table name at runtime.

      If the target table does not exist, Dataphin runs the table creation statement first. If creation fails, publishing fails, and you must fix the SQL based on the error message before republishing. If the table already exists, Dataphin skips creation.

    Note

    This setting is available only for projects in Dev-Prod mode.

    Loading Policy

    Choose between insert and copy strategies.

    • Insert strategy: Uses the GaussDB (DWS) insert into...values... statement to write data. If a primary key or unique index conflict occurs, the conflicting row becomes dirty data and fails to write. Use this strategy by default.

    • Copy strategy: Uses the GaussDB (DWS) copy from command to load data from standard input into a table. On conflict, it follows a conflict resolution policy. Use this strategy only if you encounter performance issues. You must also configure the Conflict Resolution Policy, which includes Error on Conflict and Overwrite on Conflict.

      Important

      The conflict resolution policy applies only in Copy mode and only when the AnalyticDB for PostgreSQL kernel version is greater than 4.3. If the kernel version is less than 4.3 or unknown, use this policy with caution to avoid task failure.

    Batch Write Data Volume (optional)

    The maximum data volume written in a single batch. You can also set Batch Write Record Count. Dataphin writes data as soon as either limit is reached. The default is 32 MB.

    Batch Write Record Count (optional)

    The default is 2,048 records. During data sync, Dataphin uses batch writing based on two parameters: Batch Write Record Count and Batch Write Data Volume.

    • When the accumulated data reaches either limit (volume or record count), Dataphin writes the batch immediately.

    • We recommend setting the batch write data volume to 32 MB. Adjust the record count based on your record size to maximize batch efficiency. For example, if each record is about 1 KB, set the batch size to 16 MB and the record count to more than 16,384 (16 MB ÷ 1 KB). Setting it to 20,000 records ensures Dataphin triggers writes based on the 16 MB volume limit.

    Preparation Statement (optional)

    An SQL script executed before data import.

    For example, to maintain service availability, you might create a temporary table Target_A before writing data, write to Target_A, rename the live table Service_B to Temp_C, rename Target_A to Service_B, and finally delete Temp_C.

    Completion Statement (optional)

    An SQL script executed after data import.

    Field Mapping

    Input Fields

    Shows input fields based on the upstream component's output.

    Output Fields

    Shows output fields. Click Field Management to select output fields.

    image

    • Click the gaagag icon to move a selected input field to the unselected input fields list.

    • Click the agfag icon to move an unselected input field to the selected input fields list.

    Mapping

    Manually map fields based on upstream input and target table fields. Mapping includes Row-Based Mapping and Name-Based Mapping.

    • Name-Based Mapping: Maps fields with identical names.

    • Row-Based Mapping: Maps fields by position when source and target field names differ. Only maps fields in the same row.

  9. Click Confirm to complete the configuration of the GaussDB (DWS) output component.