Configure the Amazon Redshift output component

更新时间:
复制 MD 格式

The Amazon Redshift output component writes data to an Amazon Redshift data source. When you synchronize data to Amazon Redshift, you must configure this component as the target.

Prerequisites

Procedure

  1. In the top menu bar of the Dataphin home page, choose R&D > Data Integration.

  2. In the top menu bar of the integration page, select a project. If you use the Dev-Prod mode, you must also select an environment.

  3. In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the batch pipeline that you want to develop to open its configuration page.

  4. In the upper-right corner of the page, click Component Library to open the Component Library panel.

  5. In the navigation pane on the left of the Component Library panel, choose Output. Find the Amazon Redshift component in the output component list on the right and drag it to the canvas.

  6. You can click and drag the image icon of the target input, transform, or flow component to connect it to the current Amazon Redshift output component.

  7. Click the image icon on the Amazon Redshift output component card to open the Amazon Redshift Output Configuration dialog box.

    image

  8. In the Amazon Redshift Output Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Basic Settings

    Step Name

    The name of the Amazon Redshift output component. Dataphin automatically generates a name, which you can change. The name must meet the following requirements:

    • Can contain only Chinese characters, letters, underscores (_), and digits.

    • Cannot exceed 64 characters in length.

    Datasource

    The drop-down list displays all Amazon Redshift data sources, regardless of whether you have write-through permissions. Click the image icon to copy the data source name.

    Time Zone

    Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be changed.

    Note

    For nodes created before version 5.1.2, you can select Data source default configuration or Channel configuration time zone. The default is Channel configuration time zone.

    • Data source default configuration: The default time zone of the selected data source.

    • Channel configuration time zone: The time zone configured in Properties > Channel Configuration for the current integration node.

    Schema (Optional)

    You can select tables across schemas. Select the schema of the target table. If you do not specify a schema, the schema configured in the data source is used by default.

    Table

    Select the target table for data output. You can enter a keyword to search for the table name, or enter the exact name and click Exact Search. After you select a table, the system automatically checks the table status. Click the image icon to copy the selected table name.

    If no target table exists in the Amazon Redshift data source, you can use the one-click DDL feature to quickly generate one:

    1. Click One-click DDL. Dataphin automatically generates the DDL statement to create the target table, including the table name (the source table name by default) and field types converted based on Dataphin fields.

    2. Modify the SQL script as needed, and then click Create. After the table is created, Dataphin automatically uses it as the target table.

      Note
      • If a table with the same name already exists in the development environment, an error is reported when you click Create.

      • If there are no matching items, you can still perform integration based on a manually entered table name.

    Policy for missing production table

    The policy when a production table does not exist. You can select Do not process or Automatic creation. The default is Automatic creation. If you select Do not process, the production table is not created when the node is published. If you select Automatic creation, a table with the same name is created in the target environment when the node is published.

    • Do not process: If the target table does not exist, a message is displayed when you submit, but you can still publish the node. In this case, you must manually create the target table in the production environment before you can run the node.

    • Automatic creation: You must edit the DDL statement. The DDL statement for the selected table is populated by default and can be modified. Use the placeholder ${table_name} for the table name. This is the only supported placeholder and is replaced with the actual table name during execution.

      If the target table does not exist, it is created based on the DDL statement. If the creation fails, the publishing check fails. You can modify the DDL statement and try again. If the target table already exists, the DDL statement is not executed.

    Note

    This parameter is supported only in projects that use the Dev-Prod mode.

    Batch write data volume (Optional)

    The amount of data to write in a single batch. You can also set Batch write count. The system writes data when either limit is reached. The default is 32 MB.

    Batch write count (Optional)

    The default is 2048 records. Data is written using a batching strategy controlled by Batch write count and Batch write data volume.

    • When the accumulated data reaches either limit (data volume or record count), the system considers the batch full and writes it to the target.

    • For example, set the batch write data volume to 32 MB. You can adjust the batch write count based on the actual size of a single record. A large value is usually set to take full advantage of batch writing. For example, if a single record is about 1 KB, you can set the batch write data volume to 16 MB. In this case, set the batch write count to a value greater than 16 MB divided by 1 KB (16,384). For example, set it to 20,000. With this configuration, the system triggers a batch write when the accumulated data reaches 16 MB.

    Pre-SQL statement (Optional)

    The SQL script to run on the database before data import.

    For example, to ensure continuous service availability, you can create a target table Target_A before writing data. Write data to Target_A. After writing completes, rename the service table Service_B to Temp_C, rename Target_A to Service_B, and then delete Temp_C.

    Post-SQL statement (Optional)

    The SQL script to run on the database after data import.

    Field Mapping

    Input Fields

    Displays the input fields based on the output of the upstream component.

    Output Fields

    Displays the output fields. You can perform the following operations:

    • Field Management: Click Field Management to select output fields.

      image

      • Click the gaagag icon to move fields from Selected Input Fields to Unselected Input Fields.

      • Click the agfag icon to move fields from Unselected Input Fields to Selected Input Fields.

    • Batch Add: Click Batch Add to configure fields in batches using JSON, TEXT, or DDL format.

      • To configure in JSON format, for example:

        // Example:
        [{
          "name": "user_id",
          "type": "String"
         },
         {
          "name": "user_name",
          "type": "String"
         }]
        Note

        name indicates the field name, and type indicates the field type after import. For example, "name":"user_id","type":"String" imports the field user_id and sets its type to String.

      • To configure in TEXT format, for example:

        // Example:
        user_id,String
        user_name,String
        • The row delimiter separates the information for each field. The default is a line feed (\n). Semicolons (;) and periods (.) are also supported.

        • The column delimiter separates the field name from the field type. The default is a comma (,).

      • To configure in DDL format, for example:

        CREATE TABLE tablename (
            id INT PRIMARY KEY,
            name VARCHAR(50),
            age INT
        );
    • Create Output Field: Click + Create Output Field, enter a Column name, and select a Type. After you configure the current row, click the image icon to save.

    Mapping

    You can manually map fields based on the upstream input and target table fields. Quick Mapping includes Map by Row and Map by Name.

    • Map by Name: Maps fields with the same name.

    • Map by Row: Maps fields in the same row when the source and target field names differ but the data corresponds.

  9. Click Confirm to complete the configuration of the Amazon Redshift output component.