Configure Amazon Redshift Input Component

更新时间: 2026-03-26 04:24:28

The Amazon Redshift input component retrieves data from an Amazon Redshift data source. To synchronize data from Amazon Redshift to another data source, first configure the Amazon Redshift input component to read from the source data source. Then, configure the destination data source for data synchronization. This topic describes how to configure the Amazon Redshift input component.

Prerequisites

Procedure

  1. In the Dataphin homepage top menu bar, select Development > Data Integration.

  2. On the Integration page, in the top menu bar, select Project. In Dev-Prod mode, select an environment.

  3. In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.

  4. Click Component Library in the top-right corner of the page to open the Component Library panel.

  5. In the left navigation pane of the Component Library panel, select Input. In the list of input components on the right, locate the Amazon Redshift component and drag it to the canvas.

  6. Click the image icon in the Amazon Redshift input component card to open the Amazon Redshift Input Configuration dialog box.

  7. In the Amazon Redshift Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the Amazon Redshift input component. Dataphin automatically generates the step name. Modify it as needed. Naming rules are as follows:

    • Can contain only Chinese characters, letters, underscores (_), and numbers.

    • Length cannot exceed 64 characters.

    Datasource

    The data source drop-down list displays all Amazon Redshift data sources, including those for which you have read-through permissions and those for which you do not. Click the image icon to copy the current data source name.

    Time Zone

    The system processes time-formatted data based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be modified.

    Note

    For tasks created before V5.1.2, you can select Data Source Default Configuration or Channel Configuration Time Zone. The default option is Channel Configuration Time Zone.

    • Data Source Default Configuration: The default time zone of the selected data source.

    • Channel Configuration Time Zone: The time zone configured for the current integration task under Properties > Channel Configuration.

    Schema (Optional)

    Supports selecting tables across schemas. Select the schema where the table resides. If not specified, the default is the schema configured in the data source.

    Source Table Quantity

    Select the number of source tables. The options are Single Table and Multiple Tables:

    • Single Table: Use this for scenarios where you sync business data from one table to one destination table.

    • Multiple Tables: Use this for scenarios where you sync business data from multiple tables to the same destination table. When data from multiple tables is written to the same data table, the union algorithm is used.

      For more information about UNION, see INTERSECT, UNION, and EXCEPT.

    Table Matching Method

    You can select Generic Class Rule or Database Regular Expression.

    Note

    This option is configurable only when Multiple Tables is selected for source table quantity.

    Table

    Select the source table:

    • If you select Single Table for Number of Source Tables, you can search for a table by entering a keyword, or enter the exact table name and click Exact Search. After you select a table, the system automatically checks its status. Click the image icon to copy the name of the currently selected table.

    • If Multiple Tables is selected for source table quantity, enter different expressions to add tables based on the table matching method.

      • If General Rules is selected for table matching method: In the input box, enter a table expression to filter for tables with the same structure. The system supports enumeration, regular expression-like forms, and mixed forms. For example, table_[001-100];table_102;.

      • If Database Regular Expression is selected for table matching method: In the input box, enter the regular expression supported by the current database. The system matches tables in the destination database based on this regular expression. During task runtime, the system instantly matches new table ranges for synchronization based on the database regular expression.

      After you enter the expression, click Exact Match to view the list of matched tables in the Confirm Match Details dialog box.

    Split Key (Optional)

    The system partitions data based on the configured split key field. Use this with concurrency configuration to achieve concurrent reads. You can use a column from the source data table as the split key. Additionally, use a primary key or an indexed column as the split key to ensure transfer performance.

    Important

    When you select a date/time type, the system performs a brute-force split based on the total time range and concurrency by identifying the maximum and minimum values. This does not guarantee an even split.

    Batch Read Count (Optional)

    The number of data records read at one time. When reading data from the source database, configure a specific batch read count (such as 1024 records) instead of reading records one by one. This reduces interactions with the data source, improves I/O efficiency, and lowers network latency.

    Input Filter (Optional)

    Enter the filter information for the input field, for example, ds=${bizdate}. The Input Filter applies to the following two scenarios:

    • A fixed subset of data.

    • Parameter filtering.

    Output Fields

    The output fields area displays all fields from the selected table and those matched by the filter conditions. It supports the following operations:

    • Field Management: If you do not need to output certain fields to downstream components, delete the corresponding fields:

      • Delete single field: To delete a few fields, click the sgaga icon in the Actions column to remove redundant fields.

      • Batch field deletion scenario: To delete many fields, click Field Management, select multiple fields in the Field Management dialog box, click the image left-moving icon to move the selected input fields to the unselected input fields, and click OK to complete batch field deletion.

        image..png

    • Batch Add: Click Batch Add to perform batch configuration in JSON, TEXT, or DDL format.

      Note

      After batch addition is complete, clicking OK will overwrite the configured field information.

      • Batch configure in JSON format. For example:

        // Example:
          [{
             "index": 0,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 1,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        Index indicates the column number of the specified object. Name indicates the field name after import. Type indicates the field type after import. For example, "index":3,"name":"user_id","type":"String" means to import the fourth column from the file, with the field name user_id and field type String.

      • Batch configure in TEXT format. For example:

        // Example:
        0,id,int(10),Long,comment1
        1,user_name,varchar(255),Long,comment2
        • The row delimiter separates information for each field. The default is a line feed (\n). It supports line feed (\n), semicolon (;), and period (.).

        • The column delimiter is used to separate the field name and field type. The default is a comma (,), and it supports','. The field type can be omitted, and the default is','.

      • Batch configure in DDL format. For example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Add an output field: Click + Add Output Field. As prompted, specify the Column, Type, and Description, and select a Mapping Type. After you complete the configuration for the current row, click the image icon to save.

  8. Click Confirm to complete the property configuration of the Amazon Redshift input component.

上一篇: Configure the Snowflake input component 下一篇: Configure the Amazon RDS for PostgreSQL Input Component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈