Configure the Amazon RDS for PostgreSQL Input Component

更新时间: 2026-06-17 11:23:25

The Amazon RDS for PostgreSQL input component reads data from an Amazon RDS for PostgreSQL data source. To synchronize data from Amazon RDS for PostgreSQL to another data source, configure the input component to specify the source, and then configure the destination data source.

Prerequisites

Procedure

  1. On the Dataphin homepage, in the top menu bar, click Develop, and then click Data Integration.

  2. On the Integration page, in the top menu bar, select a Project. If you are using Dev-Prod mode, also select an environment.

  3. In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop. The configuration page for that offline pipeline opens.

  4. In the upper-right corner of the page, click Component Library. The Component Library panel opens.

  5. In the left navigation pane of the Component Library panel, click Input. In the list of input components on the right, locate the Amazon RDS for PostgreSQL component, and drag it onto the canvas.

  6. Click the image icon on the Amazon RDS for PostgreSQL input component card to open the Amazon RDS for PostgreSQL Input Configuration dialog box.

  7. In the Amazon RDS for PostgreSQL Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the input component. Dataphin generates a default step name that you can change. The name must follow these rules:

    • Use only Chinese characters, letters, underscores (_), and digits.

    • Keep the name no longer than 64 characters.

    Datasource

    Lists all Amazon RDS for PostgreSQL data sources, including those you have read-through permission for and those you do not. Click the image icon to copy the data source name.

    Schema (optional)

    The schema where the table resides. Selecting a schema lets you access tables across schemas. If not specified, the schema configured in the data source is used.

    Source Table Count

    The number of source tables. Options are Single Table and Multiple Tables:

    • Single Table: Syncs data from one source table to one destination table.

    • Multiple Tables: Syncs data from multiple source tables to one destination table using the union algorithm.

      For more information about union, see .

    Table Matching Method

    Select either Generic Rules or Database Regex.

    Note

    This setting is available only when you select Multiple Tables for Source Table Count.

    Table

    Select the source table:

    • If you selected Single Table for Source Table Count, search by entering a keyword in the table name field. Or enter the exact table name and click Exact Match. After you select a table, the system detects its status automatically. Click the image icon to copy the name of the selected table.

    • If you selected Multiple Tables for Source Table Count, enter an expression based on the table matching method.

      • If you selected Generic Rules for table matching, enter an expression in the field to filter tables with the same structure. The system supports enumeration, regex-like syntax, or a mix of both. For example: table_[001-100];table_102;.

      • If you selected Database Regex for table matching, enter a regular expression supported by the database. The system matches tables in the destination database using this regex. At runtime, the task matches new tables dynamically based on the regex.

      After you enter the expression, click Exact Match. In the Confirm Match Details dialog box, view the list of matched tables.

    Split Key (optional)

    The column used to partition data for concurrent reads. Pair this with concurrency settings to read data in parallel. Any column from the source table can serve as the split key. For best performance, use a primary key or an indexed column.

    Important

    If you select a date-time type, the system performs brute-force partitioning based on the full time range and concurrency setting. This does not guarantee even distribution.

    Batch Read Size (optional)

    The number of records to read per batch. Setting a batch size (for example, 1024) reduces round trips to the data source, improves I/O efficiency, and lowers network latency.

    Input Filter (optional)

    Filter conditions for input fields. For example: ds=${bizdate}. Common scenarios for Input Filter:

    • A fixed subset of data.

    • Parameter-based filtering.

    Output Fields

    All fields from the selected table and filtered results. You can manage fields as follows:

    • Field Management: Remove fields you do not need downstream:

      • Delete One Field: Click the sgaga icon in the Actions column to remove a single field.

      • Batch field deletion scenario: Click Field Management, select the fields to remove in the Field Management dialog box, click the image left-moving icon to move them to the unselected list, and click OK.

        image..png

    • Bulk Add: Click Bulk Add to add fields in JSON, TEXT, or DDL format.

      Note

      After you click OK, the bulk-add operation overwrites existing field configurations.

      • Batch configuration in JSON format. For example:

        // Example:
          [{
             "index": 0,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 1,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        The index indicates the column number of the specified object, the name indicates the field name after import, and the type specifies the field type after import. For example, "index":3,"name":"user_id","type":"String" maps the fourth column in the file to a field named user_id with the field type String.

      • You can perform batch configurations in TEXT format. For example:

        // Example:
        0,id,int(10),Long,comment1
        1,user_name,varchar(255),Long,comment2
        • The row delimiter separates each field’s information. By default, it is a line feed (\n). You can also use a semicolon (;) or a period (.).

        • The column delimiter separates field names and types. By default, it is a comma (,). You can also use ','. The field type is optional. If omitted, the default is ','.

      • Add fields in DDL format. Example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Add a New Output Field: Click + Add Output Field. Enter values for Column, Type, and Comment. Select a Mapping Type. Click the image icon to save the row.

  8. Click OK to finish configuring the Amazon RDS for PostgreSQL input component.

上一篇: Configure Amazon Redshift Input Component 下一篇: Configure the Amazon RDS for MySQL Input Component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈