Configure the PostgreSQL input component

更新时间: 2026-06-23 10:59:10

The PostgreSQL input component reads data from a PostgreSQL data source. To sync data from a PostgreSQL source to another destination, configure this input component first, and then configure the destination component.

Prerequisites

Procedure

  1. You can select Developer > Data Integration from the top menu bar on the Dataphin homepage.

  2. On the Integration page, in the top menu bar, select a Project. In Dev-Prod mode, you must also select an environment.

  3. In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.

  4. Click Component Library in the upper-right corner of the page. This opens the Component Library panel.

  5. You can select Input in the left navigation pane of the Component Library panel. Then, locate the PostgreSQL component in the input component list on the right and drag it to the canvas.

  6. Click the image icon on the PostgreSQL input component card. This opens the PostgreSQL Input Configuration dialog box.

  7. In the PostgreSQL Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the PostgreSQL input component. Dataphin automatically generates a default name, which you can modify. Naming rules:

    • Can contain only Chinese characters, letters, underscores (_), and numbers.

    • Length cannot exceed 64 characters.

    Datasource

    Lists all PostgreSQL data sources, including those you have read-through permissions for and those you do not. Click the image icon to copy the current data source name.

    Time Zone

    Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be modified.

    Note

    For tasks created before V5.1.2, select Data Source Default Configuration or Channel Configuration Time Zone. The default selection is Channel Configuration Time Zone.

    • Data Source Default Configuration: The default time zone of the selected data source.

    • Channel Configuration Time Zone: The time zone configured for the current integration task in Properties > Channel Configuration.

    Schema (Optional)

    Allows cross-schema table selection. Select the schema where the table resides. If not specified, the schema configured in the data source is used.

    Source Table Quantity

    Select Single Table or Multiple Tables:

    • Single Table: Syncs data from one source table to one destination table.

    • Multiple Tables: Syncs data from multiple source tables to the same destination table using the union algorithm.

      For more information about UNION, see INTERSECT, UNION, and EXCEPT.

    Table Matching Method

    Select General Class Rule or Database Regex.

    Note

    This option applies only when Source Table Quantity is set to Multiple Tables.

    Table

    Select the source table:

    • If Source Table Quantity is set to Single Table, enter a table name keyword to search. Alternatively, enter the exact table name and click Precise Search. After selecting a table, the system automatically detects its status. Click the image icon to copy the name of the selected table.

    • If Source Table Quantity is set to Multiple Tables, add tables by entering different expressions based on the table matching method.

      • If General Rules is selected for the table matching method: In the input box, enter a table expression to filter for tables with the same structure. The system supports enumeration, regex-like, and mixed forms. For example, table_[001-100];table_102;.

      • If Database Regex is selected for the table matching method: In the input box, enter a regular expression supported by the current database. The system matches tables in the destination database based on this regular expression. During runtime, the task immediately matches new table ranges for synchronization based on the database regular expression.

      After entering the expression, click Precise Search. This displays a list of matched tables in the Confirm Match Details dialog box.

    Shard Key (Optional)

    Partitions data based on the specified shard key field. Combine with concurrency settings to enable concurrent reads. Use a primary key or indexed column as the shard key for optimal transfer performance.

    Important

    When you select a datetime type, the system identifies the maximum and minimum values and performs a split based on the total time range and concurrency. This does not guarantee an even distribution.

    Batch Read Count (Optional)

    The number of records read per batch. Setting a batch size (such as 1024 records) instead of reading one record at a time reduces interactions with the data source and improves I/O efficiency.

    Input Filter (Optional)

    A filter condition for the input data, for example, ds=${bizdate}. Input Filtering applies to the following scenarios:

    • A fixed subset of data.

    • Parameter filtering.

    Output Fields

    Displays all fields in the selected table that match the filter conditions. Supported operations:

    • Field Management: To exclude certain fields from downstream components, delete them:

      • Single Field Deletion Scenario: Click the sgaga icon in the Operation column to remove individual fields.

      • Batch field deletion scenario: Click Field Management. In the Field Management dialog box, select the fields to remove, click the image left-moving icon to move them to the unselected list, and click OK.

        image..png

    • Batch Add: Click Batch Add to configure fields in JSON, TEXT, or DDL format.

      Note

      After batch addition, clicking OK overwrites the existing field information.

      • Batch configure in JSON format. For example:

        // Example:
          [{
             "index": 0,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 1,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        Index indicates the column number of the specified object. Name indicates the field name after import. Type indicates the field type after import. For example, "index":3,"name":"user_id","type":"String" means to import the fourth column from the file, with the field name user_id and field type String.

      • Batch configure in TEXT format. For example:

        // Example:
        0,id,int(10),Long,comment1
        1,user_name,varchar(255),Long,comment2
        • The row delimiter separates information for each field. The default is a line feed (\n). It supports line feed (\n), semicolon (;), and period (.).

        • The column delimiter separates the field name and field type. The default is a comma (,). It supports ','. The field type is optional, and the default is ','.

      • Batch configure in DDL format. For example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Create Output Field: Click + Create Output Field, and as prompted, specify Column, Type, Description, and select a Mapping Type. After you configure this row, click the image icon to save.

  8. Click OK to complete the property configuration for the PostgreSQL input component.

上一篇: Configure PolarDB-X (formerly DRDS) input component 下一篇: Configure the Microsoft SQL Server Input Component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈