Configure the TDSQL for PostgreSQL input component

更新时间:
复制 MD 格式

The TDSQL for PostgreSQL input component reads data from a TDSQL for PostgreSQL data source. To synchronize data from a TDSQL for PostgreSQL data source to another, first configure the input component to read from the source, and then configure the destination component.

Prerequisites

  • You have created a TDSQL for PostgreSQL data source. For more information, see .

  • Your account must have read permission for data synchronization on the data source. If not, apply for it. For more information, see Apply for data source permissions.

Procedure

  1. In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.

  2. In the top menu bar on the Integration page, select Project (in Dev-Prod mode, you need to select an Environment).

  3. In the left-side navigation pane, click Batch Integration. In the Batch Integration list, click the batch pipeline that you want to develop to open its configuration page.

  4. In the upper-right corner of the page, click Component library to open the Component library panel.

  5. In the Component library panel, select Input from the left-side navigation pane. Find the TDSQL for PostgreSQL component in the list on the right and drag it to the canvas.

  6. On the TDSQL for PostgreSQL input component card, click the image icon to open the TDSQL for PostgreSQL input configuration dialog box.

  7. In the TDSQL for PostgreSQL input configuration dialog box, configure the parameters.

    Parameter

    Description

    Step name

    The name of the TDSQL for PostgreSQL input component. Dataphin automatically generates a name, which you can change. The name must meet the following requirements:

    • Can contain only Chinese characters, letters, underscores (_), and digits.

    • Must be no more than 64 characters long.

    Data source

    Lists all TDSQL for PostgreSQL data sources in Dataphin, including those with and without read permission for data synchronization. Click the image icon to copy the data source name.

    • For a data source that you do not have permission for, click Apply next to it to request read permission for data synchronization. For more information, see Apply for data source permissions.

    • If no TDSQL for PostgreSQL data source is available, click Create new data source in the drop-down list to create one. For more information, see .

    Schema

    Select the schema of the source table. You can select tables from different schemas. If you do not specify a schema, Dataphin uses the one configured in the data source by default.

    Source table mode

    Select the source table mode. Options include Single table and Multiple tables.

    • Single table: Synchronizes data from a single source table to a single destination table.

    • Multiple tables: Synchronizes data from multiple source tables to a single destination table. The system combines the data with a union operation.

      For more information about union operations, see INTERSECT, UNION, and EXCEPT.

    Table matching mode

    Select General rule or Database regular expression.

    Note

    This parameter is available only when Source table mode is set to Multiple tables.

    Table

    Select the source table or tables.

    • If you set Source table mode to Single table, you can enter a keyword to search for the table, or enter the exact table name and click Exact match. After you select a table, the system automatically checks its status. Click the image icon to copy the name of the selected table.

    • If you set Source table mode to Multiple tables, you can use expressions to add tables based on the selected table matching mode.

      • If Table matching mode is set to General rule, enter an expression in the input box to filter for tables with the same structure. The system supports enumerated lists, regular expression-like patterns, and a combination of both. For example: table_[001-100];table_102;.

      • If Table matching mode is set to Database regular expression, enter a regular expression supported by the current database. The system uses this expression to match tables in the database. When the task runs, it dynamically matches the latest table scope and synchronizes new tables accordingly.

      After entering the expression, click Exact match to view the list of matched tables in the Confirm match details dialog box.

    Split key (Optional)

    The column used to partition data for parallel reading. The system splits data into chunks based on this key, which works with the concurrency setting to enable parallel reads. For optimal performance, use a primary key or an indexed column.

    Important

    When you select a date or time column, the system identifies the minimum and maximum values and performs a brute-force split based on the total time range and concurrency. The resulting data chunks may not be equal in size.

    Fetch size (Optional)

    The number of records to read from the database in a single batch. A larger fetch size reduces round trips to the data source, improving I/O efficiency and reducing network latency. For example, you can set the fetch size to 1024.

    Filter (Optional)

    Filter source data by specifying an expression supported by TDSQL for PostgreSQL, such as ds=${bizdate}. The Filter parameter is typically used in the following scenarios:

    • To read a specific, fixed subset of data.

    • To filter data dynamically by using parameters.

    Output fields

    Displays all fields from the selected tables that match the filter criteria. You can perform the following operations:

    • Field management: Remove fields you do not need to pass to downstream components:

      • To delete a single field: Click the sgaga icon in the Actions column to remove the field.

      • To delete multiple fields in bulk: Click Field management. In the Field management dialog box, select the fields that you want to remove, click the image left arrow icon to move them to the unselected list, and then click OK.

        image..png

    • Batch add: Click Batch add to configure fields in bulk by using JSON, TEXT, or DDL format.

      Note

      Adding fields in bulk overwrites all existing field configurations.

      • To add fields in bulk using the JSON format, use the following example:

        // Example:
          [{
             "index": 0,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 1,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        In the JSON structure, index is the column number, name is the field name, and type is the field type.

        For example, "index":3,"name":"user_id","type":"String" imports the fourth column from the file, names the field user_id, and sets the field type to String.

      • To add fields in bulk using the TEXT format, use the following example:

        // Example:
        0,id,int(10),Long,comment1
        1,user_name,varchar(255),Long,comment2
        • The row delimiter separates the information for each field. The default is a newline character (\n). Semicolons (;) and periods (.) are also supported.

        • The column delimiter separates field names from field types and defaults to a half-width comma (,). The comma character ',' is supported, the field type is optional, and the delimiter defaults to ','.

      • To add fields in bulk using the DDL format, use the following example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Add output field: Click + Add output field. Enter the Field, Type, and Comment, and select a Mapping Type as prompted. After you configure the row, click the image icon to save.

  8. Click Confirm to complete the configuration of the TDSQL for PostgreSQL input component.