Configure the openGauss input component

更新时间:
复制 MD 格式

The openGauss input component reads data from an openGauss data source. To synchronize data from an openGauss data source to another data source, configure the openGauss input component to specify the source data, and then configure the destination data source.

Prerequisites

Procedure

  1. On the Dataphin home page, choose Develop > Data Integration from the top menu bar.

  2. In the top menu bar of the Data Integration page, select a Project. If you are in Dev-Prod mode, you must also select an environment.

  3. In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the batch pipeline that you want to develop to open its configuration page.

  4. In the upper-right corner of the page, click Component Library to open the Component Library panel.

  5. In the navigation pane on the left of the Component Library panel, select Input. Find the openGauss component in the list of input components on the right and drag it to the canvas.

  6. Click the image icon on the openGauss input component card to open the openGauss Input Configuration dialog box.

  7. In the openGauss Input Configuration dialog box, configure the parameters.

    Parameter

    Description

    Step Name

    The name of the openGauss input component. Dataphin automatically generates a step name, which you can change as needed. The naming conventions are as follows:

    • Can contain only Chinese characters, letters, underscores (_), and numbers.

    • Cannot exceed 64 characters in length.

    Datasource

    Lists all openGauss data sources in the current Dataphin project, including those for which you have read-through permissions and those for which you do not. Click the image icon to copy the current data source name.

    Schema

    Cross-schema table reads are supported. Select the schema where the source table is located.

    Number of source tables

    Select the number of source tables. Valid values: Single table and Multiple tables.

    • Single table: Synchronizes data from one source table to one destination table.

    • Multiple tables: Synchronizes data from multiple source tables to a single destination table. A union algorithm is used to merge data from the source tables.

      For more information about unions, see Intersection, Union, and Except.

    Table match pattern

    Select General rule or Database regex.

    Note

    This parameter is available only when Number of source tables is set to Multiple tables.

    Table

    Select the source table or tables:

    • If Number of source tables is set to Single table, you can enter a keyword to search for the table name, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the image icon to copy the name of the selected table.

    • If Number of source tables is set to Multiple tables, enter an expression to add tables based on the selected match pattern.

      • If Table match pattern is set to General rule, enter an expression in the input box to filter for tables with the same structure. The system supports enumerations, regular expression-like patterns, and a mix of both. For example: table_[001-100];table_102;.

      • If Table match pattern is set to Database regex, enter a regular expression supported by the current database. The system uses this expression to match tables in the destination database. At runtime, the node matches the new range of tables based on the database regular expression for synchronization.

      After you enter the expression, click Exact Match to view a list of matched tables in the Confirm Match Details dialog box.

    Split key

    An integer column in the source table used as the split key. A primary key or an indexed column is recommended. The system partitions data based on the split key to enable concurrent reads, which improves synchronization efficiency.

    Batch read size

    The number of records to read per batch. Setting a batch size, such as 1024 records, reduces the number of interactions with the data source, improves I/O efficiency, and lowers network latency.

    Input filter

    The filter conditions for the input data. For example, ds=${bizdate}. The Input filter applies to the following scenarios:

    • Filtering a fixed portion of data.

    • Parameter-based filtering.

    Output fields

    Displays all fields from the selected tables that match the filter conditions. The following operations are supported:

    • Field management: Remove fields that you do not need to pass to a downstream component:

      • To delete a single field: Click the sgaga icon in the Actions column.

      • To delete multiple fields in a batch: Click Field Management. In the Field Management dialog box, select multiple fields, click the image left arrow icon to move the selected input fields to the unselected input fields list, and then click OK.

        image..png

    • Batch add: Click Batch Add to configure fields in batch using JSON, TEXT, or DDL format.

      Note

      After you add fields in batch and click OK, the existing field configuration is overwritten.

      • To configure in JSON format, for example:

        // Example:
          [{
             "index": 1,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 2,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        index specifies the column number of the object. name specifies the field name after import. type specifies the field type after import. For example, "index":3,"name":"user_id","type":"String" means that the fourth column of the file is imported with the field name user_id and the field type String.

      • To configure in TEXT format, for example:

        // Example:
        1,id,int(10),Long,comment1
        2,user_name,varchar(255),Long,comment2
        • The row delimiter separates the information for each field. The default delimiter is a line feed (\n). Semicolons (;) and periods (.) are also supported.

        • The column delimiter separates field names from field types. The default is a half-width comma (,). It supports','. The field type is optional and defaults to','.

      • To configure in DDL format, for example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Create an output field: Click + Create Output Field. Enter the Column, Type, and Comment, and select the Mapping Type. Click the image icon to save.

  8. Click Confirm to save the configuration of the openGauss input component.