Configure the openGauss input component-Dataphin(Dataphin)-阿里云帮助中心

The openGauss input component reads data from an openGauss data source. To synchronize data from an openGauss data source to another data source, configure the openGauss input component to specify the source data, and then configure the destination data source.

Prerequisites

An openGauss data source is created. For more information, see Create an openGauss data source.
The account that you use to configure the openGauss input component must have read-through permissions on the data source. If the account lacks the required permissions, request them. For more information, see Request, renew, and return data source permissions.

Procedure

On the Dataphin home page, choose Develop > Data Integration from the top menu bar.
In the top menu bar of the Data Integration page, select a Project. If you are in Dev-Prod mode, you must also select an environment.
In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the batch pipeline that you want to develop to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the navigation pane on the left of the Component Library panel, select Input. Find the openGauss component in the list of input components on the right and drag it to the canvas.
Click the icon on the openGauss input component card to open the openGauss Input Configuration dialog box.

In the openGauss Input Configuration dialog box, configure the parameters.

Parameter	Description
Step Name	The name of the openGauss input component. Dataphin automatically generates a step name, which you can change as needed. The naming conventions are as follows: Can contain only Chinese characters, letters, underscores (_), and numbers. Cannot exceed 64 characters in length.
Datasource	Lists all openGauss data sources in the current Dataphin project, including those for which you have read-through permissions and those for which you do not. Click the icon to copy the current data source name. For data sources where you lack read-through permissions, click Request next to the data source to request the permissions. For more information, see Request, renew, and return data source permissions. If you do not have an openGauss data source, click Create to create one. For more information, see Create an openGauss data source.
Schema	Cross-schema table reads are supported. Select the schema where the source table is located.
Number of source tables	Select the number of source tables. Valid values: Single table and Multiple tables. Single table: Synchronizes data from one source table to one destination table. Multiple tables: Synchronizes data from multiple source tables to a single destination table. A union algorithm is used to merge data from the source tables. For more information about unions, see Intersection, Union, and Except.
Table match pattern	Select General rule or Database regex. Note This parameter is available only when Number of source tables is set to Multiple tables.
Table	Select the source table or tables: If Number of source tables is set to Single table, you can enter a keyword to search for the table name, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the icon to copy the name of the selected table. If Number of source tables is set to Multiple tables, enter an expression to add tables based on the selected match pattern. If Table match pattern is set to General rule, enter an expression in the input box to filter for tables with the same structure. The system supports enumerations, regular expression-like patterns, and a mix of both. For example: `table_[001-100];table_102;`. If Table match pattern is set to Database regex, enter a regular expression supported by the current database. The system uses this expression to match tables in the destination database. At runtime, the node matches the new range of tables based on the database regular expression for synchronization. After you enter the expression, click Exact Match to view a list of matched tables in the Confirm Match Details dialog box.
Split key	An integer column in the source table used as the split key. A primary key or an indexed column is recommended. The system partitions data based on the split key to enable concurrent reads, which improves synchronization efficiency.
Batch read size	The number of records to read per batch. Setting a batch size, such as 1024 records, reduces the number of interactions with the data source, improves I/O efficiency, and lowers network latency.
Input filter	The filter conditions for the input data. For example, `ds=${bizdate}`. The Input filter applies to the following scenarios: Filtering a fixed portion of data. Parameter-based filtering.
Output fields	Displays all fields from the selected tables that match the filter conditions. The following operations are supported: Field management: Remove fields that you do not need to pass to a downstream component: To delete a single field: Click the icon in the Actions column. To delete multiple fields in a batch: Click Field Management. In the Field Management dialog box, select multiple fields, click the left arrow icon to move the selected input fields to the unselected input fields list, and then click OK. Batch add: Click Batch Add to configure fields in batch using JSON, TEXT, or DDL format. Note After you add fields in batch and click OK, the existing field configuration is overwritten. To configure in JSON format, for example: `// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note index specifies the column number of the object. name specifies the field name after import. type specifies the field type after import. For example, `"index":3,"name":"user_id","type":"String"` means that the fourth column of the file is imported with the field name user_id and the field type String. To configure in TEXT format, for example: `// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2` The row delimiter separates the information for each field. The default delimiter is a line feed (\n). Semicolons (;) and periods (.) are also supported. The column delimiter separates field names from field types. The default is a half-width comma (,). It supports`','`. The field type is optional and defaults to`','`. To configure in DDL format, for example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Create an output field: Click + Create Output Field. Enter the Column, Type, and Comment, and select the Mapping Type. Click the icon to save.

Click Confirm to save the configuration of the openGauss input component.