Configure GBase 8c Input Component
The GBase 8c input component reads data from a GBase 8c data source. To synchronize data from GBase 8c to another data source, configure this input component to specify the source, and then configure the destination data source.
Prerequisites
-
You have created a GBase 8c data source. For detailed instructions, see .
-
The account used to configure GBase 8c input component properties must have read-through permission on the data source. If you lack this permission, request it. For more information, see Request, renew, and revoke data source permissions.
Procedure
-
On the Dataphin homepage, in the top menu bar, select Development > Data Integration.
-
On the Integration page, select Project from the top menu bar. If you are in Dev-Prod mode, select an environment.
-
In the navigation pane on the left, click Batch Pipeline. From the Batch Pipeline list, click the Offline Pipeline to develop. This opens its configuration page.
-
In the upper-right corner of the page, click Component Library to open the Component Library panel.
-
In the left navigation pane of the Component Library panel, select Input. In the input component list on the right, locate the GBase 8c component and drag it to the canvas.
-
Click the
icon on the GBase 8c input component card to open the GBase 8c Input Configuration dialog box. -
In the GBase 8c Input Configuration dialog box, configure the following parameters.
Parameter
Description
Step Name
The name of the GBase 8c input component. Dataphin auto-generates this name, but you can modify it. Naming conventions:
-
Can contain only Chinese characters, letters, underscores (_), and numbers.
-
Cannot exceed 64 characters in length.
Datasource
The drop-down list shows all GBase 8c data sources, regardless of whether you have read-through permission. Click the
icon to copy the data source name.-
For data sources without read-through permission, click Apply next to the data source to request permission. For more information, see Apply for, Renew, and Return Data Source Permissions.
-
If you do not have a GBase 8c data source, click New Data Source to create a data source. For more information, see .
Schema (Optional)
Allows selecting tables across schemas. Select the schema where the table resides. If not specified, the system uses the schema configured in the data source.
Source Table Quantity
Specify whether to read from one table or multiple tables. Options: Single Table and Multiple Tables:
-
Single Table: Use this for scenarios where you synchronize business data from one table to one destination table.
-
Multiple Tables: Use this for scenarios where you synchronize business data from multiple tables to the same destination table. When writing data from multiple tables to the same data table, the system uses a union algorithm.
For more information about UNION, see .
Table Matching Method
Select General Rules or Database Regular Expression.
NoteConfigure this option only when you select Multiple Tables for Source Table Quantity.
Table
Select the source table:
-
If you select Single Table for Source Table Quantity, enter table name keywords to search, or enter the exact table name and click Precise Search. After selecting a table, the system automatically detects its status. Click the
icon to copy the name of the selected table. -
If you select Multiple Tables for Source Table Quantity, enter different expressions to add tables based on the table matching method.
-
If you select General Rules for the table matching method, enter table expressions in the input box to filter for tables with the same structure. The system supports enumeration, regular expression-like, and mixed forms. For example,
table_[001-100];table_102;. -
If you select Database Regular Expression for the table matching method, enter the regular expression supported by the current database in the input box. The system matches tables in the destination database based on this regular expression. During runtime, the system immediately matches new table ranges for synchronization based on the database regular expression.
After entering the expression, click Precise Search to view the list of matched tables in the Confirm Match Details dialog box.
-
Split Key (Optional)
The system partitions data based on the split key field to enable concurrent reads. You can use any column from the source table as the split key. For optimal transfer performance, use a primary key or indexed column.
ImportantWhen selecting a date/time type, the system performs a brute-force split based on the total time range and concurrency by identifying the maximum and minimum values. This does not guarantee an even distribution.
Batch Read Count (Optional)
The number of records read per batch. Setting a batch size (for example, 1024) instead of reading one record at a time reduces round trips to the data source and improves I/O efficiency.
Input Filter (Optional)
Enter a filter condition for the input data, such as
ds=${bizdate}. Input Filter applies to the following scenarios:-
A fixed subset of data.
-
Parameter filtering.
Output Fields
Displays all fields from the selected table that match the filter conditions. Supported operations:
-
Field Management: If you do not need to output certain fields to downstream components, delete the corresponding fields:
-
Delete Single Field: To delete a few fields, click the
icon in the Actions column to remove unnecessary fields. -
Batch Delete Fields: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, then click the
left-move icon to move the selected input fields to the unselected input fields, and click OK to complete the batch deletion of fields.
-
-
Batch Add: Click Batch Add. This supports batch configuration in JSON, TEXT, and DDL formats.
NoteAfter batch addition is complete, clicking OK overwrites the configured field information.
-
Batch configure in JSON format. For example:
// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]NoteIndex indicates the column number of the specified object, name indicates the field name after import, and type indicates the field type after import.
For example,
"index":3,"name":"user_id","type":"String"means to import the fourth column from the file, with the field name user_id and field type String. -
Batch configure in TEXT format. For example:
// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2-
The row delimiter separates information for each field. The default is a line feed (\n). It supports line feed (\n), semicolon (;), and period (.).
-
The column delimiter separates the field name and field type. The default is a comma (,). It supports
','. The field type is optional; the default is','.
-
-
Batch configure in DDL format. For example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
-
-
New Output Field: Click +New Output Field. Fill in Column, Type, and Remarks as prompted on the page, and select Mapping Type. After configuring the current row, click the
icon to save.
-
-
Click OK to complete the property configuration for the GBase 8c input component.