Configure the PostgreSQL input component
The PostgreSQL input component reads data from a PostgreSQL data source. To sync data from a PostgreSQL source to another destination, configure this input component first, and then configure the destination component.
Prerequisites
-
A PostgreSQL data source has been added. For more information, see Create a PostgreSQL data source.
-
Configuring the PostgreSQL input component requires an account with read-through permission on the data source. If you do not have this permission, apply for data source permissions. For more information, see Apply for, renew, or release data source permissions.
Procedure
-
You can select Developer > Data Integration from the top menu bar on the Dataphin homepage.
-
On the Integration page, in the top menu bar, select a Project. In Dev-Prod mode, you must also select an environment.
-
In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.
-
Click Component Library in the upper-right corner of the page. This opens the Component Library panel.
-
You can select Input in the left navigation pane of the Component Library panel. Then, locate the PostgreSQL component in the input component list on the right and drag it to the canvas.
-
Click the
icon on the PostgreSQL input component card. This opens the PostgreSQL Input Configuration dialog box. -
In the PostgreSQL Input Configuration dialog box, configure the following parameters.
Parameter
Description
Step Name
The name of the PostgreSQL input component. Dataphin automatically generates a default name, which you can modify. Naming rules:
-
Can contain only Chinese characters, letters, underscores (_), and numbers.
-
Length cannot exceed 64 characters.
Datasource
Lists all PostgreSQL data sources, including those you have read-through permissions for and those you do not. Click the
icon to copy the current data source name.-
For data sources without read-through permission, click Request after the data source to request read permission. For details, see Request, renew, and return data source permissions.
-
If you don't have a PostgreSQL data source yet, click New Data Source to create one. For details, see Creating a PostgreSQL Data Source.
Time Zone
Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be modified.
NoteFor tasks created before V5.1.2, select Data Source Default Configuration or Channel Configuration Time Zone. The default selection is Channel Configuration Time Zone.
-
Data Source Default Configuration: The default time zone of the selected data source.
-
Channel Configuration Time Zone: The time zone configured for the current integration task in Properties > Channel Configuration.
Schema (Optional)
Allows cross-schema table selection. Select the schema where the table resides. If not specified, the schema configured in the data source is used.
Source Table Quantity
Select Single Table or Multiple Tables:
-
Single Table: Syncs data from one source table to one destination table.
-
Multiple Tables: Syncs data from multiple source tables to the same destination table using the union algorithm.
For more information about UNION, see INTERSECT, UNION, and EXCEPT.
Table Matching Method
Select General Class Rule or Database Regex.
NoteThis option applies only when Source Table Quantity is set to Multiple Tables.
Table
Select the source table:
-
If Source Table Quantity is set to Single Table, enter a table name keyword to search. Alternatively, enter the exact table name and click Precise Search. After selecting a table, the system automatically detects its status. Click the
icon to copy the name of the selected table. -
If Source Table Quantity is set to Multiple Tables, add tables by entering different expressions based on the table matching method.
-
If General Rules is selected for the table matching method: In the input box, enter a table expression to filter for tables with the same structure. The system supports enumeration, regex-like, and mixed forms. For example,
table_[001-100];table_102;. -
If Database Regex is selected for the table matching method: In the input box, enter a regular expression supported by the current database. The system matches tables in the destination database based on this regular expression. During runtime, the task immediately matches new table ranges for synchronization based on the database regular expression.
After entering the expression, click Precise Search. This displays a list of matched tables in the Confirm Match Details dialog box.
-
Shard Key (Optional)
Partitions data based on the specified shard key field. Combine with concurrency settings to enable concurrent reads. Use a primary key or indexed column as the shard key for optimal transfer performance.
ImportantWhen you select a datetime type, the system identifies the maximum and minimum values and performs a split based on the total time range and concurrency. This does not guarantee an even distribution.
Batch Read Count (Optional)
The number of records read per batch. Setting a batch size (such as 1024 records) instead of reading one record at a time reduces interactions with the data source and improves I/O efficiency.
Input Filter (Optional)
A filter condition for the input data, for example,
ds=${bizdate}. Input Filtering applies to the following scenarios:-
A fixed subset of data.
-
Parameter filtering.
Output Fields
Displays all fields in the selected table that match the filter conditions. Supported operations:
-
Field Management: To exclude certain fields from downstream components, delete them:
-
Single Field Deletion Scenario: Click the
icon in the Operation column to remove individual fields. -
Batch field deletion scenario: Click Field Management. In the Field Management dialog box, select the fields to remove, click the
left-moving icon to move them to the unselected list, and click OK.
-
-
Batch Add: Click Batch Add to configure fields in JSON, TEXT, or DDL format.
NoteAfter batch addition, clicking OK overwrites the existing field information.
-
Batch configure in JSON format. For example:
// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]NoteIndex indicates the column number of the specified object. Name indicates the field name after import. Type indicates the field type after import. For example,
"index":3,"name":"user_id","type":"String"means to import the fourth column from the file, with the field name user_id and field type String. -
Batch configure in TEXT format. For example:
// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2-
The row delimiter separates information for each field. The default is a line feed (\n). It supports line feed (\n), semicolon (;), and period (.).
-
The column delimiter separates the field name and field type. The default is a comma (,). It supports
','. The field type is optional, and the default is','.
-
-
Batch configure in DDL format. For example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
-
-
Create Output Field: Click + Create Output Field, and as prompted, specify Column, Type, Description, and select a Mapping Type. After you configure this row, click the
icon to save.
-
-
Click OK to complete the property configuration for the PostgreSQL input component.