Configure the AnalyticDB for PostgreSQL Input Component
The AnalyticDB for PostgreSQL input component reads data from an AnalyticDB for PostgreSQL data source for synchronization to a destination. Configure this component to specify the source before configuring the destination.
Prerequisites
-
An AnalyticDB for PostgreSQL data source has been created. For more information, see Create an AnalyticDB for PostgreSQL data source.
-
The account that configures this component has read-through permission on the data source. To request permission, see Request data source permissions.
Procedure
-
In the top menu bar of the Dataphin homepage, choose Development > Data Integration.
-
In the top menu bar of the integration page, select a project. In Dev-Prod mode, also select an environment.
-
In the left navigation pane, click Offline Integration. In the Offline Integration list, click the target offline pipeline to open its configuration page.
-
In the upper-right corner, click Component Library to open the Component Library panel.
-
In the Component Library panel, select Input, find AnalyticDB for PostgreSQL, and drag it onto the canvas.
-
Click the
icon on the component card to open the AnalyticDB for PostgreSQL Input Configuration dialog box. -
In the AnalyticDB for PostgreSQL Input Configuration dialog box, configure the following parameters.
Parameter
Description
Step Name
The component name. Dataphin generates a default name that you can modify. Naming rules:
-
Can contain only Chinese characters, letters, underscores (_), and digits.
-
Must be no longer than 64 characters.
Datasource
Lists all AnalyticDB for PostgreSQL data sources and project-level resources in the current Dataphin instance, regardless of read-through permission. Click the
icon to copy the data source name.For data sources without read-through permission, click Request to apply. Request data source permissions.
To create a data source, click Create Data Source. For more information, see Create an AnalyticDB for PostgreSQL data source.
Time Zone
Time-format data is processed in the current time zone. Defaults to the time zone of the selected data source and cannot be changed.
NoteFor tasks created before V5.1.2, choose Data Source Default Configuration or Channel Configuration Time Zone. Default: Channel Configuration Time Zone.
-
Data Source Default Configuration: The default time zone of the selected data source.
-
Channel Configuration Time Zone: The time zone configured under Properties > Channel Configuration for the current integration task.
Schema (optional)
Select a schema for cross-schema table selection. Defaults to the schema configured in the data source.
Source Table Quantity
Select the number of source tables. Options: Single Table and Multiple Tables:
-
Single Table: Synchronize data from one source table to one destination table.
-
Multiple Tables: Synchronize data from multiple source tables into one destination table using the union algorithm.
Table Matching Method
Choose between General Rule and Database Regex.
NoteAvailable only when Source Table Quantity is set to Multiple Tables.
Table
Select the source table(s):
-
If Source Table Quantity is Single Table, search by keyword or enter the exact name and click Exact Search. The system checks the table status automatically. Click the
icon to copy the table name. -
If Source Table Quantity is Multiple Tables, enter expressions based on the table matching method:
-
General Rule: Enter an expression to filter tables with identical structures. Supports enumeration, regex-like patterns, or a mix. Example:
table_[001-100];table_102;. -
Database Regex: Enter a database-supported regular expression. The system matches tables based on this regex. At runtime, new tables matching the pattern are dynamically included.
Click Exact Search to view matched tables in the Confirm Matching Details dialog box.
-
Split Key (optional)
The field used for data partitioning with concurrent reading. Any source table column can be used. For optimal performance, use a primary key or indexed column.
ImportantFor datetime types, the system performs brute-force splitting based on the min/max time range and concurrency level. This does not guarantee even distribution.
Batch Read Size (optional)
The number of records read per batch. Set a batch size (for example, 1024) to reduce data source interactions, improve I/O efficiency, and lower network latency.
Input Filter (optional)
Filtering conditions for data extraction:
-
Static value:
ds=20210101. -
Variable parameter:
ds=${bizdate}.
Output Fields
Displays all fields from the selected table(s) matching the filter conditions. Available operations:
-
Field Management: Delete fields to exclude them from downstream processing:
-
Delete a single field: Click the
icon in the Actions column. -
Batch deletion: Click Field Management, select fields in the Field Management dialog box, click the
icon to move them to the unselected list, and click OK.
-
-
Bulk Add: Click Bulk Add to add fields in JSON, TEXT, or DDL format.
NoteAfter clicking OK, existing field configuration is overwritten.
-
JSON format example:
// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]NoteThe index specifies the column number, name sets the field name, and type defines the field type. Example:
"index":3,"name":"user_id","type":"String"imports the fourth column as user_id with type String. -
TEXT format example:
// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2-
Row delimiter: line feed (\n) by default. Semicolon (;) and period (.) are also supported.
-
Column delimiter: comma (
',') by default. Field types are optional and default to','.
-
-
DDL format example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
-
-
Create Output Field: Click + Create Output Field, specify Column, Type, Comment, and Mapping Type, then click the
icon to save.
-
-
Click OK to complete the AnalyticDB for PostgreSQL input component configuration.