Configure the Amazon RDS for PostgreSQL Input Component
The Amazon RDS for PostgreSQL input component reads data from an Amazon RDS for PostgreSQL data source. To synchronize data from Amazon RDS for PostgreSQL to another data source, configure the input component to specify the source, and then configure the destination data source.
Prerequisites
-
You have created an Amazon RDS for PostgreSQL data source. For more information, see Create an Amazon RDS for PostgreSQL Data Source.
-
The account used to configure the input component must have read-through permission on the data source. If it does not, request permission. For more information, see Request, Renew, or Release Data Source Permissions.
Procedure
-
On the Dataphin homepage, in the top menu bar, click Develop, and then click Data Integration.
-
On the Integration page, in the top menu bar, select a Project. If you are using Dev-Prod mode, also select an environment.
-
In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop. The configuration page for that offline pipeline opens.
-
In the upper-right corner of the page, click Component Library. The Component Library panel opens.
-
In the left navigation pane of the Component Library panel, click Input. In the list of input components on the right, locate the Amazon RDS for PostgreSQL component, and drag it onto the canvas.
-
Click the
icon on the Amazon RDS for PostgreSQL input component card to open the Amazon RDS for PostgreSQL Input Configuration dialog box. -
In the Amazon RDS for PostgreSQL Input Configuration dialog box, configure the following parameters.
Parameter
Description
Step Name
The name of the input component. Dataphin generates a default step name that you can change. The name must follow these rules:
-
Use only Chinese characters, letters, underscores (_), and digits.
-
Keep the name no longer than 64 characters.
Datasource
Lists all Amazon RDS for PostgreSQL data sources, including those you have read-through permission for and those you do not. Click the
icon to copy the data source name.-
If you do not have read-through permission for a data source, click Request next to the data source to request read permission. For more information, see Request, Renew, or Release Data Source Permissions.
-
If you do not have an Amazon RDS for PostgreSQL data source, click Create Data Source to create one. For more information, see Create an Amazon Redshift Data Source.
Schema (optional)
The schema where the table resides. Selecting a schema lets you access tables across schemas. If not specified, the schema configured in the data source is used.
Source Table Count
The number of source tables. Options are Single Table and Multiple Tables:
-
Single Table: Syncs data from one source table to one destination table.
-
Multiple Tables: Syncs data from multiple source tables to one destination table using the union algorithm.
For more information about union, see .
Table Matching Method
Select either Generic Rules or Database Regex.
NoteThis setting is available only when you select Multiple Tables for Source Table Count.
Table
Select the source table:
-
If you selected Single Table for Source Table Count, search by entering a keyword in the table name field. Or enter the exact table name and click Exact Match. After you select a table, the system detects its status automatically. Click the
icon to copy the name of the selected table. -
If you selected Multiple Tables for Source Table Count, enter an expression based on the table matching method.
-
If you selected Generic Rules for table matching, enter an expression in the field to filter tables with the same structure. The system supports enumeration, regex-like syntax, or a mix of both. For example:
table_[001-100];table_102;. -
If you selected Database Regex for table matching, enter a regular expression supported by the database. The system matches tables in the destination database using this regex. At runtime, the task matches new tables dynamically based on the regex.
After you enter the expression, click Exact Match. In the Confirm Match Details dialog box, view the list of matched tables.
-
Split Key (optional)
The column used to partition data for concurrent reads. Pair this with concurrency settings to read data in parallel. Any column from the source table can serve as the split key. For best performance, use a primary key or an indexed column.
ImportantIf you select a date-time type, the system performs brute-force partitioning based on the full time range and concurrency setting. This does not guarantee even distribution.
Batch Read Size (optional)
The number of records to read per batch. Setting a batch size (for example, 1024) reduces round trips to the data source, improves I/O efficiency, and lowers network latency.
Input Filter (optional)
Filter conditions for input fields. For example:
ds=${bizdate}. Common scenarios for Input Filter:-
A fixed subset of data.
-
Parameter-based filtering.
Output Fields
All fields from the selected table and filtered results. You can manage fields as follows:
-
Field Management: Remove fields you do not need downstream:
-
Delete One Field: Click the
icon in the Actions column to remove a single field. -
Batch field deletion scenario: Click Field Management, select the fields to remove in the Field Management dialog box, click the
left-moving icon to move them to the unselected list, and click OK.
-
-
Bulk Add: Click Bulk Add to add fields in JSON, TEXT, or DDL format.
NoteAfter you click OK, the bulk-add operation overwrites existing field configurations.
-
Batch configuration in JSON format. For example:
// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]NoteThe
indexindicates the column number of the specified object, thenameindicates the field name after import, and thetypespecifies the field type after import. For example,"index":3,"name":"user_id","type":"String"maps the fourth column in the file to a field nameduser_idwith the field typeString. -
You can perform batch configurations in TEXT format. For example:
// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2-
The row delimiter separates each field’s information. By default, it is a line feed (\n). You can also use a semicolon (;) or a period (.).
-
The column delimiter separates field names and types. By default, it is a comma (,). You can also use
','. The field type is optional. If omitted, the default is','.
-
-
Add fields in DDL format. Example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
-
-
Add a New Output Field: Click + Add Output Field. Enter values for Column, Type, and Comment. Select a Mapping Type. Click the
icon to save the row.
-
-
Click OK to finish configuring the Amazon RDS for PostgreSQL input component.