Configure the Amazon RDS for PostgreSQL Input Component-Dataphin(Dataphin)-阿里云帮助中心

The Amazon RDS for PostgreSQL input component reads data from an Amazon RDS for PostgreSQL data source. To synchronize data from Amazon RDS for PostgreSQL to another data source, configure the input component to specify the source, and then configure the destination data source.

Prerequisites

You have created an Amazon RDS for PostgreSQL data source. For more information, see Create an Amazon RDS for PostgreSQL Data Source.
The account used to configure the input component must have read-through permission on the data source. If it does not, request permission. For more information, see Request, Renew, or Release Data Source Permissions.

Procedure

On the Dataphin homepage, in the top menu bar, click Develop, and then click Data Integration.
On the Integration page, in the top menu bar, select a Project. If you are using Dev-Prod mode, also select an environment.
In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop. The configuration page for that offline pipeline opens.
In the upper-right corner of the page, click Component Library. The Component Library panel opens.
In the left navigation pane of the Component Library panel, click Input. In the list of input components on the right, locate the Amazon RDS for PostgreSQL component, and drag it onto the canvas.
Click the icon on the Amazon RDS for PostgreSQL input component card to open the Amazon RDS for PostgreSQL Input Configuration dialog box.

In the Amazon RDS for PostgreSQL Input Configuration dialog box, configure the following parameters.

Parameter	Description
Step Name	The name of the input component. Dataphin generates a default step name that you can change. The name must follow these rules: Use only Chinese characters, letters, underscores (_), and digits. Keep the name no longer than 64 characters.
Datasource	Lists all Amazon RDS for PostgreSQL data sources, including those you have read-through permission for and those you do not. Click the icon to copy the data source name. If you do not have read-through permission for a data source, click Request next to the data source to request read permission. For more information, see Request, Renew, or Release Data Source Permissions. If you do not have an Amazon RDS for PostgreSQL data source, click Create Data Source to create one. For more information, see Create an Amazon Redshift Data Source.
Schema (optional)	The schema where the table resides. Selecting a schema lets you access tables across schemas. If not specified, the schema configured in the data source is used.
Source Table Count	The number of source tables. Options are Single Table and Multiple Tables: Single Table: Syncs data from one source table to one destination table. Multiple Tables: Syncs data from multiple source tables to one destination table using the union algorithm. For more information about union, see .
Table Matching Method	Select either Generic Rules or Database Regex. Note This setting is available only when you select Multiple Tables for Source Table Count.
Table	Select the source table: If you selected Single Table for Source Table Count, search by entering a keyword in the table name field. Or enter the exact table name and click Exact Match. After you select a table, the system detects its status automatically. Click the icon to copy the name of the selected table. If you selected Multiple Tables for Source Table Count, enter an expression based on the table matching method. If you selected Generic Rules for table matching, enter an expression in the field to filter tables with the same structure. The system supports enumeration, regex-like syntax, or a mix of both. For example: `table_[001-100];table_102;`. If you selected Database Regex for table matching, enter a regular expression supported by the database. The system matches tables in the destination database using this regex. At runtime, the task matches new tables dynamically based on the regex. After you enter the expression, click Exact Match. In the Confirm Match Details dialog box, view the list of matched tables.
Split Key (optional)	The column used to partition data for concurrent reads. Pair this with concurrency settings to read data in parallel. Any column from the source table can serve as the split key. For best performance, use a primary key or an indexed column. Important If you select a date-time type, the system performs brute-force partitioning based on the full time range and concurrency setting. This does not guarantee even distribution.
Batch Read Size (optional)	The number of records to read per batch. Setting a batch size (for example, 1024) reduces round trips to the data source, improves I/O efficiency, and lowers network latency.
Input Filter (optional)	Filter conditions for input fields. For example: `ds=${bizdate}`. Common scenarios for Input Filter: A fixed subset of data. Parameter-based filtering.
Output Fields	All fields from the selected table and filtered results. You can manage fields as follows: Field Management: Remove fields you do not need downstream: Delete One Field: Click the icon in the Actions column to remove a single field. Batch field deletion scenario: Click Field Management, select the fields to remove in the Field Management dialog box, click the left-moving icon to move them to the unselected list, and click OK. Bulk Add: Click Bulk Add to add fields in JSON, TEXT, or DDL format. Note After you click OK, the bulk-add operation overwrites existing field configurations. Batch configuration in JSON format. For example: `// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note The `index` indicates the column number of the specified object, the `name` indicates the field name after import, and the `type` specifies the field type after import. For example, `"index":3,"name":"user_id","type":"String"` maps the fourth column in the file to a field named `user_id` with the field type `String`. You can perform batch configurations in TEXT format. For example: `// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2` The row delimiter separates each field’s information. By default, it is a line feed (\n). You can also use a semicolon (;) or a period (.). The column delimiter separates field names and types. By default, it is a comma (,). You can also use `','`. The field type is optional. If omitted, the default is `','`. Add fields in DDL format. Example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Add a New Output Field: Click + Add Output Field. Enter values for Column, Type, and Comment. Select a Mapping Type. Click the icon to save the row.

Click OK to finish configuring the Amazon RDS for PostgreSQL input component.

上一篇: Configure Amazon Redshift Input Component 下一篇: Configure the Amazon RDS for MySQL Input Component