Configure Amazon Redshift Input Component-Dataphin(Dataphin)-阿里云帮助中心

The Amazon Redshift input component reads data from an Amazon Redshift data source. To synchronize data from Amazon Redshift to another data source, configure this component as the source, and then configure the destination data source.

Prerequisites

You have created an Amazon Redshift data source. For more information, see Creating Amazon Redshift Data Source.
The account you use to configure the Amazon Redshift input component properties must have read-through permission for the data source. If you do not have this permission, request it for the data source. For more information, see Request, renew, and revoke data source permissions.

Procedure

In the Dataphin homepage top menu bar, select Development > Data Integration.
On the Integration page, in the top menu bar, select Project. In Dev-Prod mode, select an environment.
In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the top-right corner of the page to open the Component Library panel.
In the left navigation pane of the Component Library panel, select Input. In the list of input components on the right, locate the Amazon Redshift component and drag it to the canvas.
Click the icon in the Amazon Redshift input component card to open the Amazon Redshift Input Configuration dialog box.

In the Amazon Redshift Input Configuration dialog box, configure the following parameters.

Parameter	Description
Step Name	The name of the Amazon Redshift input component. Dataphin generates a default name that you can modify. Naming rules: Can contain only Chinese characters, letters, underscores (_), and numbers. Length cannot exceed 64 characters.
Datasource	The drop-down list displays all Amazon Redshift data sources, including those you have read-through permissions for and those you do not. Click the icon to copy the data source name. For a data source you do not have read-through permission for, click Request next to it to request read permission. For more information, see Request, renew, and revoke data source permissions. If you don't have an Amazon Redshift data source, click Create Data Source to create one. For more information, see Creating an Amazon Redshift Data Source.
Time Zone	Time-formatted data is processed based on the time zone configured in the selected data source. This value cannot be modified. Note For tasks created before V5.1.2, you can select Data Source Default Configuration or Channel Configuration Time Zone. The default option is Channel Configuration Time Zone. Data Source Default Configuration: The default time zone of the selected data source. Channel Configuration Time Zone: The time zone configured for the current integration task under Properties > Channel Configuration.
Schema (Optional)	Allows selecting tables across schemas. Select the schema where the table resides. If not specified, the schema configured in the data source is used by default.
Source Table Quantity	Select the number of source tables. Options: Single Table and Multiple Tables. Single Table: Syncs data from one source table to one destination table. Multiple Tables: Syncs data from multiple source tables to the same destination table using the UNION algorithm. For more information about UNION, see .
Table Matching Method	You can select Generic Class Rule or Database Regular Expression. Note This option is configurable only when Multiple Tables is selected for source table quantity.
Table	Select the source table: If you select Single Table for Number of Source Tables, you can search for a table by entering a keyword, or enter the exact table name and click Exact Search. After you select a table, the system automatically checks its status. Click the icon to copy the name of the currently selected table. If Multiple Tables is selected for source table quantity, enter different expressions to add tables based on the table matching method. If General Rules is selected for table matching method: In the input box, enter a table expression to filter for tables with the same structure. The system supports enumeration, regular expression-like forms, and mixed forms. For example, `table_[001-100];table_102;`. If Database Regular Expression is selected for table matching method: In the input box, enter the regular expression supported by the current database. The system matches tables in the destination database based on this regular expression. During task runtime, the system instantly matches new table ranges for synchronization based on the database regular expression. After you enter the expression, click Exact Match to view the list of matched tables in the Confirm Match Details dialog box.
Split Key (Optional)	The system partitions data based on the split key field and uses concurrency settings to enable parallel reads. You can use any column from the source table as the split key. Use a primary key or indexed column for optimal performance. Important When you select a date/time type, the system splits data based on the total time range and concurrency by identifying the maximum and minimum values. This does not guarantee an even split.
Batch Read Count (Optional)	The number of records read per batch. Configure a batch size (such as 1024 records) to reduce interactions with the data source, improve I/O efficiency, and lower network latency.
Input Filter (Optional)	Enter a filter condition for input data, for example, `ds=${bizdate}`. The Input Filter applies to the following scenarios: A fixed subset of data. Parameter filtering.
Output Fields	Displays all fields from the selected table and those matched by the filter conditions. Supported operations: Field Management: To exclude fields from downstream components, delete them: Delete single field: Click the icon in the Actions column to remove a field. Batch field deletion scenario: Click Field Management, select fields in the Field Management dialog box, click the left-moving icon to move them to the unselected list, and click OK. Batch Add: Click Batch Add to perform batch configuration in JSON, TEXT, or DDL format. Note After batch addition, clicking OK overwrites the existing field configuration. Batch configure in JSON format. For example: `// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note Index indicates the column number of the specified object. Name indicates the field name after import. Type indicates the field type after import. For example, `"index":3,"name":"user_id","type":"String"` means to import the fourth column from the file, with the field name user_id and field type String. Batch configure in TEXT format. For example: `// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2` The row delimiter separates information for each field. The default is a line feed (\n). It supports line feed (\n), semicolon (;), and period (.). The column delimiter is used to separate the field name and field type. The default is a comma (,), and it supports`','`. The field type can be omitted, and the default is`','`. Batch configure in DDL format. For example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Add an output field: Click + Add Output Field. Specify the Column, Type, Description, and Mapping Type, then click the icon to save.

Click Confirm to complete the property configuration of the Amazon Redshift input component.