Configure Amazon Redshift Input Component
The Amazon Redshift input component retrieves data from an Amazon Redshift data source. To synchronize data from Amazon Redshift to another data source, first configure the Amazon Redshift input component to read from the source data source. Then, configure the destination data source for data synchronization. This topic describes how to configure the Amazon Redshift input component.
Prerequisites
You have created an Amazon Redshift data source. For more information, see Creating Amazon Redshift Data Source.
The account you use to configure the Amazon Redshift input component properties must have read-through permission for the data source. If you do not have this permission, request it for the data source. For more information, see Request, renew, and revoke data source permissions.
Procedure
In the Dataphin homepage top menu bar, select Development > Data Integration.
On the Integration page, in the top menu bar, select Project. In Dev-Prod mode, select an environment.
In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the top-right corner of the page to open the Component Library panel.
In the left navigation pane of the Component Library panel, select Input. In the list of input components on the right, locate the Amazon Redshift component and drag it to the canvas.
Click the
icon in the Amazon Redshift input component card to open the Amazon Redshift Input Configuration dialog box.In the Amazon Redshift Input Configuration dialog box, configure the following parameters.
Parameter
Description
Step Name
The name of the Amazon Redshift input component. Dataphin automatically generates the step name. Modify it as needed. Naming rules are as follows:
Can contain only Chinese characters, letters, underscores (_), and numbers.
Length cannot exceed 64 characters.
Datasource
The data source drop-down list displays all Amazon Redshift data sources, including those for which you have read-through permissions and those for which you do not. Click the
icon to copy the current data source name.For a data source for which you do not have read-through permission, you can click Request next to the data source to request the required read permission. For more information, see Request, renew, and revoke data source permissions.
If you don't have an Amazon Redshift data source, click Create Data Source to create a data source. For detailed steps, see Creating an Amazon Redshift Data Source.
Time Zone
The system processes time-formatted data based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be modified.
NoteFor tasks created before V5.1.2, you can select Data Source Default Configuration or Channel Configuration Time Zone. The default option is Channel Configuration Time Zone.
Data Source Default Configuration: The default time zone of the selected data source.
Channel Configuration Time Zone: The time zone configured for the current integration task under Properties > Channel Configuration.
Schema (Optional)
Supports selecting tables across schemas. Select the schema where the table resides. If not specified, the default is the schema configured in the data source.
Source Table Quantity
Select the number of source tables. The options are Single Table and Multiple Tables:
Single Table: Use this for scenarios where you sync business data from one table to one destination table.
Multiple Tables: Use this for scenarios where you sync business data from multiple tables to the same destination table. When data from multiple tables is written to the same data table, the union algorithm is used.
For more information about UNION, see INTERSECT, UNION, and EXCEPT.
Table Matching Method
You can select Generic Class Rule or Database Regular Expression.
NoteThis option is configurable only when Multiple Tables is selected for source table quantity.
Table
Select the source table:
If you select Single Table for Number of Source Tables, you can search for a table by entering a keyword, or enter the exact table name and click Exact Search. After you select a table, the system automatically checks its status. Click the
icon to copy the name of the currently selected table.If Multiple Tables is selected for source table quantity, enter different expressions to add tables based on the table matching method.
If General Rules is selected for table matching method: In the input box, enter a table expression to filter for tables with the same structure. The system supports enumeration, regular expression-like forms, and mixed forms. For example,
table_[001-100];table_102;.If Database Regular Expression is selected for table matching method: In the input box, enter the regular expression supported by the current database. The system matches tables in the destination database based on this regular expression. During task runtime, the system instantly matches new table ranges for synchronization based on the database regular expression.
After you enter the expression, click Exact Match to view the list of matched tables in the Confirm Match Details dialog box.
Split Key (Optional)
The system partitions data based on the configured split key field. Use this with concurrency configuration to achieve concurrent reads. You can use a column from the source data table as the split key. Additionally, use a primary key or an indexed column as the split key to ensure transfer performance.
ImportantWhen you select a date/time type, the system performs a brute-force split based on the total time range and concurrency by identifying the maximum and minimum values. This does not guarantee an even split.
Batch Read Count (Optional)
The number of data records read at one time. When reading data from the source database, configure a specific batch read count (such as 1024 records) instead of reading records one by one. This reduces interactions with the data source, improves I/O efficiency, and lowers network latency.
Input Filter (Optional)
Enter the filter information for the input field, for example,
ds=${bizdate}. The Input Filter applies to the following two scenarios:A fixed subset of data.
Parameter filtering.
Output Fields
The output fields area displays all fields from the selected table and those matched by the filter conditions. It supports the following operations:
Field Management: If you do not need to output certain fields to downstream components, delete the corresponding fields:
Delete single field: To delete a few fields, click the
icon in the Actions column to remove redundant fields.Batch field deletion scenario: To delete many fields, click Field Management, select multiple fields in the Field Management dialog box, click the
left-moving icon to move the selected input fields to the unselected input fields, and click OK to complete batch field deletion.
Batch Add: Click Batch Add to perform batch configuration in JSON, TEXT, or DDL format.
NoteAfter batch addition is complete, clicking OK will overwrite the configured field information.
Batch configure in JSON format. For example:
// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]NoteIndex indicates the column number of the specified object. Name indicates the field name after import. Type indicates the field type after import. For example,
"index":3,"name":"user_id","type":"String"means to import the fourth column from the file, with the field name user_id and field type String.Batch configure in TEXT format. For example:
// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2The row delimiter separates information for each field. The default is a line feed (\n). It supports line feed (\n), semicolon (;), and period (.).
The column delimiter is used to separate the field name and field type. The default is a comma (,), and it supports
','. The field type can be omitted, and the default is','.
Batch configure in DDL format. For example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
Add an output field: Click + Add Output Field. As prompted, specify the Column, Type, and Description, and select a Mapping Type. After you complete the configuration for the current row, click the
icon to save.
Click Confirm to complete the property configuration of the Amazon Redshift input component.