How to configure a PolarDB input component to read data from a data source-Dataphin(Dataphin)-阿里云帮助中心

The PolarDB input component reads data from a PolarDB data source. Configure this component before synchronizing data from PolarDB to another data source.

Prerequisites

A PolarDB data source has been created. For more information, see Create a PolarDB data source.
The account that you use to configure the PolarDB input component must have read-through permissions for the data source. If the account does not have the required permissions, you must request them. For more information, see Request, renew, and return data source permissions.

Procedure

On the Dataphin home page, choose Develop > Data Integration from the top menu bar.
In the top menu bar of the integration page, select a project. In Dev-Prod mode, also select an environment.
In the left navigation pane, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the Component Library panel, select Input from the navigation pane on the left. Find the PolarDB component in the list of input components on the right and drag it to the canvas.
Click the icon on the PolarDB input component card to open the PolarDB Input Configuration dialog box.

In the PolarDB Input Configuration dialog box, configure the parameters.

Parameter	Description
Step Name	The name of the PolarDB input component. Dataphin automatically generates a name, which you can change. Naming rules: You can only enter Chinese characters, letters, underscores (_), and numbers. The name cannot exceed 64 characters in length.
Datasource	Lists all PolarDB data sources, regardless of whether you have read-through permissions. Click the icon to copy the data source name. For data sources for which you do not have read-through permissions, click Request after the data source name to request the permissions. For more information, see Request, renew, and return data source permissions. If you do not have a PolarDB data source, click New to create one. For more information, see Create a PolarDB data source.
Time Zone	Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be changed. Note For nodes created before version V5.1.2, you can select Data Source Default Configuration or Channel Configuration Time Zone. The default selection is Channel Configuration Time Zone. Data Source Default Configuration: The default time zone of the selected data source. Channel Configuration Time Zone: The time zone configured in Properties > Channel Configuration for the current integration node.
Number of source tables	The number of source tables for data synchronization. Options: Single Table and Multiple Tables: Single Table: Synchronizes data from one source table to one target table. Multiple Tables: Synchronizes data from multiple source tables to the same target table using the union algorithm. For more information about union, see INTERSECT, UNION, and EXCEPT.
Table matching method	Currently, you can only select General Rule. Note This parameter is available only when you select Multiple Tables for Number of source tables.
Table	Select the source table: If you selected Single Table for Number of source tables, you can enter a keyword to search for the table, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the icon to copy the name of the selected table. If you selected Multiple Tables for Number of source tables, add tables as follows: In the input box, enter an expression to filter for tables with the same structure. The system supports enumeration, regular expression-like patterns, and a mix of both. For example, `table_[001-100];table_102`. Click Exact Match. In the Confirm Match Details dialog box, view the list of matched tables. Click Confirm.
Split Key (Optional)	Partitions data for concurrent reads when used with the concurrency setting. Specify a column from the source table. For best performance, use the primary key or an indexed column. Important If you select a date and time type, the system identifies the minimum and maximum values and performs a brute-force split based on the total time range and concurrency. The splits are not guaranteed to be even.
Batch Read Size (Optional)	The number of records to read per batch (for example, 1024). Reading in batches reduces interactions with the data source, improves I/O efficiency, and lowers network latency.
Input Filter (Optional)	Filter conditions for the input fields, for example, `ds=${bizdate}`. Applicable scenarios: A fixed portion of the data. Parameter-based filtering.
Output Fields	Displays all fields from the selected tables that match the filter criteria. Available operations: Field Management: If you do not need to output certain fields to downstream components, you can delete them: Deleting a single field: To delete a small number of fields, click the icon in the Actions column to remove the unwanted fields. Deleting fields in batch: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, click the left arrow icon to move the selected input fields to the unselected input fields list, and then click OK to delete the fields in batch. Batch Add: Click Batch Add to configure fields in batch using JSON, TEXT, or DDL format. Note After you add fields in batch and click OK, the existing field information is overwritten. To configure in batch using JSON format, for example: `// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note The `index` field specifies the column number of the object. The `name` field specifies the name of the imported field. The `type` field specifies the data type of the imported field. For example, `"index": 3, "name": "user_id", "type": "String"` indicates that the fourth column in the file is imported as the field `user_id` of type `String`. To configure in batch using TEXT format, for example: `// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2` The row delimiter separates the information for each field. The default is a line feed (\n). Semicolons (;) and periods (.) are also supported. The column delimiter separates field names from field types. The default delimiter is a half-width comma (,). It supports`','`. The field type is optional and defaults to`','`. To configure in batch using DDL format, for example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Create Output Field: Click +Create Output Field. Follow the prompts to enter the Column, Type, and Comment, and select the Mapping Type. After you finish configuring the current row, click the icon to save.

Click Confirm to complete the configuration of the PolarDB input component.

上一篇: Configure Log Service input components 下一篇: Configure OSS input component