How to configure the StarRocks input component to read data from a data source-Dataphin(Dataphin)-阿里云帮助中心

The StarRocks input component reads data from StarRocks data sources. To synchronize data from StarRocks to another data source, configure the StarRocks input component first, and then configure the destination data source.

Prerequisites

Create a StarRocks data source. For more information, see Create a StarRocks Data Source.
The account used to configure the StarRocks input component properties must have read-through permissions for the data source. If you do not have these permissions, request them. For more information, see Request, Renew, and Return Data Source Permissions.

Procedure

On the Dataphin homepage, in the top menu bar, choose Development > Data Integration.
On the Integration page, in the top menu bar, select Project. If your project is in Dev-Prod mode, select an environment.
In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline you want to develop to open its configuration page.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the left navigation pane of the Component Library panel, select Input. In the input component list on the right, locate the StarRocks component and drag it to the canvas.
Click the icon on the StarRocks input component card to open the StarRocks Input Configuration dialog box.

In the StarRocks Input Configuration dialog box, configure the following parameters.

Parameter	Description
Step Name	The name of the StarRocks input component. Dataphin automatically generates this name, and you can modify it. Naming conventions: Can contain only Chinese characters, letters, underscores (_), and numbers. Cannot exceed 64 characters.
Datasource	Lists all StarRocks data sources in Dataphin, regardless of whether you have read-through permissions. Click the icon to copy the data source name. For data sources without read-through permissions, click Request next to the data source to request permissions. For more information, see Request Data Source Permissions. If you do not have a StarRocks data source, click Create Data Source to create one. For more information, see Create a StarRocks Data Source.
Source Table Quantity	Select the number of source tables. Options: Single Table and Multiple Tables. Single Table: Synchronizes business data from a single table to a single target table. Multiple tables: Synchronizes business data from multiple tables to the same destination table using the union algorithm. For more information about union, see Intersection, Union, and Complement.
Table Matching Method	You can select General-Purpose Rules or Database Regular Expression. Note This option is configurable only when Source Table Quantity is set to Multiple tables.
Table	Select the source table: If Source Table Quantity is set to Non-partitioned table, enter table name keywords to search, or enter the exact table name and click Precise Search. After selecting a table, the system automatically detects the table status. Click the icon to copy the name of the selected table. If Source Table Quantity is set to Multiple tables, you can add tables by entering different expressions based on the table matching method. If Table Matching Method is set to General Rules: In the input box, enter a table expression to filter for tables with the same structure. The system supports enumeration, regular expression-like, and mixed forms. For example, `table_[001-100];table_102;`. If Table Matching Method is set to Database Regular Expression: In the input box, enter the regular expression supported by the current database. The system will match tables in the destination database based on this regular expression. During runtime, the task will instantly match new table ranges for synchronization based on the database regular expression. After entering the expression, click Precise Search to view the list of matched tables in the Confirm Match Details dialog box.
Shard Key (Optional)	A column with an integer field type in the source table, used for data partitioning during concurrent reads. We recommend that you use a primary key or an indexed column as the shard key to improve synchronization efficiency.
Batch Read Count (Optional)	The number of records read per batch. Setting a batch size (for example, 1024) instead of reading records one by one reduces interactions with the data source, improves I/O efficiency, and lowers network latency.
Input Filter (Optional)	A filter condition for input fields, such as `ds=${bizdate}`. Applicable scenarios: A fixed subset of data. Parameter filtering.
Output Fields	Displays all fields from the selected table and those matched by the filter conditions. You can create, batch add, or delete output fields as needed. Batch Add: Click Batch Add to support batch configuration in JSON, TEXT, and DDL formats. Note After batch adding is complete, clicking OK will overwrite the configured field information. Configure in batches using JSON format, for example: `// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]` Note `name` specifies the field name, and `type` specifies the field type after import. For example, `"name":"user_id","type":"String"` imports the field user_id with the type String. Configure in batches using TEXT format, for example: `// Example: user_id,String user_name,String` The row delimiter separates information for each field. The default is a line feed (\\n). It supports line feed (\\n), semicolon (;), and period (.). The column delimiter separates the field name and field type. The default is a comma (,). Configure in batches using DDL format, for example: `CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );` Create Output Field: Click +Create Output Field, and enter the Column and select the Type as prompted on the page. Delete a Field Individually: Click the icon in the Actions column for the target field to remove it. Note When the compute engine is StarRocks, the output fields of the StarRocks input component support viewing field classification and grading. Non-StarRocks compute engines do not support this. Batch Delete Fields: To delete multiple fields at once, click Field Management. In the Field Management dialog box, select the fields, click the left-arrow icon to move them to the unselected list, and click Confirm.

Click Confirm to complete the property configuration for the StarRocks input component.

上一篇: Configure the openGauss input component 下一篇: Configure SAP Table Input Component