Configure StarRocks Input Component

更新时间: 2026-06-23 13:55:57

The StarRocks input component reads data from StarRocks data sources. To synchronize data from StarRocks to another data source, configure the StarRocks input component first, and then configure the destination data source.

Prerequisites

Procedure

  1. On the Dataphin homepage, in the top menu bar, choose Development > Data Integration.

  2. On the Integration page, in the top menu bar, select Project. If your project is in Dev-Prod mode, select an environment.

  3. In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline you want to develop to open its configuration page.

  4. Click Component Library in the upper-right corner of the page to open the Component Library panel.

  5. In the left navigation pane of the Component Library panel, select Input. In the input component list on the right, locate the StarRocks component and drag it to the canvas.

  6. Click the image icon on the StarRocks input component card to open the StarRocks Input Configuration dialog box.

  7. In the StarRocks Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the StarRocks input component. Dataphin automatically generates this name, and you can modify it. Naming conventions:

    • Can contain only Chinese characters, letters, underscores (_), and numbers.

    • Cannot exceed 64 characters.

    Datasource

    Lists all StarRocks data sources in Dataphin, regardless of whether you have read-through permissions. Click the image icon to copy the data source name.

    • For data sources without read-through permissions, click Request next to the data source to request permissions. For more information, see Request Data Source Permissions.

    • If you do not have a StarRocks data source, click Create Data Source to create one. For more information, see Create a StarRocks Data Source.

    Source Table Quantity

    Select the number of source tables. Options: Single Table and Multiple Tables.

    • Single Table: Synchronizes business data from a single table to a single target table.

    • Multiple tables: Synchronizes business data from multiple tables to the same destination table using the union algorithm.

      For more information about union, see Intersection, Union, and Complement.

    Table Matching Method

    You can select General-Purpose Rules or Database Regular Expression.

    Note

    This option is configurable only when **Source Table Quantity** is set to **Multiple tables**.

    Table

    Select the source table:

    • If **Source Table Quantity** is set to **Non-partitioned table**, enter table name keywords to search, or enter the exact table name and click Precise Search. After selecting a table, the system automatically detects the table status. Click the image icon to copy the name of the selected table.

    • If **Source Table Quantity** is set to **Multiple tables**, you can add tables by entering different expressions based on the table matching method.

      • If **Table Matching Method** is set to **General Rules**: In the input box, enter a table expression to filter for **tables with the same structure**. The system supports enumeration, regular expression-like, and mixed forms. For example, table_[001-100];table_102;.

      • If **Table Matching Method** is set to **Database Regular Expression**: In the input box, enter the regular expression supported by the current database. The system will match tables in the destination database based on this regular expression. During runtime, the task will instantly match new table ranges for synchronization based on the database regular expression.

      After entering the expression, click Precise Search to view the list of matched tables in the Confirm Match Details dialog box.

    Shard Key (Optional)

    A column with an **integer** field type in the source table, used for data partitioning during concurrent reads. We recommend that you use a **primary key** or an **indexed column** as the shard key to improve synchronization efficiency.

    Batch Read Count (Optional)

    The number of records read per batch. Setting a batch size (for example, 1024) instead of reading records one by one reduces interactions with the data source, improves I/O efficiency, and lowers network latency.

    Input Filter (Optional)

    A filter condition for input fields, such as ds=${bizdate}. Applicable scenarios:

    • A fixed subset of data.

    • Parameter filtering.

    Output Fields

    Displays all fields from the selected table and those matched by the filter conditions. You can create, batch add, or delete output fields as needed.

    • Batch Add: Click Batch Add to support batch configuration in JSON, TEXT, and DDL formats.

      Note

      After batch adding is complete, clicking **OK** will overwrite the configured field information.

      • Configure in batches using JSON format, for example:

        // Example:
        [{
          "name": "user_id",
          "type": "String"
         },
         {
          "name": "user_name",
          "type": "String"
         }]
        Note

        name specifies the field name, and type specifies the field type after import. For example, "name":"user_id","type":"String" imports the field user_id with the type String.

      • Configure in batches using TEXT format, for example:

        // Example:
        user_id,String
        user_name,String
        • The row delimiter separates information for each field. The default is a line feed (\\n). It supports line feed (\\n), semicolon (;), and period (.).

        • The column delimiter separates the field name and field type. The default is a comma (,).

      • Configure in batches using DDL format, for example:

        CREATE TABLE tablename (
            id INT PRIMARY KEY,
            name VARCHAR(50),
            age INT
        );
    • Create Output Field: Click +Create Output Field, and enter the Column and select the Type as prompted on the page.

    • Delete a Field Individually: Click the sgaga icon in the Actions column for the target field to remove it.

      Note

      When the compute engine is StarRocks, the output fields of the StarRocks input component support viewing field **classification and grading**. Non-StarRocks compute engines do not support this.

    • Batch Delete Fields: To delete multiple fields at once, click Field Management. In the Field Management dialog box, select the fields, click the image left-arrow icon to move them to the unselected list, and click Confirm.

      image..png

  8. Click Confirm to complete the property configuration for the StarRocks input component.

上一篇: Configure the openGauss input component 下一篇: Configure SAP Table Input Component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈