Configure a PolarDB input component

更新时间: 2026-06-23 14:54:13

The PolarDB input component reads data from a PolarDB data source. Configure this component before synchronizing data from PolarDB to another data source.

Prerequisites

Procedure

  1. On the Dataphin home page, choose Develop > Data Integration from the top menu bar.

  2. In the top menu bar of the integration page, select a project. In Dev-Prod mode, also select an environment.

  3. In the left navigation pane, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.

  4. In the upper-right corner of the page, click Component Library to open the Component Library panel.

  5. In the Component Library panel, select Input from the navigation pane on the left. Find the PolarDB component in the list of input components on the right and drag it to the canvas.

  6. Click the image icon on the PolarDB input component card to open the PolarDB Input Configuration dialog box.

  7. In the PolarDB Input Configuration dialog box, configure the parameters.

    Parameter

    Description

    Step Name

    The name of the PolarDB input component. Dataphin automatically generates a name, which you can change. Naming rules:

    • You can only enter Chinese characters, letters, underscores (_), and numbers.

    • The name cannot exceed 64 characters in length.

    Datasource

    Lists all PolarDB data sources, regardless of whether you have read-through permissions. Click the image icon to copy the data source name.

    Time Zone

    Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be changed.

    Note

    For nodes created before version V5.1.2, you can select Data Source Default Configuration or Channel Configuration Time Zone. The default selection is Channel Configuration Time Zone.

    • Data Source Default Configuration: The default time zone of the selected data source.

    • Channel Configuration Time Zone: The time zone configured in Properties > Channel Configuration for the current integration node.

    Number of source tables

    The number of source tables for data synchronization. Options: Single Table and Multiple Tables:

    • Single Table: Synchronizes data from one source table to one target table.

    • Multiple Tables: Synchronizes data from multiple source tables to the same target table using the union algorithm.

      For more information about union, see INTERSECT, UNION, and EXCEPT.

    Table matching method

    Currently, you can only select General Rule.

    Note

    This parameter is available only when you select Multiple Tables for Number of source tables.

    Table

    Select the source table:

    • If you selected Single Table for Number of source tables, you can enter a keyword to search for the table, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the image icon to copy the name of the selected table.

    • If you selected Multiple Tables for Number of source tables, add tables as follows:

      1. In the input box, enter an expression to filter for tables with the same structure.

        The system supports enumeration, regular expression-like patterns, and a mix of both. For example, table_[001-100];table_102.

      2. Click Exact Match. In the Confirm Match Details dialog box, view the list of matched tables.

      3. Click Confirm.

    Split Key (Optional)

    Partitions data for concurrent reads when used with the concurrency setting. Specify a column from the source table. For best performance, use the primary key or an indexed column.

    Important

    If you select a date and time type, the system identifies the minimum and maximum values and performs a brute-force split based on the total time range and concurrency. The splits are not guaranteed to be even.

    Batch Read Size (Optional)

    The number of records to read per batch (for example, 1024). Reading in batches reduces interactions with the data source, improves I/O efficiency, and lowers network latency.

    Input Filter (Optional)

    Filter conditions for the input fields, for example, ds=${bizdate}. Applicable scenarios:

    • A fixed portion of the data.

    • Parameter-based filtering.

    Output Fields

    Displays all fields from the selected tables that match the filter criteria. Available operations:

    • Field Management: If you do not need to output certain fields to downstream components, you can delete them:

      • Deleting a single field: To delete a small number of fields, click the sgaga icon in the Actions column to remove the unwanted fields.

      • Deleting fields in batch: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, click the image left arrow icon to move the selected input fields to the unselected input fields list, and then click OK to delete the fields in batch.

        image..png

    • Batch Add: Click Batch Add to configure fields in batch using JSON, TEXT, or DDL format.

      Note

      After you add fields in batch and click OK, the existing field information is overwritten.

      • To configure in batch using JSON format, for example:

        // Example:
          [{
             "index": 1,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 2,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        The index field specifies the column number of the object. The name field specifies the name of the imported field. The type field specifies the data type of the imported field. For example, "index": 3, "name": "user_id", "type": "String" indicates that the fourth column in the file is imported as the field user_id of type String.

      • To configure in batch using TEXT format, for example:

        // Example:
        1,id,int(10),Long,comment1
        2,user_name,varchar(255),Long,comment2
        • The row delimiter separates the information for each field. The default is a line feed (\n). Semicolons (;) and periods (.) are also supported.

        • The column delimiter separates field names from field types. The default delimiter is a half-width comma (,). It supports','. The field type is optional and defaults to','.

      • To configure in batch using DDL format, for example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Create Output Field: Click +Create Output Field. Follow the prompts to enter the Column, Type, and Comment, and select the Mapping Type. After you finish configuring the current row, click the image icon to save.

  8. Click Confirm to complete the configuration of the PolarDB input component.

上一篇: Configure Log Service input components 下一篇: Configure OSS input component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈