configure selectdb output component, dataphin, data integration, data synchronization, batch pipeline

更新时间: 2026-06-23 10:13:05

The SelectDB output component writes data from external databases or big data platform storage systems to SelectDB for integration and reprocessing.

Prerequisites

  • You have created a SelectDB data source. For more information, see Create a SelectDB data source.

  • The account configuring the SelectDB output component properties must have write-through permissions for the data source. If you do not have permissions, request data source permissions. For more information, see Request data source permissions.

Procedure

  1. On the Dataphin home page, in the top menu bar, choose Development > Data Integration.

  2. On the Integration page, in the top menu bar, select Project (In Dev-Prod mode, select an environment).

  3. In the navigation pane on the left, click Offline Integration. Then, in the Offline Integration list, click the target offline pipeline to open its configuration page.

  4. Click Component Library in the upper-right corner of the page to open the Component Library panel.

  5. In the navigation pane on the left of the Component Library panel, select Outputs. In the output component list on the right, find the SelectDB component and drag it to the canvas.

  6. Click and drag the image icon of the target input, transform, or flow component to connect it to the current SelectDB output component.

  7. Click the image icon in the SelectDB output component card to open the SelectDB Output Configuration dialog box.image

  8. In the SelectDB Output Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Basic Settings

    Step Name

    The name of the SelectDB output component. Dataphin generates a default name that you can modify. Naming conventions:

    • Can contain only Chinese characters, letters, underscores (_), and numbers.

    • Length cannot exceed 64 characters.

    Datasource

    Lists all SelectDB data sources, regardless of your write-through permissions. Click the image icon to copy the data source name.

    • For data sources that do not have write-through permissions, you can click Request next to the data source to request the permissions. For more information, see Request data source permissions.

    • If you do not have a SelectDB data source, click Create Data Source to create one. For detailed steps, see Create a SelectDB Data Source.

    Table

    Select the destination table for the output data. You can enter a table name keyword to search, or enter the exact table name and click Precise Search. After you select a table, the system automatically checks the table status. Click the image icon to copy the name of the currently selected table.

    If no target table exists for data synchronization, use one-click table creation to generate one:

    1. Click One-Click Table Creation. Dataphin automatically generates the code for creating the target table, including the target table name (which defaults to the source table name) and field types (with preliminary conversion based on Dataphin fields).

    2. You can modify the SQL script for creating the target table as needed, and then click New. After the target table is successfully created, Dataphin automatically sets the new target table as the target table for output data.

      Note
      • If a table with the same name exists in the development environment, Dataphin reports an error indicating the table already exists after you click Create.

      • If there are no matching items, you can also integrate based on a manually entered table name.

    Production Table Missing Strategy

    Specifies how to handle a missing production table. You can select Take No Action or Automatic Creation. The default is Automatic Creation. If you select Take no action, the production table is not created when the task is published. If you select Automatic creation, a table with the same name is created in the target environment when the task is published.

    • Do Not Process: If the target table does not exist, a prompt is displayed upon submission, but the task can still be published. You must manually create the target table in the production environment before the task can run.

    • Automatic Creation: You must Edit The Table Creation Statement. By default, the statement of the selected table is filled in, and you can modify it. The table name in the statement uses the placeholder ${table_name}. Only this placeholder is supported, and it is replaced with the actual table name during execution.

      If the target table does not exist, the system creates it using the table creation statement. If table creation fails, the check result is 'failed' during publishing. You can modify the statement and publish again. If the target table already exists, table creation is skipped.

    Note

    This option is supported only in Dev-Prod mode projects.

    Data Format

    You can select CSV or JSON.

    If you choose CSV, also configure CSV Import Column Delimiter and CSV Import Row Delimiter.

    CSV Import Column Delimiter (Optional)

    The column delimiter for StreamLoad CSV import. Default: _@dp@_. Leave empty to use the default. If your data contains _@dp@_, specify a different delimiter.

    CSV Import Row Delimiter (Optional)

    The row delimiter for StreamLoad CSV import. Default: _#dp#_. Leave empty to use the default. If your data contains _#dp#_, specify a different delimiter.

    Batch Write Data Volume (Optional)

    The data volume per batch write. Works with Batch Write Count — the system writes when either limit is reached first. Default: 32 MB.

    Batch Write Record Count (Optional)

    The number of records per batch write. Default: 2048 records. Data synchronization writes in batches based on Number Of Records Per Batch and Data Volume Per Batch.

    • When accumulated data reaches either limit (data volume or record count), the system considers the batch full and writes it to the destination.

    • Set the batch write data volume to 32 MB. For the record count, adjust based on single record size. For example, if a single record is approximately 1 KB, set the batch insert byte size to 16 MB, then set the record count above 16384 (16 MB / 1 KB) — for instance, 20000 records. This way, the data volume limit triggers writes first, writing each time 16 MB of data accumulates.

    Preparation Statement (Optional)

    The SQL script executed on the database before data import.

    For example, to ensure continuous service availability: before writing, create target table Target_A and write data to it. After writing completes, rename Service_B (the active service table) to Temp_C, rename Target_A to Service_B, then delete Temp_C.

    Completion Statement (Optional)

    The SQL script executed on the database after data import.

    Field Mapping

    Input Fields

    Input fields are displayed based on the output of upstream components.

    Output Fields

    The following operations are supported for output fields:

    • Field Management: Click Field Management to select output fields.

      image

      • Click the gaagag icon to move Selected Input Fields to Unselected Input Fields.

      • Click the agfag icon to move fields from Unselected Input Fields to Selected Input Fields.

    • Batch Add: Click Batch Add. This supports batch configuration in JSON, TEXT, and DDL formats.

      • Batch configure in JSON format. For example:

        // Example:
        [{
          "name": "user_id",
          "type": "String"
         },
         {
          "name": "user_name",
          "type": "String"
         }]
        Note

        Name indicates the imported field name, and type indicates the field type after import. For example, "name":"user_id","type":"String" means to import the field named user_id and set its field type to String.

      • Configure fields in batches in TEXT format, for example:

        // Example:
        user_id,String
        user_name,String
        • Row delimiters separate field entries. Default: line feed (\n). Also supports semicolon (;) and period (.).

        • Column delimiters separate field names and field types. The default is comma (,).

      • Batch configure in DDL format. For example:

        CREATE TABLE tablename (
            id INT PRIMARY KEY,
            name VARCHAR(50),
            age INT
        );
    • Create Output Field: Click + Create Output Field, fill in Column and select Type according to the prompts on the page. After you configure the current row, click the image icon to save.

    Mapping

    Maps input fields from the source table to output fields in the target table. Supported mapping modes:

    • Same Name Mapping: Maps fields with the same field name.

    • Same Row Mapping: Maps fields in the same row position when source and target field names differ.

  9. Click Confirm to complete the property configuration for the SelectDB Output Component.

上一篇: Configure Amazon S3 output component 下一篇: Configure the Lindorm (compute engine) output component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈