Configure an Impala input component

更新时间: 2026-06-08 21:00:38

The Impala input component reads data from an Impala data source and syncs it to other data sources. Configure the input component first, then configure the sync target.

Prerequisites

Procedure

  1. In the top menu bar, choose Develop > Data Integration.

  2. On the Data Integration page, select a Project. In Dev-Prod mode, also select an environment.

  3. In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline to develop.

  4. Click Component Library in the upper-right corner to open the Component Library panel.

  5. In the Component Library panel, click Input, find Impala, and drag it onto the canvas.

  6. Click the image icon on the Impala component to open the Impala Input Configuration dialog box.

  7. In the Impala Input Configuration dialog box, configure the parameters.

    Parameter

    Description

    Step name

    The Impala input component name. Auto-generated by Dataphin. Rename as needed. Naming rules:

    • Use only Chinese characters, letters, underscores (_), and digits.

    • Use no more than 64 characters.

    Data source

    Lists all Impala data sources in Dataphin, including those you lack sync-read permission for. Click the image icon to copy the data source name.

    Source table count

    Select Single table or Multiple tables:

    • Single table: Sync from one source table to one target table.

    • Multiple tables: Sync from multiple source tables to one target table. Dataphin merges data using the union algorithm.

      For more information, see INTERSECT, UNION, EXCEPT, and MINUS.

    Table matching method

    Only Generic rule is supported.

    Note

    Available only when Source table count is set to Multiple tables.

    Table

    Select the source table:

    • If you selected Single table for Source table count, search by keyword. Click the image icon to copy the table name.

    • If you selected Multiple tables for Source table count, add tables:

      1. In the input box, enter an expression to filter tables with the same structure.

        Supported formats: enumeration, regex-like patterns, or both. Example: table_[001-100];table_102.

      2. Click Exact match and review the matched tables in the Confirm match details dialog box.

      3. Click Confirm.

    Shard key

    Select an integer column from the source table as the shard key. A primary key or indexed column is recommended. Dataphin partitions data by this field for concurrent reads, improving sync efficiency.

    Batch read size

    Number of records to read per batch. Set a value such as 1024 to reduce data source interactions, improve I/O efficiency, and lower network latency.

    Input filter

    Conditions to filter extracted data:

    • Static field. Example: ds=20210101.

    • Variable parameter. Example: ds=${bizdate}.

    Output fields

    Lists all fields from the selected table after applying the input filter. To exclude fields from downstream components, delete them:

    • Delete one field at a time: Click the sgaga icon in the Actions column to delete extra fields.

    • Delete multiple fields at once: Click Field management, select fields in the Field management dialog box, click the left-shift icon image to move them to the unselected list, and click OK.

      image..png

  8. Click Confirm.

上一篇: Configure the GoldenDB input component 下一篇: Configure the openGauss input component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈