Configure Impala input component for data sync-Dataphin(Dataphin)-阿里云帮助中心

The Impala input component reads data from an Impala data source and syncs it to other data sources. Configure the input component first, then configure the sync target.

Prerequisites

An Impala data source is created. Create an IMPALA data source.
Your account has sync-read permission on the data source. If not, request it. Request, renew, or release data source permissions.

Procedure

In the top menu bar, choose Develop > Data Integration.
On the Data Integration page, select a Project. In Dev-Prod mode, also select an environment.
In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline to develop.
Click Component Library in the upper-right corner to open the Component Library panel.
In the Component Library panel, click Input, find Impala, and drag it onto the canvas.
Click the icon on the Impala component to open the Impala Input Configuration dialog box.

In the Impala Input Configuration dialog box, configure the parameters.

Parameter	Description
Step name	The Impala input component name. Auto-generated by Dataphin. Rename as needed. Naming rules: Use only Chinese characters, letters, underscores (_), and digits. Use no more than 64 characters.
Data source	Lists all Impala data sources in Dataphin, including those you lack sync-read permission for. Click the icon to copy the data source name. If you lack sync-read permission, request it. Request, renew, or release data source permissions. To create a data source, click New. Create an IMPALA data source.
Source table count	Select Single table or Multiple tables: Single table: Sync from one source table to one target table. Multiple tables: Sync from multiple source tables to one target table. Dataphin merges data using the union algorithm. For more information, see INTERSECT, UNION, EXCEPT, and MINUS.
Table matching method	Only Generic rule is supported. Note Available only when Source table count is set to Multiple tables.
Table	Select the source table: If you selected Single table for Source table count, search by keyword. Click the icon to copy the table name. If you selected Multiple tables for Source table count, add tables: In the input box, enter an expression to filter tables with the same structure. Supported formats: enumeration, regex-like patterns, or both. Example: `table_[001-100];table_102`. Click Exact match and review the matched tables in the Confirm match details dialog box. Click Confirm.
Shard key	Select an integer column from the source table as the shard key. A primary key or indexed column is recommended. Dataphin partitions data by this field for concurrent reads, improving sync efficiency.
Batch read size	Number of records to read per batch. Set a value such as 1024 to reduce data source interactions, improve I/O efficiency, and lower network latency.
Input filter	Conditions to filter extracted data: Static field. Example: `ds=20210101`. Variable parameter. Example: `ds=${bizdate}`.
Output fields	Lists all fields from the selected table after applying the input filter. To exclude fields from downstream components, delete them: Delete one field at a time: Click the icon in the Actions column to delete extra fields. Delete multiple fields at once: Click Field management, select fields in the Field management dialog box, click the left-shift icon to move them to the unselected list, and click OK.

Click Confirm.

上一篇: Configure the GoldenDB input component 下一篇: Configure the openGauss input component