Configure an Impala input component
The Impala input component reads data from an Impala data source and syncs it to other data sources. Configure the input component first, then configure the sync target.
Prerequisites
-
An Impala data source is created. Create an IMPALA data source.
-
Your account has sync-read permission on the data source. If not, request it. Request, renew, or release data source permissions.
Procedure
-
In the top menu bar, choose Develop > Data Integration.
-
On the Data Integration page, select a Project. In Dev-Prod mode, also select an environment.
-
In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline to develop.
-
Click Component Library in the upper-right corner to open the Component Library panel.
-
In the Component Library panel, click Input, find Impala, and drag it onto the canvas.
-
Click the
icon on the Impala component to open the Impala Input Configuration dialog box. -
In the Impala Input Configuration dialog box, configure the parameters.
Parameter
Description
Step name
The Impala input component name. Auto-generated by Dataphin. Rename as needed. Naming rules:
-
Use only Chinese characters, letters, underscores (_), and digits.
-
Use no more than 64 characters.
Data source
Lists all Impala data sources in Dataphin, including those you lack sync-read permission for. Click the
icon to copy the data source name.-
If you lack sync-read permission, request it. Request, renew, or release data source permissions.
-
To create a data source, click New. Create an IMPALA data source.
Source table count
Select Single table or Multiple tables:
-
Single table: Sync from one source table to one target table.
-
Multiple tables: Sync from multiple source tables to one target table. Dataphin merges data using the union algorithm.
For more information, see INTERSECT, UNION, EXCEPT, and MINUS.
Table matching method
Only Generic rule is supported.
NoteAvailable only when Source table count is set to Multiple tables.
Table
Select the source table:
-
If you selected Single table for Source table count, search by keyword. Click the
icon to copy the table name. -
If you selected Multiple tables for Source table count, add tables:
-
In the input box, enter an expression to filter tables with the same structure.
Supported formats: enumeration, regex-like patterns, or both. Example:
table_[001-100];table_102. -
Click Exact match and review the matched tables in the Confirm match details dialog box.
-
Click Confirm.
-
Shard key
Select an integer column from the source table as the shard key. A primary key or indexed column is recommended. Dataphin partitions data by this field for concurrent reads, improving sync efficiency.
Batch read size
Number of records to read per batch. Set a value such as 1024 to reduce data source interactions, improve I/O efficiency, and lower network latency.
Input filter
Conditions to filter extracted data:
-
Static field. Example:
ds=20210101. -
Variable parameter. Example:
ds=${bizdate}.
Output fields
Lists all fields from the selected table after applying the input filter. To exclude fields from downstream components, delete them:
-
Delete one field at a time: Click the
icon in the Actions column to delete extra fields. -
Delete multiple fields at once: Click Field management, select fields in the Field management dialog box, click the left-shift icon
to move them to the unselected list, and click OK.
-
-
Click Confirm.