Configure Elasticsearch Input Components
The Elasticsearch input component reads data from Elasticsearch data sources. To synchronize Elasticsearch data to other data sources, configure this input component before configuring the target data source.
Prerequisites
-
An Elasticsearch data source is created. Create Elasticsearch Data Source.
-
The account configuring this component has read-through permission for the data source. To request access, see Request Data Source Permission.
Procedure
-
On the Dataphin home page, select Development > Data Integration from the top menu bar.
-
In the integration page's top menu bar, select Project (Dev-Prod mode requires selecting an environment).
-
Click Batch Pipeline in the left-side navigation pane. Then, in the Batch Pipeline list, click the offline pipeline to open its configuration page.
-
Click Component Library in the upper-right corner to open the Component Library panel.
-
In the Component Library panel's left-side navigation pane, select Input. Find the Elasticsearch component in the list on the right and drag it to the canvas.
-
Click the
icon on the Elasticsearch input component card to open the Elasticsearch Input Configuration dialog box. -
Configure parameters in the Elasticsearch Input Configuration dialog box.
Parameter
Description
Basic Configuration
Step Name
The name of the Elasticsearch input component. Dataphin auto-generates a name, which you can modify. Naming rules:
-
Can only contain Chinese characters, letters, underscores (_), and numbers.
-
Cannot exceed 64 characters.
Datasource
Lists all Elasticsearch data sources in Dataphin with their project levels and read-through permission status. Click the
icon to copy the data source name.-
For data sources without read-through permission, click Request next to the data source to apply. Request Data Source Permission.
-
If no Elasticsearch data source exists, click Create to add one. Create Elasticsearch Data Source.
Query Type
Select whether to read index documents by index name or index alias. Parameters differ by query type.
-
Index.
-
Index Document: The index name in Elasticsearch. Click the
icon to copy the name of the currently selected index document. -
Index Document Type: The type name of the index in Elasticsearch.
NoteIndex Document and Index Document Type are required in Elasticsearch 6.x and Elasticsearch 7.x versions, and optional in Elasticsearch 8.x version.
-
-
Index Alias.
-
Index Alias: The alias of the index in Elasticsearch.
-
Index Document Type: The type name of the index in Elasticsearch.
-
Query Conditions
Elasticsearch query parameter for full or incremental queries. Example:
{ "match_all": {}}runs a full query.Cursor Time
The scroll context duration for Elasticsearch pagination.
-
If too small, idle time between page fetches may exceed the scroll duration, causing cursor expiry and data loss.
-
If too large, concurrent queries may exceed the server-side
max_open_scroll_contextlimit, causing query errors. Example: 5m sets a 5-minute cursor.
Unit: Days (-d), hours (-h), minutes (-m), seconds (-s), milliseconds (-ms), microseconds (-micros), nanoseconds (-nanos).
Advanced Configuration
Batch Read Count
Number of records read per batch. Default: 1024. Increasing this value reduces interactions with the data source and improves I/O efficiency.
Connection Timeout
Client connection timeout. Default: 6000 seconds.
Management Timeout
Client read timeout. Default: 6000 seconds.
Date Format
Required when a synchronized date-type field lacks a
formatin itsmapping. Configure thedateFormatparameter. Default ES format:yyyy-MM-dd'T'HH:mm:ssZ.Output Fields
Configure the output fields.
-
Retrieve Field Information.
When the query type is Index, click Retrieve Field Information to auto-populate fields from the selected index.
-
Batch Add Fields.
-
Click Batch Add.
-
JSON format example:
[{"name":"col_integer","type":"integer"}, {"name":"col_long","type":"long"}, {"name":"col_double","type":"double"}]Notenamespecifies the field name andtypespecifies its type. Example:"name":"user_id","type":"String"adds a field named user_id with type String. -
TEXT format example:
col_long,long col_double,double-
Row delimiter separates field entries. Default: line feed (\n). Also supports semicolon (;) and period (.).
-
Column delimiter separates the field name and type. Default: comma (,).
-
-
-
Click Confirm.
-
-
Create New Output Field.
Click Create New Output Field and specify the Column name and Type.
-
Manage Output Fields.
Manage added fields:
-
Drag the
icon next to Column to reorder fields. -
Click the
icon in the Operation column to edit a field. -
Click the
icon in the Operation column to delete a field.
-
-
-
Click Confirm to save the configuration.