configure Elasticsearch input component, data synchronization, Dataphin-Dataphin(Dataphin)-阿里云帮助中心

The Elasticsearch input component reads data from Elasticsearch data sources. To synchronize Elasticsearch data to other data sources, configure this input component before configuring the target data source.

Prerequisites

An Elasticsearch data source is created. Create Elasticsearch Data Source.
The account configuring this component has read-through permission for the data source. To request access, see Request Data Source Permission.

Procedure

On the Dataphin home page, select Development > Data Integration from the top menu bar.
In the integration page's top menu bar, select Project (Dev-Prod mode requires selecting an environment).
Click Batch Pipeline in the left-side navigation pane. Then, in the Batch Pipeline list, click the offline pipeline to open its configuration page.
Click Component Library in the upper-right corner to open the Component Library panel.
In the Component Library panel's left-side navigation pane, select Input. Find the Elasticsearch component in the list on the right and drag it to the canvas.
Click the icon on the Elasticsearch input component card to open the Elasticsearch Input Configuration dialog box.

Configure parameters in the Elasticsearch Input Configuration dialog box.

Parameter		Description
Basic Configuration	Step Name	The name of the Elasticsearch input component. Dataphin auto-generates a name, which you can modify. Naming rules: Can only contain Chinese characters, letters, underscores (_), and numbers. Cannot exceed 64 characters.
	Datasource	Lists all Elasticsearch data sources in Dataphin with their project levels and read-through permission status. Click the icon to copy the data source name. For data sources without read-through permission, click Request next to the data source to apply. Request Data Source Permission. If no Elasticsearch data source exists, click Create to add one. Create Elasticsearch Data Source.
	Query Type	Select whether to read index documents by index name or index alias. Parameters differ by query type. Index. Index Document: The index name in Elasticsearch. Click the icon to copy the name of the currently selected index document. Index Document Type: The type name of the index in Elasticsearch. Note Index Document and Index Document Type are required in Elasticsearch 6.x and Elasticsearch 7.x versions, and optional in Elasticsearch 8.x version. Index Alias. Index Alias: The alias of the index in Elasticsearch. Index Document Type: The type name of the index in Elasticsearch.
	Query Conditions	Elasticsearch query parameter for full or incremental queries. Example: `{ "match_all": {}}` runs a full query.
	Cursor Time	The scroll context duration for Elasticsearch pagination. If too small, idle time between page fetches may exceed the scroll duration, causing cursor expiry and data loss. If too large, concurrent queries may exceed the server-side `max_open_scroll_context` limit, causing query errors. Example: 5m sets a 5-minute cursor. Unit: Days (-d), hours (-h), minutes (-m), seconds (-s), milliseconds (-ms), microseconds (-micros), nanoseconds (-nanos).
Advanced Configuration	Batch Read Count	Number of records read per batch. Default: 1024. Increasing this value reduces interactions with the data source and improves I/O efficiency.
	Connection Timeout	Client connection timeout. Default: 6000 seconds.
	Management Timeout	Client read timeout. Default: 6000 seconds.
	Date Format	Required when a synchronized date-type field lacks a `format` in its `mapping`. Configure the `dateFormat` parameter. Default ES format: `yyyy-MM-dd'T'HH:mm:ssZ`.
Output Fields		Configure the output fields. Retrieve Field Information. When the query type is Index, click Retrieve Field Information to auto-populate fields from the selected index. Batch Add Fields. Click Batch Add. JSON format example: `[{"name":"col_integer","type":"integer"}, {"name":"col_long","type":"long"}, {"name":"col_double","type":"double"}]` Note `name` specifies the field name and `type` specifies its type. Example: `"name":"user_id","type":"String"` adds a field named user_id with type String. TEXT format example: `col_long,long col_double,double` Row delimiter separates field entries. Default: line feed (\n). Also supports semicolon (;) and period (.). Column delimiter separates the field name and type. Default: comma (,). Click Confirm. Create New Output Field. Click Create New Output Field and specify the Column name and Type. Manage Output Fields. Manage added fields: Drag the icon next to Column to reorder fields. Click the icon in the Operation column to edit a field. Click the icon in the Operation column to delete a field.

Click Confirm to save the configuration.

上一篇: Configure SAP Table Input Component 下一篇: Configure the Greenplum input component