Configure Elasticsearch Input Components

更新时间: 2026-06-04 20:54:15

The Elasticsearch input component reads data from Elasticsearch data sources. To synchronize Elasticsearch data to other data sources, configure this input component before configuring the target data source.

Prerequisites

Procedure

  1. On the Dataphin home page, select Development > Data Integration from the top menu bar.

  2. In the integration page's top menu bar, select Project (Dev-Prod mode requires selecting an environment).

  3. Click Batch Pipeline in the left-side navigation pane. Then, in the Batch Pipeline list, click the offline pipeline to open its configuration page.

  4. Click Component Library in the upper-right corner to open the Component Library panel.

  5. In the Component Library panel's left-side navigation pane, select Input. Find the Elasticsearch component in the list on the right and drag it to the canvas.

  6. Click the image icon on the Elasticsearch input component card to open the Elasticsearch Input Configuration dialog box.

  7. Configure parameters in the Elasticsearch Input Configuration dialog box.

    Parameter

    Description

    Basic Configuration

    Step Name

    The name of the Elasticsearch input component. Dataphin auto-generates a name, which you can modify. Naming rules:

    • Can only contain Chinese characters, letters, underscores (_), and numbers.

    • Cannot exceed 64 characters.

    Datasource

    Lists all Elasticsearch data sources in Dataphin with their project levels and read-through permission status. Click the image icon to copy the data source name.

    Query Type

    Select whether to read index documents by index name or index alias. Parameters differ by query type.

    • Index.

      • Index Document: The index name in Elasticsearch. Click the image icon to copy the name of the currently selected index document.

      • Index Document Type: The type name of the index in Elasticsearch.

        Note

        Index Document and Index Document Type are required in Elasticsearch 6.x and Elasticsearch 7.x versions, and optional in Elasticsearch 8.x version.

    • Index Alias.

      • Index Alias: The alias of the index in Elasticsearch.

      • Index Document Type: The type name of the index in Elasticsearch.

    Query Conditions

    Elasticsearch query parameter for full or incremental queries. Example: { "match_all": {}} runs a full query.

    Cursor Time

    The scroll context duration for Elasticsearch pagination.

    • If too small, idle time between page fetches may exceed the scroll duration, causing cursor expiry and data loss.

    • If too large, concurrent queries may exceed the server-side max_open_scroll_context limit, causing query errors. Example: 5m sets a 5-minute cursor.

    Unit: Days (-d), hours (-h), minutes (-m), seconds (-s), milliseconds (-ms), microseconds (-micros), nanoseconds (-nanos).

    Advanced Configuration

    Batch Read Count

    Number of records read per batch. Default: 1024. Increasing this value reduces interactions with the data source and improves I/O efficiency.

    Connection Timeout

    Client connection timeout. Default: 6000 seconds.

    Management Timeout

    Client read timeout. Default: 6000 seconds.

    Date Format

    Required when a synchronized date-type field lacks a format in its mapping. Configure the dateFormat parameter. Default ES format: yyyy-MM-dd'T'HH:mm:ssZ.

    Output Fields

    Configure the output fields.

    • Retrieve Field Information.

      When the query type is Index, click Retrieve Field Information to auto-populate fields from the selected index.

    • Batch Add Fields.

      1. Click Batch Add.

        • JSON format example:

          [{"name":"col_integer","type":"integer"},
           {"name":"col_long","type":"long"},
           {"name":"col_double","type":"double"}]
          Note

          name specifies the field name and type specifies its type. Example: "name":"user_id","type":"String" adds a field named user_id with type String.

        • TEXT format example:

          col_long,long
          col_double,double
          • Row delimiter separates field entries. Default: line feed (\n). Also supports semicolon (;) and period (.).

          • Column delimiter separates the field name and type. Default: comma (,).

      2. Click Confirm.

    • Create New Output Field.

      Click Create New Output Field and specify the Column name and Type.

    • Manage Output Fields.

      Manage added fields:

      • Drag the image icon next to Column to reorder fields.

      • Click the agag icon in the Operation column to edit a field.

      • Click the agfag icon in the Operation column to delete a field.

  8. Click Confirm to save the configuration.

上一篇: Configure SAP Table Input Component 下一篇: Configure the Greenplum input component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈