Configure the Data Lake Formation input component

更新时间: 2026-03-14 04:23:01

The Data Lake Formation input component reads data from a Data Lake Formation data source. To sync data from a Data Lake Formation data source to another data source, first configure the Data Lake Formation input component to read from the source and then configure the destination data source. This topic describes how to configure the Data Lake Formation input component.

Prerequisites

Procedure

  1. On the Dataphin home page, in the top menu bar, click Developer and then click Data Integration.

  2. On the integration page, in the top menu bar, select a Project. If you are in Dev-Prod mode, also select an environment.

  3. In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.

  4. In the upper-right corner of the page, click Component Library to open the Component Library panel.

  5. In the left navigation pane of the Component Library panel, click Input. Find the Data Lake Formation component in the list of input components on the right and drag it to the canvas.

  6. Click the image icon on the Data Lake Formation input component card to open the Data Lake Formation Input Configuration dialog box.

  7. In the Data Lake Formation Input Configuration dialog box, configure the parameters.

    Parameter

    Description

    Step Name

    The name of the Data Lake Formation input component. Dataphin automatically generates a step name. You can also change the name as needed. The naming convention is as follows:

    • Can contain only Chinese characters, letters, underscores (_), and digits.

    • Cannot exceed 64 characters in length.

    Datasource

    The data source drop-down list displays all Data Lake Formation data sources. This includes data sources for which you have read-through permissions and those for which you do not. Click the image icon to copy the name of the current data source.

    Table

    Select a source table. You can enter a keyword to search for the table name, or enter the exact table name and click Exact Search. After you select a table, the system automatically checks the table status. Click the image icon to copy the name of the selected table.

    Partition

    If the selected source table is a partitioned table, you must specify the partition information. For example, state_date='20190101'. You can also use parameters to incrementally obtain data each day. For example, state_date=${bizdate}.

    Output Fields

    The Output Fields section displays all fields that are found in the selected table and that match the filter criteria. The following operations are supported:

    • Field Management: If you do not need to output certain fields to downstream components, you can delete them:

      • To delete a single field: To delete a small number of fields, click the sgaga icon in the Actions column to delete the unwanted fields.

      • To delete fields in a batch: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields. Then, click the image left-shift icon to move the selected input fields to the unselected input fields list. Click Confirm to delete the fields in a batch.

        image..png

    • Batch Add: Click Batch Add. You can add fields in a batch in JSON, TEXT, or DDL format.

      Note

      After you add fields in a batch and click Confirm, the action overwrites the configured field information.

      • To add fields in a batch in JSON format, for example:

        // Example:
          [{
             "index": 1,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 2,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        `index` specifies the column number of the object. `name` specifies the field name after import. `type` specifies the field type after import.

        For example, "index":3,"name":"user_id","type":"String" indicates that the fourth column of the file is imported. The field name is `user_id` and the field type is `String`.

      • To add fields in a batch in TEXT format, for example:

        // Example:
        1,id,int(10),Long,comment1
        2,user_name,varchar(255),Long,comment2
        • The row delimiter separates the information of each field. The default row delimiter is a line feed (\n). You can also use a semicolon (;) or a period (.).

        • The column delimiter separates field names from field types. The default delimiter is a half-width comma (,), which is supported by ','. Field types are optional, and the delimiter defaults to ','.

      • To add fields in a batch in DDL format, for example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Create Output Field: Click +Create Output Field. Follow the on-screen instructions to enter the Column, Type, and Remarks, and select a Mapping Type. After you configure the current row, click the image icon to save.

  8. Click Confirm to complete the property configuration for the Data Lake Formation input component.

上一篇: Configure the OpenSearch input component 下一篇: Configure the InfluxDB input component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈