Configure the Data Lake Formation input component-Dataphin(Dataphin)-阿里云帮助中心

The Data Lake Formation input component reads data from a Data Lake Formation data source. To sync data from a Data Lake Formation data source to another data source, first configure the Data Lake Formation input component to read from the source and then configure the destination data source. This topic describes how to configure the Data Lake Formation input component.

Prerequisites

You have created a Data Lake Formation data source. For more information, see Create a Data Lake Formation data source.
The account used to configure the properties of the Data Lake Formation input component must have read-through permissions for the data source. If the account does not have the required permissions, request them for the data source. For more information, see Apply for, renew, and return data source permissions.

Procedure

On the Dataphin home page, in the top menu bar, click Developer and then click Data Integration.
On the integration page, in the top menu bar, select a Project. If you are in Dev-Prod mode, also select an environment.
In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the left navigation pane of the Component Library panel, click Input. Find the Data Lake Formation component in the list of input components on the right and drag it to the canvas.
Click the icon on the Data Lake Formation input component card to open the Data Lake Formation Input Configuration dialog box.

In the Data Lake Formation Input Configuration dialog box, configure the parameters.

Parameter	Description
Step Name	The name of the Data Lake Formation input component. Dataphin automatically generates a step name. You can also change the name as needed. The naming convention is as follows: Can contain only Chinese characters, letters, underscores (_), and digits. Cannot exceed 64 characters in length.
Datasource	The data source drop-down list displays all Data Lake Formation data sources. This includes data sources for which you have read-through permissions and those for which you do not. Click the icon to copy the name of the current data source. For a data source for which you do not have read-through permissions, click Request next to the data source to request the permissions. For more information, see Apply for, renew, and return data source permissions. If you do not have a Data Lake Formation data source, click New to create one. For more information, see Create a Data Lake Formation data source.
Table	Select a source table. You can enter a keyword to search for the table name, or enter the exact table name and click Exact Search. After you select a table, the system automatically checks the table status. Click the icon to copy the name of the selected table.
Partition	If the selected source table is a partitioned table, you must specify the partition information. For example, `state_date='20190101'`. You can also use parameters to incrementally obtain data each day. For example, `state_date=${bizdate}`.
Output Fields	The Output Fields section displays all fields that are found in the selected table and that match the filter criteria. The following operations are supported: Field Management: If you do not need to output certain fields to downstream components, you can delete them: To delete a single field: To delete a small number of fields, click the icon in the Actions column to delete the unwanted fields. To delete fields in a batch: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields. Then, click the left-shift icon to move the selected input fields to the unselected input fields list. Click Confirm to delete the fields in a batch. Batch Add: Click Batch Add. You can add fields in a batch in JSON, TEXT, or DDL format. Note After you add fields in a batch and click Confirm, the action overwrites the configured field information. To add fields in a batch in JSON format, for example: `// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note `index` specifies the column number of the object. `name` specifies the field name after import. `type` specifies the field type after import. For example, `"index":3,"name":"user_id","type":"String"` indicates that the fourth column of the file is imported. The field name is `user_id` and the field type is `String`. To add fields in a batch in TEXT format, for example: `// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2` The row delimiter separates the information of each field. The default row delimiter is a line feed (\n). You can also use a semicolon (;) or a period (.). The column delimiter separates field names from field types. The default delimiter is a half-width comma (,), which is supported by `','`. Field types are optional, and the delimiter defaults to `','`. To add fields in a batch in DDL format, for example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Create Output Field: Click +Create Output Field. Follow the on-screen instructions to enter the Column, Type, and Remarks, and select a Mapping Type. After you configure the current row, click the icon to save.

Click Confirm to complete the property configuration for the Data Lake Formation input component.

上一篇: Configure the OpenSearch input component 下一篇: Configure the InfluxDB input component