Configure HBase input component-Dataphin(Dataphin)-阿里云帮助中心

The HBase input component reads data from an HBase data source. To synchronize data from HBase to other data sources, configure the HBase input component first, and then configure the target data source.

Prerequisites

You have purchased and enabled the high availability (HA) feature of the DataService Studio or Tag Service module to configure primary/secondary links for data sources.
You have created an HBase data source. For more information, see .
The account used to configure the HBase input component properties must have read-through permission on the data source. If you do not have the permission, you need to request it. For more information, see Request, renew, and return permissions on a data source.

Procedure

In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.
In the top navigation bar of the integration page, select a project (In Dev-Prod mode, you need to select an environment).
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the left-side navigation pane of the Component Library panel, select Inputs. Find the HBase component in the input component list on the right and drag it to the canvas.
Click the icon in the HBase input component card to open the HBase Input Configuration dialog box.

In the HBase Input Configuration dialog box, configure the parameters.

Parameter	Description
Step Name	The name of the HBase input component. Dataphin automatically generates a step name, which you can modify. The name must meet the following requirements: It can contain only Chinese characters, letters, underscores (_), and digits. It cannot exceed 64 characters in length.
Datasource	All HBase data sources in the current Dataphin instance are listed, including those for which you may not have read-through permission. Click the icon to copy the data source name. For data sources for which you do not have read-through permission, you can click Request next to the data source to request read-through permission. For more information, see Request permission on a data source. If you do not have an HBase data source, click Create to create one. For more information, see .
Select Link	If you have enabled the high availability feature of Tag Service and the selected HBase data source has Active/standby Links, you can select either the Active Link or Standby Link for integration. This only affects the production data source.
Table	You can enter a keyword to search for tables or enter the exact table name and click Exact Match. Click the icon to copy the name of the selected table.
Output Mode	Select an output mode: Normal Mode or Multi-version Mode (Vertical Table).
maxversion	If you select Multi-version Mode (Vertical Table) as the output mode, you need to specify maxversion. maxversion specifies the number of versions to read. A value of -1 indicates that all versions are read.
File Encoding	Select a file encoding format: UTF-8 or GBK.
Start Rowkey	The starting rowkey for scanning. All rows with rowkeys lexicographically greater than or equal to this value are included in the scan results. For example, `aaa` (string) or `10110` (binary).
End Rowkey	The end rowkey for scanning. All rows with rowkeys lexicographically less than this value are scanned. The end rowkey itself is excluded (left-closed, right-open interval). For example, to scan all user records from `user0001` to `user9999` in an HBase table, set the start rowkey to `user0001` and the end rowkey to `user10000`. This returns all rows with `user`-prefixed rowkey values between `user0001` and `user10000`, excluding the row with rowkey `user10000`.
Start Rowkey Type	Select the type of the start rowkey: String or Binary.
Output Fields	The output fields of the component. Batch Add Fields. Click Batch Add. Configure in JSON format. For example: `// Example: [{ "name": "cf1:q1", "type": "string" }, { "name": "cf1:q2", "type": "string" }, { "name": "cf1:q3", "type": "string" }]` Note `name` is the column family and field name, and `type` is the field type. For example, `"name":"cf1:a","type":"String"` indicates that field `a` in column family `cf1` is imported as type `String`. Configure in TEXT format. For example: `// Example: cf1:q1,string cf1:q2,string cf1:q3,string` The row delimiter is used to separate the information of each field. The default is a line feed (\n). Supported delimiters include line feed (\n), semicolon (;), and period (.). The column delimiter is used to separate the field name and field type. The default is a comma (,). Click OK. Create A New Output Field. Click Create Output Field, and fill in the Column Family, Column, and select the Type as prompted. Manage output fields. You can perform the following operations on added fields: Click and drag the Column icon next to to change the position of the field. Click the Operation icon in the column to edit an existing field. Click the Operation icon in the column to delete an existing field.

Click OK to complete the property configuration of the HBase input component.

上一篇: Configure the Hive input component 下一篇: Configure MongoDB Input Widget